Methods for statistical inference on correlated data : application to genomic data
Eleonora De Leonardis (LPS)

The availability of huge amounts of data has changed the role of physics
with respect to other disciplines. Within this dissertation I will explore
the innovations introduced in molecular biology thanks to statistical
physics approaches. In the last 20 years the size of genome databases has
exponentially increased, therefore the exploitation of raw data, in the
scope of extracting information, has become a major topic in statistical
physics. After the success in protein structure prediction, surprising
results have been finally achieved also in the related field of RNA
structure characterisation. However, recent studies have revealed that,
even if databases are growing, inference is often performed in the under
sampling regime and new computational schemes are needed in order to
overcome this intrinsic limitation of real data. This dissertation will
discuss inference methods and their application to RNA structure
prediction. We will discuss some heuristic approaches that have been
successfully applied in the past years, even if poorly theoretically
understood. The last part of the work will focus on the development of a
tool for the inference of generative models, hoping it will pave the way
towards novel applications.