Michele Clamp, Tim Hubbard, Arek Kasprzyk, Arne Stabenau, Val Curwen, Eduardo
Eyras, James Stalker
are a unique resource in molecular biology in that they are a closed (or effectively
closed) dataset, where every activity of an organism must be found, and that
they are entirely amenable to computational storage and analysis. Over the
last decade, impressive improvements in technology have allowed the genome
sequence of all main model organisms to be determined, in particular Human,
Mouse and Rat in the mammalian clade and worm and fruitfly.
However, they are a headache for most people to use, as the datasets are large
and most people would prefer to work on somewhat derived information from
the genome, e.g. the set of proteins encoded by the genome, rather than the
raw genome sequence itself. Ensembl is one of the leading systems for
the storage, management and general analysis of genomes. I will present some
of the challenges Ensembl has to overcome along with some vignettes
of its use.
interactions in three-dimensions
interactions are central to most biological processes, and are currently the
subject of great interest. Despite the many recently developed methods to
identify protein interactions, little attention has been paid to one of the
best sources of data: complexes of known three-dimensional (3D) structure
as determined by X-ray crystallography.
In this talk I will discuss such complexes, and how they can be used to study
and predict protein interactions and to interrogate interaction networks proposed
by methods such as two-hybrid screens or affinity purifications.
1. P. Aloy, R.B. Russell (2002) Interrogating protein interaction networks
through structural biology. Proc. Natl. Acad. Sci. USA 99, 5896-5901.
2. P. Aloy, F.D. Ciccarelli, C. Leutwein, A.C. Gavin, G. Superti-Furga, P.
Bork, B. Boettcher, R.B. Russell (2002) A complex prediction: three-dimensional
model of the yeast exosome. EMBO reports 3, 628-635.
3. P. Aloy, R.B. Russell (2002) Potential artefacts in protein-interaction
networks. FEBS Lett. 530, 253-254.
4. P. Aloy, R.B. Russell (2003) InterPreTS: Protein interaction prediction
through tertiary structure. Bioinformatics 9, 161-162.
5. P. Aloy, R.B. Russell (2002) The third dimension for protein interactions
and complexes. Trends Biochem. Sci. 27, 633-638.
graphical models for gene function and gene interaction
Birkbeck College, University of London, UK
ultimate purpose of high-throughput experiments in molecular biology such
as microarrays, screens for protein-protein interactions, or localisation
of transcription factor binding sites is to unravel gene regulatory networks.
Such experiments can be combined with a bioinformatics analysis and comparison
of genomic data or data on protein structures. It is clear that such combined
analysis can produce insights in the working of biological systems on a much
more comprehensive scale than more traditional approaches that focus on isolated
On the other hand, high-throughput data differ in their quality from data
obtained in more focused experiments on specific systems. They are charaterised
by comparatively large false positive or false negative rates and their interpretation
requires careful statistical modelling. A further challenge is that each single
source of information might be inconclusive if considered in isolation. Probabilistic
models can exploit combined evidence to reach much stronger conclusions. Due
to their intuitive interpretation in terms of causal or conceptual connections,
graphical models for the interpretation of high-throughput experimental and
bioinformatics data in particular are gaining in popularity. Variants of graphical
models, for example hidden Markov models or artificial neural networks, are
already used very successfully for classification and prediction tasks in
bioinformatics and microarray analysis.
More recently, probabilistic graphical models such as Markov networks and
Bayesian networks have been suggested to model functional relationship and
gene interaction networks. One main attraction of probabilistic models is
their integrative nature. The same network linking functional knowledge with
knowledge on gene or protein interaction, for example, can be used to infer
function from interaction or interaction from function, or solve a mixture
of both inference tasks. Missing values are also easily dealt with in this
essentially Bayesian frameworks.
We will illustrate the flexibility of graphical models and discuss the deduction
of gene regulatory networks from microarray data as well as the probabilistic
inference of functional assignments from a variety of sources such as known
protein-protein interactions, microarray data, genome structure, and known
interactions in related organisms. We will also discuss strategies to overcome
a principal problem with graphical models (and indeed Bayesian models in general)
- their computation is usually time and resource intensive. Various sampling
and approximation schemes can be applied to keep computing time manageable.