Organised By

Geoff Barton


Ensembl - A genome information infrastructure

Ewan Birney

Coauthor(s): Michele Clamp, Tim Hubbard, Arek Kasprzyk, Arne Stabenau, Val Curwen, Eduardo Eyras, James Stalker

Genomes are a unique resource in molecular biology in that they are a closed (or effectively closed) dataset, where every activity of an organism must be found, and that they are entirely amenable to computational storage and analysis. Over the last decade, impressive improvements in technology have allowed the genome sequence of all main model organisms to be determined, in particular Human, Mouse and Rat in the mammalian clade and worm and fruitfly.

However, they are a headache for most people to use, as the datasets are large and most people would prefer to work on somewhat derived information from the genome, e.g. the set of proteins encoded by the genome, rather than the raw genome sequence itself. Ensembl is one of the leading systems for the storage, management and general analysis of genomes. I will present some of the challenges Ensembl has to overcome along with some vignettes of its use.


Protein-protein interactions in three-dimensions

Rob Russell
EMBL, Germany

Protein interactions are central to most biological processes, and are currently the subject of great interest. Despite the many recently developed methods to identify protein interactions, little attention has been paid to one of the best sources of data: complexes of known three-dimensional (3D) structure as determined by X-ray crystallography.

In this talk I will discuss such complexes, and how they can be used to study and predict protein interactions and to interrogate interaction networks proposed by methods such as two-hybrid screens or affinity purifications.


1. P. Aloy, R.B. Russell (2002) Interrogating protein interaction networks through structural biology. Proc. Natl. Acad. Sci. USA 99, 5896-5901.

2. P. Aloy, F.D. Ciccarelli, C. Leutwein, A.C. Gavin, G. Superti-Furga, P. Bork, B. Boettcher, R.B. Russell (2002) A complex prediction: three-dimensional model of the yeast exosome. EMBO reports 3, 628-635.

3. P. Aloy, R.B. Russell (2002) Potential artefacts in protein-interaction networks. FEBS Lett. 530, 253-254.

4. P. Aloy, R.B. Russell (2003) InterPreTS: Protein interaction prediction through tertiary structure. Bioinformatics 9, 161-162.

5. P. Aloy, R.B. Russell (2002) The third dimension for protein interactions and complexes. Trends Biochem. Sci. 27, 633-638.

Probabilistic graphical models for gene function and gene interaction

Lorenz Wernisch
Birkbeck College, University of London, UK

The ultimate purpose of high-throughput experiments in molecular biology such as microarrays, screens for protein-protein interactions, or localisation of transcription factor binding sites is to unravel gene regulatory networks. Such experiments can be combined with a bioinformatics analysis and comparison of genomic data or data on protein structures. It is clear that such combined analysis can produce insights in the working of biological systems on a much more comprehensive scale than more traditional approaches that focus on isolated aspects.

On the other hand, high-throughput data differ in their quality from data obtained in more focused experiments on specific systems. They are charaterised by comparatively large false positive or false negative rates and their interpretation requires careful statistical modelling. A further challenge is that each single source of information might be inconclusive if considered in isolation. Probabilistic models can exploit combined evidence to reach much stronger conclusions. Due to their intuitive interpretation in terms of causal or conceptual connections, graphical models for the interpretation of high-throughput experimental and bioinformatics data in particular are gaining in popularity. Variants of graphical models, for example hidden Markov models or artificial neural networks, are already used very successfully for classification and prediction tasks in bioinformatics and microarray analysis.

More recently, probabilistic graphical models such as Markov networks and Bayesian networks have been suggested to model functional relationship and gene interaction networks. One main attraction of probabilistic models is their integrative nature. The same network linking functional knowledge with knowledge on gene or protein interaction, for example, can be used to infer function from interaction or interaction from function, or solve a mixture of both inference tasks. Missing values are also easily dealt with in this essentially Bayesian frameworks.

We will illustrate the flexibility of graphical models and discuss the deduction of gene regulatory networks from microarray data as well as the probabilistic inference of functional assignments from a variety of sources such as known protein-protein interactions, microarray data, genome structure, and known interactions in related organisms. We will also discuss strategies to overcome a principal problem with graphical models (and indeed Bayesian models in general) - their computation is usually time and resource intensive. Various sampling and approximation schemes can be applied to keep computing time manageable.