Protein Structure Prediction

Background

Membrane proteins are involved in a wide range of essential functions, including the communication between cells and the transport of nutrients, ions, and waste products across biological membranes. These proteins, that are estimated to constitute 25% of proteins at a genomic scale, play key roles in an equally wide range of diseases like diabete, hypertension, depression, arthritis and cancer. They are also common drug targets (for over 75% of pharmaceuticals in use today). Determining membrane protein structures is essential for the understanding of how drugs interfere with cellular communication and regulation. However, current knowledge about the detailed 3D structures of membrane proteins is limited, because such protein structures are difficult to study by traditional experimental methods.

Research rationale

The idea is to use computational techniques to enhance our knowledge about membrane proteins. However, developing algorithms that are capable of predicting the three-dimensional structure of proteins at atomic detail is a very difficult task. Instead of tertiary structure determination, we focused our research on two complementary aspects: the structural classification of proteins, which allows to identify potential membrane proteins, and the prediction of transmembrane alpha-helices in membrane proteins.

Structural classification of proteins

Periodical patterns and tandem repeats of residues are often found in DNA and protein sequences. In proteins, their presence helps towards an understanding of the molecular structure of a fibrous/structural protein employing the principle of conformational equivalence and it may suggest ways of ultramolecular assembly for the formation of higher order structure. Characteristic examples are periodicities found in a number of sequences of fibrous proteins (e.g. tropomyosin, myosin, keratins and collagen). We used the Fourier analysis method to highlight hidden periodicities in protein sequences [1, 2] and developped a tool accessible by biologists through the Internet (FT).

In the continuation of this work, we explored the use of hierarchical, artificial neural networks for the generalized classification of proteins into several distinct classes - transmembrane, fibrous, globular, and mixed - from information solely encoded in their amino acid sequences [3–5]. The use of our implementations (PRED-TMR2, PRED-CLASS) to analyze various test sets and complete proteomes of several organisms demonstrates that such methods could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods.

Prediction of transmembrane alpha-helices in membrane proteins.

The successful location of transmembrane segments, of their secondary structure and the packing modes of secondary structure elements is important because they define the architecture of a transmembrane protein. However, equally important is the determination of topology, which defines the "polarity" of integral membrane proteins.

Researchers have identified several characteristics that are common to a large proportion of transmembrane segments. They observed, for example, that transmembrane segments are mainly composed of hydrophobic residues, and that the propensity of positively charged residues is higher in the non-transmembrane segments on the inner part of the cell, also that a high propensity of tyrosine and tryptophan indicates the outer part of the cell. To enhance this knowledge, we performed several statistical analysis of known transmembrane segments to find other characteristics of transmembrane parts. We determined, among other things, the distribution of transmembrane segment length, the propensity for each amino acid to be in a transmembrane region and the precise profiles of potential termini ("edges", starts and ends) of transmembrane regions. We combined this information with several scoring functions to predict the precise position of transmembrane segments [6] and their topology [7]. The accuracy of our method compares well with that of other popular existing methods. This work led to the implementation of several tools freely available on the Internet: (PRED-TMR, OrienTM, CoPreTHi, Dam-Bio) [8–11].

References

1. Pasquier C, Promponas V, Varvayannis N, Hamodrakas S: A web interface for FT: a tool dedicated to the study of periodicities in sequences. In 20th conference of the hellenic society for biological science. Samos Island; 1998.

2. Pasquier C, Promponas V, Varvayannis N, Hamodrakas S: A web server to locate periodicities in a sequence. Bioinformatics 1998, 14:749–50.

3. Pasquier C, Promponas V, Palaios G, Hamodrakas I, Hamodrakas S: PRED-TMR2: An hierarchical neural network to classify proteins as transmembrane and a novel method to predict transmembrane segments. In Proceedings of the 21st conference of the hellenic society for biological sciences. Galissas, Syros island; 1999.

4. Pasquier C, Hamodrakas S: An hierarchical artificial neural network system for the classification of transmembrane proteins. Protein Engineering Design and Selection 1999, 12:631–634.

5. Pasquier C, Promponas V, Hamodrakas S: PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications. Proteins: Structure, Function, and Bioinformatics 2001, 44:361–9.

6. Pasquier C, Promponas V, Palaios G, Hamodrakas I, Hamodrakas S: A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Engineering Design and Selection 1999, 12:381–385.

7. Liakopoulos T, Pasquier C, Hamodrakas S: A novel tool for the prediction of transmembrane protein topology based on a statistical analysis of the SwissProt database: the OrienTM algorithm. Protein Engineering Design and Selection 2001, 14:387–390.

8. Promponas V, Palaios G, Pasquier C, Hamodrakas I, Hamodrakas S: CoPreTHi: a program to combine the results of transmembrane protein segment prediction methods. In 20th conference of the hellenic society for biological sciences. Samos Island; 1998.

9. Promponas V, Palaios G, Pasquier C, Hamodrakas I, Hamodrakas S: CoPreTHi: a Web tool which combines transmembrane protein segment prediction methods. In silico biology 1999, 1:159–62.

10. Liakopoulos T, Palaios G, Promponas V, Hamodrakas I, Pasquier C, Hamodrakas S: A workbench for computational analysis of protein sequence and structure on the Internet. In Proceedings of the 22nd conference of the hellenic society for biological sciences. Skiathos island; 2000.

11. Liakopoulos T, Harkiolakis N, Promponas V, Pasquier C, Hamodrakas I, Papandreou N, Iconomidou V, Papandreou N, Tzafestas E, Tzafestas S, Eliopoulos E, Hamodrakas S: DAM-BIO: Bioinformatics internet workbench for protein analysis. New modules and applications to biological problems. In 23rd conference of the hellenic society for biological sciences. Chios island; 2001.