Анализ геномных последовательностей: автоматическая классификация белков и определение функциональной роли альтернативного сплайсинга

Кривенцева, Евгения Викторовна

Бесплатный автореферат и диссертация по биологии на тему
Анализ геномных последовательностей: автоматическая классификация белков и определение функциональной роли альтернативного сплайсинга
ВАК РФ 03.00.02, Биофизика

Содержание диссертации, кандидата физико-математических наук, Кривенцева, Евгения Викторовна

Эта работа была сделана на кафедре молекулярной биофизики, Факультета Молекулярной и Биологической Физики Московского Физико-Технического Института (Государственного университета), в лаборатории вычислительного и структурного анализа биополимеров, Института Молекулярной Биологии Энгельгардта Российской Академии Наук, так же в группе Баз Данных Биологических Последовательностей, в Европейском Институте Биоинформатики (ЕВ1), филиале ЕМВЬ, в Кембридже, Великобритания. Группа Баз Данных Биологических Последовательностей создает ведущие биологические базы данных, белковые базы данных 8"\¥]Б8-Р1ЮТ и ТгЕМВЬ, базу данных нуклеотидных последовательностей ЕМВЬ, ресурс белковых доменов и семейств - МегРго, и другие.

Диссертация состоит из семи глав. Первая

глава, краткий обзор литературы по теме работы, содержит введение в эволюцию белков, представляет алгоритмы сравнений биологических последовательностей и их параметры, и перечисляет доступные ресурсы белковых и нуклеотидных последовательностей и подходы к их классификациям.

Вторая

глава вводит потребность автоматической классификации белков, описывает выбранную методологию ОивТг и приводит ее тестирование. Она также описывает созданную базу данных и разработанный интерфейс.

Третья

глава описывает ресурс анализа полных протеомов и представляет проведенный сравнительный анализ.

Четвертая

глава представляет другие приложениях ОиБТг как, например, автоматическая аннотация белков, нахождение новых белковых доменов, выбор контролей для экспериментов на микро чипах и целевого выбора белков для экспериментального определения структур.

Пятая

глава представляет анализ воздействия альтернативного сплайсинга на функциональное разнообразие протеома.

Заключение Диссертация по теме "Биофизика", Кривенцева, Евгения Викторовна

выводы

1. Показано, что предложенная процедура автоматической классификации белковых последовательностей способна воспроизводить белковые семейства, определённые экспертным анализом как на основе подобия последовательностей (Pfam), так и на основе структурной организации (SCOP). Было исследовано пространство параметров процедуры и протестирована её общая эффективность.

2. Белки, представленные в ведущей базе данных SWISS-PROT+TrEMBL, были расклассифицированы в иерархично организованные группы по подобию их последовательностей. Полученные данные сохранены в разработанной реляционной базе данных под управлением СУБД ORACLE и общедоступны по Internet (CluSTr. http://www.ebi.ac. uk/clustr/, (6)). Разработанный ВЕБ интерфейс предоставляет возможность поиска по базе данных, наглядное представление иерархии кластеров и интеграцию с другими ресурсами.

3. CluSTr был спроектирован и реализован для использования в разрабатываемых процедурах автоматической аннотации белков в SWISS-PROT+TrEMBL. Полученные данные были использованы для сравнительного анализа полных геномов, результаты которого доступны в Internet (http://www.ebi.ac.uk/npoTeoMa/) (7); изучения трансмембранных белковых семейств (8); описания новых белковых доменов (5); выбора контролей для экспериментов по анализу экспрессии генов на микрочипах; и для определения перспективных белков для экспериментального разрешения структур (14).

4. Статистически показано влияние альтернативного сплайсинга на функциональное разнообразие протеома, а именно: а) альтернативный сплайсинг имеет тенденцию вставлять/убирать полные белковые домены, в то время как нарушение доменов и других структурных элементов наблюдается реже ожидаемого. Показано, что этот эффект не может быть объяснён корреляцией границ доменов и экзонов. b) в большинстве случаев частичного перекрывания альтернативного сплайсинга и доменов, функциональный эффект альтернативного сплайсинга эквивалентен полному удалению доменов. c) в случаях, когда альтернативный сплайсинг наблюдается внутри доменов и очевидным образом не изменяет структуры белка, наблюдается тенденция нарушения функционально важных участков.

Таким образом, показано существенное влияние положительного отбора на эволюцию альтернативного сплайсинга.

6. Публикации автора книги

1. R. Apweiler, М. Biswas, W.Fleischmann, Е. У. Kriventseva. N. Mulder. Automation of Protein Sequence Characterization and Its Application in Whole Протеома Analysis. Book chapter. Gene Regulation and Metabolism: Post-Genomic Computational Approaches eds. J. Collado-Vides and R.Hofestadt MIT Press 2002 19-47.

ЖУРНАЛЫ

2. Boue S, Vingron M, Kriventseva E, Koch I. Theoretical analysis of alternative splice forms using computational methods. Bioinformatics. 2002 Oct;18 Suppl 2:S65-S73.

3. Biswas M, O'Rourke JF, Camón E, Fräser G, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva E. Mittard V, Mulder N, Phan I, Servant F, Apweiler R. Applications of InterPro in protein annotation and genome analysis. Brief Bioinform. 2002 Sep;3(3):285-95.

4. Kanapin, R. Apweiler, M. Biswas, W. Fleischmann, Y. Karavidopoulou, P. Kersey, E.V. Kriventseva, V. Mittard, N. Mulder, T. Oinn, I. Phan, F. Servant, E. Zdobnov. Interactive InterPro-based comparisons of proteins in whole genomes. Bioinformatics. 2002 Feb;18(2):374-5.

5. E. V. Kriventseva, M. Biswas and R. Apweiler. Clustering and analysis of protein families. (2001) Curr Opin Struct Biol 11(3): 334-9.

6. E.V. Kriventseva, W.Fleischmann, E. M. Zdobnov, R. Apweiler. CluSTr: a database of Clusters of SWISS-PROT+TrEMBL proteins. (2001) Nucleic Acids Res 29(1): 33-6.

7. R. Apweiler, M. Biswas, W. Fleischmann, A. Kanapin, Y. Karavidopoulou, P. Kersey, E. V. Kriventseva, V. Mittard, N. Mulder, I. Phan and E. Zdobnov. Протеома Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes. (2001) Nucleic Acids Res 29(1): 44-8.

8. S. Moller, E. V. Kriventseva, R. Apweiler A collection of well characterised integral membrane proteins. (2000) Bioinformatics 16(12): 1159-60.

9. E.V. Kriventseva, V.l. Makeev, M.S. Gel'fand. Statistical analysis of the exon-intron structure of higher eukaryote genes. (1999) Biofizika 44(4): p. 595-600.

10. E.V. Kriventseva and M.S. Gelfand. Statistical analysis of the exon-intron structure of higher and lower eukaryote genes. (1999) J Biomol Struct Dyn 17(2): p. 281-8.

КОНФЕРЕНЦИИ

11. E.V. Kriventseva. F. Servant, T. Bruls, R. Apweiler. CluSTr - the database of Clusters of SWISS-PROT+TrEMBL proteins. ISMB02 Abstract 173B. http://www.ismb02.org/posters/poster/Kriventseva.pdf

12. V. Mittard, R. Apweiler, D. Barreil, U. Das, W. Fleischmann, A. Kanapin, P. Kersey, E. Kriventseva. P. McNeil, N. Mulder, F. Servant Sequence and Structural Integration within the InterPro, Протеома Analysis and SWISS-PROT Databases. ISMB02 Abstract 176A. http://www.ismb02.Org/genomeannotation.htm#Tocl3331903

13. E. V. Kriventseva, S. Möller, R. Apweiler. Domain-finding with CluSTr: Re-occuring motifs determined with a database of mutual sequence similarity. (2001) ISMB'Ol, Abstract 110 http://ismb01 .cbs.dtu.dk/ProteinFamilies.html#Al 10.

14. E.V. Kriventseva, M. Biswas, A. Kanapin, P. Kersey, N. Mulder, I. Phan, R. Apweiler. Target selection for structural genomics by comparative analysis of the predicted протеомав of Saccharomyces cerevisiae, Caenorhabditis elegans and D. melanogaster. (2001) 42nd Annual Drosophila Research Conference Abstract 945. p 326a. http://www.faseb.org/genetics/dros01/html/f945.htm.

15. E. V. Kriventseva, W. Fleischmann, R. Apweiler. Evaluating the CluSTr methodology. (2000) Poster presentation on German Conference on Boinformatics.

16. R. Apweiler, M. Biswas, W. Fleischmann, A. Kanapin, Y. Karavidopoulou, P. Kersey, E. Kriventseva, V. Mittard, N. Mulder, T. Oinn, I. Phan, E. Zdobnov. Протеома analysis: application of InterPro and CluSTr for the functional classification of proteins in whole genomes. In: Proceedings of the German Conference on Bioinformatics (GCB'OO) pp. 149-157.

17. E. V. Kriventseva, W. Fleischmann, A. Kanapin, R. Apweiler. Clustering of proteins from SWISS-PROT and TrEMBL. (1999) ISMB'99 Abstract http://ismb99.gmd.de/PosterAbstracts/KriventsevaFKA.pdf.

18. E.V. Kriventseva and M.S Gelfand. Statistical analysis of the exon-intron structure and splicing sites of several eukaryotes. (1998) 'Theoretical Biophysics. Current topics'. Abstract 42. http://www.biophys.msu.ru/awse/confer/nlw98/abs42.htm

Библиография Диссертация по биологии, кандидата физико-математических наук, Кривенцева, Евгения Викторовна, Москва

1. Altschul S.F., and Gish W. (1996) Local alignment statistics. Methods Enzymol 266:460-80.

2. Altschul S.F., Gish W., Miller W., Myers E.W., and Lipman DJ. (1990) Basic local alignment search tool. J Mol Biol 215:403-10.

3. Altschul S.F., and Koonin E.V. (1998) Iterated profile searches with P SI-BLAST—a tool for discovery in protein databases. Trends Biochem Sei 23:444-7.

4. Altschul S.F., and Lipman D.J. (1990) Protein database searches for multiple alignments. Proc Natl Acad Sei U S A 87:5509-13.

5. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W„ and Lipman D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-402.

6. Apweiler R. (2000) Protein sequence databases. Adv Protein Chem 54:31-71.

7. Attwood T.K. (2000) The role of pattern databases in sequence analysis. Brief Bioinform 1:45-59.

8. Attwood T.K., Croning M.D., Flower D.R., Lewis A.P., Mabey J.E., Scordis P., Selley J.N., and Wright W. (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res 28:225-7.

9. Bairoch A., and Apweiler R. (2000) The SWISS-PROTprotein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28:45-8.

10. Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S.R., Griffiths-Jones S., Howe K.L., Marshall M., and Sonnhammer E.L. (2002) The Pfam protein families database. Nucleic Acids Res 30:276-80.

11. Boise L.H., Gonzalez-Garcia M., Postema C.E., Ding L., Lindsten T., Turka L.A., Mao X., Nunez G., and Thompson C.B. (1993) bcl-x, a bcl-2-related gene that functions as a dominant regulator of apoptotic cell death. Cell 74:597-608.

12. Bork P., and Koonin E.V. (1998) Predicting functions from protein sequences—where are the bottlenecks? Nat Genet 18:313-8.

13. Brenner S.E. (2000) Target selection for structural genomics. Nat Struct Biol 7 Suppl:967-9.

14. Brenner S.E. (2001) A tour of structural genomics. Nat Rev Genet 2:801-9.

15. Brenner S.E., and Levitt M. (2000) Expectations from structural genomics. Protein Sci 9:197-200.

16. Brett D., Hanke J., Lehmann G., Haase S., Delbruck S., Krueger S., Reich J., and Bork P. (2000) EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett 474:83-6.

17. Brown M., Hughey R., Krogh A., Mian I.S., Sjolander K., and Haussler D. (1993) Using Dirichlet mixture priors to derive hidden Markov models for protein families. Proc Int Conf Intel! Syst Mol Biol 1:47-55.

18. Burley S.K., and Bonanno J.B. (2002) Structural genomics of proteins from conserved biochemical pathways and processes. CurrOpin Struct Biol 12:383-91.

19. Buvoli M., Biamonti G., Tsoulfas P., Bassi M.T., Ghetti A., Riva S., and Morandi C. (1988) cDNA cloning of human hnRNP protein AI reveals the existence of multiple mRNA isoforms. Nucleic Acids Res 16:3751-70.

20. Bystroff C., and Shao Y. (2002) Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA. Bioinformatics lSSuppl 1.S54-61.

21. Bystroff C., Thorsson V., and Baker D. (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol 301:173-90.

22. Caceres J.F., and Kornblihtt A.R. (2002) Alternative splicing: multiple control mechanisms and involvement in human disease. Trends Genet 18:186-93.

23. Cavalier-Smith T. (1985) Selfish DNA and the origin of introns. Nature 315:283-4.

24. Cavalier-Smith T. (1991) Intronphylogeny: anew hypothesis. Trends Genet 7:145-8.

25. Clamp M. (1998) JalView. http://www2.ebi.ac.uk/~michele/ialvi.ew/.

26. Comet J.P., Aude J.C., Glemet E., Risler J.L., Henaut A., Slonimski P.P., and Codani J.J. (1999) Significance ofZ-value statistics of Smith-Waterman scores for protein alignments. Comput Chem 23:317-31.

27. Consortium. (1999) IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN) and Nomenclature Committee oflUBMB (NC-IUBMB), newsletter 1999. Eur JBiochem 264:607-9.

28. Consortium. (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796-815.

29. Consortium M.G.S. (2002) Initial sequencing and comparative analysisof the mouse genome. Nature Nov.

30. Corpet F., Gouzy J., and Kahn D. (1999) Recent improvements of the ProDom database of protein domain families. Nucleic Acids Res 27:263-7.

31. Coulson A. (1996) The Caenorhabditis elegans genome project. C. elegans Genome Consortium. Biochem Soc Trans 24:289-91.

32. Dayhoff M.O. (1976) The origin and evolution of protein superfamilies. Fed Proc 35:2132-8.

33. Dengler U., Siddiqui A.S., and Barton G.J. (2001) Protein structural domains: analysis of the 3Dee domains database. Proteins 42:332-44.

34. DeRisi J.L., Iyer V.R., and Brown P.O. (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680-6.

35. Dietmann S., Park J., Notredame C., Heger A., Lappe M., and Holm L. (2001) A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3. Nucleic Acids Res 29:55-7.

36. Doye V., Soubrier F., Bauw G., Boutterin M.C., Beretta L., Koppel J., Vandekerckhove J., and Sobel A. (1989) A single cDNA encodes two isoforms of stathmin, a developmentally regulated neuron-enriched phosphoprotein. J Biol Chem 264:12134-7.

37. Eddy S.R. (1998) Profile hidden Markov models. Bioinformatics 14:755-63.

38. Elofsson A., and Sonnhammer E.L. (1999) A comparison of sequence and structure protein domain families as a basis for structural genomics. Bioinformatics 15:480-500.

39. Enright A.J., Uiopoulos I., Kyrpides N.C., and Ouzounis C.A. (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86-90.

40. Etzold T., Ulyanov A., and Argos P. (1996) SRS: information retrieval system for molecular biology data banks. Methods Enzymol 266:114-28.

41. Falquet L., Pagni M., Bucher P., Hulo N., Sigrist C.J., Hofmann K., and Bairoch A. (2002) The PROSITE database, its status in 2002. Nucleic Acids Res 30:235-8.

42. Fields S., Kohara Y., and Lockhart D.J. (1999) Functional genomics. Proc Natl Acad Sci U S A 96:8825-6.

43. Fleischmann W., Moller S., Gateau A., and Apweiler R. (1999) A novel method for automatic functional annotation of proteins. Bioinformatics 15:228-33.

44. Galperin M.Y., and Koonin E.V. (1998) Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol 1:55-67.

45. Garavelli J.S., Hou Z., Pattabiraman N., and Stephens R.M. (2001) The RESID Database of protein structure modifications and the NRL-3D Sequence-Structure Database. Nucleic Acids Res 29:199201.

46. Gasteiger E., Jung E., and Bairoch A. (2001) SWISS-PROT: connecting biomolecular knowledge via a protein database. Curr Issues Mol Biol 3:47-55.

47. Gilbert W. (1978) Why genes in pieces? Nature 271:501.

48. Gilbert W., and Glynias M. (1993) On the ancient nature ofintrons. Gene 135:137-44.

49. Gilbert W., Marchionni M., and McKnight G. (1986) On the antiquity of introns. Cell 46:151-3.

50. Glemet E., and Codani J.J. (1997) LASSAP, a LArge Scale Sequence compArison Package. Comput Appl Biosci 13:137-43.

51. Go M. (1981) Correlation of DNA exonic regions with protein structural units in haemoglobin. Nature 291:90-2.

52. GotohO. (1982) An improved algorithm for matching biological sequences. J Mol Biol 162:705-8.

53. Gough J., and Chothia C. (2002) SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res 30:268-72.

54. Gracy J., and Argos P. (1998) DOMO: a new database of aligned protein domains. Trends Biochem Sci 23:495-7.

55. Hadley C., and Jones D.T. (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure Fold Des 7:1099-112.

56. Haft D.H, Loftus B.J., Richardson D.L., Yang F., Eisen J.A., Paulsen I.T., and White O. (2001) TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res 29:41-3.

57. Hamosh A., Scott A.F., Amberger J., Bocchini C., Yalle D., and McKusick V.A. (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 30:52-5.

58. Henikoff J.G., Greene E.A., Pietrokovski S., and Henikoff S. (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 28:228-30.

59. Henikoff S., and Henikoff J.G. (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915-9.

60. Henikoff S., and Henikoff J.G. (1994) Protein family classification based on searching a database of blocks. Genomics 19:97-107.

61. Henikoff S., Henikoff J.G., and Pietrokovski S. (1999) Blocks+: a non-redundant database of protein alignment blocks derivedfrom multiple compilations. Bioinformatics 15:471-9.

62. Holm L., and Sander C. (1996) The FSSP database: fold classification based on structure-structure alignment of proteins. Nucleic Acids Res 24:206-9.

63. Holm L., and Sander C. (1999) Protein folds and families: sequence and structure alignments. Nucleic Acids Res 27:244-7.

64. Huang J.Y., and BrutlagD.L. (2001) The EMOTIFdatabase. Nucleic Acids Res 29:202-4.

65. Hurst L.D., and McVean G.T. (1996) A difficult phase for introns-early. Molecular evolution. Curr Biol 6:533-6.

66. Johnson M.S., and Overington J.P. (1993) A structural basis for sequence comparisons. An evaluation of scoring methodologies. J Mol Biol 233:716-38.

67. Kan Z., Rouchka E.C., Gish W.R., and States D.J. (2001) Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res 11:889-900.

68. Karlin S., and Altschul S.F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A 87:2264-8.

69. Karsch-Mizrachi I., and Ouellette B.F. (2001) The GenBank sequence database. Methods Biochem Anal 43:45-63.

70. Keirsebilck A., Bonne S., Staes K, van Hengel J., Nollet F., Reynolds A., and van Roy F. (1998) Molecular cloning of the human pl20ctn catenin gene (CTNND1): expression of multiple alternatively spliced isoforms. Genomics 50:129-46.

71. Kersey P., Hermjakob H., and Apweiler R. (2000) VARSPLIC: alternatively-spliced protein sequences derived from SWISS-PROT and TrEMBL. Bioinformatics 16:1048-9.

72. Koonin E.V. (2000) Bridging the gap between sequence andfunction. Trends Genet 16:16.

73. Krause A., Stoye J., and Vingron M. (2000) The SYSTERSprotein sequence cluster set. Nucleic Acids Res 28:270-2.

74. Kretschmann E., Fleischmann W., and Apweiler R. (2001) Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT. Bioinformatics 17:920-6.

75. Kriventseva E.V., Fleischmann W., Zdobnov E.M., and Apweiler R. (2001) CluSTr: a database of clusters ofSWISS-PROT+TrEMBLproteins. Nucleic Acids Res 29:33-6.

76. Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., and Eisenberg D. (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285:751-3.

77. Marcotte E.M., Pellegrini M., Thompson M.J., Yeates T.O., and Eisenberg D. (1999) A combined algorithm for genome-wide prediction ofprotein function. Nature 402:83-6.

78. McGarvey P.B., Huang H., Barker W.C., Orcutt B.C., Garavelli J.S., Srinivasarao G.Y., Yeh L.S., Xiao C., and Wu C.H. (2000) PIR: a new resource for bioinformatics. Bioinformatics 16:290-1.

79. Mewes H.W., Frishman D., Guldener U., Mannhaupt G., Mayer K., Mokrejs M., Morgenstern B., Munsterkotter M., Rudd S., and Weil B. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30:31-4.

80. Mironov A.A., Fickett J.W., and Gelfand M.S. (1999) Frequent alternative splicing of human genes. Genome Res 9:1288-93.

81. Modrek B., Resch A., Grasso C., and Lee C. (2001) Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res 29:2850-9.

82. Mott R.F., Kirkwood T.B., and Curnow R.N. (1990) Tests for the statistical significance of protein sequence similarities in data-bank searches. Protein Eng 4:149-54.

83. Mushegian A.R., Garey J.R., Martin J., and Liu L.X. (1998) Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins encoded by the human, fly, nematode, and yeast genomes. Genome Res 8:590-8.

84. Needleman S.B., and Wunsch C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443-53.

85. Nielsen H., Brunak S., and von Heijne G. (1999) Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng 12:3-9.

86. Oakley A.J., Harnnoi T., Udomsinprasert R., Jirajaroenrat K., Ketterman A.J., and Wilce M.C. (2001) The crystal structures of glutathione S-transferases isozymes 1-3 and 1-4 from Anopheles dims species B. Protein Sci 10:2176-85.

87. O'Donovan C., Martin M.J., Gattiker A., Gasteiger E., Bairoch A., and Apweiler R. (2002) High-quality protein knowledge resource: SWISS-PROTand TrEMBL. Brief Bioinform 3:275-84.

88. O'Donovan C., Martin M.J., Glemet E., Codani J.J., and Apweiler R. (1999) Removing redundancy in SWISS-PROT and TrEMBL. Bioinformatics 15:258-9.

89. Okayama T., Tamura T., Gojobori T., Tateno Y., Ikeo K., Miyazaki S., Fukami-Kobayashi K., and Sugawara H. (1998) Formal design and implementation of an improved DDBJ DNA database with a new schema and object-oriented library. Bioinformatics 14:472-8.

90. Pearl F.M., Lee D„ Bray J.E., Buchan D.W., Shepherd A.J., and Orengo C.A. (2002) The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci 11:233-44.

91. Pearson W.R. (1995) Comparison of methods for searching protein sequence databases. Protein Sci 4:1145-60.

92. Pearson W.R. (1996) Effective protein sequence comparison. Methods Enzymol 266:227-58.

93. Pearson W.R. (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol 132:185-219.

94. Pearson W.R., and Lipman D.J. (1988) Improved tools for biological sequence comparison. Proc Natl Acad SciU S A 85:2444-8.

95. Ponting C.P., and Russell R.R. (2002) The natural history of protein domains. Annu Rev Biophys Biomol Struct 31:45-71.

96. Pruitt K.D., and Maglott D.R. (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 29:137-40.

97. Roberts G.C., and Smith C.W. (2002) Alternative splicing: combinatorial output from the genome. Curr OpinChem Biol 6:375-83.

98. RostB. (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85-94.

99. Rost B., Honig B., and Valencia A. (2002) Bioinformatics in structural genomics. Bioinformatics 18:897-8.

100. Russell R.B., and Barton G.J. (1994) Structural features can be unconsented in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility. J Mol Biol 244:332-50.

101. Rzhetsky A., Ayala F.J., Hsu L.C., Chang C., and Yoshida A. (1997) Exon/intron structure of aldehyde dehydrogenase genes supports the "introns-late" theory. Proc Natl Acad Sci U S A 94:6820-5.

102. Schmucker D., Clemens J.C., Shu H., Worby C.A., Xiao J., Muda M., Dixon J.E., and Zipursky S.L. (2000) Drosophila Dscam is an axon guidance receptor exhibiting extraordinary molecular diversity. Cell 101:671-84.

103. Servant F., Bru C., Carrere S., Courcelle M., Gouzy J., Peyruc D., and D. K. (2002) ProDom: Automated clustering of homologous domains. Briefings in Bioinformatics :246-251.

104. Sigrist C.J., Cerutti L., Hulo N., Gattiker A., Falquet L., Pagni M., Bairoch A., and Bucher P. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3:265-74.

105. Smith T.F., and Waterman M.S. (1981) Identification of common molecular subsequences. J Mol Biol 147:195-7.

106. Spingola M., Grate L., Haussler D., and Ares M., Jr. (1999) Genome-wide bioinformatic and molecular analysis ofintrons in Saccharomyces cerevisiae. Rna 5:221-34.

107. Taylor W.R. (1996) A non-local gap-penalty for profile alignment. Bull Math Biol 58:1-18.

108. Tittiger C., Whyard S., and Walker V.K. (1993) A novel intron site in the triosephosphate isomerase gene from the mosquito Culex tarsalis. Nature 361:470-2.

109. Valencia A., and Pazos F. (2002) Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 12:368-73.

110. Vlahovicek K., Murvai J., Barta E., and Pongor S. (2002) The SBASE protein domain library, release 9.0: an online resource for protein domain identification. Nucleic Acids Res 30:273-5.

111. Wang Y., Addess K.J., Geer L., Madej T., Marchler-Bauer A., Zimmerman D., and Bryant S.H. (2000) MMDB: 3D structure data in Entrez. Nucleic Acids Res 28:243-5.

112. Westbrook J., Feng Z., Jain S., Bhat T.N., Thanki N., Ravichandran V., Gilliland G.L., Bluhm W., Weissig H., Greer D.S., Bourne P.E., and Berman H.M. (2002) The Protein Data Bank: unifying the archive. Nucleic Acids Res 30:245-8.

113. Winzeler E.A., Richards D.R., Conway A.R., Goldstein A.L., Kalman S., McCullough M.J., McCusker J.H., Stevens D.A., Wodicka L., Lockhart D.J., and Davis R.W. (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194-7.

114. Shpakovski G.Y., Ussery D., Barrell B.G., and Nurse P. (2002) The genome sequence of Schizosaccharomyces pombe. Nature 415:871-80.

115. Wu C.H., Xiao C., Hou Z., Huang H., and Barker W.C. (2001) iProClass: an integrated, comprehensive and annotated protein classification database. Nucleic Acids Res 29:52-4.

116. Wu C.H., Zhao S., and Chen H.L. (1996) A protein class database organized with ProSite protein groups andPIR superfamilies. J Comput Biol 3:547-61.

117. Yona G., Linial N., and Linial M. (2000) ProtoMap: automatic classification of protein sequences and hierarchy ofprotein families. Nucleic Acids Res 28:49-55.

118. Yu L., White J.V., and Smith T.F. (1998) A homology identification method that combines protein sequence and structure information. Protein Sci 7:2499-510.

119. Zdobnov E.M., and Apweiler R. (2001) InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847-8.

Информация о работе

Кривенцева, Евгения Викторовна
кандидата физико-математических наук
Москва, 2002
ВАК 03.00.02

Диссертация

Анализ геномных последовательностей: автоматическая классификация белков и определение функциональной роли альтернативного сплайсинга - тема диссертации по биологии, скачайте бесплатно

Похожие работы