List of gene prediction software

From Wikipedia, the free encyclopedia

This is a list of software tools and web portals used for gene prediction.

Name Description Species References
FINDER Automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences Eukaryotes [1]
FragGeneScan Predicting genes in complete genomes and sequencing Reads Prokaryotes, Metagenomes [2]
ATGpr Identifies translational initiation sites in cDNA sequences Human [3]
Prodigal Its name stands for Prokaryotic Dynamic Programming Genefinding Algorithm. It is based on log-likelihood functions and does not use Hidden or Interpolated Markov Models. Prokaryotes, Metagenomes (metaProdigal) [4]
AUGUSTUS Eukaryote gene predictor Eukaryotes [5]
BGF Hidden Markov model (HMM) and dynamic programming based ab initio gene prediction program [6]
DIOGENES Fast detection of coding regions in short genome sequences
Dragon Promoter Finder Program to recognize vertebrate RNA polymerase II promoters Vertebrates [7]
EasyGene The gene finder is based on a hidden Markov model (HMM) that is automatically estimated for a new genome. Prokaryotes [8][9]
EuGene Integrative gene finding Prokaryotes, Eukaryotes [10][11]
FGENESH HMM-based gene structure prediction: multiple genes, both chains Eukaryotes [12]
FrameD Find genes and frameshift in G+C rich prokaryote sequences Prokaryotes, Eukaryotes [13]
GeMoMa Homology-based gene prediction based on amino acid and intron position conservation as well as RNA-Seq data [14][15]
GENIUS II Links ORFs in complete genomes to protein 3D structures Prokaryotes, Eukaryotes [16]
geneid Program to predict genes, exons, splice sites, and other signals along DNA sequences Eukaryotes [17]
GeneParser Parse DNA sequences into introns and exons Eukaryotes [18]
GeneMark Family of self-training gene prediction programs Prokaryotes, Eukaryotes,

Metagenomes

[19][20][21][22]
GeneTack Predicts genes with frameshifts in prokaryote genomes Prokaryotes [23]
GenomeScan Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organisms, GENSCAN server is the GenomeScan's predecessor Vertebrate, Arabidopsis, Maize [24]
GENSCAN Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organisms Vertebrate, Arabidopsis, Maize [25][26][27]
GLIMMER Finds genes in microbial DNA Prokaryotes [28][29][30]
GLIMMERHMM Eukaryotic gene-finding system Eukaryotes [31]
GrailEXP Predicts exons, genes, promoters, polyas, CpG islands, EST similarities, and repeat elements in DNA sequence Human, Mus musculus, Arabidopsis thaliana, Drosophila melanogaster [32][33]
mGene Support-vector machine (SVM) based system to find genes Eukaryotes [34]
mGene.ngs SVM based system to find genes using heterogeneous information: RNA-seq, tiling arrays Eukaryotes [35]
MORGAN Decision tree system to find genes in vertebrate DNA Eukaryotes [36]
BioNIX Web tool to combine results from different programs: GRAIL, FEX, HEXON, MZEF, GENEMARK, GENEFINDER, FGENE, BLAST, POLYAH, REPEATMASKER, TRNASCAN Prokaryotes, Eukaryotes [37]
NNPP Neural network promoter prediction Prokaryotes, Eukaryotes [38]
NNSPLICE Neural network splice site prediction Drosophila, Human [39]
ORFfinder Graphical analysis tool to find all open reading frames Prokaryotes, Eukaryotes [40]
Regulatory Sequence Analysis Tools Series of modular computer programs to detect regulatory signals in non-coding sequences Fungi, Prokaryotes, Metazoa, Protist, Plants [41][42]
PHANOTATE A tool to annotate phage genomes. Phages [43]
SplicePredictor Method to identify potential splice sites in (plant) pre-mRNA by sequence inspection using Bayesian statistical models Eukaryotes [44]
VEIL Hidden Markov model to find genes in vertebrate DNA Server Eukaryotes [45]

See also[edit]

References[edit]

  1. ^ Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM (Apr 2021). "FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences". BMC Bioinformatics. 44 (9): e89. doi:10.1186/s12859-021-04120-9. PMC 8056616. PMID 33879057.
  2. ^ Rho M, Tang H, Ye Y (November 2010). "FragGeneScan: predicting genes in short and error-prone reads". Nucleic Acids Research. 38 (20): e191. doi:10.1093/nar/gkq747. PMC 2978382. PMID 20805240.
  3. ^ Nishikawa, Tetsuo; Ota, Toshio; Isogai, Takao (2000-11-01). "Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences". Bioinformatics. 16 (11): 960–967. doi:10.1093/bioinformatics/16.11.960. ISSN 1367-4803. PMID 11159307.
  4. ^ Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (March 2010). "Prodigal: prokaryotic gene recognition and translation initiation site identification". BMC Bioinformatics. 11: 119. doi:10.1186/1471-2105-11-119. PMC 2848648. PMID 20211023.
  5. ^ Keller O, Kollmar M, Stanke M, Waack S (March 2011). "A novel hybrid gene prediction method employing protein multiple sequence alignments". Bioinformatics. 27 (6): 757–63. doi:10.1093/bioinformatics/btr010. hdl:11858/00-001M-0000-0011-F244-D. PMID 21216780.
  6. ^ Li, Heng; Liu, Jin-Song; Xu, Zhao; Jin, Jiao; Fang, Lin; Gao, Lei; Li, Yu-Dong; Xing, Zi-Xing; Gao, Shao-Gen; Liu, Tao; Li, Hai-Hong (2005-07-01). "Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome". Journal of Computer Science and Technology. 20 (4): 446–453. doi:10.1007/s11390-005-0446-x. ISSN 1860-4749. S2CID 13497894.
  7. ^ Bajic, Vladimir B.; Seah, Seng Hong; Chong, Allen; Zhang, Guanglan; Koh, Judice L. Y.; Brusic, Vladimir (2002-01-01). "Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters". Bioinformatics. 18 (1): 198–199. doi:10.1093/bioinformatics/18.1.198. ISSN 1367-4803. PMID 11836231.
  8. ^ Nielsen, P.; Krogh, A. (2005-12-15). "Large-scale prokaryotic gene prediction and comparison to genome annotation". Bioinformatics. 21 (24): 4322–4329. doi:10.1093/bioinformatics/bti701. ISSN 1367-4803. PMID 16249266.
  9. ^ Larsen, Thomas Schou; Krogh, Anders (2003-06-03). "EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance". BMC Bioinformatics. 4 (1): 21. doi:10.1186/1471-2105-4-21. ISSN 1471-2105. PMC 521197. PMID 12783628.
  10. ^ Foissac S, Gouzy J, Rombauts S, Mathé C, Amselem J, Sterck L, de Peer YV, Rouzé P, Schiex T (May 2008). "Genome annotation in plants and fungi: EuGene as a model platform". Current Bioinformatics. 3 (2): 87–97. doi:10.2174/157489308784340702.
  11. ^ Sallet, Erika; Gouzy, Jérôme; Schiex, Thomas (2019), Kollmar, Martin (ed.), "EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes", Gene Prediction: Methods and Protocols, Methods in Molecular Biology, vol. 1962, New York, NY: Springer, pp. 97–120, doi:10.1007/978-1-4939-9173-0_6, ISBN 978-1-4939-9173-0, PMID 31020556, S2CID 131776381, retrieved 2021-11-24
  12. ^ Salamov AA, Solovyev VV (April 2000). "Ab initio gene finding in Drosophila genomic DNA". Genome Research. 10 (4): 516–22. doi:10.1101/gr.10.4.516. PMC 310882. PMID 10779491.
  13. ^ Schiex T, Gouzy J, Moisan A, de Oliveira Y (July 2003). "FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences". Nucleic Acids Research. 31 (13): 3738–41. doi:10.1093/nar/gkg610. PMC 169016. PMID 12824407.
  14. ^ Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F (May 2016). "Using intron position conservation for homology-based gene prediction". Nucleic Acids Research. 44 (9): e89. doi:10.1186/s12859-018-2203-5. PMC 4872089. PMID 26893356.
  15. ^ Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J (May 2018). "Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi". BMC Bioinformatics. 19 (1): 189. doi:10.1093/nar/gkw092. PMC 5975413. PMID 29843602.
  16. ^ Yabuki, Yukimitsu; Mukai, Yuri; Swindells, Mark B.; Suwa, Makiko (2004-03-01). "GENIUS II: a high-throughput database system for linking ORFs in complete genomes to known protein three-dimensional structures". Bioinformatics. 20 (4): 596–598. doi:10.1093/bioinformatics/btg478. ISSN 1367-4803. PMID 14751990.
  17. ^ Blanco, Enrique; Parra, Genís; Guigó, Roderic (June 2007), "Using geneid to Identify Genes", Current Protocols in Bioinformatics, Chapter 4, John Wiley & Sons, Inc.: 4.3.1–4.3.28, doi:10.1002/0471250953.bi0403s18, ISBN 978-0471250951, PMID 18428791
  18. ^ Snyder, Eric E.; Stormo, Gary D. (1995-04-21). "Identification of Protein Coding Regions In Genomic DNA". Journal of Molecular Biology. 248 (1): 1–18. doi:10.1006/jmbi.1995.0198. ISSN 0022-2836. PMID 7731036.
  19. ^ Lukashin AV, Borodovsky M (February 1998). "GeneMark.hmm: new solutions for gene finding". Nucleic Acids Research. 26 (4): 1107–15. doi:10.1093/nar/26.4.1107. PMC 147337. PMID 9461475.
  20. ^ Besemer J, Lomsadze A, Borodovsky M (June 2001). "GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions". Nucleic Acids Research. 29 (12): 2607–18. doi:10.1093/nar/29.12.2607. PMC 55746. PMID 11410670.
  21. ^ Lomsadze A, Burns PD, Borodovsky M (September 2014). "Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm". Nucleic Acids Research. 42 (15): e119. doi:10.1093/nar/gku557. PMC 4150757. PMID 24990371.
  22. ^ Zhu W, Lomsadze A, Borodovsky M (July 2010). "Ab initio gene identification in metagenomic sequences". Nucleic Acids Research. 38 (12): e132. doi:10.1093/nar/gkq275. PMC 2896542. PMID 20403810.
  23. ^ Antonov I, Borodovsky M (June 2010). "Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm". Journal of Bioinformatics and Computational Biology. 8 (3): 535–51. doi:10.1142/S0219720010004847. PMID 20556861.
  24. ^ Yeh, Ru-Fang; Lim, Lee P.; Burge, Christopher B. (2001-05-01). "Computational Inference of Homologous Gene Structures in the Human Genome". Genome Research. 11 (5): 803–816. doi:10.1101/gr.175701. ISSN 1088-9051. PMC 311055. PMID 11337476.
  25. ^ Burge, Chris; Karlin, Samuel (1997-04-25). "Prediction of complete gene structures in human genomic DNA11Edited by F. E. Cohen". Journal of Molecular Biology. 268 (1): 78–94. doi:10.1006/jmbi.1997.0951. ISSN 0022-2836. PMID 9149143.
  26. ^ Burge, Christopher B. (1998-01-01), Salzberg, Steven L.; Searls, David B.; Kasif, Simon (eds.), "Chapter 8 - Modeling dependencies in pre-mRNA splicing signals", New Comprehensive Biochemistry, Computational Methods in Molecular Biology, vol. 32, Elsevier, pp. 129–164, doi:10.1016/S0167-7306(08)60465-2, ISBN 978-0-444-82875-0, retrieved 2021-11-24
  27. ^ Burge, Christopher B; Karlin, Samuel (1998-06-01). "Finding the genes in genomic DNA". Current Opinion in Structural Biology. 8 (3): 346–354. doi:10.1016/S0959-440X(98)80069-9. ISSN 0959-440X. PMID 9666331.
  28. ^ Delcher, Arthur L.; Bratke, Kirsten A.; Powers, Edwin C.; Salzberg, Steven L. (2007-01-19). "Identifying bacterial genes and endosymbiont DNA with Glimmer". Bioinformatics. 23 (6): 673–679. doi:10.1093/bioinformatics/btm009. ISSN 1460-2059. PMC 2387122. PMID 17237039.
  29. ^ Delcher, A. (1999-12-01). "Improved microbial gene identification with GLIMMER". Nucleic Acids Research. 27 (23): 4636–4641. doi:10.1093/nar/27.23.4636. ISSN 1362-4962. PMC 148753. PMID 10556321.
  30. ^ Salzberg, S. L.; Delcher, A. L.; Kasif, S.; White, O. (1998-01-01). "Microbial gene identification using interpolated Markov models". Nucleic Acids Research. 26 (2): 544–548. doi:10.1093/nar/26.2.544. ISSN 0305-1048. PMC 147303. PMID 9421513.
  31. ^ Majoros WH, Pertea M, Salzberg SL (November 2004). "TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders". Bioinformatics. 20 (16): 2878–9. doi:10.1093/bioinformatics/bth315. PMID 15145805.
  32. ^ Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2004). "GrailEXP and Genome Analysis Pipeline for Genome Annotation". Current Protocols in Bioinformatics. 8 (1): 4.9.1–4.9.15. doi:10.1002/0471250953.bi0409s04. ISSN 1934-340X. PMID 18428726.
  33. ^ Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2003). "GrailEXP and Genome Analysis Pipeline for Genome Annotation". Current Protocols in Human Genetics. 39 (1): 6.5.1–6.5.15. doi:10.1002/0471142905.hg0605s39. ISSN 1934-8258. PMID 18428363. S2CID 21431978.
  34. ^ Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, et al. (November 2009). "mGene: accurate SVM-based gene finding with an application to nematode genomes". Genome Research. 19 (11): 2133–43. doi:10.1101/gr.090597.108. PMC 2775605. PMID 19564452.
  35. ^ Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, et al. (August 2011). "Multiple reference genomes and transcriptomes for Arabidopsis thaliana". Nature. 477 (7365): 419–23. Bibcode:2011Natur.477..419G. doi:10.1038/nature10414. PMC 4856438. PMID 21874022.
  36. ^ "MORGAN". sites.stat.washington.edu. Retrieved 2021-11-24.
  37. ^ Bedő, Justin; Di Stefano, Leon; Papenfuss, Anthony T (November 2020). "Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix". GigaScience. 9 (11). doi:10.1093/gigascience/giaa121. ISSN 2047-217X. PMC 7672450. PMID 33205815.
  38. ^ Reese, Martin G (2001-12-01). "Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome". Computers & Chemistry. 26 (1): 51–56. doi:10.1016/S0097-8485(01)00099-7. ISSN 0097-8485. PMID 11765852.
  39. ^ Reese, Martin G.; Eeckman, Frank H.; Kulp, David; Haussler, David (1997-01-01). "Improved Splice Site Detection in Genie". Journal of Computational Biology. 4 (3): 311–323. doi:10.1089/cmb.1997.4.311. PMID 9278062.
  40. ^ "Home - ORFfinder - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2021-11-24.
  41. ^ Santana-Garcia, Walter; Rocha-Acevedo, Maria; Ramirez-Navarro, Lucia; Mbouamboua, Yvon; Thieffry, Denis; Thomas-Chollier, Morgane; Contreras-Moreira, Bruno; van Helden, Jacques; Medina-Rivera, Alejandra (2019-01-01). "RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding". Computational and Structural Biotechnology Journal. 17: 1415–1428. doi:10.1016/j.csbj.2019.09.009. ISSN 2001-0370. PMC 6906655. PMID 31871587.
  42. ^ Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques (2018-05-02). "RSAT 2018: regulatory sequence analysis tools 20th anniversary". Nucleic Acids Research. 46 (W1): W209–W214. doi:10.1093/nar/gky317. ISSN 0305-1048. PMC 6030903. PMID 29722874.
  43. ^ McNair, Katelyn; Zhou, Carol; Dinsdale, Elizabeth A.; Souza, Brian; Edwards, Robert A. (2019-11-01). "PHANOTATE: a novel approach to gene identification in phage genomes". Bioinformatics. 35 (22): 4537–4542. doi:10.1093/bioinformatics/btz265. ISSN 1367-4803. PMC 6853651. PMID 31329826.
  44. ^ Brendel, V.; Xing, L.; Zhu, W. (2004-02-05). "Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus". Bioinformatics. 20 (7): 1157–1169. doi:10.1093/bioinformatics/bth058. ISSN 1367-4803. PMID 14764557.
  45. ^ Henderson, John; Salzberg, Steven; Fasman, Kenneth H. (1997-01-01). "Finding Genes in DNA with a Hidden Markov Model". Journal of Computational Biology. 4 (2): 127–141. doi:10.1089/cmb.1997.4.127. hdl:1903/8004. PMID 9228612.