Gene Ontology Meta Annotator for Plants
Making a genome sequence accessible and useful involves three basic steps: genome assembly, structural annotation, and functional annotation. The quality of data generated at each step influences the accuracy of inferences that can be made, with high- quality analyses produce better datasets resulting in stronger hypotheses for downstream experimentation.
Gene Ontology Meta Annotator for Plants (GOMAP) is a high-throughput pipeline to annotate GO terms to plant protein sequences in a high-confidence and reproducible manner. It combines sequence-similarity, domain-presence and mixed-method based approaches and we are currently applying it to the reference genomes of several agriculturally important plants (maize, wheat, rice, cotton, soybean). All generated annotation datasets as well as the source code are publicly available.
For more information on how the pipeline works as well as instructions on how to run it yourself, please check out the documentation on GitHub.
GOMAP was developed from maize-GAMER (Publication), a collaborative project to improve the status of gene functional annotation in maize.
© 2018 Dill-PICL, Iowa State University
Maize-GAMER
maize-GAMER is a collaborative project to improve the status of gene functional annotation in maize. The project has three main areas of focus, namely
- Design a pipeline for the functional annotation of maize genes.
- Use manually curated test data to evaluate the annotations and generate a best subset of annotations for use
- Design a user friendly review system for the community to provide feedback and endorsements of the annotations
Predicting Annotations
GO annotations are generated using three different approaches in the pipeline.
- Sequence similarity to Arabidopsis (TAIR) and existing plant genes with curated GO annotations.
- InterproScan to detect protein domains which have GO terms annotated to them.
- CAFA (Critical Assessment of Functional Annotation) tools (Argot2, FANNGO, PANNZER) that use a combination of machine learning and statistics to predict GO terms for input genes
These annotations will be compared to available GO annotations for maize from Gramene. Gramene uses the Ensembl Compara pipeline to generate GO annotations. RBH – Reciprocal Best Hit
Evaluating Annotations
Represents the part of the pipeline which is used to evaluate the annotations by calculating and comparing the performance measures.
- Test datasets is comprised of Gold Standard - manually curated annotations from MaizeGDB. About 4% of the maize protein coding genes are represented in this test dataset.
- Protein-centric evaluation metrics from the CAFA project are currently being used to evaluate different tools.
- Precision (PR) is the mean of the proportion of correctly predicted annotations for a given protein compared to the total number of predictions
- Recall (RC) is the mean proportion of correctly predicted annotations for a given protein compared to the total number of annotations in the test dataset for the given protein.
- F-score is a single value which reflects a tool’s accuracy, and is calculated from RP and RC
Reviewing Annotations
Represents the outline of the Review system which will be implemented at the end of evaluation step.
- Basic View will have minimal information necessary for subject experts to review their gene(s) of their choice quickly.
- Evidence View will allow users to look at the tools that support a particular GO annotation. Each tool supporting the particular annotation will have a simple graphic showing the details of the annotations. E.g., Sequence similarity based methods will have a simple diagram representing the representative target, coverage, identity, and E-value of a given BLAST hit.
All this data will be made available for download for downstream analysis of you own experiments. Non-reviewed annotations will be made available as soon as the evaluation of the results from the pipeline are completed. Reviewed annotations will be made available after the release of the tool to the maize community and sufficient time has been given for accumulation of community effort for revision of the Non-reviewed annotations.
People
Principal Investigator
Carolyn Dill
Associate Professor
GDCB, BCB & Agronomy Iowa State University
Support Staff
Darwin Campbell
Database Manager
Dill PICL Iowa State University
Scott Zarecor
Programmer
Dill PICL Iowa State University
Students
Gokul Wimalanathan
Doctoral Candidate
BCB Dill PICL & Vollbrecht Lab Iowa State University
Dennis Psaroudakis
Fulbright scholar
Dill PICL Iowa State University
Collaborators
Carson Andorf
Director
MaizeGDB UDSA - ARS
Iddo Friedberg
Associate Professor
VMPM & BCB Iowa State University
Alumni
Chris Lawrence
Undergraduate
Senior Genetics Dill PICL Iowa State University
Datasets
Publicly available GOMAP Datasets
These datasets generated by GOMAP are new high-coverage and reproducible functional annotation sets of protein coding genes based on Gene Ontology (GO) term assignments. Follow the DOI link for more information, access, and download. Check out the main project page (Home on the left) to find out more about how they were generated and how you can use the pipeline yourself.
Rel | Species | Germplasm/ Line |
Assembly/ Annotation |
DOI (link) |
Data Wrangler |
---|---|---|---|---|---|
2022 | Humulus lupulus (Common Hop) | Cascade | HopBase | 10.25739/ew58-8d76 | Leila Fattel |
2022 | Zea mays (Maize) | B73 | Zm-B73-REFERENCE-NAM-5.0 | 10.25739/g1rt-b278 | Olivia Johnson |
2022 | Capsicum annuum (Pepper) | cv CM334 | Pepper Genome Platform (PGP) | 10.25739/cqn7-yt96 | Leila Fattel |
2021 | Camellia sinensis (Tea) |
sinensis | TPIA CSS_ChrLev_20200506 | 10.25739/w1fz-9313 | Leila Fattel |
2021 | Vaccinium corymbosum L. (Highbush Blueberry) |
Draper | GigaDB V_corymbosum_v1.0 | 10.25739/q7rq-e992 | Leila Fattel |
2021 | Brassica napus (Canola) |
ZS11 | BnPIR ZS11 genome | 10.25739/xgmr-hr31 | Dollye Starr |
2021 | Coffea canephora (Coffee) |
Coffea canephora | CGH C. canephora genome v1.0 | 10.25739/rm4j-3580 | Leila Fattel |
2021 | Solanum pennellii (Tomato) |
Solanum pennellii | Bolger2014.v1 | 10.25739/fhr4-cx67 | Dennis Psaroudakis |
2021 | Solanum lycopersicum (Tomato) |
Heinz 1706 | ITAG4.1 | 10.25739/zh2v-4p15 | Dennis Psaroudakis |
2021 | Musa acuminata (Banana) |
DH-Pahang | NCBI ASM31385v2 | 10.25739/yt7w-gs55 | Leila Fattel |
2021 | Theobroma cacao (Cacao) |
B97-61/B2 | NCBI Criollo_cocoa_genome_V2 | 10.25739/9qc0-n310 | Leila Fattel |
2021 | Vitis vinifera (Grape) |
Pinot Noir PN40024 |
Genoscope 2010 genome 12X | 10.25739/jtfk-q888 | Haley Dostalik |
2020 | Pinus lambertiana (Sugar Pine) |
Sugar Pine | TreeGenesDB sugar pine assembly v1.5 | 10.25739/jvs4-xr88 | Colleen Yanarella |
2020 | Cannabis sativa (Cannabis) |
Hemp | NCBI Cannabis sativa GCA_900626175.1 | 10.25739/ab9z-2z86 | Kevin Chiteri |
2020 | Gossypium raimondii (Cotton) |
Cotton D | Gossypium raimondii JGI v2.1 | 10.25739/a13t-zh47 | Parnal Joshi |
2019 | Glycine max (Soybean) | Williams 82 | Joint Genome Institute (JGI) Glycine max genome assembly Wm82.a4.v1 (genotype Williams 82, assembly 4.0, gene model annotation 1.0) | 10.25739/59ec-1719 | Dennis Psaroudakis |
2019 | Oryza sativa (Rice) | japonica | IRGSP 1.0 | 10.25739/53g0-j859 | Ha Vu |
2019 | Triticum aestivum (Wheat) | Chinese Spring | IWGSC's RefSeq 1.1 | 10.25739/65kf-jz20 | Dennis Psaroudakis |
2019 | Zea mays (Maize) | Mo17 | Zm-Mo17-REFERENCE-CAU-1.0 | 10.25739/m634-cn58 | Kokulapalan Wimalanathan |
2019 | Zea mays (Maize) | PH207 | Zm-PH207-REFERENCE_NS-UIUC_UMN-1.0 | 10.25739/dm9s-aa15 | Kokulapalan Wimalanathan |
2019 | Zea mays (Maize) | W22 | Zm-W22-REFERENCE-NRGENE-2.0 Zm00004b.1 | 10.25739/e4va-9f09 | Kokulapalan Wimalanathan |
2019 | Hordeum vulgare (Barley) | IBSC_PGSB_r1 | 10.25739/zvgv-8e37 | Colleen Yanarella | |
2019 | Brachypodium distachyon (Stiff Brome) | Bd21 | Bd21.v3.1.r1 | 10.25739/dw2t-3g82 | Kokulapalan Wimalanathan |
2019 | Sorghum bicolor (Sorghum) | BTx623 | BTx623.v3.0.1.r1 | 10.25739/4ty0-ye98 | Kokulapalan Wimalanathan |
2019 | Arachis hypogaea (Peanut) | Tifrunner | Arachis hypogaea assembly 1.0 | 10.25739/chab-0e35 | Dennis Psaroudakis |
2019 | Medicago truncatula (Barrel Clover) | R108_HM340 | R108: v1.0 | 10.25739/2sqc-j140 | Dennis Psaroudakis |
2019 | Medicago truncatula (Barrel Clover) | A17_HM341 | Mt4.0v2 | 10.25739/py38-yb08 | Dennis Psaroudakis |
2019 | Phaseolus vulgaris (Common Bean) | G19833 | DOE-JGI and USDA-NIFA Phaseolus vulgaris annotation 2.0 |
10.25739/1ywe-ew96 | Dennis Psaroudakis |
2019 | Vigna unguiculata (Cowpea) | IT97K-499-35 | JGI annotation v1.1 | 10.25739/cdx9-wr97 | Dennis Psaroudakis |
2017 | Zea mays (Maize) | B73 | RefGen_v3 5b+ | 10.7946/P2S62P | Kokulapalan Wimalanathan |
2017 | Zea mays (Maize) | B73 | RefGen_V4 Zm00001d.2 | 10.7946/P2M925 | Kokulapalan Wimalanathan |