=Paper= {{Paper |id=Vol-1546/poster_75 |storemode=property |title=UniProt-GOA: A Central Resource for Data Integration and GO Annotation |pdfUrl=https://ceur-ws.org/Vol-1546/poster_75.pdf |volume=Vol-1546 |authors=Mélanie Courtot,Aleksandra Shypitsyna,Elena Speretta,Alexander Holmes,Tony Sawford,Tony Wardell,Maria J. Martin,Claire O'Donovan |dblpUrl=https://dblp.org/rec/conf/swat4ls/CourtotSSHSWMO15 }} ==UniProt-GOA: A Central Resource for Data Integration and GO Annotation== https://ceur-ws.org/Vol-1546/poster_75.pdf
    UniProt-GOA: A central resource for data
        integration and GO annotation.

Mélanie Courtot, Aleksandra Shypitsyna, Elena Speretta, Alexander Holmes,
    Tony Sawford, Tony Wardell, Maria J. Martin, and Claire O’Donovan

   European Molecular Biology Laboratory, European Bioinformatics Institute
 (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
                              United Kingdom
                            mcourtot@ebi.ac.uk



     Abstract. The Gene Ontology (GO) is a well-established, structured
     vocabulary used in the functional annotation of gene products. GO terms
     are used to replace the multiple nomenclatures used by scientific databases
     that can hamper data integration. Currently, GO consists of more than
     41,000 terms describing the molecular function, biological process and
     subcellular location of a gene product in a generic cell.
     The UniProt-Gene Ontology Annotation (UniProt-GOA) project [1] pro-
     vides high-quality manual and electronic GO annotations, historically to
     proteins within the UniProt Knowledgebase. Recently, support for anno-
     tation of RNAs via RNAcentral IDs and to macromolecular complexes,
     identified by IntAct Complex Portal IDs, was added. For many species,
     no experimental data is available: electronic annotations are the only
     source of information for biological investigation, and it is therefore crit-
     ical that solid pipelines for data integration across multiple resources
     be implemented. For example, we rely on Ensembl [2] to project GO
     annotations automatically according to orthology between species, or
     InterPro [3] to identify proteins with similar signatures to which GO
     terms describing the conserved function or location can be associated. In
     September 2015, an additional 1.5 million annotations from the UniProt
     Unified Rule (UniRule [4]) system were added electronically.
     In addition to increasing the number of annotations available, UniProt-
     GOA also supports, as part of manual curation, the addition of infor-
     mation about the context of a GO term, such as the target gene or the
     location of a molecular function, via annotations extensions. For exam-
     ple, we can now describe that a gene product is located in a specific
     compartment of a specific cell type (e.g., gene product that localizes to
     the nucleus of a keratinocyte [5]). Annotation extensions are amenable to
     sophisticated queries and reasoning. A typical use case is for researchers
     studying a protein that is causative of a specific rare cardiac phenotype:
     they will be more interested in specific cardiomyocytes cell differentiation
     proteins than all proteins involved in cell differentiation.
     Annotation files for various reference proteomes are released monthly, in-
     cluding human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis
     and Dictyostelium, as well as a file for the multiple species within UniPro-
     tKB. The UniProt-GOA dataset can be queried through our user-friendly
       QuickGO browser6 or downloaded in a parsable format via the EMBL-
       EBI [7] and GO Consortium FTP [8] sites.
       UniProt-GOA is the largest and most comprehensive open-source con-
       tributor of annotations to the GO Consortium annotation effort. The
       UniProt-GOA dataset has increasingly been integrated into tools that
       aid in the analysis of large datasets resulting from high-throughput ex-
       periments thus assisting researchers in biological interpretation of their
       results.

       Keywords: gene ontology, curation, data integration, uniprot


References
1. UniProt-GOA website, http://www.ebi.ac.uk/GOA
2. The Ensembl project, http://www.ensembl.org/index.html
3. InterPro: protein sequence analysis & classification, http://www.ebi.ac.uk/interpro/
4. UniRule,       Rule-based       evidence        for      UniProtKB       annotation,
   http://www.uniprot.org/help/unirule
5. Huntley, Rachael P., et al. A method for increasing expressivity of Gene Ontology
   annotations using a compositional approach. BMC bioinformatics 15.1 (2014): 155.
6. QuckGO, A fast browser for Gene Ontology terms and annotations,
   http://www.ebi.ac.uk/QuickGO
7. EBI FTP server, ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
8. Gene Ontology FTP, ftp://ftp.geneontology.org/pub/go/gene-associations