---------------------------------------------------------------------------- Pentacon NSAID Project Curation Princeton University, NJ, USA University of Pennsylvania, PA, USA ---------------------------------------------------------------------------- Readme file name: Curation_AAP_Readme_20131220_production.txt Readme for the following files: Curation_AAP_All_Topics_20131220_production.txt Curation_AAP_IVM_20131220_production.txt Curation_AAP_Kinetics_20131220_production.txt Total Number of Genes: 134 Number of Gold Standard (Direct) Genes: 103 Number of Likely (Indirect) Genes: 12 Number of Predicted Genes: 18 Corresponding gene file: AAP_genelist_20131220_production.txt Version: production Date: 12/20/13 Curation Overview ----------------- Curation was performed by Pentacon curators based on information available in UniProt, BRENDA, BindingDB, and the published literature. The curation process involved verifying a specific list of genes for their involvement in certain pathways based on experimental evidence or valid reviews. Data pertaining to kinetics and tissue/cell type expression of these genes/proteins were parsed from UniProt, BRENDA and Binding DB; Ontomaton spreadsheets were pre-populated with the resulting datatypes, and datatypes were reviewed for accuracy by curation of the associated papers. In some cases data were added directly from the published literature rather than from the databases. The pre-populated data types were converted to the requested ontology terms and corresponding IDs in the Google spreadsheet using OntoMaton. Information concerning OntaMaton is available here: http://isatools.wordpress.com/2012/07/13/introducing-ontomaton-ontology-search-tagging-for-google-spreadsheets/ Gene Lists: ----------- The curation in this file has been carried out for genes listed in the associated AAP_genelist_20131220_production.txt file.This gene list includes the curated "Gene Set" AAP (Arachidonic Acid Pathway). Each gene list includes information on the various curated "Gene Sets" and "Gene Set Qualifiers" that are used to rank the genes as being "Gold Standard (Direct)", "Likely (Indirect)", or "Predicted" based on evidence codes and details for each gene, as described below: Evidence codes used in gene lists: The evidence code C is used to denote review articles. The evidence code E is used for articles that present experimental evidence including, but not limited to, tissue distribution and enzyme characterization. The evidence code P is used for publications that (1) predict presence based on evidence in mice/rabbits (2) use bioinformatics tools to identify human genes and (3) contain non-traceable author statements. Bioinformatics approaches would include using conserved sequence motifs to identify candidate genes, using a known human gene to identify sequences with significant identity (and finding cDNA in EST database). If there are multiple PubMed IDs in the PubMed ID column, but only one evidence code in the Evidence Type column, it means that all PubMed IDs were assigned the same evidence code. If there are multiple reference codes, each evidence code correlates with each corresponding PMID in the PubMed ID column. Gold Standard (Direct): The gene set qualifier "Gold Standard" is assigned when experimental evidence demonstrates involvement of the gene in the arachidonic acid metabolism and arachidonic acid remodeling pathways. Experimental evidence means that an enzyme has been assayed with substrates that are in these pathways, a receptor binds ligands in these pathways, or the protein interacts with another protein in the pathway. Genes assigned the "Gold Standard" gene set qualifier can be used for computational analyses. Likely (Indirect): The gene set qualifier "Likely" is assigned when genes ‘likely’ participate in the arachidonic acid metabolism and remodelling pathway. Genes are assigned "Likely" when there is experimental evidence for the predicted/expected activity for a relevant probe substrate, but not definitive experimental evidence for participation in the arachidonic acid metabolism and remodelling pathway. For example, an enzyme expected to be involved in AA remodelling for which activity was demonstrated using palmitic, oleic, or linoleic acid, but not arachidonic acid, would be assigned the gene set qualifier "Likely". These genes can be included in a computational analysis based on programmer discretion. Predicted: The gene set qualifier "Predicted" is assigned when genes have been inferred to be involved in the arachidonic acid metabolism and remodelling pathways based on (1) evidence from other organisms or (2) homology. "Predicted" is assigned to genes for which participation in the arachidonic acid pathway is purely predicted and/or gene products have not been characterized. These genes should not be used in a computational analysis. -------------------------------------------------------------- Curation File Information -------------------------------------------------------------- Pentacon UniProt Parsing Reference available at this link: https://docs.google.com/document/d/1q0a0K1GBP8OCpKL_H69AtKHGpWUIJjxoRSeS7pUWt5E/edit Pentacon BRENDA Parsing Reference available at this link: https://docs.google.com/document/d/12qg9NNOgD7yY9AVzH2cBci4O4g016My303ziFO6vkYY/edit?usp=sharing Pentacon BindingDB Parsing Reference available at this link: https://docs.google.com/document/d/1-LwkRlLiGVuCunMlWzHVAZERQ1Wl58nZ40mh8L_sQLE/edit?usp=sharing These documents describe how the OntoMaton curation spreadsheets are generated by parsing UniProt, BRENDA, or BindingDB entries. Data in the spreadsheets are then verified by curators who make use of OntoMaton to convert the curated information into standardized ontology terms that may be used in computational analyses. Curation Topics and Ontologies: ------------------------------- UniProt BRENDA BindingDB Curation Topic: Any conditional (drug) UniProt data type: Various UniProt fields BRENDA data type: Various Brenda fields BindingDB data type: Various BindingDB fields Conversion to Ontology: ChEBI Conversion to Ontology: ChEBI Conversion to Ontology: ChEBI Ontology: http://www.obofoundry.org/cgi-bin/detail.cgi?id=chebi Ontology: http://www.obofoundry.org/cgi-bin/detail.cgi?id=chebi Ontology: http://www.obofoundry.org/cgi-bin/detail.cgi?id=chebi Curation Topic: Any conditional (environmental) UniProt data type: None Conversion to Ontology: PATO Ontology: http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=PATO Curation Topic: Cell-type specificity/dependency UniProt data type: Comment/tissue specificity BRENDA data type: Comment/tissue specificity Ontology: Cell Ontology Ontology: BRENDA Tissue Ontology Ontology URL: http://purl.obolibrary.org/obo/cl.owl Ontology URL: http://purl.bioontology.org/ontology/BTO Other Ontology: Cell Ontology Other Ontology URL: http://purl.obolibrary.org/obo/cl.owl Curation Topic: Cell line UniProt data type: Comment/tissue specificity BRENDA data type: Comment/tissue specificity Ontology: Cell Line Ontology Ontology: BRENDA Tissue Ontology Ontology URL: http://purl.obolibrary.org/obo/cl.owl Ontology URL: http://purl.bioontology.org/ontology/BTO Ontology: BRENDA Tissue Ontology Ontology URL: http://purl.bioontology.org/ontology/BTO Curation Topic: Cellular location UniProt data type: Comment/subcellular location Conversion to Ontology: GO Cellular Component Ontology: http://www.obofoundry.org/cgi-bin/detail.cgi?id=cellular_component Curation Topic: Pathway UniProt data type: Comment/pathway; Comment/function Conversion to Ontology: GO Biological Process Ontology: http://www.obofoundry.org/cgi-bin/detail.cgi?id=biological_process Curation Topic: Human Disease UniProt data type: Comment/Involvement in disease BRENDA data type: Various Brenda fields Conversion to Ontology: Human Disease Ontology Conversion to Ontology: Human Disease Ontology Ontology URL: http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=DOID Ontology URL: http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=DOID Other Ontology: SNOMED Note that disease associations for BRENDA data are curated only for cell lines or cells that are derived from tissue in the diseased state. Other Ontology URL: http://purl.bioontology.org/ontology/SNOMEDCT For disease information derived from UniProt, notes are included in the "Secondary Source Notes" column of the file that indicate whether a disease is caused by a mutation in a gene, or else is associated with variations in a gene. Curation Topic: Tissue specificity/dependency UniProt data type: Comment/tissue specificity BRENDA data type: Comment/tissue specificity Conversion to Ontology: Brenda Tissue Ontology Conversion to Ontology: BRENDA Tissue Ontology Ontology URL: http://purl.bioontology.org/ontology/BTO Ontology URL: http://purl.bioontology.org/ontology/BTO Other Ontology: Uberon Other Ontology: Uberon Other Ontology URL: http://purl.bioontology.org/ontology/UBERON Other Ontology URL: http://purl.bioontology.org/ontology/UBERON Curation Topic: Developmental stage UniProt data type: Comment/developmental stage Conversion to Ontology: Brenda Tissue Ontology Ontology URL: http://purl.bioontology.org/ontology/BTO Other Ontology: Human Developmental Anatomy Ontology Other Ontology URL: not used Curation Topic: Phenotypes UniProt data type: Comment/polymorphism OR Comment/disruption phenotype Conversion to Ontology: Human Phenotype Ontology Ontology URL: Not used for human disease phenotypes Curation Topic: Isoforms UniProt data type: Comment/alternative products/isoform Conversion to Ontology: None Ontology URL: None Secondary IDs: UniProt VSP_ IDs Curation Topic: Variants/Genetic Background UniProt data type: Feature/sequence variant Conversion to Ontology: None Ontology URL: None Secondary IDs: UniProt VAR_ IDs and dbSNP rs IDs Curation Topic: Kinetics UniProt data type: Comment/biophysicochemical properties/KM or Vmax BRENDA data type: Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off BindingDB data type: Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off Conversion to Ontology: None Conversion to Ontology: None Conversion to Ontology: None Ontology URL: None Ontology URL: None Ontology URL: None NOTE: For the following files, rows with Secondary Source = Pentacon were added by Pentacon curators and tied to relevant PubMed IDs. Otherwise, the rows are based on information parsed from other resources, converted to ontology terms where applicable, and verified by Pentacon curators. Curation files: ---------------------- Curation_AAP_All_Topics_20131220_production.txt: All curation topics above except for isoforms, variants and kinetics Curation_AAP_IVM_20131220_production.txt: Compilation of Isoforms, sequence Variants, and Mutagenesis sites (IVMs) for the curated genes. The information was initially parsed from UniProt which is identified as a "Secondary Source" (external resource), as applicable. UniProt does not assign any specific ID to the full-length protein in each entry if there aren't any alternative splice forms. For this reason, for each UniProt entry that only has a full-length protein documented, and does not have any Alternative products specified, an entry for "Comment/alternative products/isoform" was manually generated by PENTACON, along with a Data Value of "-1; Isoform 1" (e.g. P04180-1;Isoform 1 for LCAT). For these cases, the Secondary Source was marked as "UniProt-derived" and the internal PENTACON Notes includes the information that the isoform is "full length (doesn't exist in UniProtKB)", meaning the -1 ID does not exist in UniProt. The corresponding "Secondary Source Version Date" and "Secondary Source Version" contain the date and version of the UniProt entry that was used to retrieve other protein information, e.g Feature/mutagenesis site or Feature/sequence variant. Curation_AAP_Kinetics_20131220_production.txt: Columns for "Curation_AAP_All_Topics_20131220_production.txt" and "Curation_AAP_IVM_20131220_production.txt" files: ---------------------------------------------------------------------------- (---Notes provided for specific columns where applicable.) Curator Name UniProt ID NCBI Gene ID Gene Name Species Name Taxonomy ID Data Type ---See curation topics above Data Value ---Text parsed according to UniProt or BRENDA data type listed in curation topics above, or, in cases where the Secondary Source = PENTACON, text added by Pentacon curators. GO Term --- GO CC or BP terms (some GO MF terms). Terms already associated with a gene in GO have Use = N. GO ID Uberon Term Uberon ID BRENDA Tissue Ontology Term BRENDA Tissue Ontology ID Anatomy Ontology Term ---Not used for human curation, used for model organism curation. Anatomy Ontology ID ---Not used for human curation, used for model organism curation. Phenotype Ontology Term ---Not used for human curation, used for model organism curation. Phenotype Ontology ID ---Not used for human curation, used for model organism curation. Developmental Stage Term (BRENDA) Developmental Stage ID (BRENDA) Cell Type Ontology Term (CTO or CL) Cell Type Ontology ID (CTO or CL) Cell Line Ontology Term (CLO) Cell Line Ontology ID (CLO) PATO Qualifier PATO ID OMIM ID ---Parsed from UniProt text (Comment/Involvement in disease) Human Disease Ontology Term Human Disease Ontology ID SNOMED Term ---Added if no applicable Human Disease Ontology term found SNOMED ID ChEBI Chemical Term ChEBI Chemical ID Secondary Source IDs ---IDs that are assigned by the database from which the data was originally parsed. For example, secondary source IDs may be UniProt IDs (e.g. variants have VAR_IDs, isoforms have VSP_IDs), EC numbers (assigned by BRENDA), or BindingDB IDs. The source of the ID is noted in the Secondary Source column (see below). When the Secondary Source is PENTACON, there is no Secondary Source ID. Secondary Source Notes ---Notes parsed from secondary resources (e.g. UniProt, BRENDA). For disease information derived from UniProt, notes are included in the "Secondary Source Notes" column that indicate whether a disease is caused by a mutation in a gene, or else is associated with variations in a gene. Reviewed (Y/N) --- Reviewed by a Pentacon curator (Y=yes, N=no) Use (Y/N) ---Curation for that row is valid and can be used (Y=yes, N=no) Evidence Code ---Evidence Code Ontology ID (ECO:0000311 = Imported Information, ECO:0000006 = Experimental, ECO:0000033 = traceable author statement, ECO:0000035 = no biological data found) Primary Source ---Typically the PubMed ID Secondary Source ---Typically an external resource, e.g. UniProtKB, BRENDA. If Secondary Source = Pentacon, then the curated information and ontology terms were added by Pentacon curators and associated with relevant PubMed IDs. PENTACON Notes --- Pentacon curator notes; may contain relevant citation information for the associated papers, and also additional notes that are preceded by the phrase "Pentacon Notes". Additional rows of data added by Pentacon curators and not directly parsed from another resource are indicated as "Added by Pentacon". Gene Set ---Abbreviation used to indicate the corresponding gene collection relevant to a certain biological pathway or network, e.g. AAP is used for genes directly involved in the 'arachidonic acid pathway', AAE is used for genes related to the arachidonic acid pathway, and BP refers to genes related to the phenotype of blood pressure. Secondary Source Version Date ---Version date of downloaded information from external resource (e.g. UniProtKB, BRENDA, BindingDB) for each annotation row. Secondary Source Version ---Version number of downloaded information from external resource (e.g. UniProtKB, BRENDA, BindingDB) for each annotation row. PENTACON Annotation No ---Unique annotation ID assigned by Pentacon. Columns for "Curation_AAP_Kinetics_20131220_production.txt:" file: ---------------------------------------------------------------------------- (---Notes provided for specific columns where applicable.) Curator Name UniProt ID NCBI Gene ID Gene Name Species Name Taxonomy ID Data Type ---See curation topics above Data Value ---Text parsed according to UniProt, BRENDA, or BindingDB data type listed in curation topics above, or, in cases where the Secondary Source = PENTACON, text added by Pentacon curators. Value ---Text parsed from Uniprot, BRENDA, or BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, or, in cases where the Secondary Source = PENTACON, text added by Pentacon curators. Unit ---Units (e.g. pmol/min/mg) parsed from Uniprot, BRENDA, or BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, or, in cases where the Secondary Source = PENTACON, units added by Pentacon curators. ChEBI Chemical Term A ---Chemical term parsed from Uniprot, BRENDA, or BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, and converted to ChEBI term. in cases where Secondary Source = PENTACON, chemical terms were added by Pentacon curators. ChEBI Chemical ID A ChEBI Chemical Term B ---Additional chemical entry converted to ChEBI term when two substrates are identified for the reaction, or when another compound (such as an inhibitor) is used in the presence of a substrate. ChEBI Chemical ID B Secondary Source IDs ---IDs that are assigned by the database from which the data was originally parsed. For example, secondary source IDs may be UniProt IDs (e.g. variants have VAR_IDs, isoforms have VSP_IDs), EC numbers (assigned by BRENDA), or BindingDB IDs. The source of the ID is noted in the Secondary Source column (see below). When the Secondary Source is PENTACON, there is no Secondary Source ID. Secondary Source Notes ---Notes parsed from secondary resources (e.g. UniProt, BRENDA, BindingDB) Reviewed (Y/N) --- Reviewed by a Pentacon curator (Y=yes, N=no) Use (Y/N) ----Curation for that row is valid and can be used (Y=yes, N=no) Evidence Code ---Evidence Code Ontology ID (ECO:0000311 = Imported Information, ECO:0000006 = Experimental, ECO:0000033 = traceable author statement, ECO:0000035 = no biological data found) Primary Source ---Typically the PubMed ID Secondary Source ---Typically an external resource, e.g. UniProtKB, BRENDA, BindingDB. If Secondary Source = Pentacon, then the curated information and ontology terms were added by Pentacon curators and associated with relevant PubMed IDs. PENTACON Notes --- Pentacon curator notes; may contain relevant citation information for the associated papers, and also additional notes that are preceded by the phrase "Pentacon Notes". Additional rows of data added by Pentacon curators and not directly parsed from another resource are indicated as "Added by Pentacon". Gene Set ---Abbreviation used to indicate the corresponding gene collection relevant to a certain biological pathway or network, e.g. AAP is used for genes directly involved in the 'arachidonic acid pathway', AAE is used for genes related to the arachidonic acid pathway, and BP refers to genes related to the phenotype of blood pressure. Secondary Source Version Date ---Version date of downloaded information from external resource (e.g. UniProtKB) for each annotation row. Secondary Source Version ---Version number of downloaded information from external resource (e.g. UniProtKB) for each annotation row. PENTACON Annotation No ---Unique annotation ID assigned by Pentacon. ---------------------------------------------------------------------------- For questions please contact Rose Oughtred (rose at genomics.princeton.edu). ----------------------------------------------------------------------------