PROFESS

Release 1.5
(2013-08-25)







StatsDatabase Statistics

Function

eggNOG Clusters 224,847
Enzyme Classes 5,093
GO Terms 27,606
Ligands 9,330
PFAM 10,340
Protein Interactions 67,113

Evolution

Essential Genes 6,099

Structure

CATH Classes 2,178
Protein Structures 56,699
Structure Comparisons 401,967

Sequence

Protein Sequences 10,891,633

Disease

Pancreatic cancer 2,013

Other

Taxonomy 558,282
Last update: 2013-08-25

TeamDevelopment Team

Contact:    Thomas Triplet


Database Design and Programming
Peter Revesz
Thomas Triplet


Genomics
Mark A. Griep
Robert Powers
Matthew D. Shortridge
Jaime Stark

Frequently Asked Questions




General Questions | Data Available | Querying PROFESS | For Developers



General Questions


What is PROFESS?


PROFESS (PROtein Function, Evolution, Structure and Sequence) is a general framework that integrates various biological databases. PROFESS aims at giving an overview of biological systems by integrating protein annotations at different levels: function, evolution, structure and sequence.


What is not PROFESS?


PROFESS should not be considered as a replacement for current databases, but rather as a complementary tool. In particular, PROFESS includes the data necessary to show the users a broad overview, whereas the details of the annotations are left in the original databases. For example, each orthologous cluster is associated with a list of Protein Families (PFAM) and a short description. But PROFESS does not include the details of the Protein Families, such as the parameters used to generate the Markov models for example.


Who will be interested in PROFESS?


Anyone working in biology, bioinformatics, genomics or proteomics may be interested. More specifically, large-scale analyzes of molecular biological data will benefit most from PROFESS.


How to cite PROFESS?


If PROFESS was useful to your research, please cite:

T. Triplet, M. Shortridge, M. Griep, J. Stark, R. Powers, and P. Revesz. PROFESS: a PROtein Function, Evolution, Structure and Sequence database. Database : the journal of biological databases and curation, 2010, p. baq011.

T. Triplet, M. Shortridge, M. Griep, R. Powers, and P. Revesz. PROFESS: PROtein Functions, Evolution, Structures and Sequences. 11th International Congress on Amino Acids, Peptides and Proteins, Vienna, Austria: 2009, p. 95.

Full-text article is available open-source online.


Which browsers are compatible with PROFESS?


PROFESS is compatible with most recent browsers: Chrome, Firefox 3, Internet Explorer 8, Opera 9 and Safari 4. However, some clusters represent a huge amount of data, which may generate errors in Internet Explorer, particularly in IE7 and older. Hence, we do not recommend Internet Explorer. The web-interface is based on AJAX, so Javascript must also be enabled (which is usually the default parameter).




Data Available


What are the core databases of PROFESS?


PROFESS integrates the following core data sources covering the function, evolution, structure and sequence of proteins:

Function:    eggNOG, Database of Interacting Proteins, EC, Gene Ontology, KEGG Ligands, PFAM, Protein interactions in E. coli.
Evolution:   Database of Essential Genes.
Structure:   CATH, PDB, SCOP.
Sequence:  Swiss-Prot, TrEMBL.


We also provide links to GenBank, Pubmed and UniProtKB Taxonomy. UniProtKB mapping service was also used to map some of the core databases.


What other data can I find?


In addition, we generated all-against-all pairwise structural comparisons for all protein structures within their respective orthologous cluster using DaliLite. This data gives an overview of how well conserved a protein structure is within a particular orthologous group.


We also give sequence-based phylogenetic tree for each orthologous cluster. For orthologus groups with more than three protein structures we also report a structure based phylogenetic tree. (when possible).


The databases I normally use are not there. What can I do?


Contact us! PROFESS will keep growing based on user feedback: highly requested databases will therefore have a higher priority.


How often will you update PROFESS?


Data from the core databases will be updated every semester. The pool of core databases will grow based on user feedback.


Why use eggNOG used as the main classification?


The eggNOG dataset was used because of the robust nature of defining orthologous groups and the relative ease of linking with the Protein Data Bank (PDB) using associated UniProt accession numbers.


Why do some eggNOG clusters not have structure comparisons?


If an orthologous cluster is missing structure comparison information it is because there was fewer than two structures solved for that cluster.


Why do some eggNOG clusters not have a structure based phylogenetic trees?


Structure-based phylogenetic trees are based on structure comparisons. If there were fewer than three structures solved for a particular orthologous cluster the tree would be meaningless.


How are the sequence-based trees generated?


All sequence based trees were downloaded from the eggNOG database. Images were generate using the DrawTree program implemented in Phylip. Because many orthologous clusters have many branches the images are of low resolution but give an overall view of the relative bushiness of a cluster. For more information about sequence based trees please refer to the eggNOG database.


How are the structure-based trees generated?


Structure similarity trees were generated by reading the structure based sequence alignment of MAMMOTH-mult. Our in house software measured branch distances which were minimized by the Neighbor-Joining method. Images were generated by the Drawtree program implemented in Phylip.




Querying PROFESS


What is the PROFESSor?


The "PROFESSor" refers to the unified text field used to quickly query PROFESS. It will mine any data from any integrated database.


How to use the PROFESSor?


Just type a keyword. selenocysteine for example. The PROFESSor will suggest entries from all integrated databases. Suggestions may help the user to refine his query, although selecting a suggestion is optional.



You may also restrict suggestions to a particular database of your choice by prefixing the keyword with [KEY], where KEY depends on the database and may be one of the following: CATH, EC, GO, LIGAND, NOG, PDB, PFAM, TAXON.



Alternatively, you can also type the ID commonly use to refer to an item. For example, you can type the PDB ID of a protein structure.

Note that if you search for carboxylase for example, decarboxylase entries will not be returned except if no carboxylase can be found.


How to run more advanced queries?


To run more advanced queries, go to the Advanced Search page. The advanced search is similar to the PROFESSor, except that you may specify a value for several databases at once. Note that the query will return results that match all completed field (logical AND).




For Developers


How to download parseable data?


All tables can be downloaded as plain text in CSV (Comma Separated Values) format. We provide 2 means to download the data:

greenArrow

  1. Using the web interface, just click on the green down arrow in the header of each module.

     
  2. Using HTTP GET requests of the form:
     
    http://cse.unl.edu/~profess/get_csv/module_@@.php?cogNumber=##

    where @@ is the name of the module and ## is the COG number.

    The following modules are currently available (case sensitive):
    function_summary_pfam
    linkCogPdb
    function
    daliStructureComparison
    essentialGenes
    ligands
    structure
    function_summary_go
    function_summary_ec
    sequence
    proteinInteractions
    e_sequence


What about FTP?


File are generated on demand and not stored on our server. Hence, we do not provide an FTP access to our data. However, we provide other means to download data in CSV format (see above).


How to write new modules?


We currently do not offer this possibility. Eventually, we would like provide an API such that researchers could develop their own modules to identify new relations and share them with the community.