PROFESS

Release 1.1
(2009-11-06)







StatsDatabase Statistics

Function

COG Clusters 9,725
Enzyme Classes 5,093
GO Terms 27,606
Ligands 9,330
PFAM 10,340
Protein Interactions 11,422

Evolution

Essential Genes 0

Structure

CATH Classes 2,178
Protein Structures 56,699
Structure Comparisons 59,520

Sequence

Protein Sequences 292,953
Sequence Comparisons 95,021
Last update: 2009-11-06

TeamDevelopment Team

Contact:    Thomas Triplet


Database Design and Programming
Peter Revesz
Thomas Triplet


Genomics
Mark A. Griep
Robert Powers
Matthew D. Shortridge

Frequently Asked Questions




General Questions | Data Available | Querying PROFESS | For Developers



General Questions



What is PROFESS?


PROFESS (PROtein Function, Evolution, Structure and Sequence) is a general framework that integrates various biological databases. PROFESS aims at giving an overview of genome biology systems by integrating protein annotations at different levels: function, evolution, structure and sequence.


What is not PROFESS?


PROFESS should not be considered as a replacement for current databases, but rather as a complementary tool. In particular, PROFESS includes the data necessary to show the users a broad overview, whereas the details of the annotations are left in the original databases. For example, each COG cluster is associated with a list of Protein Families (PFAM) and a short description. But PROFESS does not include the details of the Protein Families, such as the parameters used to generate the Markov models for example.


Who will be interested in PROFESS?


Anyone working on genome biology systems may be interested. More specifically, large-scale analyzes will benefit most from PROFESS.


How to cite PROFESS?


Coming soon...


Which browsers are compatible with PROFESS?


PROFESS is compatible with most recent browsers: Chrome, Firefox 3, Internet Explorer 8, Opera 9 and Safari 4. However, a few COG clusters represent a huge amount of data, which may generate errors in Internet Explorer, particularly in IE7 and older. Hence, we do not recommend Internet Explorer. The web-interface is based on AJAX, so Javascript must also be enabled (which is usually the default parameter).




Data Available



What are the core databases of PROFESS?


PROFESS integrates the following core data sources covering the function, evolution, structure and sequence of proteins:

Function:    COG, Database of Interacting Proteins, EC, GO, KEGG Ligands, PFAM, Protein interactions in E. coli.
Evolution:   Database of Essential Genes.
Structure:   CATH, PDB, SCOP.
Sequence:  COG, PDB.


We also provide links to GenBank, Pubmed and UniProtKB Taxonomy


What other data can I find?


We also provide the output of the BLAST alignments used to correlate sequences in the PDB with COG sequences.


In addition, we generated all-against-all pairwise structural comparisons within and between the Proteobacteria and Firmicutes using DaliLite.


We also give sequence and structure-based phylogenetic trees for each COG cluster (when possible).


The databases I normally use are not there. What can I do?


Contact us! PROFESS will keep growing based on user feedback: highly requested databases will therefore have a higher priority.


How often will you update PROFESS?


Data from the core databases will be updated every semester. The pool of core databases will grow based on user feedback.


Why use COG as the main classification?


We also tried the Enzyme Classification, but the resulting clusters were too sparse.


Why some COG clusters do not have structure comparisons?


We generate all-against-all pairwise structural comparisons within and between the Proteobacteria and Firmicutes only for COG clusters with at least 2 structures in each phylum.


Why some COG clusters do not have phylogenetic trees?


Structure-based phylogenetic trees are based on structure comparisons. All structure comparisons were manually filtered within their respective COG clusters to remove redundantly solved structures, any structures that were solved in multiple or non-functionally relevant conformations (mutant protein, non-native experimental conditions, inhibited ligand complex), or structures that only had partial domains solved. All ribosomal COGs were removed from the dataset because it was discovered that the BLAST search was not robust enough to distinguish between the multiple sub-units of the ribosome). As a result, less phylogenetic trees are available, but of higher reliability.


How are the structure-based trees generated?


Bootstrapped structure similarities trees were generated by in house software reading the output of MAMMOTH-mult. Distances were minimized by the Fitch-Margoliash method implemented in Phylip.




Querying PROFESS



What is the PROFESSor?


The "PROFESSor" refers to the unified text field used to quickly query PROFESS. It will mine any data from any integrated database.


How to use the PROFESSor?


Just type a keyword. selenocysteine for example. The PROFESSor will suggest entries from all integrated databases. Suggestions may help the user to refine his query, although selecting a suggestion is optional.



You may also restrict suggestions to a particular database of your choice by prefixing the keyword with [KEY], where KEY depends on the database and may be one of the following: EC, COG, GO, LIGAND, PDB, PFAM.



Alternatively, you can also type the ID commonly use to refer to an item. For example, you can type the PDB ID of a protein structure.

Note that if you search for carboxylase for example, decarboxylase entries will not be returned except if no carboxylase can be found.


How to run more advanced queries?


To run more advanced queries, go to the Advanced Search page. The advanced search is similar to the PROFESSor, except that you may specify a value for several databases at once. Note that the query will return results that match all completed field (logical AND).




For Developers



How to download parseable data?


All tables can be downloaded as plain text in CSV (Comma Separated Values) format. We provide 2 means to download the data:

greenArrow

  1. Using the web interface, just click on the green down arrow in the header of each module.

     
  2. Using HTTP GET requests of the form:
     
    http://cse.unl.edu/~profess/get_csv/module_@@.php?cogNumber=##

    where @@ is the name of the module and ## is the COG number.

    The following modules are currently available (case sensitive):


What about FTP?


File are generated on demand and not stored on our server. Hence, we do not provide an FTP access to our data. However, we provide other means to download data in CSV format (see above).


How to write new modules?


We currently do not offer this possibility. Eventually, we would like provide an API such that researchers could develop their own modules to identify new relations and share them with the community.