
Release 1.1
(2009-11-06)
Help and DocumentationDocumentation
• Getting started
• Advanced queries
• Core databases
• Modules
FAQ
• What is and what is not PROFESS?
• Who will be interested in PROFESS?
• How to query PROFESS?
• How to download the data?
• More...
Database StatisticsFunction
| • COG Clusters | 9,725 |
| • Enzyme Classes | 5,093 |
| • GO Terms | 27,606 |
| • Ligands | 9,330 |
| • PFAM | 10,340 |
| • Protein Interactions | 11,422 |
Evolution
| • Essential Genes | 0 |
Structure
| • CATH Classes | 2,178 |
| • Protein Structures | 56,699 |
| • Structure Comparisons | 59,520 |
Sequence
| • Protein Sequences | 292,953 |
| • Sequence Comparisons | 95,021 |
Development TeamContact: Thomas Triplet
Database Design and Programming
• Peter Revesz
• Thomas Triplet
Genomics
• Mark A. Griep
• Robert Powers
• Matthew D. Shortridge
General Questions | Data Available | Querying PROFESS | For Developers
PROFESS (PROtein Function, Evolution, Structure and Sequence) is a general framework that integrates various biological databases. PROFESS aims at giving an overview of genome biology systems by integrating protein annotations at different levels: function, evolution, structure and sequence.
PROFESS should not be considered as a replacement for current databases, but rather as a complementary tool. In particular, PROFESS includes the data necessary to show the users a broad overview, whereas the details of the annotations are left in the original databases. For example, each COG cluster is associated with a list of Protein Families (PFAM) and a short description. But PROFESS does not include the details of the Protein Families, such as the parameters used to generate the Markov models for example.
Anyone working on genome biology systems may be interested. More specifically, large-scale analyzes will benefit most from PROFESS.
Coming soon...
PROFESS is compatible with most recent browsers: Chrome, Firefox 3, Internet Explorer 8, Opera 9 and Safari 4. However, a few COG clusters represent a huge amount of data, which may generate errors in Internet Explorer, particularly in IE7 and older. Hence, we do not recommend Internet Explorer. The web-interface is based on AJAX, so Javascript must also be enabled (which is usually the default parameter).
PROFESS integrates the following core data sources covering the function, evolution, structure and sequence of proteins:
• Function: COG, Database of Interacting Proteins, EC, GO, KEGG Ligands, PFAM, Protein interactions in E. coli.
• Evolution: Database of Essential Genes.
• Structure: CATH, PDB, SCOP.
• Sequence: COG, PDB.
We also provide links to GenBank, Pubmed and UniProtKB Taxonomy
We also provide the output of the BLAST alignments used to correlate sequences in the PDB with COG sequences.
In addition, we generated all-against-all pairwise structural comparisons within and between the Proteobacteria and Firmicutes using DaliLite.
We also give sequence and structure-based phylogenetic trees for each COG cluster (when possible).
Contact us! PROFESS will keep growing based on user feedback: highly requested databases will therefore have a higher priority.
Data from the core databases will be updated every semester. The pool of core databases will grow based on user feedback.
We also tried the Enzyme Classification, but the resulting clusters were too sparse.
We generate all-against-all pairwise structural comparisons within and between the Proteobacteria and Firmicutes only for COG clusters with at least 2 structures in each phylum.
Structure-based phylogenetic trees are based on structure comparisons. All structure comparisons were manually filtered within their respective COG clusters to remove redundantly solved structures, any structures that were solved in multiple or non-functionally relevant conformations (mutant protein, non-native experimental conditions, inhibited ligand complex), or structures that only had partial domains solved. All ribosomal COGs were removed from the dataset because it was discovered that the BLAST search was not robust enough to distinguish between the multiple sub-units of the ribosome). As a result, less phylogenetic trees are available, but of higher reliability.
Bootstrapped structure similarities trees were generated by in house software reading the output of MAMMOTH-mult. Distances were minimized by the Fitch-Margoliash method implemented in Phylip.
The "PROFESSor" refers to the unified text field used to quickly query PROFESS. It will mine any data from any integrated database.
Just type a keyword. selenocysteine for example. The PROFESSor will suggest entries from all integrated databases. Suggestions may help the user to refine his query, although selecting a suggestion is optional.

You may also restrict suggestions to a particular database of your choice by prefixing the keyword with [KEY], where KEY depends on the database and may be one of the following: EC, COG, GO, LIGAND, PDB, PFAM.

Alternatively, you can also type the ID commonly use to refer to an item. For example, you can type the PDB ID of a protein structure.
Note that if you search for carboxylase for example, decarboxylase entries will not be returned except if no carboxylase can be found.
To run more advanced queries, go to the Advanced Search page. The advanced search is similar to the PROFESSor, except that you may specify a value for several databases at once. Note that the query will return results that match all completed field (logical AND).
All tables can be downloaded as plain text in CSV (Comma Separated Values) format. We provide 2 means to download the data:
File are generated on demand and not stored on our server. Hence, we do not provide an FTP access to our data. However, we provide other means to download data in CSV format (see above).
We currently do not offer this possibility. Eventually, we would like provide an API such that researchers could develop their own modules to identify new relations and share them with the community.