Assigned Friday, March 4
Due
Monday, March 21 at 11:59 p.m.
Total points: 90
When you hand in your results from this homework, you should submit the following, in separate files:
Submit everything by the due date and time using the web-based handin program.
On this homework, you must work on your own and submit your own results written in your own words.
(90 pts) In this assignment, you will implement a program to infer a profile hidden Markov model from a multiple alignment and use it to search a database for related proteins.
You will build your model from this global multiple alignment. The file contains a multiple alignment of several related protein sequences (here's an overview of the MSF format; you can google "MSF format" for others). After your program parses this file, it will determine how to define the model's architecture and then infer the model's parameters from the multiple alignment, using Laplace's rule as a prior. After your program infers the model, your program will search this database for proteins related to the model.
You are to submit a detailed, well-written report, with conclusions. In your report, discuss each hit, reporting all relevant information. If some of the alignments are especially long, you may omit those alignments from your report. But you must still discuss them in your report and include the alignments in a text file in your .tar.gz file. Of course, this is merely the minimum that is required in your report.
Return to the CSCE 471/871 (Fall 2011) Home Page
Last modified 16 August 2011; please report problems to sscott AT cse.