Assigned Tuesday, March 3
Due
Sunday, March 22
Monday, March 23
at 11:59 p.m.
Total points: 100
When you hand in your results from this homework, you should submit the following, in separate files:
Submit everything by the due date and time using the web-based handin program.
On this homework, you must work on your own and submit your own results written in your own words.
(100 pts) In this assignment, you will implement a program to infer a profile hidden Markov model from a multiple alignment and use it to search a database for related proteins and align these hits against the original alignment.
You will build your model from this global multiple alignment. The file contains a multiple alignment of several related protein sequences (here's an overview of the MSF format; you can google "MSF format" for others). (The format is self-explanatory when you recognize that gaps are reporesented by both a dot "." and a tilde "~".) After your program parses this file, it will determine how to define the model's architecture and then infer the model's parameters from the multiple alignment, using Laplace's rule as a prior. After your program infers the model, it will search this database for proteins related to the model. Then, your program will align each hit against the original multiple alignment.
You are to submit a detailed, well-written report, with conclusions supported by evidence. In your report, discuss each hit, reporting all relevant information. If some of the alignments are especially long, you may omit those alignments from your report. But you must still discuss them in your report and include the alignments in a text file in your .zip file. Of course, this is merely the minimum that is required in your report.
Return to the CSCE 471/871 (Spring 2015) Home Page
Last modified 26 March 2015; please report problems to sscott AT cse.