Assigned Monday, February 9
Monday, February 23
Friday, February 27
Total points: 70
When you hand in your results from this homework, you should submit the following, in separate files:
Submit everything by the due date and time using the web-based handin program.
On this homework, you must work on your own and submit your own results written in your own words.
You will use these two data sets: data1.txt and data2.txt. Each of these files consists of several lines. Each line is a sequence of outcomes of dice rolls (numbered 0–5). In contrast with Homework 1, no state information is embedded in these sequences, but you may assume that only three states emit symbols in this model (and the begin and end states determine each sequence's length).
You will repeat the following steps three times for each of the two data sets. First, read in the data set. Second, randomly subsample half the sequences in the data set and use Baum-Welch to infer a hidden Markov model based on the subsampled half. Third, compute the log likelihood of seeing the other half of the data set given the model you just inferred. (Put another way, the subsampled half is the training set and the other half is the test set.) Once you've done this three times, graphically display in your report the model that has the highest likelihood. Then repeat this entire process for the second data set. Thus at the end, you will have built six models, computed six log likelihoods, and graphically displayed two models (one from data1 and one from data2) in your report.
You are to submit a detailed, well-written report, with conclusions. In particular, you should answer the following questions. How much variance was there in the measured log likelihoods for each model? Are you comfortable with running each training set three times and taking the maximum, or are more rounds necessary? Can you think of other ways to avoid getting trapped in local maxima? Of course, this is merely the minimum that is required in your report. Other experiments that you run and other interesting questions that you answer might yield extra points.