Assigned Monday, November 17
Due Sunday, December 7 at 11:59:59 p.m.
When you hand in your results from this homework,
you should submit the following, in separate
files:
Implement the boosting algorithm. Use it to build an ensemble of either (1) single-node ANNs using your GD/EG implementation from Homework 2, or (2) decision stumps (depth-1 decision trees) using your ID3 implementation from Homework 1 (you may use Chris Hammack's version of ID3 if you wish). Make sure you limit the depth of the trees if you use ID3. You may have your learners train on resampled data sets, or you may have the algorithm use its knowledge of the distribution over the training set to train to directly minimize training error. (You can do the latter for ANNs by modifying the function that GD or EG minimizes; this is worth many extra points.)
Run your classifier on the same LASSO experiments that you conducted in Homework 2, Problem 3. Keep track of your error on the training set after each round of bagging or boosting, and use these values to generate a plot like those in Figure 4.9 on p. 110: training error versus training round, where error is sample error on the set [p. 130], not squared error. (For extra credit, you may also plot error on an independent validation set; to do this I recommend that you pull out a subset of the training set prior to starting training.) Also, report your final error results (on the test set) with 95% confidence intervals and compare them to the results from Homework 2. When did training error go to zero? Did overfitting occur? As usual, a well-written report is expected. This part of your report should be similar to that of Homework 2, including (at a minimum) answering the questions from that homework that are relevant.
The following problem is only for students registered for CSCE 878. CSCE 478 students who do it will receive extra credit, but the amount will be less than the number of points indicated.
Last modified 16 August 2011; please report problems to sscott AT cse.