Assigned Monday, November 1
Due Wednesday, November 17 at 11:59 p.m.
When you hand in your results from this homework,
you should submit the following, in separate
files:
Implement the boosting algorithm. Use it to build an ensemble of either (1) single-node ANNs using your GD/EG implementation from Homework 2, or (2) decision stumps (depth-1 decision trees) using your ID3 implementation from Homework 1 (you may use the Weka version of these [or related] algorithms you wish). Make sure you limit the depth of the trees if you use ID3. You may have your learners train on resampled data sets, or you may have the algorithm use its knowledge of the distribution over the training set to train to directly minimize training error. (You can do the latter for ANNs by modifying the function that GD or EG minimizes; this is worth extra points.)
Run your classifier on the same data sets that you used in the previous homeworks. Keep track of your error on the training set after each round of boosting, and use these values to generate a plot like those in Figure 4.9 on p. 110: training error versus training round, where error is sample error on the set [p. 130], not squared error. (For extra credit, you may also plot error on an independent validation set.) Also, report your final error results (on the test set) either with confidence intervals or with a ROC curve. When did training error go to zero? Did overfitting occur?
The following problem is only for students registered for CSCE 878. CSCE 478 students who do it will receive extra credit, but the amount will be less than the number of points indicated.
Last modified 16 August 2011; please report problems to sscott.