CSCE 478/878 (Fall 2003) Homework 3

Assigned Monday, November 17
Due Sunday, December 7 at 11:59:59 p.m.


When you hand in your results from this homework, you should submit the following, in separate files:

  1. A single .tar.gz or .tar.Z file (make sure you use a UNIX-based compression program) called username.tar.gz where username is your username on cse. In this tar file, put:
  2. A single .pdf file with your writeup of the results for all the homework problems, including the last problem. Only pdf will be accepted, and you should only submit one pdf file, with the name username.pdf, where username is your username on cse. Include all your plots in this file, as well as a detailed summary of your experimental setup, results, and conclusions. If you have several plots, you might put a few example ones in the main text and defer the rest to an appendix. Remember that the quality of your writeup strongly affects your grade. See the web page on "Tips on Presenting Technical Material".
Submit everything by the due date and time using the web-based handin program.


You may work with a partner on this homework. If you do, then you will be graded more rigorously, especially on the written presentation of your results.
  1. (10 pts) Do Problem 6.1 on p. 198.

  2. (60 pts) You will run experiments similar to those for Problem 3 in Homework 2, but with an ensemble of classifiers.

    Implement the boosting algorithm. Use it to build an ensemble of either (1) single-node ANNs using your GD/EG implementation from Homework 2, or (2) decision stumps (depth-1 decision trees) using your ID3 implementation from Homework 1 (you may use Chris Hammack's version of ID3 if you wish). Make sure you limit the depth of the trees if you use ID3. You may have your learners train on resampled data sets, or you may have the algorithm use its knowledge of the distribution over the training set to train to directly minimize training error. (You can do the latter for ANNs by modifying the function that GD or EG minimizes; this is worth many extra points.)

    Run your classifier on the same LASSO experiments that you conducted in Homework 2, Problem 3. Keep track of your error on the training set after each round of bagging or boosting, and use these values to generate a plot like those in Figure 4.9 on p. 110: training error versus training round, where error is sample error on the set [p. 130], not squared error. (For extra credit, you may also plot error on an independent validation set; to do this I recommend that you pull out a subset of the training set prior to starting training.) Also, report your final error results (on the test set) with 95% confidence intervals and compare them to the results from Homework 2. When did training error go to zero? Did overfitting occur? As usual, a well-written report is expected. This part of your report should be similar to that of Homework 2, including (at a minimum) answering the questions from that homework that are relevant.

    The following problem is only for students registered for CSCE 878. CSCE 478 students who do it will receive extra credit, but the amount will be less than the number of points indicated.

  3. (40 pts) Implement a naive Bayes classifier, using m-estimates for the probabilities to work around attribute values that are unused in the training set. Run your classifier on the same LASSO experiments that you conducted in Homework 2, Problem 3. Report your error results with 95% confidence intervals and compare them to the results from Homework 2. This part of your report should be similar to that of Homework 2, including (at a minimum) answering the questions from that homework that are relevant.

  4. (5 pts) State how many hours you spent on each problem of this homework assignment.

back
CSCE 478/878 (Fall 2003) Home Page

Last modified 16 August 2011; please report problems to sscott AT cse.