CSCE 478/878 (Fall 2001) Homework 3

Assigned Monday, November 5
Due Friday, December 7 at 11:59:59 p.m.

When you hand in your results from this homework, you should submit the following, in separate files:

  1. Source code in the language of your choice (in plain text files).
  2. A makefile and a README file facilitating compilation and running of your code (include a description of command line options). If we cannot easily re-create your experiments, you might not get full credit.
  3. All your data (in plain text files).
  4. Your writeup (in pdf format) of the results for all the homework problems, including the last problem (only pdf will be accepted, and you should only submit one pdf file). Include all your plots in this file, as well as a detailed summary of your experimental setup, results, and conclusions. If you have several plots, you might put a few example ones in the main text and defer the rest to an appendix. Remember that the quality of your writeup strongly affects your grade. See the web page on ``Tips on Presenting Technical Material''.
Submit everything by the due date and time using the handin program.


You may work with a partner on this homework. If you do, then you will be graded more rigorously, especially on the written presentation of your results.
  1. (5 pts) Do Problem 6.1 on p. 198

  2. (10 pts) Do Problem 6.2 on p. 198

  3. (70 pts for 478, 85 pts for 878) You will run experiments similar to those for Problem 4 in Homework 2, but with new classifiers and with combinations of your old classifiers.

    (a)
    Implement a naive Bayes classifier and a k-nearest neighbor classifier. For the naive Bayes classifier, you should use m-estimates for the probabilities to work around attribute values that are unused in the training set. For the k-nearest neighbor classifier, you will need to define an appropriate distance function, depending on your data sets. There are several choices of distance when numeric attributes are used, but for symbolic attributes (e.g. an attribute ``color'' that takes on values ``red'', ``green'', and ``blue''), you'll need to use another approach. One possibility is the Hamming distance, where you count the number of attributes whose values differ between the two instances.

    You will run your two classifiers on the same experiments (with the same UCI data sets and the same splits into training and test sets) that you conducted in Homework 2, Problem 4. Report your error results with confidence intervals and compare them to the results from Homework 2. (Of course, with these classifiers, there is no such thing as a ``training round'', so the only errors you will report are final training error and final testing error.) In addition to reporting test error, you should also report average time to evaluate each test example (for both classifiers). (See my slides on performance measurement for tips on this part.) If you go back and time your EG/GD classifier(s) from Homework 2, you will get a few extra points.

    478 students only need to experiment with k=1 and only one value of m for the m-estimate. 878 students must try three different values of m and k. Also, 878 students must choose at least two of the four learning algorithms and run a paired t test to find out which one is better for each of your data sets (assuming that one can be declared superior in a statistically significant way).

    This part of your report should be similar to that of Homework 2, including (at a minimum) answering the questions from that homework that are relevant to these classifiers.

    (b)
    Now you will test your learners in an ensemble setting.

    Of all the classifiers you trained in this homework and the previous one (consider each ANN architecture/learning rate pair from Homework 2 to be a distinct classifier), choose the one that was best on its test set. Then choose the three worst classifiers. Place these four classifiers in a pool of experts and run weighted majority on them. (You will need a new training set for WM to do this; I recommend using some of the original training set and some of the test set, but hold out some of the test set out for testing WM.) Generate a plot like those in Figure 4.9 on p. 110 (training error and test error versus training round, where error is sample error on the set [p. 130], not squared error). Report training and test error for each classifier and for WM. Thus you will have 10 curves in your plot. You need only one value of beta for your experiments.

    For 878 students only: Implement the bagging or boosting algorithm. Use it to build an ensemble of either (1) single-node ANNs using your GD/EG implementation from Homework 2, or (2) decision stumps (depth-1 decision trees) using your ID3 implementation from Homework 1. You may have your learners train on resampled data sets, or you may have the algorithm use its knowledge of the distribution over the training set to train to directly minimize training error (this is worth extra points). Generate a plot like those in Figure 4.9 on p. 110 and give confidence intervals (again, this is sample error, not squared error). When did training error go to zero? Did overfitting occur? As usual, a well-written report is expected.

  4. (5 pts) State how many hours you spent on each problem of this homework assignment.

back
CSCE 478/878 (Fall 2001) Home Page

Last modified 16 August 2011; please report problems to sscott AT cse.