CSCE 478/878 (Fall 2001) Homework 1

Assigned Wednesday, September 26
Due Wednesday, October 10 at 11:59:59 p.m.
Now due Friday, October 12 at 11:59:59 p.m.

You are to submit all files, including all source code (in the language of your choice), data files, and a single pdf file of your entire writeup (only pdf will be accepted this time, and you should only submit one pdf file). Submit everything by the due date and time using the handin program.


  1. (25 pts) Implement the ID3 algorithm from table 3.1 (p. 56) and run it at least 30 times on randomly generated data sets, where the examples are generated and labeled the same way as described in Problem 2.10 (p. 50). For each of the 30 runs, use a different size for the training set. Then evaluate each of the 30 decision trees on a randomly generated test set of size 100 (use the same test set for each different training set). Then plot the performance on the test set versus the size of the training set. How did increasing the training set size influence generalization error? Did overfitting occur? If not, can you push the learner to the point of overfitting? Why or why not? Hand in your source code and data sets as part of your solution to this problem, as well as a brief report of your results.

  2. (10 pts) Do Problem 2.3 on p. 48

  3. (5 pts) Do Problem 3.2 on p. 77

  4. (15 pts) Do Problem 7.2 on p. 227

  5. (5 pts) State how many hours you spent on each problem of this homework assignment (for CSCE 878 students, this includes the next two problems).

    The following two problems are only for students registered for CSCE 878. CSCE 478 students who do these will receive extra credit, but the amount will be substantially less than the number of points indicated.

  6. (20 pts) Do Problem 7.6 on p. 228. Hand in your source code and data sets as part of your solution to this problem, as well as a brief report of your results.

  7. (10 pts) A binary decision stump is a depth-1 decision tree, i.e. it has a root node and two leaves. What is the VC dimension of the hypothesis class of binary decision stumps defined over the real plane? Argue that your answer is correct.

Return to the CSCE 478/878 (Fall 2001) Home Page

Last modified 16 August 2011; please report problems to sscott AT cse.