Assigned Monday, October 27
Due Monday, November 17 at 11:59:59 p.m.
When you hand in your results from this homework,
you should submit the following, in separate
files:
You are to plug your classifier into Lasso and train and test on the web data that we provide you. There are three labelings of the data, so you are to run three different experiments, saving three different classifiers. Further, you are to run your ID3 implementation from Homework 1 on these same data sets. (If you do not have a fully functional implementation from Homework 1, you may use Chris Hammack's version instead.) Thus you will run six different experiments and you are to compute 95% confidence intervals for each experimental result.
You are to submit a detailed, well-written report, with real conclusions and everything. In particular, you should answer the following questions for both your new classifier and ID3. Did training error go to 0? Did overfitting occur? Should you have stopped training early? Was there a statistically significant difference between the performance of ID3 and that of the ANN/SVM? Was there a statistically significant difference in performance of the same algorithm on different labelings of the data? What algorithm would you recommend for web page classification in general? Of course, this is merely the minimum that is required in your report.
Extra credit opportunities include (but are not limited to) running on extra data sets, using other activation functions, using this multiclass data set, and running experiments on more ANN architectures/SVM kernels and/or with more learning rates. As always, the amount of extra credit is commensurate with the level of extra effort and the quality of your report of the results.
The following problem is only for students registered for CSCE 878. CSCE 478 students who do it will receive extra credit, but the amount will be less than the number of points indicated.
Last modified 16 August 2011; please report problems to sscott AT cse.