Assigned Friday, October 12
Due Monday, November 5 at 11:59:59 p.m.
Now due Tuesday, November 6 at 11:59:59 p.m.
When you hand in your results from this homework, you should submit the following, in separate files:
Split each data set into independent training and test (validation) sets. Then choose three different network architectures, and train each on the training set and test on the test set. The testing should be done after every round of training, so you should generate plots like those in Figure 4.9 on p. 110. Also, for each error value plotted, compute its 95% confidence interval (you may incorporate the interval into the plot if you wish, which will improve readability). Repeat this for three values of the learning rate. Make sure that the three architectures and three values of the learning rate all differ significantly, allowing for a very broad range of experiments. Thus, 478 students will run 3 x 3 x 3 = 27 experiments (3 data sets, 3 architectures, 3 learning rate values) and 878 students will run 54.
You are to submit a detailed, well-written report, with real conclusions and everything. In particular, you should answer the following questions. Did training error go to 0? When? Did overfitting occur? When should you have stopped training? Of course, this is merely the minimum that is required in your report.
Extra credit opportunities include (but are not limited to) running on extra data sets, using other activation functions, using multiclass data sets, handling unspecified attribute values, and running experiments on more architectures and with more learning rates. As always, the amount of extra credit is commensurate with the level of extra effort and the quality of your report of the results.
NOTE: When implementing EG, you need to be able to represent negative weights.
Last modified 16 August 2011; please report problems to sscott AT cse.