Homework 2: CSCE 478/878 (Fall 2016)

CSCE 478/878 (Fall 2016) Homework 2

Assigned on Thursday, September 29
Due Sunday, October 16 at 11:59 p.m.

When you hand in your results from this homework, you should submit the following, in separate files:

A single zip file called username.zip, where username is your username on cse. In this file, put:
- Source code in the language of your choice (in plain text files).
- A makefile and/or a README.txt file facilitating compilation and running of your code (include a description of command line options). If we cannot easily re-create your experiments, you might not get full credit.
- All your data and results (in plain text files).
A single .pdf file with your writeup of the results for all the homework problems, including the last problem. Only pdf will be accepted, and you should only submit one pdf file, with the name username.pdf, where username is your username on cse. Include all your plots in this file, as well as a detailed summary of your experimental setup, results, and conclusions. If you have several plots, you might put a few example ones in the main text and defer the rest to an appendix. Remember that the quality of your writeup strongly affects your grade. See the web page on "Tips on Presenting Technical Material".

Submit everything by the due date and time using the handin program.

On this homework, you must work with your homework partner.

(15 pts) Suppose a hypothesis commits 10 errors over a sample of 65 independently drawn test examples. What is the 90% two-sided confidence interval for the true error rate? What is the 95% one-sided interval? What is the 90% one-sided interval?

(20 pts) Consider a two-layer feedforward ANN with two inputs a and b, one hidden unit c, and one output unit d. Units c and d use the sigmoid function to squash the weighted sum of its inptus. This network has five weights [w_ca, w_cb, w_c0, w_dc, w_d0], where w_x0 represents the threshold weight for unit x. Initialize all these weights to 0.2. Using a learning rate of η = 0.5, give the weights after each of two full passes of backpropagation through the following training examples using on-line (incremental) updates. In the table, a and b are the input attribute values, and r is the target (label) value.

a b r

1 0 1

0 1 0

(20 pts) Design a single-layer, two-input perceptron that implements the boolean function A ∧ [∼ B], where ∼ is logical negation. Design a multi-layer network of perceptrons to implement [A ⊕ B] ⊕ C, where ⊕ represents exclusive OR.

(85 pts) Implement an artificial neural network (ANN) with at least one hidden layer. You may hard-code the sizes of the input and hidden layers, or you may set them dynamically based on parameters passed to the program. Your ANN will be trained by the Backpropagation algorithm. If you use discrete-valued attributes or multiclass labels, explain in your report how you implemented that in your ANN.

You are to compare your ANN's results to those from ID3 on the same UCI data sets you used for Homework 1 (if you were unsuccessful in getting your ID3 implementation working, you may utilize an existing implementation, such as Weka's or Quinlan's C4.5). Your goal is to convince the reader that, for each data set, either one of the two algorithms is superior to the other (and give a significance level as well) or that there is no statistically significant difference between them. To accomplish this task, you may use any tools from the lecture that you wish, under two conditions: (1) you must use the tools correctly and thoroughly corroborate your assertion, and (2) you must have at least one confidence interval or statistical test and at least one ROC curve in your report.

You are to submit a detailed, well-written report, with conclusions that you can justify with your results. In particular, you should answer the following questions for both your new classifier and ID3. Did training error go to 0? Did overfitting occur? Should you have stopped training early? Was there a statistically significant difference between the performance of ID3 and that of the ANN? What algorithm would you recommend for your data sets? Of course, this is merely the minimum that is required in your report.

Extra credit opportunities include (but are not limited to) running on extra data sets, using other activation functions, using multiclass data, and running experiments on more ANN architectures and/or with more learning rates. As always, the amount of extra credit is commensurate with the level of extra effort and the quality of your report of the results.

back
CSCE 478/878 (Fall 2016) Home Page

Last modified 03 November 2016; please report problems to sscott AT cse.