CSCE 478/878 (Fall 2006): Homework 2

CSCE 478/878 (Fall 2006) Homework 2

Assigned Sunday, October 1
Due ~~Tuesday, October 24~~ ~~Thursday, October 26 at 11:59 p.m.~~ Friday, October 27 at 11:59 a.m.

When you hand in your results from this homework, you should submit the following, in separate files:

A single .tar.gz or .tar.Z file (make sure you use a UNIX-based compression program) called username.tar.gz where username is your username on cse. In this tar file, put:
- Source code in in the language of your choice (in plain text files).
- A makefile and a README file facilitating compilation and running of your code (include a description of command line options). If we cannot easily re-create your experiments, you might not get full credit.
- All your data and results (in plain text files).
A single .pdf file with your writeup of the results for all the homework problems, including the last problem. Only pdf will be accepted, and you should only submit one pdf file, with the name username.pdf, where username is your username on cse. Include all your plots in this file, as well as a detailed summary of your experimental setup, results, and conclusions. If you have several plots, you might put a few example ones in the main text and defer the rest to an appendix. Remember that the quality of your writeup strongly affects your grade. See the web page on ``Tips on Presenting Technical Material''.

Submit everything by the due date and time using the web-based handin program.

On this homework, you must work on your own and submit your own results written in your own words.

(10 pts) Simulate the running of Kernel Perceptron on the following training set:

x y

(1, 4) +1

(2, 2) −1

(0, 2) +1

Start with the first training example, updating the αs one at a time, and iterate over the training set until convergence. Use η = 0.5 and a degree-2 polynomial kernel: K(x₁, x₂) = (< x₁, x₂ >)² . Show all steps.
(10 pts) Do Problem 5.3 on p. 152
(10 pts) Consider a kernel K : X × X → ℝ that takes two vectors from X and returns a real number. From class we know that since K is a kernel, it computes a dot product in an induced feature space Φ. Specifically, K(x_i, x_j) = < Φ(x_i), Φ(x_j) >. Define the squared Euclidean distance between Φ(x_i) and Φ(x_j) (i.e. ∥ Φ(x_i) − Φ(x_j) ∥²) in terms of K.
(60 pts) Implement an artificial neural network (ANN) with at least one hidden layer. The input layer will have n nodes (i.e. there are n features per example) and the hidden layer will have k nodes. Here, n and k are parameters passed to the learner (i.e. do not hard-code these values into your program). The number of output nodes may be fixed at 1 if you wish, though if you plan to handle multi-class data, you should make this dynamic as well. Your ANN will be trained by the Backpropagation algorithm, either GD- or EG-based. (If you implement EG-based backpropagation, you will receive extra credit. Note that in this case you need to be able to represent negative weights.)
You are to compare your ANN's results to those from ID3 on the same UCI data sets you used for Homework 1. Your goal is to convince the reader that, for each data set, either one of the two algorithms is superior to the other (and give a significance level as well) or that there is no statistically significant difference between them. To accomplish this task, you may use any tools from Lecture 5 that you wish, under two conditions: (1) you must use the tools correctly and thoroughly corroborate your assertion, and (2) you must have at least one confidence interval and at least one ROC curve in your report. Note that in order to use certain statistical tools correctly, you may need to run a few additional experiments with ID3.
You are to submit a detailed, well-written report, with real conclusions and everything. In particular, you should answer the following questions for both your new classifier and ID3. Did training error go to 0? Did overfitting occur? Should you have stopped training early? Was there a statistically significant difference between the performance of ID3 and that of the ANN? What algorithm would you recommend for your data sets? Of course, this is merely the minimum that is required in your report.
Extra credit opportunities include (but are not limited to) running on extra data sets, using other activation functions, using multiclass data, and running experiments on more ANN architectures and/or with more learning rates. As always, the amount of extra credit is commensurate with the level of extra effort and the quality of your report of the results.
(5 pts) State how many hours you spent on each problem of this homework assignment. For CSCE 878 students, this includes the next two problems.
The following problems are only for students registered for CSCE 878. CSCE 478 students who do one or both of them will receive extra credit, but the amount will be less than the number of points indicated.
(25 pts) Do Problem 4.10 on p. 125, for both additive and multiplicative updates.
(15 pts) A self-normalizing kernel is one in which all vectors in the induced feature space have unit length, i.e. that ∥ Φ_K(x) ∥ = 1, where Φ_K is the remapping induced by the kernel K. Explain how any kernel K can be converted to a self-normalizing kernel, i.e. to a kernel K' such that ∥ Φ_K'(x) ∥ = 1 for all x.

back
CSCE 478/878 (Fall 2006) Home Page

Last modified 16 August 2011; please report problems to sscott AT cse.

x	y
(1, 4)	+1
(2, 2)	−1
(0, 2)	+1