Assigned Monday, January 29
Due Sunday, February 11 at 11:59 p.m.
When you hand in your results from this homework, you should submit the following, in separate files:
Submit all seven files by the due date and time using the handin program.
On this homework, you must work with your homework partner.
Form homework groups of size 2–3 and report the members of your group, including full names and cse usernames.
Consider the leaky Rectified Linear Unit (leaky ReLU):
Rw1(x) = max(0.01w1· x, w1 · x)Draw a computation graph G1 for this unit where weight vector w1 and input vector x are each of dimension 3 (ignore bias terms in this exercise). Then extend computation graph G1 to a new graph G2 where the 3-input leaky ReLU unit feeds into a second, 1-input leaky ReLU Rw2(Rw1(x)) (i.e., the first unit's output becomes the input of the second unit, and the second unit has its one weight w2). This gives you a two-layer network that computes ŷ=Rw2(Rw1(x)). In your computation graphs, use only the primitives multiplication, addition, max, subtraction, and squaring.
You will now further extend G2 to compute the gradient ∇J(w) for square loss function J(w)=(y–ŷ)², where w is a vector of all the weights (i.e., the concatenation of w1 and w2). Specifically, let the training instance (x,y) be features x=[1.0, 2.0, 3.0]T, and label y=2. Let the weights be w1=[1.5,2.5,–2.5]T and w2=[–50.0]T. What is the gradient ∇J(w) in this case? What are the new weights after updating via standard gradient descent using learning rate η=0.01?
In this part of the homework, your group will work with a variation of the MNIST dataset called Fashion-MNIST, which is meant to serve as a direct drop-in replacement for MNIST. You will take data as numpy arrays, partition it into separate training and testing sets, use this to train and evaluate your models, and report the results.
Your trained network will take as input a feature vector of dimension 784 (corresponding to the pixel values of 28×28 images), each an integer from 0–255. The class labels are in the following table.
Here are some sample images:
Design and implement at least two architectures for this problem. You may vary the number and sizes of the layers, but you must use at least one hidden layer and you must use ReLU (not convolutional nodes) for all hidden nodes and softmax for the output layer. You will measure loss with cross-entropy after mapping the class labels to one-hot vectors.
The data is on crane in the folder
/work/cse496dl/shared/homework/01The folder has two numpy files: fmnist_train_data.npy and fmnist_train_labels.npy.
The numpy array provides all the data available to you for this assignment. It is up to you to partition it into (1) a single training set and a single testing set, or (2) k subsets for k-fold cross validation. Option (1) is quicker and easier, but option (2) provides a more thorough analysis of your model, and might give you an edge in the competition. Whichever option you choose, you must ensure that you do not test on the training set.
For each of your architectures, you will use Adam to optimize on the training set and then you will test on the test set. For each training run, you will use at least two sets of hyperparameters. You must also choose a regularizer and evaluate your system's performance with and without it. Thus, you will perform at least 2×2×2=8 training runs. (You will run a factor of k times more if you do k-fold cross validation.)
You are to submit a detailed, well-written report, with conclusions that you can justify with your results. Your report should include a description of the learning problem (what is to be learned), a detailed description of your architecture, activation functions, and regularizer, the values of the hyperparameters you tested, and how you partitioned your data. All these design decisions must be justified. You should then describe your experimental results, including a confusion matrix, and draw conclusions from your results (conclusions must be justified by your presented results). In particular, you should discuss the impact that your hyperparameter settings and regularizer had on performance, including overfitting.
As part of your submission, you will include files representing your best model of the ones that you created and tested. After the deadline, we will evaluate each team's best model on a held-out data set (separate from the file you will access on crane). Bonus points will be awarded to the three teams with the best submitted models, as measured by classification accuracy. To help you determine how good your model is relative to your competitors, each night we will evaluate on our data set each team's submitted model and post the accuracies. These evaluations will begin on Monday, February 5 and will be done daily until the homework deadline.
To submit your program, you must submit the following three files:
To sumit your model for the competition, you must submit the following three files:
saver.save(session, "homework_1")Your model must use the following two tensors:
Also, submit a file called team.txt with your team name as you want it to appear in the posted standings.
Finally, you must submit your report (including your responses to exercises 1 and 2) as username1-username2.pdf, where username1 and username2 are your team members' user names on cse.
x = x / 255.0where x is the tensor.
Last modified 09 February 2018; please report problems to sscott.