Homework 2: CSCE 496/896 (Spring 2018)

CSCE 496/896 (Spring 2018) Homework 2

Assigned Monday, February 12
Due ~~Sunday, February 25~~ Tuesday, February 27 at 11:59 p.m.

When one person from your group hands in your team's results from this homework, they should submit via handin the following, in separate files:

Three program files, with the following names (files from either Exercise 1 or 2 are fine):

main.py: Code to specify and train the TensorFlow graph
model.py: TensorFlow code that defines the network
util.py: Helper functions

A single .pdf file with your writeup of the results for all the homework problems. Only pdf will be accepted, and you should only submit one pdf file, with the name username1-username2.pdf, where username1 and username2 are your team members' user names on cse. Include all your plots in this file, and all discussion.

Also, one person from your group (the same one who handed in the .py and .pdf files) will submit your team's model files for the competition. On crane, you should copy your model files to $WORK/handin. This folder has been created for you. Do not delete the handin folder and recreate it, as that will break the permission settings needed for us to evaluate your model. You will submit two sets of three model files each, with the following names:

emodb_homework_2-0.data-00000-of-00001
emodb_homework_2-0.index
emodb_homework_2-0.meta
savee_homework_2-0.data-00000-of-00001
savee_homework_2-0.index
savee_homework_2-0.meta

Your emodb model must use the following two tensors:

input_placeholder with shape=[None,16641]
output with shape=[None,7]

Your savee model must use the following two tensors:

input_placeholder with shape=[None,16641]
output2 with shape=[None,7]

tf.identity

On this homework, you must work with your homework partner(s).

(75 pts)
For this problem, you will apply convolutional neural networks to the problem of emotion classification of short (3–5 seconds) utterances from the Berlin Database of Emotional Speech (EMODB). All utterances were resampled to have frequency rate of 16 kHz prior to any processing. All audio utterances were then converted into spectrograms. A spectrogram is an image that visualizes the variation of energy at different frequencies across time. The vertical axis represents frequency and the horizontal axis represents time. The energy or intensity is encoded either by the level of darkness or by the colors. The data in this experiment are represented as wide-band spectrograms, which have a higher time resolution than their counterpart narrow-band spectrograms. All spectrogram images were then resized to size 129×129 pixels and then z-normalized to have zero mean and standard deviation close to one.

Here is a sample spectrogram:

Your trained network will take as input a feature vector of dimension 16641 (corresponding to the pixel values of the 129×129 spectrogram images), each a real number. The class labels are in the following table.

Label Value (one-hot index) Meaning

0 Happy

1 Sad

2 Angry

3 Scared

4 Bored

5 Disgusted

6 Neutral

Design and implement at least two convolutional architectures for this problem. You may vary the number and sizes of the layers, but you must use at least two convolutional+pooling layers and you must use at least one connected layer (with ReLU), followed by softmax for the output layer. You will measure loss with cross-entropy since the class labels will be one-hot vectors.

The data is on crane in the folder
/work/cse496dl/shared/homework/02
The folder contains the folder EMODB-German. In that folder you will find the following numpy files:
- test_x_1.npy
- test_x_2.npy
- test_x_3.npy
- test_x_4.npy
- test_y_1.npy
- test_y_2.npy
- test_y_3.npy
- test_y_4.npy
- train_x_1.npy
- train_x_2.npy
- train_x_3.npy
- train_x_4.npy
- train_y_1.npy
- train_y_2.npy
- train_y_3.npy
- train_y_4.npy
A file with an "x" in its name contains instances, and one with "y" contains labels. The files are structured for you to run 4-fold cross-validation to evaluate each of your models without partitioning the data ahead of time. In fold i, you will use the ith test set for testing and the ith train set for training. E.g., for iteration 2 of cross-validation, you will train on train_x_2.npy (along with the label file train_y_2.npy) and test on test_x_2.npy (along with the label file test_y_2.npy). (Each set denoted "train" is augmented.) Since you are performing cross-validation on augmented data sets, it is important that you do not mix the partitions given to you.

For each of your architectures, you will use Adam to optimize on each training set and then you will test on each test set. For each training run, you will use at least two sets of hyperparameters. You must also choose a regularizer.

You are to submit a detailed, well-written report, with conclusions that you can justify with your results. Your report should include a description of the learning problem (what is to be learned), a detailed description of your architecture, activation functions, and regularizer, and the values of the hyperparameters you tested. All these design decisions must be justified. You should then describe your experimental results, including a confusion matrix, and draw conclusions from your results (conclusions must be justified by your presented results). In particular, you should discuss the impact of your hyperparameter settings.

As part of your submission, you will include files representing your best model of the ones that you created and tested. After the deadline, we will evaluate each team's best model on a held-out data set (separate from the file you will access on crane). Bonus points will be awarded to the three teams with the best submitted models, as measured by classification accuracy. To help you determine how good your model is relative to your competitors, each night we will evaluate on our data set each team's submitted model and post the accuracies. These evaluations will begin on Monday, February 19 and will be done daily until the homework deadline.

Submission Requirements

To submit your program, you must submit the following three files, via handin:
1. main.py
2. model.py
3. util.py
To sumit your model for the competition, you must submit the following three files by copying them to $WORK/handin on crane:
1. emodb_homework_2-0.data-00000-of-00001
2. emodb_homework_2-0.index
3. emodb_homework_2-0.meta
You should use this command to save the files:
saver.save(session, "emodb_homework_2")
Your emodb model must use the following two tensors:
1. input_placeholder with shape=[None,16641]
2. output with shape=[None,7]
Finally, via handin, you must submit your report as username1-username2.pdf, where username1 and username2 are your team members' user names on cse.
(75 pts)
In this exercise you will perform transfer learning from the EMODB learning task to a related one based on the Surrey Audio-Visual Expressed Emotion (SAVEE) database. You will take your best model from the previous exercise, fix the weights of the convolutional layers, and refine the weights of the dense layers with a new data set that has the same format and input/output dimensions as the EMODB dataset. The following table overviews the class labels for the new problem.

Label Value (one-hot index) Meaning

0 Happy

1 Sad

2 Angry

3 Scared

4 Surprised

5 Disgusted

6 Neutral

The data is on crane in the folder
/work/cse496dl/shared/homework/02
The folder contains the folder SAVEE-British. In that folder you will find the following numpy files:
- test_x_1.npy
- test_x_2.npy
- test_x_3.npy
- test_x_4.npy
- test_y_1.npy
- test_y_2.npy
- test_y_3.npy
- test_y_4.npy
- train_x_1.npy
- train_x_2.npy
- train_x_3.npy
- train_x_4.npy
- train_y_1.npy
- train_y_2.npy
- train_y_3.npy
- train_y_4.npy
The file name conventions are the same for this dataset as with EMODB. You will run 4-fold cross-validation to evaluate each of your models. As with EMODB, you will evaluate at least two sets of hyperparameters and use a regularizer.

Submission Requirements

To submit your program, you must submit the following three files, via handin:
1. main.py
2. model.py
3. util.py
To sumit your model for the second competition, you must submit the following three files by copying them to $WORK/handin on crane:
1. savee_homework_2-0.data-00000-of-00001
2. savee_homework_2-0.index
3. savee_homework_2-0.meta
You should use this command to save the files:
saver.save(session, "savee_homework_2")
Your savee model must use the following two tensors:
1. input_placeholder with shape=[None,16641]
2. output2 with shape=[None,7]

Finally, via handin, you must submit your report as username1-username2.pdf, where username1 and username2 are your team members' user names on cse. Your report for this exercise will be merged with that from the previous exercise. It should assess the impact that transfer learning had on performance, including whether you had to modify the values of the hyperparameters for the new dataset.

Return to the CSCE 496/896 (Spring 2018) Home Page

Last modified 26 February 2018; please report problems to

Label Value (one-hot index)	Meaning
0	Happy
1	Sad
2	Angry
3	Scared
4	Bored
5	Disgusted
6	Neutral

CSCE 496/896 (Spring 2018) Homework 2

Submission Requirements

Submission Requirements