Assigned Monday, February 12
Due
Sunday, February 25
Tuesday, February 27
at 11:59 p.m.
When one person from your group hands in your team's results from this homework, they should submit via handin the following, in separate files:
Also, one person from your group (the same one who handed in the .py and .pdf files) will submit your team's model files for the competition. On crane, you should copy your model files to $WORK/handin. This folder has been created for you. Do not delete the handin folder and recreate it, as that will break the permission settings needed for us to evaluate your model. You will submit two sets of three model files each, with the following names:
On this homework, you must work with your homework partner(s).
For this problem, you will apply convolutional neural networks to the problem of emotion classification of short (3–5 seconds) utterances from the Berlin Database of Emotional Speech (EMODB). All utterances were resampled to have frequency rate of 16 kHz prior to any processing. All audio utterances were then converted into spectrograms. A spectrogram is an image that visualizes the variation of energy at different frequencies across time. The vertical axis represents frequency and the horizontal axis represents time. The energy or intensity is encoded either by the level of darkness or by the colors. The data in this experiment are represented as wide-band spectrograms, which have a higher time resolution than their counterpart narrow-band spectrograms. All spectrogram images were then resized to size 129×129 pixels and then z-normalized to have zero mean and standard deviation close to one.
Here is a sample spectrogram:
Your trained network will take as input a feature vector of dimension 16641 (corresponding to the pixel values of the 129×129 spectrogram images), each a real number. The class labels are in the following table.
Label Value (one-hot index) | Meaning |
0 | Happy |
1 | Sad |
2 | Angry |
3 | Scared |
4 | Bored |
5 | Disgusted |
6 | Neutral |
Design and implement at least two convolutional architectures for this problem. You may vary the number and sizes of the layers, but you must use at least two convolutional+pooling layers and you must use at least one connected layer (with ReLU), followed by softmax for the output layer. You will measure loss with cross-entropy since the class labels will be one-hot vectors.
The data is on crane in the folder
/work/cse496dl/shared/homework/02The folder contains the folder EMODB-German. In that folder you will find the following numpy files:
For each of your architectures, you will use Adam to optimize on each training set and then you will test on each test set. For each training run, you will use at least two sets of hyperparameters. You must also choose a regularizer.
You are to submit a detailed, well-written report, with conclusions that you can justify with your results. Your report should include a description of the learning problem (what is to be learned), a detailed description of your architecture, activation functions, and regularizer, and the values of the hyperparameters you tested. All these design decisions must be justified. You should then describe your experimental results, including a confusion matrix, and draw conclusions from your results (conclusions must be justified by your presented results). In particular, you should discuss the impact of your hyperparameter settings.
As part of your submission, you will include files representing your best model of the ones that you created and tested. After the deadline, we will evaluate each team's best model on a held-out data set (separate from the file you will access on crane). Bonus points will be awarded to the three teams with the best submitted models, as measured by classification accuracy. To help you determine how good your model is relative to your competitors, each night we will evaluate on our data set each team's submitted model and post the accuracies. These evaluations will begin on Monday, February 19 and will be done daily until the homework deadline.
To submit your program, you must submit the following three files, via handin:
To sumit your model for the competition, you must submit the following three files by copying them to $WORK/handin on crane:
saver.save(session, "emodb_homework_2")Your emodb model must use the following two tensors:
Finally, via handin, you must submit your report as username1-username2.pdf, where username1 and username2 are your team members' user names on cse.
In this exercise you will perform transfer learning from the EMODB learning task to a related one based on the Surrey Audio-Visual Expressed Emotion (SAVEE) database. You will take your best model from the previous exercise, fix the weights of the convolutional layers, and refine the weights of the dense layers with a new data set that has the same format and input/output dimensions as the EMODB dataset. The following table overviews the class labels for the new problem.
Label Value (one-hot index) | Meaning |
0 | Happy |
1 | Sad |
2 | Angry |
3 | Scared |
4 | Surprised |
5 | Disgusted |
6 | Neutral |
The data is on crane in the folder
/work/cse496dl/shared/homework/02The folder contains the folder SAVEE-British. In that folder you will find the following numpy files:
To submit your program, you must submit the following three files, via handin:
To sumit your model for the second competition, you must submit the following three files by copying them to $WORK/handin on crane:
saver.save(session, "savee_homework_2")Your savee model must use the following two tensors:
Finally, via handin, you must submit your report as username1-username2.pdf, where username1 and username2 are your team members' user names on cse. Your report for this exercise will be merged with that from the previous exercise. It should assess the impact that transfer learning had on performance, including whether you had to modify the values of the hyperparameters for the new dataset.
Last modified 26 February 2018; please report problems to