CSCE 970 (Spring 2003) Homework 1

Assigned Thursday, January 23
Due Tuesday, February 18 at 11:59:59 p.m.
Total points: 100


When you hand in your results from this homework, you should submit the following, in separate files:

  1. A single .tar.gz or .tar.Z file (make sure you use a UNIX-based compression program) called username.tar.gz where username is your username on cse. In this tar file, put:
  2. A single .pdf file with your writeup of the results for all the homework problems, including the last problem. Only pdf will be accepted, and you should only submit one pdf file, with the name username.pdf, where username is your username on cse. Include all your plots in this file, as well as a detailed summary of your experimental setup, results, and conclusions. If you have several plots, you might put a few example ones in the main text and defer the rest to an appendix. Remember that the quality of your writeup strongly affects your grade. See the web page on ``Tips on Presenting Technical Material''.
Submit everything by the due date and time using the web-based handin program.

On this homework, you must work on your own and submit your own results written in your own words.


  1. (20 pts) Do Exercise 2.7 on page 48 of Theodoridis.

  2. (20 pts) Do Exercise 2.12 on page 49 of Theodoridis.

  3. (50 pts) Implement one classifier from set (a) and one classifier from set (b) (two total).

    (a)
    A Bayesian classifier that assumes the pdf for each class is a Gaussian and uses maximum likelihood estimation to estimate both the mean vector and the covariance matrix for each class. For this part you should report what your estimates were for each class for each data set.
    OR
    A Bayesian classifier that uses Parzen windows to estimate the pdf for each class. Use a Gaussian function with 0 mean and unit variance as your kernel function.

    (b)
    The Perceptron, using at least three different learning rates and at least three different numbers of passes through the training data. Train this with the Pocket algorithm.
    OR
    Winnow, using at least three different learning rates and at least three different numbers of passes through the training data. Train this with the Pocket algorithm and allow for negative and positive weights. Explain how you handle both signs of weights, and justify your procedure.

    You will train and test your two classifiers on six datasets. For five of them, the number of classes is M=2 and the number of dimensions is l=2. For the sixth, the number of classes is M=4 and the number of dimensions is l=4. Because M>2, you must use a multi-class technique (e.g. Kessler's construction or ECOC) for Perceptron and Winnow.

    The data sets were generated according to different probability distributions and some are linearly separable while others are not (each testing set is generated in the same fashion as its corresponding training set). In your Bayesian classifiers, do not forget that all classes might not be equally probable!

    When you choose values for your different parameters, choose widely varying numbers (e.g. differing by factors of 2 or more), so you can see how the parameter values impact training speed and classification error.

    Report on the performance of these classifiers on the different data sets with the different parameter values (one way is to hold all parameter values fixed while varying one, measuring its effect). You might also try varying the number of training examples to see which classifiers perform better with less data. Include comments on time (in number of iterations and in real time) to train, time to test, and error rates during training and testing. Also, for each data set, note if a classifier tended to make a particular type of error. I.e. were all its misclassified feature vectors near each other? If so, explain this phenomenon.

    Contrast the performances (error rates and times) of these classifiers on the different sets. From these results, what can you infer about the characteristics of each data set in terms of linear separability, probability distribution, etc.? Based on the characteristics of each data set, which classifier do you feel is most appropriate? Do you think better results could be obtained by using other methods (including those from Chapter 4)? Which methods do you think would improve performance and why?

    When you hand in your results, submit source code electronically as well as a well-written, detailed report (much of your grade will depend on your presentation).

    The data sets are available on the web. Each directory has 2M files: classi.train and classi.test for i=1,...,M. The *.train files are for training the classifier, the *.test files are for testing it (if you wish to experiment with various amounts of training data, you can prune some data out of the training sets). The file classi.* contains one omega_i feature vector per line, real numbers separated by spaces. You may write scripts to reformat the data before it goes to your executables, so long as you keep the training and testing sets separate. However, you must provide these scripts in your .tar.gz file. I.e. I should be able to run your main scripts on the original data sets to evaluate your programs.

  4. (5 pts) List at least three topics that you feel would be good subjects for a class project. These topics may be on any material related to pattern recognition, i.e. they need not be related to topics we have covered in class so far.

  5. (5 pts) State the approximate amount of time you spent on each problem, especially for each classifier you designed.

Extra Credit (15 pts) Find a good SVM or RBF toolset that is freely available (or write your own). Download and compile it (if necessary) and run it on different training and testing sets (including the ones I generated, if you wish). Briefly report on its performance with different network architectures and parameters. Also report on its ease of use.


Return to the CSCE 970 (Spring 2003) Home Page

Last modified 16 August 2011; please report problems to sscott AT cse.


Back