Before starting

In homework 1, you wrote a classifier that conformed to the LASSO API. We'll see now how conforming to an API allows us to tap into other resources. If you did not make any violations of the API, the next few steps should be easy.

What we are going to do is remove the learner wrapper, and turn our programs into Web Services. We're going to utilize SOAP to help us with this. You do not need to know anything about web services or SOAP to proceed. We will be using the AXIS SOAP Implementation for Java and the gSOAP implementation for C++. Download the plug-in code here here (Java) (C++) Please use ONLY these libraries, as there are some bugs in AXIS which I have fixed, and without these changes in the custom version, you will have problems.

Important: This code has only been tested on Solaris (cse), Linux and MacOS X. It may work on Windows, but I have not personally tested this.

Important #2: It is probably best to run this on a cse machine, because issues such as firewalls and/or slow connection speeds may cause problems for you.

Important #3: Test your code first on the command line. Use LearnerWrapper, run simple sanity checks, etc. It is MUCH easier to debug on the command line than in the web service.

Important #4: Don't bang your head against the wall for hours on plugging-in issues. This process really should not be time-intensive. A simple e-mail to the TA or coming in during office hours for five minutes might save you hours if you're stuck!

Important #5: I tested several of your ID3 algorithms. Unfortunately, several of them tend to have severe memory problems when the number of examples and dimension increases (2000 x 64). Namely, your programs completely blow away the JVM (if you're using java). This causes your program to crash. If you have questions about this, please see me.

Plugging In

Java C++

Registration:

  1. Go to LASSO. Click on Register in the upper right. Fill out the form to get an account.
  2. After you have an account, log into the system (log in is on the left hand side).
  3. After logging in, you will be on the Main Menu. Choose Component Manager.
  4. Click "Add" next to Learner.
  5. Type in the name of your learner, and its service URL (as described above). You only need to do this once, even if you change your program and kill and restart your web service.
  6. After saving, go back to component manager and click "Add" next to Classifier. Choose the Learner you registered. Next to "dataformat in" choose "CSVData". For "dataformat out", choose "Labeled Data". You should create a new classifier for each learning task.

  7. Now go to pipeline manager via the Main Menu.
  8. Click add and create a pipeline. (Note that you will create a new pipeline for each learning task.)
  9. For Task, choose Ontology Membership.
  10. Choose "News Ontology"
  11. Click the "Append New Stage" Button.
  12. Choose "DataSource" as the type, and choose "478 Prepared Data" as the DataSource.
  13. Click the "Append New Stage" Button again.
  14. Choose Classifier as the type, and choose your classifier as the ID.
  15. Click Validate. You should not get any errors. If you have problems, contact the TA.
  16. Click Save and Exit.
  17. Now go to the job manager.
  18. Select your pipeline, and set up a training task. This should now start running the pipeline. Note that the learning task you choose must match the "Ontology Destination" you selected above (e.g. "NewsTask Data (Train)" matches "NewsOntology"). Select "Replace Existing Data", which is necessary for you to get correct results.
  19. After training completes successfully (i.e. "Status: Completed"), submit a classification task in a similar fashion. If you instead get "Status: Stopped (Error?)", do not bother submitting the classification task until you successfully complete training.
  20. Check back in a few minutes to see if your job has completed. If it has, continue:
  21. Click on the Data Analysis Tool.
  22. Choose the job number (probably the last one in the list). Get the summary results. The results below are presented in a "confusion matrix", where the horizontal axis corresponds to the true label of a web page and the vertical axis corresponds to the label that your classifier predicted. So in the matrix below, the lower-left number indicates that there were 13 positive examples (True Label = "yes") that the classifier predicted as "no", i.e. false negative errors.
  23. Congratulations! You did it!