Before starting
In homework 1, you wrote a classifier that conformed to the LASSO
API. We'll see now how conforming to an API allows us to tap into other
resources. If you did not make any violations of the API, the next
few steps should be easy.
What we are going to do is remove the learner wrapper, and turn our
programs into Web Services. We're going to utilize SOAP to help
us with this. You do not need to know anything about web services or
SOAP to proceed. We will be using the AXIS SOAP Implementation for
Java and the gSOAP implementation for C++. Download the plug-in
code here here (Java) (C++) Please use ONLY these libraries, as
there are some bugs in AXIS which I have fixed, and without these
changes in the custom version, you will have problems.
Important: This code has only been tested on Solaris (cse),
Linux and MacOS X. It may work on Windows, but I have not personally
tested this.
Important #2: It is probably best to run this on a cse
machine, because issues such as firewalls and/or slow connection
speeds may cause problems for you.
Important #3: Test your code first on the command
line. Use LearnerWrapper, run simple sanity checks, etc. It is MUCH easier
to debug on the command line than in the web service.
Important #4: Don't bang your head against the
wall for hours on plugging-in issues. This process really should not be
time-intensive. A simple e-mail to the TA or coming in during office hours
for five minutes might save you hours if you're stuck!
Important #5: I tested several of your ID3
algorithms. Unfortunately, several of them tend to have severe memory
problems when the number of examples and dimension increases (2000 x 64).
Namely, your programs completely blow away the JVM (if you're using
java). This causes your program to crash. If you have questions about this, please see me.
Plugging In
Java
- Download the Java web service code.
- Open the LearnerGlue.java file. Replace DumbLearner with the name of your learner class.
- Modify your class so that it implements edu.unl.lasso.Learner.MyLearner.MyLearner instead of MyLearner (one line code change).
- Type "make".
- We are going to start a pseudo-webserver to run our service in.
You need to pick a service port. Any random four-digit number greater
than 4000 should work. From the directory your code is in, run:
java -classpath .:lasso.jar org.apache.axis.transport.http.SimpleAxisServer -p 8080 &
where 8080 is the port number of your choice.
- We will now start the service, also from the source directory, run:
java -classpath .:lasso.jar
org.apache.axis.client.AdminClient -p 8080 deploy.wsdd
again where 8080 is the port number of your choice.
- We are now running a web service! As long as the machine you're running on does not die/get rebooted, this service should be available.
- The URL to your service is: http://machinename:8080/axis/services/MyLearner where machinename is the computer's name and 8080 is the port number.
- Note: if you make changes to your code after starting the server, you may need to restart the server. Kill the old process first.
If you do this, you do not
have to add a new learner so long as you do not change the port number.
- Proceed with the registration steps below.
C++
- Download the C++ web service code.
- Copy your learner to the same directory as the files extracted
from the web service archive.
- Adjust the Makefile that comes with the new code. Replace DumbLearner with the name of your class. Add any compilation rules to the Makefile for any external classes you may have written.
- Run "make". You should get a learner.cgi file.
- You can run learner.cgi in one of two ways:
- Put the file in your public_html directory. You can now access the service at:
http://cse/~myusername/learner.cgi
This is very convenient, however it is difficult to capture debugging information.
- You can run in a standalone debugging mode as follows: ./learner.cgi 8080 where 8080 is a four digit port number of your choice.
- If you use the standalone/debug mode, your service URL is http://machinename:8080/
(Note: if you make changes to your code after starting the server, you need to restart the server. Kill the old process first.
If you do this, you do not
have to add a new learner so long as you do not change the port number.)
- Proceed with the registration steps below.
Registration:
- Go to LASSO. Click on Register in the upper right. Fill out the form to get an account.
- After you have an account, log into the system (log in is on the left hand side).
- After logging in, you will be on the Main Menu.
Choose Component Manager.
- Click "Add" next to Learner.
- Type in the name of your learner, and its service URL (as described above). You only need to do this once, even if you change your program
and kill and restart your web service.
- After saving, go back to component manager and click "Add" next to
Classifier. Choose the Learner you
registered. Next to "dataformat in" choose "CSVData". For
"dataformat out", choose "Labeled Data". You should create a new classifier
for each learning task.
- Now go to pipeline manager via the Main Menu.
- Click add and create a pipeline. (Note that you will create a new
pipeline for each learning task.)
- For Task, choose Ontology Membership.
- Choose "News Ontology"
- Click the "Append New Stage" Button.
- Choose "DataSource" as the type, and choose "478 Prepared Data" as the DataSource.
- Click the "Append New Stage" Button again.
- Choose Classifier as the type, and choose your classifier as the ID.
- Click Validate. You should not get any errors. If you have problems, contact the TA.
- Click Save and Exit.
- Now go to the job manager.
- Select your pipeline, and set up a training task.
This should now start running the pipeline. Note that the learning task you
choose must match the "Ontology Destination" you selected above (e.g.
"NewsTask Data (Train)" matches "NewsOntology"). Select "Replace Existing
Data", which is necessary for you to get correct results.
- After training completes successfully (i.e. "Status: Completed"),
submit a classification task in a similar fashion.
If you instead get "Status: Stopped (Error?)", do not bother submitting the
classification task until you successfully complete training.
- Check back in a few minutes to see if your job has completed. If it
has, continue:
- Click on the Data Analysis Tool.
- Choose the job number (probably the last one in the list).
Get the summary results.
The results below are presented in a "confusion matrix", where
the horizontal axis corresponds to the true label of a web page and
the vertical axis
corresponds to the label that your classifier predicted. So in the matrix
below, the lower-left number indicates that there were 13 positive examples
(True Label = "yes") that the classifier predicted as "no", i.e. false negative
errors.
- Congratulations! You did it!