CSCE 496/896-005 (Spring 2018) Project Ideas

From the syllabus:

In this course you and your team will do a substantial project, in which you will characterize a significant problem amenable to a deep learning solution, study the related work to this problem, develop one or more deep learning approaches to this problem, and evaluate your approaches.

You will summarize your project results in a written report and an oral presentation. The written report must use a professional writing style similar to that found in a refereed conference or journal (e.g., ACM, IEEE, ICML, ICLR, NIPS), including abstract, introduction, summary of related work, your contribution, references, and an appendix (if necessary). The oral presentation will be to the entire class at the end of the semester: during the fifteenth week (April 23–27), and if necessary, during the fourteenth week (April 16–20). You will submit your written report no later than 11:59 p.m. on April 25. In accordance with UNL policies, you have now been informed in writing of the nature and scope of this project prior to the eighth week of classes.

Later this semester (late February to early March) we will set a deadline for submission of 1–3 paragraph proposals on your projects. Also, in late March, you will submit to us a brief progress report and meet with us for a check-in on your project. You must do both of these in order to get full credit for your project, and you must get our approval on your proposal before starting work on your project.

Projects are due by 11:59 pm on Wednesday, April 25. See the rules on projects for more information.

✔ ~~Project oral presentations are the weeks of April 16 and April 23. See the presentation schedule for more information.~~

✔ Project check-in reports are due Sunday, April 8. You should submit it to sscott@cse and pquint@cse in text format in the body of an email before 11:59 pm on that day. The proposal should include:

A summary of what you've accomplished on your project so far.

A description of what major elements remain for your project, including what is left over from your proposal as well as any new elements that you've added since your proposal.

An overview of any significant hurdles that have arisen.

A list of what questions you need answered to move forward.

On Monday, April 9, we will spend the hack-a-thon period reviewing each group's check-in report.

✔ PROPOSAL DEADLINE: The proposal submission deadline is Sunday, March 4. You should submit it to sscott@cse and pquint@cse in text format in the body of an email before 11:59 pm on that day. The proposal should include:

A brief statement of your project topic.

Motivation for your topic (why it is important and interesting).

A precise work plan: what you plan to do, what data sets you will test on, how you will evaluate performance, etc.

At least three references (at least two published journal or conference papers).

Project ideas

The following is a list of possible projects, suggested by a variety of individuals. If you want to know more about a particular one, let us know and we can put you in contact with those who can provide more details.

You are welcome to suggest your own project as well, e.g., one related to your own research. It should contain some form of an experimental component, be relevant to the course, and be a non-trivial extension beyond what we covered in class.

Classification

Object Detection on Aerial and Satellite Imagery

An application of graph convolutional kernels

Problem: Graph convolutional kernels are analogous to convolutions on images, but they try to identify features in the adjacency matrices of graphs. Useful in node labeling, edge identification, and graph compression.
Data: Graph data from the Stanford Large Network Dataset Collection (SNAP), and other graphs
Related papers and blog posts

GRAPH CONVOLUTIONAL NETWORKS blog post

Student progression through UNL's academic programs

Problem: Modeling student progression, major migration and related success factors
Data: Vanessa Roof, Director of Student Success Reporting & Analytics
Notes: Anonymized UNL student data; specific problem(s) TBD; might leverage graphical models

EEG analysis

Problems:

An EEG dataset where subjects are seeing a seeing a series of face, scene, and object images on the screen. The classification problem is what category the person is seeing. They have 17 subjects with pretty good data (and many more subjects less good), with a few hundred instances per subject per category.
An EEG dataset where subjects are doing a visual short-term memory task: they see either 1 or 2 items (small colored discs), on either the left or right side of the screen, and then they have to remember the locations of the items for a second or two and then report the location of one of them. The classification problem is to predict whether subjects are seeing 1 or 2 items, and whether those items are on the left or right.
An fMRI dataset where subjects are viewing a movie that's about 5 minutes long, and they see the same movie several times throughout the experiment. They have about 20 subjects' worth of data. An issue is that one cannot easily combine data across human beings with fMRI the way one can with EEG because the anatomy is so different between people, so there is no straightforward way to correspond the features to each other across subjects. However, they have an fMRI image every second, so each person probably has a couple thousand instances to work with, where each instance would correspond to 1 second of the video.

Defense against adversarial examples

Problem: Several image classification systems have been shown to be tricked into gross misclassification by making small, sometimes imperceptible changes to the input image, e.g., tricking the system into thinking a panda is a gibbon. An open problem is how to modify an architecture or regularizer to make classifiers more robust against such minor changes.
Related papers

Computer-aided smart user interface design

Problem: Computer aided smart user interface (UI) design can be a cool deep learning application. In practice, assessment and refactoring UI involve human participants. However, evaluating designs by looking at historical data can reduce the need for human intervention. Can be in both supervised and semi-supervised setting.
Related papers

Human-robot interaction

Problem: The objective is to use human-robot interaction data to predict the user's locus of control (LOC), which is a user quality measuring the control a person feels they have over their life. Large amounts of data can be used to train a model, but the classification would ideally be based on a small number of interactions.
Data: A study with 30 human participants had each participant control a robot to navigate an obstacle course. Each participant made two runs, for a total number of 60 runs, each capturing data from approximately 30,000 position points. Data gathered includes duration of each run, distance of each run, number of commands sent to the robot each run, etc.

Application of convolutional neural networks for genomic motif finding

Problem: CNNs have been applied to natural language processing (NLP). The phrases are encoded into a format that looks like Braille and input to the network. This project is to determine if that could be used for finding genomic motifs for bacterial species. Conventionally, these motifs are k-mers and their frequencies in a species' genome is considered a signature for that species. Perhaps one could use a CNN + autoencoder to come up with relevant motifs instead of enumerating all k-mers. The data in this case would be sequencing reads.
Related links

Automated table analysis

Problem: A classification problem, where the rows and columns of a complete table are segmented into headers, data, and ancillary regions solved with unsupervised learning of complete tables. In this formulation, one would consider the tables as pictures with categorical cell pixels (each cell would be one pixel, analogous to color pictures). The input to the program would be formatting/stylistic cell features derived from a .xls representation of the table. Available are 1320 tables in .xls format, along with the classification results produced by an alternate algorithmic method to evaluate results.

Association training

Problem: Assist an ongoing project to train a system to map inputs (images, text, sounds, etc.) from input space X to an embedded vector in ℝ^d such that two similar instances from X have embeddings that are near each other in ℝ^d under some distance measure such as Euclidean distance. This would potentially allow the use of locality-sensitive hashing on the embeddings to enable very fast information retrieval of similar instances from X. Similar to a classification problem, but labels are associated with pairs of inputs. E.g., two instances x₁, x₂ ∈ X are a positively labeled pair if they are known to be similar in input space (e.g., both pictures of cats), and a negatively labled pair if they known to not be similar. The goal is to successfully train using a relatively small number of positive and negative pairs.
Data: MNIST, CIFAR-10, text data

Autoencoding

Hard instance generation

Problem: Some computational problems are known to be intractable in the worst case, but are still widely studied at a heuristic level. One such problem is 3-CNF-SAT: given a boolean 3-CNF formula φ, answer whether there an assignment to its variables that satisfies it. heuristics for 3-CNF-SAT have been studied extensively, but there are still some formulas that are very difficult to solve (requiring days of processing). This project's goal is to apply autoencoders to characterize these hard problems, to lend insight on why hard instances are hard.
Data from competitions

Action-Conditional Video Prediction

Create a model which, conditioned on the current observation of a stochastic MDP environment (e.g., Atari or robotic pushing environment) and the actions taken by an existing policy, predicts future states of the stochastic environment
Very strongly related to reinforcement learning theory and applications
Proposer: Paul Quint, with high willingness in long-term collaboration after the semester and to push ideas to publication
Related papers and blog posts

Reinforcement Learning

Play a new game with AlphaZero

Problem: Re-implement Deepmind's famous approach to Go/Chess/Shogi for a new game
Data: Environment of students' choosing, ideally some difficult, perfect knowledge, deterministic, two player game
Proposer: Paul Quint, with willingness to collaborate after the semester
Related papers and blog posts

Play Starcraft 2 Minigames

Problem: Play minigames in the StarCraft II Learning Environment, a cutting edge problem in Deep RL
Environment: https://github.com/deepmind/pysc2
Notes: This project could go in a lot if directions involving planning, AI, multi-agent systems, and more
Proposer: Paul Quint, with willingness to collaborate after the semester
Related papers and blog posts

Last modified 01 May 2018; please report problems to sscott.