CSCE 496/896-005 (Spring 2019) Project Ideas

From the syllabus:

In this course you and your team will do a substantial project, in which you will characterize a significant problem amenable to a deep learning solution, study the related work to this problem, develop one or more deep learning approaches to this problem, and evaluate your approaches.

You will summarize your project results in a written report and an oral presentation. The written report must use a professional writing style similar to that found in a refereed conference or journal (e.g., ACM, IEEE, ICML, ICLR, NIPS), including abstract, introduction, summary of related work, your contribution, references, and an appendix (if necessary). The oral presentation will be to the entire class at the end of the semester: during the fifteenth week (April 22–26), and if necessary, during the fourteenth week (April 15–19). You will submit your written report no later than 11:59 p.m. on April 24. In accordance with UNL policies, you have now been informed in writing of the nature and scope of this project prior to the eighth week of classes.

Later this semester (late February to early March) we will set a deadline for submission of 1–3 paragraph proposals on your projects. Also, in late March, you will submit to us a brief progress report and meet with us for a check-in on your project. You must do both of these in order to get full credit for your project, and you must get our approval on your proposal before starting work on your project.

Projects are officially due by 11:59 pm on Wednesday, April 24. See the rules on projects for more information.

Project oral presentations are the weeks of April 15 and April 22. See the presentation schedule for more information.

✔ Project check-in reports are due Sunday, April 7. You should submit it to sscott@cse and pquint@cse in text format in the body of an email before 11:59 pm on that day. The proposal should include:

A summary of what you've accomplished on your project so far.

A description of what major elements remain for your project, including what is left over from your proposal as well as any new elements that you've added since your proposal.

An overview of any significant hurdles that have arisen.

A list of what questions you need answered to move forward.

On Monday, April 8, we will spend the hack-a-thon period reviewing each group's check-in report.

✔ PROPOSAL DEADLINE: The proposal submission deadline is Sunday, March 3. You should submit it to sscott@cse and pquint@cse in text format in the body of an email before 11:59 pm on that day. The proposal should include:

A brief statement of your project topic.

Motivation for your topic (why it is important and interesting).

A precise work plan: what you plan to do, what data sets you will test on, how you will evaluate performance, etc.

At least three references (at least two published journal or conference papers).

Project Ideas

The following is a list of possible projects, suggested by a variety of individuals. If you want to know more about a particular one, let us know and we can put you in contact with those who can provide more details.

You are welcome to suggest your own project as well, e.g., one related to your own research. It should contain some form of an experimental component, be relevant to the course, and be a non-trivial extension beyond what we covered in class.

Classification

Classification of wildlife camera trap images

Problem: The goal of this pilot project is to reduce the burden of manual viewing and classification of thousands of wildlife images using image classification tools. The images come from camera traps (motion-activated cameras), which are widely used by ecologists to collect various information on wildlife populations such as habitat use and prey vigilance. The first part of this project is to identify the species of the animal in the image. The second part is to indicate the location of each animal in the image, and to count them. The work will entail adapting existing approaches trained on African wildlife to identify Nebraska wildlife. Could become a thesis and a paper.
Proposer: Andrew Little
Related papers:

Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning by Norouzzadeh et al., 2018
Deep Learning Object Detection Methods for Ecological Camera Trap Data by Schneider et al., 2018
Machine learning to classify animal species in camera trap images: Applications in ecology by Tabak et al., 2018
Identifying Animal Species in Camera Trap Images using Deep Learning and Citizen Science by Willi et al., 2018
Researchers Successfully Train Computers to Identify Animals in Photos

Classification of programmer expertise

Problem: The goal of this project is to classify the expertise level of a programmer based on eye tracking data (time series or fixed window) collected while the programmer debugs code. Could become a thesis and a paper.
Proposer: Bonita Sharif
Related papers:

Tracing software developers' eyes and interactions for change tasks by Kevic et al., 2015.

Prediction of stackoverflow tags

Problem: The goal of this project is to predict the tags of stackoverflow questions. Data comes from the The SOTorrent Dataset and is part of the MSR 2019 Mining Challenge. Could become a thesis and a paper.
Proposer: Bonita Sharif
Related papers:

SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts by Baltes et al., 2018

Classification of animal feed types

Problem: Given chemical analysis results on animal feed, predict its type (e.g., soy-based, corn-based). Could become a thesis and a paper.

Deep learning in wireless networks

Problem: Apply deep learning to problems in wireless networking. Data collected from the NEXTT testbed of a cloud-radio access network (C-RAN)-based experimental wireless network. Possibilities include, but are not limited to, channel estimation, bandwidth sharing, and others related to massive MIMO. Could become a thesis and a paper.

Defense against adversarial examples

Problem: Several image classification systems have been shown to be tricked into gross misclassification by making small, sometimes imperceptible changes to the input image, e.g., tricking the system into thinking a panda is a gibbon. An open problem is how to modify an architecture or regularizer to make classifiers more robust against such minor changes.
Related papers:

Association training

Problem: Assist an ongoing project to train a system to map inputs (images, text, sounds, etc.) from input space X to an embedded vector in ℝ^d such that two similar instances from X have embeddings that are near each other in ℝ^d under some distance measure such as Euclidean distance. This would potentially allow the use of locality-sensitive hashing on the embeddings to enable very fast information retrieval of similar instances from X. Similar to a classification problem, but labels are associated with pairs of inputs. E.g., two instances x₁, x₂ ∈ X are a positively labeled pair if they are known to be similar in input space (e.g., both pictures of cats), and a negatively labled pair if they known to not be similar. The goal is to successfully train using a relatively small number of positive and negative pairs.
Data: MNIST, CIFAR-10, text data

EEG analysis

Problems:

An EEG dataset where subjects are seeing a seeing a series of face, scene, and object images on the screen. The classification problem is what category the person is seeing. They have 17 subjects with pretty good data (and many more subjects less good), with a few hundred instances per subject per category.
An EEG dataset where subjects are doing a visual short-term memory task: they see either 1 or 2 items (small colored discs), on either the left or right side of the screen, and then they have to remember the locations of the items for a second or two and then report the location of one of them. The classification problem is to predict whether subjects are seeing 1 or 2 items, and whether those items are on the left or right.
An fMRI dataset where subjects are viewing a movie that's about 5 minutes long, and they see the same movie several times throughout the experiment. They have about 20 subjects' worth of data. An issue is that one cannot easily combine data across human beings with fMRI the way one can with EEG because the anatomy is so different between people, so there is no straightforward way to correspond the features to each other across subjects. However, they have an fMRI image every second, so each person probably has a couple thousand instances to work with, where each instance would correspond to 1 second of the video.

Reinforcement Learning

Learning how to apply genetic algorithm operators for software assurance

Problem: Use reinforcement learning to learn a policy to select which genetic algorithm (GA) action is most appropriate for evolving solutions for formal methods of software assurance.
Data: Kodkod relational models are fed into a GA for solving. The RL agent will learn how to select which GA operators for fastest solving.
Proposer: Hamid Bagheri
Related Paper:

Selecting evolutionary operators using reinforcement learning: Initial explorations

Play a new game with AlphaZero

Problem: Re-implement Deepmind's famous approach to Go/Chess/Shogi for a new game
Data: Environment of students' choosing, ideally some difficult, perfect knowledge, deterministic, two player game
Proposer: Eleanor Quint, with willingness to collaborate after the semester
Related papers and blog posts:

Play Starcraft 2 Minigames

Problem: Play minigames in the StarCraft II Learning Environment, a cutting edge problem in Deep RL
Environment: https://github.com/deepmind/pysc2
Notes: This project could go in a lot if directions involving planning, AI, multi-agent systems, and more
Proposer: Eleanor Quint, with willingness to collaborate after the semester
Related papers and blog posts:

Imputation

Predicting missing flight recorder data

Problem: Given data (time series or fixed window) from an airplane's flight recorder, predict values that are missing in the sequence. Could be a thesis or a paper.

Autoencoding

Hard instance generation

Problem: Some computational problems are known to be intractable in the worst case, but are still widely studied at a heuristic level. One such problem is 3-CNF-SAT: given a boolean 3-CNF formula φ, answer whether there an assignment to its variables that satisfies it. heuristics for 3-CNF-SAT have been studied extensively, but there are still some formulas that are very difficult to solve (requiring days of processing). This project's goal is to apply autoencoders to characterize these hard problems, to lend insight on why hard instances are hard.
Data from competitions

Action-Conditional Video Prediction

Problem: Create a model which, conditioned on the current observation of a stochastic MDP environment (e.g., Atari or robotic pushing environment) and the actions taken by an existing policy, predicts future states of the stochastic environment. Very strongly related to reinforcement learning theory and applications
Proposer: Eleanor Quint, with high willingness in long-term collaboration after the semester and to push ideas to publication
Related papers and blog posts:

Miscellaneous Applications

An application of graph convolutional kernels

Problem: Graph convolutional kernels are analogous to convolutions on images, but they try to identify features in the adjacency matrices of graphs. Useful in node labeling, edge identification, and graph compression.
Data: Graph data from the Stanford Large Network Dataset Collection (SNAP), and other graphs
Related papers and blog posts:

GRAPH CONVOLUTIONAL NETWORKS blog post

Computer-aided smart user interface design

Problem: Computer aided smart user interface (UI) design can be a cool deep learning application. In practice, assessment and refactoring UI involve human participants. However, evaluating designs by looking at historical data can reduce the need for human intervention. Can be in both supervised and semi-supervised setting.
Related papers:

Application of convolutional neural networks for genomic motif finding

Problem: CNNs have been applied to natural language processing (NLP). The phrases are encoded into a format that looks like Braille and input to the network. This project is to determine if that could be used for finding genomic motifs for bacterial species. Conventionally, these motifs are k-mers and their frequencies in a species' genome is considered a signature for that species. Perhaps one could use a CNN and autoencoder to come up with relevant motifs instead of enumerating all k-mers. The data in this case would be sequencing reads.
Related links:

Last modified 15 April 2019; please report problems to sscott.