[madPL] Summer-internship opportunities for summer 2022


Date: Thu, 11 Nov 2021 21:26:25 +0000
From: Thomas Reps <reps@xxxxxxxxxxx>
Subject: [madPL] Summer-internship opportunities for summer 2022

Hi,

 

From Wei Zhang, who did his Ph.D. with Shan Lu before she moved to U. Chicago.

 

Tom

From: Wei Zhang <weiz@xxxxxxxxxx>
Sent: Thursday, November 11, 2021 2:53 PM
To: Thomas Reps <reps@xxxxxxxxxxx>
Subject: hi from Wei Zhang (IBM Research) and looking for summer interns 2022

 

Hi Tom, 

 

How are you ? This is Wei, Shan's student. It has been 8 years since I graduated! I hope you are doing well! 

 

A little update from me: After I joined IBM, I worked in the programming language department (2nd line manager was Michael Hind and 1st line manager was David Grove) for about 2 years before I moved on to the AI research. In the past 6 years or so, I had quite fun collaborating with different set of researchers (speech recognition, computer architects, physicists , and material scientists). Most of my research has been around systems for machine learning. My publication list is at https://researcher.watson.ibm.com/researcher/view_person_pubs.php?person=us-weiz&t=1

 

 

In the last year, I have been working on a project called CodeNet (see https://arxiv.org/abs/2105.12655), which is  accepted by the NeurIPS 2021 dataset track. It is a dataset curated by our group. The dataset consists of 14 millions  source code files (written in different languages, e.g., c/c++/java/python) from programming exercise websites (i.e., leetcode-like websites) The GitHub repo is at https://github.com/IBM/Project_CodeNet, so far it has over 1000 stars :) I have been very happy with this project, as it is a combination of what I did when I was in Wisconsin (PL/SE) and what I have been doing at IBM (AI). 

 

Our group is looking for summer interns in 2022. In my mind, we can explore many traditional PL/SE tasks -- bug finding, bug fixing, runtime performance prediction and etc --  since we have lots of meta data for this dataset (i.e., if the program passed tests, CPU run time, Memory consumption and etc). The programs are self-contained (one file per program) so it could be really an ideal case study -- let's see if AI can help solve these self-contained tasks first. So far, we have been applying Language Modeling, Graph Neural Network, and Transformer-like Machine Translation techniques to the bug fixing problem. The project is quite open-minded -- at this stage we want to  explore different AI techniques (one area that we haven't been touching is reinforcement learning) and we can even try to explore building some testing input generation platforms via traditional PL/SE techniques (e.g., Symbolic Executions) so that our dataset can be reasonably augmented.

 

I myself have been applying some GNNs to the parse trees in the CodeNet and find they are quite good at identifying semantically similar problems! And I thought it would be great if we could add more control flow and data flow information into the graphs :) Julian (Dolby) did that for the JAVA benchmark via WALA, but we don't have a tool as sophisticated as CodeSurfer for C++ programs :) (C++ programs are by far the largest subset in our dataset)

 

I think it will be a great fit for your students if they are interested in applying some AI techniques to PL/SE problems and trying to see how smart (or stupid) AI is ! If you want to hear more about this project, please let me know. We don't have a very concrete plan yet as we are still trying to define what are our end goals, you and your students are welcome to brainstorm what are the interesting things to do!

 

Orthogonal to this topic, our group is also looking at large model support (i.e. model parallelism) since very likely we are going to use some large models (GPT-like) and this can fall into the more distributed training style research, which I have been doing a lot in the past few years. The students would be welcome to work on either one of these two topics (AI for Code or System for AI for Code)! 

 

Finally, I will also write a similar email to Somesh and see if he knows any student who might be interested in this. Other than you and Somesh, I don't know who else in Wisconsin I should contact regarding this project. Please feel free to forward the email to some other faculty/students who might be interested in this.  

 

Thanks!

 

Wei 

 

P.S.: Our CEO did talk a little about this project in the Press (e.g., https://www.forbes.com/sites/moorinsights/2021/06/04/ibm-codenet-artificial-intelligence-that-can-program-computers-and-solve-a-100-billion-legacy-code-problem/?sh=28ac21086cdc ) But that is more about hyping things up.  

 

[← Prev in Thread] Current Thread [Next in Thread→]
  • [madPL] Summer-internship opportunities for summer 2022, Thomas Reps <=