Hi all,
Next Wednesday 9/12, we will discuss a recent work, code2vec, on generating embeddings for code.
The purpose of an embedding is to map symbolic entities into vector space while still preserving the relations between them in terms of distance. It can
be viewed as a feature extraction process and is often the first layer for a deep learning model which works with symbolic inputs. The authors introduce the idea of viewing source code as bags of abstract syntax tree paths, and use them to better generate
method or token embeddings, and also predict method names. In another closely related work [2] by the same authors, they use the same representation to predict variable types and other ML tasks over source code in Java, Python and _javascript_.
Time: 4pm, Wednesday, September 12
Location: CS 3310
Presenter: Jinman Zhao (jz@xxxxxxxxxxx)
Paper:
[1] code2vec: Learning Distributed Representations of Code
Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav
FloC 2018 Machine Learning for Programming
https://arxiv.org/abs/1803.09473
Related paper (in case you want to go deeper):
[2] A General Path-Based Representation for Predicting Program Properties
Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav
PLDI 2018
https://arxiv.org/abs/1803.09544
Hope to see you there!
Jinman Zhao