Update awesome-machine-learning-on-source-code with the recent papers #91

vmarkovtsev · 2019-07-30T10:56:55Z

I feel that we are missing many new papers.

EgorBu · 2019-08-02T11:39:00Z

I will add:

Learning Compositional Neural Programs with Recursive Tree Search and Planning from ..., Nando de Freitas

We propose a novel reinforcement learning algorithm, AlphaNPI, that incorporates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and increase interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. Using this specification, AlphaNPI is able to train NPI models effectively with RL for the first time, completely eliminating the need for strong supervision in the form of execution traces. The experiments show that AlphaNPI can sort as well as previous strongly supervised NPI variants. The AlphaNPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disk

Program Synthesis and Semantic Parsing with Learned Code Idioms from Richard Shin, Miltiadis Allamanis, Marc Brockschmidt, Oleksandr Polozov

Program synthesis of general-purpose source code from natural language specifications is challenging due to the need to reason about high-level patterns in the target program and low-level implementation details at the same time. In this work, we present PATOIS, a system that allows a neural program synthesizer to explicitly interleave high-level and low-level reasoning at every generation step. It accomplishes this by automatically mining common code idioms from a given corpus, incorporating them into the underlying language for neural synthesis, and training a tree-based neural synthesizer to use these idioms during code generation. We evaluate PATOIS on two complex semantic parsing datasets and show that using learned code idioms improves the synthesizer's accuracy.

Modeling Vocabulary for Big Code Machine Learning from Hlib Babii, Andrea Janes, Romain Robbes

When building machine learning models that operate on source code, several decisions have to be made to model source-code vocabulary. These decisions can have a large impact: some can lead to not being able to train models at all, others significantly affect performance, particularly for Neural Language Models. Yet, these decisions are not often fully described. This paper lists important modeling choices for source code vocabulary, and explores their impact on the resulting vocabulary on a large-scale corpus of 14,436 projects. We show that a subset of decisions have decisive characteristics, allowing to train accurate Neural Language Models quickly on a large corpus of 10,106 projects.

Towards Neural Decompilation from Omer Katz, Yuval Olshaker, Yoav Goldberg, Eran Yahav

We address the problem of automatic decompilation, converting a program in low-level representation back to a higher-level human-readable programming language. The problem of decompilation is extremely important for security researchers. Finding vulnerabilities and understanding how malware operates is much easier when done over source code.
The importance of decompilation has motivated the construction of hand-crafted rule-based decompilers. Such decompilers have been designed by experts to detect specific control-flow structures and idioms in low-level code and lift them to source level. The cost of supporting additional languages or new language features in these models is very high.
We present a novel approach to decompilation based on neural machine translation. The main idea is to automatically learn a decompiler from a given compiler. Given a compiler from a source language S to a target language T , our approach automatically trains a decompiler that can translate (decompile) T back to S . We used our framework to decompile both LLVM IR and x86 assembly to C code with high success rates. Using our LLVM and x86 instantiations, we were able to successfully decompile over 97% and 88% of our benchmarks respectively.

But there are much more in http://arxiv-sanity.com.
I will review saved articles and add missed one - so we will have better recommendations of articles soon.

vmarkovtsev assigned m09, EgorBu and warenlg Jul 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update awesome-machine-learning-on-source-code with the recent papers #91

Update awesome-machine-learning-on-source-code with the recent papers #91

vmarkovtsev commented Jul 30, 2019

EgorBu commented Aug 2, 2019

Update awesome-machine-learning-on-source-code with the recent papers #91

Update awesome-machine-learning-on-source-code with the recent papers #91

Comments

vmarkovtsev commented Jul 30, 2019

EgorBu commented Aug 2, 2019