Structured Generative Models of Natural Source Code

Chris Maddison, Daniel Tarlow
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):649-657, 2014.

Abstract

We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have two key properties: First, they incorporate both sequential and hierarchical structure. Second, they are capable of integrating closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the probability of generating test programs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-maddison14, title = {Structured Generative Models of Natural Source Code}, author = {Maddison, Chris and Tarlow, Daniel}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {649--657}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/maddison14.pdf}, url = {https://proceedings.mlr.press/v32/maddison14.html}, abstract = {We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have two key properties: First, they incorporate both sequential and hierarchical structure. Second, they are capable of integrating closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the probability of generating test programs.} }
Endnote
%0 Conference Paper %T Structured Generative Models of Natural Source Code %A Chris Maddison %A Daniel Tarlow %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-maddison14 %I PMLR %P 649--657 %U https://proceedings.mlr.press/v32/maddison14.html %V 32 %N 2 %X We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have two key properties: First, they incorporate both sequential and hierarchical structure. Second, they are capable of integrating closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the probability of generating test programs.
RIS
TY - CPAPER TI - Structured Generative Models of Natural Source Code AU - Chris Maddison AU - Daniel Tarlow BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-maddison14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 649 EP - 657 L1 - http://proceedings.mlr.press/v32/maddison14.pdf UR - https://proceedings.mlr.press/v32/maddison14.html AB - We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have two key properties: First, they incorporate both sequential and hierarchical structure. Second, they are capable of integrating closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the probability of generating test programs. ER -
APA
Maddison, C. & Tarlow, D.. (2014). Structured Generative Models of Natural Source Code. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):649-657 Available from https://proceedings.mlr.press/v32/maddison14.html.

Related Material