On the Naturalness of Software: A Research Vision

What: On the Naturalness Software: A Research Vision
Who: Earl Barr (University College London)
When: June 20th, 2014, 15:00 – 15:45
Where: Inria Lille, Salle Plénière, http://goo.gl/maps/32z7m

Abstract: Natural languages like French are rich, complex, and powerful. The highly creative and graceful use of languages like French and Tamil, by masters like Flaubert and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension. Our work rests on the hypothesis that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations—and thus, like natural language, it is also likely to be repetitive and predictable. Using the widely adopted n-gram model, we validate this hypothesis and show how one can exploit the local repetitiveness of code to increase programmer productivity using code completion as our motivating example. This work has opened a new avenue of research — the application of large corpus statistical inference directly to code — that has seen much recent activity, including papers at all the top software engineering venues tackling problems as machine translation applied to the code porting problem (ASE), coding convention and idiom inference (FSE), test case generation (ICST), syntactic error localization (MSR), detecting errors in API documentation (OOPSLA), and code synthesis (PLDI).

Biography: Earl Barr is a lecturer at the University College London. He received his Ph.D. (2009) degree in Computer Science at the University of California at Davis. He won the I3P Fellowship from the US Department of Homeland Security in 2010. He is a co-investigator on SemaMatch, a UK EPSRC grant on malware. Dr. Barr’s research interests include testing and program analysis, empirical software engineering, computer security, and distributed systems. His recent work focuses on testing and analysis of numerical software, debugging, malware, obfuscation, and the application of NLP techniques to source code.