Advisors: Zhendong Su and Sumit Gulwani Thesis: Program Synthesis for Empowering End Users and Stress-Testing Compilers
Microsoft Program Synthesis using Examples (PROSE) SDK is a framework of technologies for programming by examples – automatic generation of programs from input-output examples at runtime. Given a domain-specific language (DSL) and some input-output examples for the desired program’s behavior, PROSE algorithms synthesize a ranked set of DSL programs that are consistent with the given examples. The PROSE SDK also includes a pre-defined set of specific technologies for various data wrangling domains, such as: FlashFill, a library of string transformations from Microsoft Excel 2013; text extraction library with a DSL of programs that find desired regions in a textual file; web extraction library with a DSL of programs that select elements on a web page.
An umbrella project whose goal is to improve the quality of all aspects of critical software, such as compilers and database engines. We introduce Equivalence Modulo Inputs, a novel methodology to generate valid tests from existing programs to find bugs in compiler optimizers [PLDI14a, OOPSLA15]. We are also developing techniques to find bugs in other important compiler components: link-time optimizers [ISSTA15] and warning infrastructures [ICSE16]. In total, we have reported 600+ new bugs (majority are miscompilation) in GCC and LLVM, most of which have been confirmed and fixed by developers.
A project that combines various program synthesis techniques and tools (e.g., FlashExtract, FlashFill, and NLyze) to enable users to perform more sophisticated tasks that otherwise cannot be done using a single technique alone [UIST15]. For example, we enable editing by examples by using FlashExtract to highlight regions that need to be edited repetitively, and applying transformation tools such as FlashFill to transform these regions.
A general framework that allows users to extract relevant data from semi-structured documents (text files, web pages, spreadsheets) using examples (demo video) [PLDI14b]. FlashExtract has been shipped as part of PowerShell in Windows 10 and as the Custom Fields feature in Microsoft Operations Management Suite.
An end-to-end programming system for synthesizing smartphone automation scripts from natural language descriptions [MobiSys13a]. SmartSynth has been integrated into TouchDevelop, a popular programming environment on mobile devices (demo video) [MobiSys13b].
A practical symbolic execution system that was specifically designed for detecting floating-point exceptions [POPL13]. Ariadne has found 2,091 floating-point exceptions in the GNU Scientific Library (GSL).
Working on the PROSE framework.
Worked on the FlashM project that combines multiple synthesis paradigms. Improving FlashExtract logics and ranking. Collaborating with other product groups in Microsoft to bring FlashExtract technology to more customers.
Worked with the PowerShell team to help transfer FlashExtract to PowerShell (shipped with Windows 10).
Designed and developed FlashExtract, a general program synthesis framework that enables data extraction from semi-structured documents using examples (published in PLDI 2014).
Designed and developed a natural language front-end for TouchDevelop to synthesize TouchDevelop scripts from English keywords (integrated into TouchDevelop).