Vũ Lê

PortraitI am a Senior Research Software Engineer at Microsoft, working on the PROSE framework.
I obtained my PhD in 2015 at UC Davis, advised by Zhendong Su and Sumit Gulwani. My research is about developing novel and practical techniques to help improve the quality of critical software (e.g., compilers and database engines) and make programming accessible for end users.
  • Email:
  • Phone: 425-421-4582
  • Address: 1 Microsoft Way
    Redmond, WA 98052
  • Curriculum Vitae: Download


University of California, Davis, USA

Ph.D. (2009 - 2015)

Advisors: Zhendong Su and Sumit Gulwani
Thesis: Program Synthesis for Empowering End Users and Stress-Testing Compilers

University of Technology - Vietnam National University

B.Eng. in Computer Science (2001 - 2006)


  • PLDI Distinguished Paper Award, PLDI '14, Edinburgh, 2014
  • Best Graduate Researcher Award, Computer Science Department, UC Davis, 2014
  • Gold Medal at ACM Student Research Competition, PLDI '13, Seattle, 2013
  • PhD Fellowship, Vietnam Education Foundation, 2009


Research Projects



  • Period: July 2015 - Present
  • Collaborators: PROSE team

Microsoft Program Synthesis using Examples (PROSE) SDK is a framework of technologies for programming by examples – automatic generation of programs from input-output examples at runtime. Given a domain-specific language (DSL) and some input-output examples for the desired program’s behavior, PROSE algorithms synthesize a ranked set of DSL programs that are consistent with the given examples. The PROSE SDK also includes a pre-defined set of specific technologies for various data wrangling domains, such as: FlashFill, a library of string transformations from Microsoft Excel 2013; text extraction library with a DSL of programs that find desired regions in a textual file; web extraction library with a DSL of programs that select elements on a web page.

Critical Software Validation

UC Davis

  • Period: January 2013 - Present
  • Collaborators: Merhdad Afshari, Zhendong Su, Chengnian Sun

An umbrella project whose goal is to improve the quality of all aspects of critical software, such as compilers and database engines. We introduce Equivalence Modulo Inputs, a novel methodology to generate valid tests from existing programs to find bugs in compiler optimizers [PLDI14a, OOPSLA15]. We are also developing techniques to find bugs in other important compiler components: link-time optimizers [ISSTA15] and warning infrastructures [ICSE16]. In total, we have reported 600+ new bugs (majority are miscompilation) in GCC and LLVM, most of which have been confirmed and fixed by developers.


Microsoft Research

  • Period: June 2014 - June 2015
  • Main Collaborator: Sumit Gulwani

A project that combines various program synthesis techniques and tools (e.g., FlashExtract, FlashFill, and NLyze) to enable users to perform more sophisticated tasks that otherwise cannot be done using a single technique alone [UIST15]. For example, we enable editing by examples by using FlashExtract to highlight regions that need to be edited repetitively, and applying transformation tools such as FlashFill to transform these regions.


Microsoft Research

  • Period: April 2013 - present
  • Collaborator: Sumit Gulwani

A general framework that allows users to extract relevant data from semi-structured documents (text files, web pages, spreadsheets) using examples (demo video) [PLDI14b]. FlashExtract has been shipped as part of PowerShell in Windows 10 and as the Custom Fields feature in Microsoft Operations Management Suite.


UC Davis, Microsoft Research

  • Period: January 2011 - September 2013
  • Collaborators: Peli de Halleux, Sumit Gulwani, Zhendong Su

An end-to-end programming system for synthesizing smartphone automation scripts from natural language descriptions [MobiSys13a]. SmartSynth has been integrated into TouchDevelop, a popular programming environment on mobile devices (demo video) [MobiSys13b].


UC Davis

  • Period: June 2010 - July 2012
  • Collaborators: Earl Barr, Zhendong Su, Thanh Vo

A practical symbolic execution system that was specifically designed for detecting floating-point exceptions [POPL13]. Ariadne has found 2,091 floating-point exceptions in the GNU Scientific Library (GSL).

Work Experience

Research Software Engineer

Microsoft Corporation

  • Period: July 2015 - present
  • Team: PROSE

Working on the PROSE framework.


Microsoft Research Redmond

  • Period: October 2014 - June 2015
  • Supervisor: Sumit Gulwani

Worked on the FlashM project that combines multiple synthesis paradigms. Improving FlashExtract logics and ranking. Collaborating with other product groups in Microsoft to bring FlashExtract technology to more customers.

Research Intern

Microsoft Research Redmond

  • Period: June 2014 - September 2014
  • Supervisor: Sumit Gulwani

Worked with the PowerShell team to help transfer FlashExtract to PowerShell (shipped with Windows 10).

Research Intern (twice)

Microsoft Research Redmond

  • Period: April 2013 - September 2013
  • Supervisor: Sumit Gulwani

Designed and developed FlashExtract, a general program synthesis framework that enables data extraction from semi-structured documents using examples (published in PLDI 2014).

Research Intern

Microsoft Research Redmond

  • Period: June 2012 - September 2012
  • Supervisor: Peli de Halleux and Sumit Gulwani

Designed and developed a natural language front-end for TouchDevelop to synthesize TouchDevelop scripts from English keywords (integrated into TouchDevelop).