Large-scale program analysis for language evolution

Filip Křikava, 6 Dec 2021

Programming languages need to evolve constantly otherwise they fall out of favor, become neglected and lost. The hard part of growing a language is to make the changes as little disruptive as possible. Each change has to be carefully reviewed for its impact on the ecosystem. However, until recently, language designers and engineers had only a few means to understand the impact of a programming language change. The cloud code-hosting websites have changed that by making code a shared resource and giving everyone access to a huge number of open-source projects. That has opened whole new opportunities in language evolution. We can use program analysis to get empirical evidence about how language is used in real-world code.

In this talk I will presents some of our experience in conducting large-scale program analyses of public code repositories. Concretely, I will show you four different analyses with which we try to answer the following questions: (1) How well can automated trace-based unit test extraction actually work in practice for R? The aim is to reduce the burden of writing a comprehensive unit test suites in the cases where it may be possible for a tool to extract them automatically from a client code. (2) What expressive power do we need to ascribe types to R function? The goal is to retrofit a type system in R that would benefit the users in making the code more reliable and increase our assurance data analysis for which R is used so much. (3) How are Scala implicits used in the wild? The aim is to provide empirical evidence on the use and misuse of this distinct Scala feature. And finally (4), how is eval used in R? In order to be able to employ pragmatic static analysis in R, we need to understand the scope and scale of eval in R programs.