Checking machine-checked proofs

I happen to serve on a regular basis as a reviewer for submissions dealing with formalized mathematics. These submissions often consist of regular papers, accompanied with the source code of a library of formal proofs. This source code is meant to be machine-checked with a certain proof assistant. Inspecting a formal library submitted along with a paper is not an easy task and there are several layers of verification that a reviewer can perform:

Check that the files are there, that they are not corrupted, that they look like source files for the target proof assistant;
Check that the formalization actually compiles using the documented version of the indicated proof assistant;
Look for the list of global axioms, parameters, unfinished proofs, etc. and check that they coincide with the ones acknowledged in the paper description;
Check that the code contains all the material (definitions and theorems) claimed in the paper, and that the paper description is faithful to the code.
Evaluate the relevance of the formalization choices.
Assess the overall quality of the code.

Submitting a paper describing a formal library at a conference like the International Conference on Interactive Theorem Proving (ITP) or the International Conference on Certified Programs and Proofs conference (CPP) is demanding: these venues impose the submission of source files along with any submission describing a formal development. Thus not only should the authors write a good paper, but they also have to submit a formal library, to be scrutinized by the referees. Being a reviewer for such a conference is demanding as well: not only should the referees read and evaluate the paper, but they also have to assess the quality of the formal library, whatever that means, in the time frame of the conference.

Ideally, each submission should go through each of these assessments but my experience is that it is far from always the case. Steps 4., 5. and 6. are what I would call the ‘normal’ duty of a reviewer: only a human eye can check that the formalization does not call a cat a dog, that it avoids studying the numerous properties of the empty set, or that it pays attention to the readability and maintainability of their sources. Step 1. is often performed by the chairs/editors of the conference/journal, which helps in particular preserving the anonymity of reviewers. Ideally, steps 2. and 3. could largely be automated. One could even say that it is one of the features and purposes of machine-checked mathematics… But in the current state of affairs, these are the steps which waste a lot of energy. And I must confess that I often skip these checks myself, in order to keep enough time to devote to steps 4. 5. and 6.

For instance, not all formal developments are based on a mainstream version of a mainstream system, so in order to review a paper, you may well end up needing first to install a (new) proof assistant on your machine, plus maybe a couple of prerequisite formal libraries. Second, in any case, you need to understand how to compile the submission in batch mode, and also to know the commands of the proof assistant useful for the sanity check step c). None of the mainstream proof assistant I am aware of provide such a quick reference guide for reviewers, so I usually have to delve into the user manual of each of them to find this information. And when something goes wrong with compilation, and I am not an expert of the system under consideration, I usually just do not have enough time to understand the problem. So I just give up and make the assumption that the problem comes from my local naive configuration. I suspect that this unsatisfactory situation also incurs a fairness issue: depending on the reviewing process lottery, not all submissions are inspected along the same protocol. From my experience again, it is furthermore not easy to know from their report which nature of inspection the reviewers were able to perform on a given submission.

These issues are not new of course. Back in 2013, Andrej Bauer wrote a very interesting blog post about “How to review formalized mathematics“. This post discusses the ideal reviewing process for formalized mathematics as well as the issues of submitting, reviewing and publishing a paper when formal proofs are an essential part thereof. Since then, we have seen more and more submitted articles which describe some pieces of formalized mathematics or even present a new result with an accompanying formal proof. Beside specialized venues like CPP or ITP, the programs of several mainstream conferences in computer science, like the annual Principles Of Programming Languages Symposium (POPL) or the annual International Conference on Functional Programming (ICFP), feature each year papers which mention an accompanying formal proof. These conference have a much broader spectrum than ITP or CPP. Not all the papers are related with pieces of software, and these pieces of software, when relevant, are also of a broader nature. In 2011, a team of researchers concerned with evaluation of code in conferences related to the topics of programming languages and software engineering started the Artifact Evaluation Process initiative. It seems that this initiative was quite successful and it made its way through several high profile venues like POPL, CAV, PLDI,…

The case of conferences devoted to formal verification is probably a bit more specific and the situation might improve significantly if all the players make a move in the same direction. Here are a few examples:

Suggestions for system developers: Post accessible, referee-friendly documentation of the system with information on:

the quickest way to build the system and test a development for correctness
how to determine what axioms and assumptions were used, what parts of the code are incomplete (e.g. with “admit” or “sorry”).

Suggestions for authors of papers:

Provide clear instructions for how to build and test the artifacts;
Make sure the source code is packaged to make that as easy as possible;
Explicit the links between the paper description and the formalized material, either in the paper, possibly in an appendix, or as comments in the sources or may be even in both.

Suggestions for conference organizers and journals:

Clarify to authors the expectations for submissions of formal artifacts;
Clarify to referees the expectations for reviewing;
When artifact submissions are mandatory, make sure that at least
one referee has inspected the code (whatever that means).

Suggestions for reviewers:

Give your best to evaluate artifacts. Even a glance is useful;
Be up front with co-referees, editors, etc. about what you were able to do.

All the opinions expressed here are mine but I would like to thank Jeremy Avigad, Yves Bertot and Laurent Théry for their comments and suggestions.

Leave a Reply Cancel reply