📤 Submit Your Results

We welcome submissions from the research community! Follow the guidelines below to add your proof synthesis system to the leaderboard.

🎯 Benchmarks

You can submit results for either or both benchmarks:

📦 Getting the Benchmarks

Both benchmarks are available in the benchmarks/ directory of our repository. Each directory contains a tasks.jsonl file for programmatic access.

📋 Submission Process

1 Run Evaluation
Run your proof synthesis system on the benchmark tasks. Each task should be verified using the Verus verifier. A task is considered solved if Verus verification succeeds.
2 Collect Results
Record the number of solved tasks, average time per task, and average cost per task (in USD). If possible, include a per-source/per-project breakdown.
3 Create Submission JSON
Format your results according to our schema (see below). Include links to your paper and code repository.
4 Submit via Pull Request
Open a pull request to our GitHub repository adding your entry to the appropriate leaderboard/data/*.json file. Include a brief description of your approach.

📄 Submission Schema

Your submission should follow this JSON format:

{
  "submission_id": "your-system-model-version",
  "system_name": "Your System Name",
  "model": "LLM Model Used",
  "date": "YYYY-MM-DD",
  "results": {
    "solved": 135,
    "total": 150,
    "percent_solved": 90.0,
    "avg_time_seconds": 28.5,
    "avg_cost_usd": 0.25
  },
  "breakdown": [
    {"category": "CloverBench", "solved": 11, "total": 11},
    {"category": "MBPP", "solved": 72, "total": 78}
  ],
  "paper_url": "https://arxiv.org/abs/...",
  "code_url": "https://github.com/...",
  "verified": false,
  "notes": "Brief description of your approach"
}

Required Fields

Optional Fields

⚠️ Rules & Guidelines

No Cheating

Submissions using trivial solutions (e.g., assume false, #[verifier::external_body]) to fake verification will be rejected. We may run spot-checks on submitted solutions.

✅ Verification Levels

Submissions are labeled with verification status:

To expedite verification, please provide detailed reproduction instructions and consider making your evaluation scripts publicly available.

📧 Contact

For questions about submissions, open an issue on our GitHub repository or email the maintainers.