MigrationBench

1. 📖 Overview
- 1.1 MigrationBench: Dataset and Evaluation Framework
- 1.2 SDFeedback: Migration with LLMs
2. 🤗 MigrationBench Datasets
3. Code Migration Evaluation
4. 📚 Citation

1. 📖 Overview

MigrationBench is a library to access code migration success, in an automated and robust way.

Reference paper: MigrationBench: Repository-Level Code Migration Benchmark from Java 8

1.1 MigrationBench: Dataset and Evaluation Framework

The name MigrationBench is used for both the dataset and the evaluation framework for code migration success:

🤗 MigrationBench is a large-scale code migration benchmark dataset at the repository level, across multiple programming languages.
- Current and initial release includes java 8 repositories with the maven build system, as of May 2025.
MigrationBench (current Github package) is the evaluation framework to assess code migration success, from java 8 to 17 or any other long-term support (LTS) versions.

The evaluation is an approximation for functional equivalence by checking the following:

The repo is able to build and pass all tests
Compiled classes' major versions are consistent with the target java version
- 52 and 61 for java 8 and 17 respectively
Test methods are invariant after code migration
Number of test cases is non-decreasing after code migration
The repos' dependency libraries match their latest major versions
- Optional for minimal migration by definition, while
- Required for maximal migration

1.2 SDFeedback: Migration with LLMs

SDFeedback is a separate Github package to conduct code migration with LLMs as a baseline solution, and it relies on the current package for the final evaluation.

It builds an ECR image and then
It runs both code migration and final evaluation with Elastic Map Reduce (EMR) Serverless in a scalable way.

2. 🤗 MigrationBench Datasets

There are three datasets in 🤗 MigrationBench:

All repositories included in the datasets are available on GitHub, under the MIT or Apache-2.0 license.

Index	Dataset	Size	Notes
1	🤗 `AmazonScience/migration-bench-java-full`	5,102	Each repo has a test directory or at least one test case
2	🤗 `AmazonScience/migration-bench-java-selected`	300	A subset of 🤗 `migration-bench-java-full`
3	🤗 `AmazonScience/migration-bench-java-utg`	4,814	The unit test generation (utg) dataset, disjoint with 🤗 `migration-bench-java-full`

3. Code Migration Evaluation

We support running code migration evaluation for MigrationBench in two modes:

Single eval mode: For a single repository and
Batch eval mode: For multiple repositories

3.1 Get Started

To get started with code migration evaluation from java 8 to 17, under either minimal migration or maximal migration (See the arXiv paper for the definition):

3.1.1 Basic Setup

Verify you have java 17, maven 3.9.6 and conda (optional) locally:

# java
~ $ java --version
openjdk 17.0.15 2025-04-15 LTS
OpenJDK Runtime Environment Corretto-17.0.15.6.1 (build 17.0.15+6-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.15.6.1 (build 17.0.15+6-LTS, mixed mode, sharing)

# maven
~ $ mvn --version
Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)
Maven home: /usr/local/bin/apache-maven-3.9.6
Java version: 17.0.15, vendor: Amazon.com Inc., runtime: /usr/lib/jvm/java-17-amazon-corretto.x86_64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.10.236-208.928.amzn2int.x86_64", arch: "amd64", family: "unix"

# conda (Optional)
$ conda --version
conda 25.1.1

3.1.2 Install MigrationBench

git clone https://github.com/amazon-science/MigrationBench.git

cd MigrationBench

# They're optional if one doesn't need a conda env
# export CONDA_ENV=migration-bench
# conda create -n $CONDA_ENV python=3.9
# conda activate $CONDA_ENV

pip install -r requirements.txt -e .

Next, to run a single job or a batch of jobs, refer to file level comments in src/migraiton_bench/run_eval.py.

3.2 Single Eval

To run eval for a single repository, provide the Github url, a git diff file and optionally more flags:

3.2.1 Unsuccessful Eval

# cd .../src/migraiton_bench

GITHUB_URL=https://github.com/0xShamil/java-xid
GIT_DIFF_FILE=...

python run_eval.py --github_url $GITHUB_URL --git_diff_filename $GIT_DIFF_FILE

One may see the following output, as the git diff file is invalid:

...
[single] Migration success (count) `False`: `('https://github.com/0xShamil/java-xid', '...')`.
...

3.2.2 Successful Eval

python run_eval.py --github_url $GITHUB_URL --require_compiled_java_major_version 52

By redirecting the code migration target to java 8 (through require_compiled_java_major_version = 52), it should succeed without any code changes:

...
[single] Migration success (count) `True`: `('https://github.com/0xShamil/java-xid', None)`.
...

3.3 Batch Eval

To run eval for in batch mode for multiple repositories, one can provide a predictions file in the json format.

3.3.1 Sample Predictions File

For each repo, one needs to provide the Github url and the git diff content or file:

$ cat predictions.json
[
  {
      "github_url": "https://github.com/0xShamil/java-xid",
      "git_diff_file": "eval/testdata/java-xid.diff"
  },
  {
      "github_url": "https://github.com/0xShamil/java-xid",
      "git_diff": ""
  }
]

3.3.2 Run Batch Eval

# cd .../src/migraiton_bench

PREDICTIONS=predictions.json
python run_eval.py --predictions_filename $PREDICTIONS  # --require_compiled_java_major_version 52

One may see the following output, without valid git diff content or file:

...
[batch] Final eval result: Success = 0 out of 2.
...

4. 📚 Citation

@misc{liu2025migrationbenchrepositorylevelcodemigration,
      title={MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8},
      author={Linbo Liu and Xinle Liu and Qiang Zhou and Lin Chen and Yihan Liu and Hoan Nguyen and Behrooz Omidvar-Tehrani and Xi Shen and Jun Huan and Omer Tripp and Anoop Deoras},
      year={2025},
      eprint={2505.09569},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2505.09569},
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src/migration_bench		src/migration_bench
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MigrationBench

1. 📖 Overview

1.1 MigrationBench: Dataset and Evaluation Framework

1.2 SDFeedback: Migration with LLMs

2. 🤗 MigrationBench Datasets

3. Code Migration Evaluation

3.1 Get Started

3.1.1 Basic Setup

3.1.2 Install MigrationBench

3.2 Single Eval

3.2.1 Unsuccessful Eval

3.2.2 Successful Eval

3.3 Batch Eval

3.3.1 Sample Predictions File

3.3.2 Run Batch Eval

4. 📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

amazon-science/MigrationBench

Folders and files

Latest commit

History

Repository files navigation

MigrationBench

1. 📖 Overview

1.1 MigrationBench: Dataset and Evaluation Framework

1.2 SDFeedback: Migration with LLMs

2. 🤗 MigrationBench Datasets

3. Code Migration Evaluation

3.1 Get Started

3.1.1 Basic Setup

3.1.2 Install MigrationBench

3.2 Single Eval

3.2.1 Unsuccessful Eval

3.2.2 Successful Eval

3.3 Batch Eval

3.3.1 Sample Predictions File

3.3.2 Run Batch Eval

4. 📚 Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages