Function-centric evaluation of protein representation models
PROBE evaluates fixed-length protein embeddings for how well they capture functional biological knowledge. It provides four complementary benchmarks:
Benchmark | Biological Question | Key Metric(s) | Dataset & Split |
---|---|---|---|
Semantic Similarity Inference | Do embeddings place functionally similar proteins closer together? | Spearman ρ (embedding distance vs Resnik GO similarity) | 500, 200, Sparse protein sets |
GO-Term Function Prediction | Can GO terms be predicted from embeddings? | F-max, AUROC, AUPR (5-fold CV) | Swiss-Prot Human; splits by size (High / Middle / Low); MF, BP, CC |
Drug-Target Family Classification | Can embeddings classify proteins into drug-target families? | MCC, Accuracy, Macro-F1 (10-fold CV) | Random and identity-based splits (nc, uc50, uc30, mm15) |
Protein–Protein Binding Affinity | Can embeddings predict ΔΔG of interface mutants? | MSE, MAE, Pearson R (10-fold CV) | SKEMPI v1 (3,047 mutants) |
Introduced in Unsal et al. 2022, DOI: 10.1038/s42256-022-00457-9
Run benchmarks directly in the browser:
Upload your embedding vectors, select tasks, and get results. Interactive leaderboards and plots are available.
Described in Çevrim et al. 2025, DOI: 10.1101/2025.04.10.648084
- OS: Ubuntu 20.04 or compatible
- Python: ≥ 3.9
- RAM: 8 GB minimum (16 GB+ recommended)
# Clone repository
git clone https://github.com/HUBioDataLab/PROBE.git
cd PROBE
# (Optional) Set up virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
mkdir -p data
curl -L -o datasets.zip \
https://drive.google.com/uc?export=download&id=1elGfjI4jwzcjOBT6LoMnz-7DtdwPFV6i
unzip datasets.zip -d data
Creates:
data/auxilary_input/
data/preprocess/
Choose one:
-
Use Preprocessed Embeddings
- HUMAN embeddings – for similarity, function, and family benchmarks
- SKEMPI embeddings – for affinity benchmark
-
Or Prepare Your Own Embeddings
See Preparing new embedding files
Place all embeddings under:
data/representation_vectors/
Edit the probe_config.yaml
:
representation_name: MyModel # prefix for output files
benchmark: all # similarity | function | family | affinity | all
representation_file_human: ../data/representation_vectors/mymodel_uniprot_human.csv
representation_file_affinity: ../data/representation_vectors/mymodel_skempi.csv
similarity_tasks: ["Sparse", "200", "500"]
function_prediction_aspect: All_Aspects
function_prediction_dataset: All_Data_Sets
family_prediction_dataset: ["nc", "uc50", "uc30", "mm15"]
detailed_output: False
Set detailed_output: True
to additionally save pickled models, raw predictions, confusion matrices, and per‑fold score arrays.
cd bin
python PROBE.py
Results will be saved in the results/
folder.
-
Generate Embeddings for:
-
Format as CSV:
- Column 0 header
Entry
(UniProt accession or SKEMPI ID) - Remaining columns: integer feature indices (
0,1,2,…
) - One row per sequence; all rows have equal vector length
- Column 0 header
-
Save to:
data/representation_vectors/
Reference the file paths inprobe_config.yaml
.
Benchmark | Filename Pattern | Contents |
---|---|---|
Semantic Similarity | Semantic_sim_inference_<matrix>_<rep>.csv |
Spearman correlations for 500, 200, Sparse |
Function Prediction | Ontology_based_function_prediction_5cv_mean_<rep>.tsv ..._std_<rep>.tsv |
Mean ± SD of F-max, AUROC, AUPR |
Family Classification | Drug_target_protein_family_classification_mean_results_<split>_<rep>.csv ..._class_based_results_...csv |
Overall + per-family metrics |
Binding Affinity | Affinit_prediction_skempiv1_<rep>.csv ..._detail.csv |
MSE, MAE, Pearson R, per fold |
- Benchmark: Süleyman Unsal, Hakan Atas, Mehmet Albayrak, Kadir Turhan, Ahmet Cem Acar, Tunca Doğan
- Web Platform: Elif Çevrim, Melih Gökay Yiğit, Erva Ulusoy, Ardan Yılmaz, Tunca Doğan
Released under the GNU General Public License v3.0
Unsal, S., Atas, H., Albayrak, M. et al. Learning functional properties of proteins with language models. Nat Mach Intell 4, 227–245 (2022). https://doi.org/10.1038/s42256-022-00457-9
Çevrim, E., Yiğit, M. G., Ulusoy, E., Yılmaz, A., & Doğan, T. (2025). A Benchmarking Platform for Assessing Protein Language Models on Function-related Prediction Tasks. In Protein Function Prediction: Methods and Protocols (pp. 241-268). New York, NY: Springer US.