Graph Retriever Analysis and Performance Evaluation
GRAPE is a framework for benchmarking how well LLM agents query knowledge graphs via MCP-compatible servers.
evaluation-dataset-generation/
– Uses LLMs to generate questions and answers from real Neo4j databasesmcp-server-evaluations/
– Evaluates MCP server implementations against the generated dataset using an LLM judge
GRAPE supports multiple domains, real-world graphs from demo.neo4jlabs.com, and a consistent evaluation pipeline.
-
Use the existing dataset The repository includes a pre-generated
generated_dataset.json
. Re-runningdataset_generation.ipynb
is optional. -
Run evaluation Go to a folder in
mcp-server-evaluations/
and run the evaluation notebook with the dataset.
- Add various MCP implementation evaluations