Skip to content

the meteor metric seems not consist with the official version #115

@jianguda

Description

@jianguda

Describe the bug

The computed meteor score seems strange because the value is very different from the scores computed by other tools. For example, I use the meteor score computed by NLGeval as the reference (which reuses the official jar file for the computation)

Steps to reproduce the bug

from datasets import load_metric
from nlgeval import NLGEval, compute_individual_metrics

meteor = load_metric('meteor')
predictions = ["It is a guide to action which ensures that the military always obeys the commands of the party"]
references = ["It is a guide to action that ensures that the military will forever heed Party commands"]
results = meteor.compute(predictions=predictions, references=references)
# print the actual result
print(round(results["meteor"], 4))
metrics_dict = compute_individual_metrics(references, predictions[0])
# print the expected result
print(round(metrics_dict["METEOR"], 4))

By the way, you need to install the nlg-eval library first. Please check the installation guide here, thanks!

Expected results

0.4474

Actual results

0.7398

Environment info

  • datasets version: 1.10.2
  • Platform: macOS-10.16-x86_64-i386-64bit
  • Python version: 3.8.5
  • PyArrow version: 4.0.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions