Prompts for adjunct_island subset of blimp dataset #736

Urmish · 2022-04-25T19:25:57Z

Note - Still WIP, but will like a quick review to see if the choices being made are correct.

Choices -

Metric - Accuracy. Reason - Avoiding BLEU or ROGUE score as these sentences are just some permutation of each other. Wasn't sure if BLEU or ROGUE style metrics will capture the goal of this subset using prompt style methods
Choices in template - To enable accuracy style metric, I had to add choices in the template

awebson

Thanks for the PR!

A bunch of prompts just harcode the target correct answer as like Sentence 1, even when S2 may be the correct answer. In general, it's bad to hardcode it like this and you should use random in jinja. Something like this:

{% set shuffled_order = [0, 1] | random %}
Which one of the following sentences is grammatical? Please answer A or B.
{% if shuffled_order == 0 %}
A: {{ sentence_good }}
B: {{ sentence_bad }}
{% else %}
A: {{ sentence_bad }}
B: {{ sentence_good }}
{% endif %}
|||
{% if shuffled_order == 0 %}
A
{% else %}
B
{% endif %}

Prompts like "adjunct_bad_first" expect models to answer "Sentence 1". If so you need to tell models "answer by Sentence 1 or Sentence 2". Or better yet just ask models to reply a single token "1"/"2" or "A"/"B". Be sure to update answer choices accordingly.
Why are the "A/B", "B/A" prompts not original task?

awebson · 2022-04-26T21:34:09Z

Thank you! There are some duplicate ones:

A/B choice randomized (choice order: B-A) vs. order: A-B
Some prompts just swap "Yes and No" with "No and Yes". Let's just keep the "Yes and No" ones

Also there is an extra space in Sentence A: {{ sentence_good }}

awebson · 2022-04-26T21:34:42Z

promptsource/templates/blimp/adjunct_island/templates.yaml

+
+      {% if shuffled_order == 0 %}
+
+      Sentence A:  {{ sentence_good }}


extra space after Sentence: A

Good catch, fixed! Thanks!

najoungkim · 2022-04-26T22:11:27Z

I think the choice ordering is nontrivial (speaking from experience, don't have a source to cite). But I've been running local tests for T0-3B and for anaphor_number_agreement and here are some results:

Accuracy (%), 1000 examples
Yes-No order for good sentences = 46.2%
No-Yes order for good sentences: 50.3%

A-B order (spacing fixed) = 55.7%
B-A order (spacing fixed) = 48.1%

This is what motivates the minimally different prompts, but if you feel strongly about removing the order swapped versions I can remove them.

awebson · 2022-04-27T19:58:42Z

Oh wow great to know these order differences! I trust that you will pay extra attention when you analyze the variance of these minimal prompts and when we plot the scatters like Fig 4 in the T0 paper as we discussed before.

najoungkim · 2022-04-27T21:02:35Z

Well, I guess some part of me still thinks it's a bit irresponsible to ignore this variance so here's a happy medium: randomizing presentation order for in-prompt options within a single prompt :)

…der swapped prompts.

awebson · 2022-04-28T22:00:33Z

Well done! Thanks Najoung and Urmish!

I didn't check all 50-something subsets of Blimp but I trust that you programmatically copied the identical prompts?

najoungkim · 2022-04-28T22:14:16Z

Yep used copy_templates.py you shared!

First 2 prompts for blimp

3272924

Urmish mentioned this pull request Apr 25, 2022

Add BLiMP to Full Benchmark bigscience-workshop/evaluation#22

Open

Urmish and others added 4 commits April 25, 2022 14:32

Fixed some template issues that led to conflicting details

74450d7

Updated templates to avoid packing errors

b6f01db

Added 2 new prompts with a different style of asking question

0a1689c

Add Najoung\'s templates

f651092

Urmish marked this pull request as ready for review April 25, 2022 20:11

awebson self-assigned this Apr 25, 2022

Added prompts for complex_NP_island

de16aa3

awebson self-requested a review April 25, 2022 23:23

awebson requested changes Apr 25, 2022

View reviewed changes

Urmish and others added 5 commits April 25, 2022 18:48

Updated the templates after the requested fixes

ef823c4

Added templates for complex_NP_island via copy scripts

1cd898e

Updated templates for all subsets

a431749

Switched to randomized prompts.

30a1cda

Minor changes to prompt names

4f7f515

awebson reviewed Apr 26, 2022

View reviewed changes

najoungkim added 2 commits April 26, 2022 17:40

Fix answer_choices for some templates + better template names

52d4d1b

Fixed extra spaces.

e91b438

jzf2101 changed the base branch from main to eval-hackathon April 26, 2022 23:24

najoungkim added 2 commits April 27, 2022 16:42

Added null prompts (true null and single quotation mark versions)

1246645

Minor cleanup

a0a451d

Prompt cleanups: true null prompt, choice randomization + dropping or…

7a37755

…der swapped prompts.

thinkzink mentioned this pull request Apr 28, 2022

Added prompts for CrowS-Pairs-multilingual #748

Merged

awebson merged commit b99bfc2 into bigscience-workshop:eval-hackathon Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prompts for adjunct_island subset of blimp dataset #736

Prompts for adjunct_island subset of blimp dataset #736

Uh oh!

Urmish commented Apr 25, 2022

Uh oh!

awebson left a comment •

edited

Loading

Uh oh!

awebson commented Apr 26, 2022

Uh oh!

awebson Apr 26, 2022

Uh oh!

najoungkim Apr 26, 2022

Uh oh!

najoungkim commented Apr 26, 2022

Uh oh!

awebson commented Apr 27, 2022

Uh oh!

najoungkim commented Apr 27, 2022

Uh oh!

awebson commented Apr 28, 2022

Uh oh!

najoungkim commented Apr 28, 2022

Uh oh!

Uh oh!

Prompts for adjunct_island subset of blimp dataset #736

Prompts for adjunct_island subset of blimp dataset #736

Uh oh!

Conversation

Urmish commented Apr 25, 2022

Uh oh!

awebson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awebson commented Apr 26, 2022

Uh oh!

awebson Apr 26, 2022

Choose a reason for hiding this comment

Uh oh!

najoungkim Apr 26, 2022

Choose a reason for hiding this comment

Uh oh!

najoungkim commented Apr 26, 2022

Uh oh!

awebson commented Apr 27, 2022

Uh oh!

najoungkim commented Apr 27, 2022

Uh oh!

awebson commented Apr 28, 2022

Uh oh!

najoungkim commented Apr 28, 2022

Uh oh!

Uh oh!

awebson left a comment •

edited

Loading