Add CrowS-Pairs task #70

oskarvanderwal · 2021-11-10T10:26:33Z

Evaluated on: GPT-2
Time evaluating on GPU: 00:48

Here is my attempt at implementing CrowS-Pairs and making it suitable for autoregressive models (closes #37).
Originally, CrowS-Pairs is designed for Masked Language Models, so I had to adapt their sentence scoring function (based on masking tokens). I am using perplexity instead to compare the sentences.

I have tested the task on GPT-2, and get the following results:

{
    "crowspairs_bias": 0.593501326259947,
    "crowspairs_bias_age": 0.5287356321839081,
    "crowspairs_bias_disability": 0.6,
    "crowspairs_bias_gender": 0.583969465648855,
    "crowspairs_bias_nationality": 0.44654088050314467,
    "crowspairs_bias_physical-appearance": 0.6349206349206349,
    "crowspairs_bias_race-color": 0.5775193798449613,
    "crowspairs_bias_religion": 0.6761904761904762,
    "crowspairs_bias_sexual-orientation": 0.7738095238095238,
    "crowspairs_bias_socioeconomic": 0.6686046511627907
}

Add dummy scoring function

No more bugs, but haven't validated if it is implemented correctly yet. Also need to fix score per bias type

oskarvanderwal and others added 30 commits October 8, 2021 12:49

Add template for CrowS-Pairs task

98f0beb

Adjust filename

a251d23

Loading CrowS-Pairs from url in right format

5d1a11d

Tokenize dataset sentences+plan for scoring func.

b8c3cd5

Add link to implementation scores crows-pairs

8bbbd7e

Implemented most of the eval logic (untested)

03ee5ee

Add scoring sentence as separate function

83d1cef

Remove unnecessary todo item

e0e7fbc

Add comment about dataset on huggingface datasets

4096287

Update crowspairs.py

1dce412

Add dummy scoring function

Update crowspairs.py

aaf2bc1

No more bugs, but haven't validated if it is implemented correctly yet. Also need to fix score per bias type

Simplified the evaluation logic

eeabdc8

Remove print statement

d247e98

Test

b4bb270

Assume not start and stop token

6a664bb

Remove pair-score metric

43d50ae

Small bug fix

801dd06

Remove print statement

31c941c

Add confidence metric

a05b622

Add average model confidence to metric

35a9e18

Typo

fcdd28e

Comment out confidence metric

3d8416f

Change sentence score to perplexity

919dc1f

Update crowspairs.py

7df4df2

Update crowspairs.py

01db11d

Update crowspairs.py

c0a9fea

Update crowspairs.py

d1d50a4

Update crowspairs.py

ecfa84a

Update crowspairs.py

c979e93

Update crowspairs.py

0521b24

oskarvanderwal and others added 9 commits November 3, 2021 15:52

Update crowspairs.py

9a83d92

Update crowspairs.py

904233e

Update crowspairs.py

7b01be8

Update crowspairs.py

f2b959a

Update crowspairs.py

1805346

Update crowspairs.py

cd57052

Cleaning code

82ef242

Run make quality

3e0e41c

Fix comment

9f98a89

sebastianGehrmann approved these changes Jan 17, 2022

View reviewed changes

oskarvanderwal mentioned this pull request Apr 28, 2022

Adding CrowsPairs task for English and French bigscience-workshop/lm-evaluation-harness#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CrowS-Pairs task #70

Add CrowS-Pairs task #70

Uh oh!

oskarvanderwal commented Nov 10, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add CrowS-Pairs task #70

Are you sure you want to change the base?

Add CrowS-Pairs task #70

Uh oh!

Conversation

oskarvanderwal commented Nov 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oskarvanderwal commented Nov 10, 2021 •

edited

Loading