Name		Name	Last commit message	Last commit date
parent directory ..
assemble_blocks		assemble_blocks
assemble_forms		assemble_forms
build_deliverable_datasets		build_deliverable_datasets
flask_interface		flask_interface
get_form_data		get_form_data
sample_1		sample_1
sample_2		sample_2
README.md		README.md

README.md

Instructions for creating the ConConCor dataset

All code is made available under a Apache license.

Scripts can be found in parent folder.

Initial sampling of approx 200 ocr articles + metadata per content/data query combination for 'contentious words', 'alternative words' and 'additional words'. (refer to datasheets)

cd sample_1
python3 PIPELINE.py

see sample_1/PIPELINE.py for details.

Scoring of P(sentence) based on bigram probabilities from sample_1/PIPELINE.py

Sample 5-sentence extracts of centred on contentious, alternative and additional target words according to the ratios 20:20:5

cd sample_2/
python3 PIPELINE.py

see sample_2/PIPELINE.py for details.

Build blocks (batches) of sequential 50 annotations, sampled w/o replacement from previous sample step

Requires:

control.csv, a csv of url, query word, text for each control sample

Run:

python3 make_blocks.py

Assemble the google forms from the batches

see here for details

Web interface re-directing prolific users to assembled google forms

Code for generating the flask web interface can be found here

Retrieve the annotations

see here for details

Build the datasets

refer to and run create_datasets.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build_scripts

build_scripts

README.md

Instructions for creating the ConConCor dataset

Initial sampling of approx 200 ocr articles + metadata per content/data query combination for 'contentious words', 'alternative words' and 'additional words'. (refer to datasheets)

Scoring of P(sentence) based on bigram probabilities from sample_1/PIPELINE.py

Sample 5-sentence extracts of centred on contentious, alternative and additional target words according to the ratios 20:20:5

Build blocks (batches) of sequential 50 annotations, sampled w/o replacement from previous sample step

Assemble the google forms from the batches

Web interface re-directing prolific users to assembled google forms

Retrieve the annotations

Build the datasets

Files

build_scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

build_scripts

Folders and files

parent directory

README.md

Instructions for creating the ConConCor dataset

Initial sampling of approx 200 ocr articles + metadata per content/data query combination for 'contentious words', 'alternative words' and 'additional words'. (refer to datasheets)

Scoring of P(sentence) based on bigram probabilities from sample_1/PIPELINE.py

Sample 5-sentence extracts of centred on contentious, alternative and additional target words according to the ratios 20:20:5

Build blocks (batches) of sequential 50 annotations, sampled w/o replacement from previous sample step

Assemble the google forms from the batches

Web interface re-directing prolific users to assembled google forms

Retrieve the annotations

Build the datasets