Description of the different versions of the dataset

The following repository contain the data for the paper 'Moderation in the Wild: Investigating User-Driven Moderation in Online Discussions'. The guidelines of the annotation study are available in the directory guidelines.

The data is available in two versions: UMOD_non_aggregated.csv and UMOD_aggregated.csv in the directory dataset.

UMOD_non_aggregated.csv

This version contains each instance and annotations by one annotator. The annotations are not aggregated.

id: [string]
preceding_comment: [string] the comment that expresses the original opinion or argument
reply: [string] an answer to the preceding comment that should be rated as whether it expresses some form of moderation
moderation [yes/no]: whether the reply is a form of moderation
subjectivity: [1-5] how opinionated is the comment
agressiveness: [1-5] agressive comments can be defined as comments that are insulting, threatening, sarcastic, or hostile
constructiveness: [1-5] constructive comments can be defined as high-quality comments that make a contribution to the conversation
sentiment: [positive / negative / neutral]
annotator: [string] a unique identifier for each annotator (anonimized)
role: [user / moderator / none]
race: [black / white / asian / mixed / other]
sex: [male / female]
age: [int]
none: [binary] no function is suitable for the comment or the comment does not express any function
social: [binary] welcoming / greeting / encouraging others to participate
improveQuality: [binary] feedback on the quality of the comment / argument (e.g. asking for clarification, asking for evidence, asking for sources)
organizing: [binary] directs the comment to another comment or user
content: [binary] correcting / adding information
broadening: [binary] bringing in new perspectives or encouraging others to do so
policing: [binary] encouraging civil and respectful behavior
siteIssues: [binary] helps to resolve issues with the site or to explain rules of the site
offTopic: [binary] indicates irrelevant content, off-point statements
agreeWithOpinion: [yes / no / opinion not clear]

UMOD_aggregated.csv

This version creates an aggregated version of the annotations. Each instance exists exactly once and was annotated by between 7 and 10 annotators. Each instance has a reply comment, a preceding comment and a comment id. The annotations are aggregated by the following rules:

softlabel_mace: [float between 0 and 1] the probability that the comment is a user moderation comment. Can be used as a soft label for user moderation. It has been aggregated using MACE (Hovy 2013).
softlabel_raw: [float between 0 and 1] the probability that the comment is a user moderation comment. Can be used as a soft label for user moderation. It has been aggregated using the raw annotations.
subjectivity: [float between 1 and 5] mean
agressiveness: [float between 1 and 5] mean
constructiveness: [float between 1 and 5] mean
sentiment: [positive / negative / neutral] majority sentiment
functions: [binary] if at least 1/2 of all annotators selected a function, it is considered as a function of the comment
agreeWithOpinion: [yes / no / opinion not clear] the number of annotators that selected the corresponding answer for that instance
entropy_moderation: [float between 0 and 1.5] the normalized entropy of the raw annotations for that item. The higher, the more disagreement.
relamountRole: [float between 0 and 1] the relative amount of annotators that selected the moderator role for that instance. The higher, the more annotators selected the same role.
countModFunctions: [int] the number of different moderator functions that were selected for that instance.
annot_similarity_pre: [float between 0 and 1] the cosine similarity between the initial definitions of usermoderation of all annotators for that instance. The higher, the more similar the annotators were in their understanding of user moderation.
annot_similarity_post: [float between 0 and 1] the cosine similarity between the final (post-study) definitions of usermoderation of all annotators for that instance. The higher, the more similar the annotators were in their understanding of user moderation.
numAnnotators: [int] the number of annotators that annotated that instance.
seedmodel: [string] the seed model that was used for selecting this candidate item. Either (chat)GPT, expert or negative.
flagged: [binary] whether the comment was flagged by any annotator as disturibing or not.

UMOD_reply_wiht_linguistic_and_tan2016_features.csv

This file contains the reply comment with id and the additional features that were used in the paper for the analysis of constructiveness.

Licence and intended use

licence: CC BY 4.0 DEED

If you use this work please cite the following:

Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, and Lillian Lee. 2016. Winning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions. In Proceedings of the 25th international conference on world wide web, pages 613–624.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Description of the different versions of the dataset

UMOD_non_aggregated.csv

UMOD_aggregated.csv

UMOD_reply_wiht_linguistic_and_tan2016_features.csv

Licence and intended use

Files

README.md

Latest commit

History

README.md

File metadata and controls

Description of the different versions of the dataset

UMOD_non_aggregated.csv

UMOD_aggregated.csv

UMOD_reply_wiht_linguistic_and_tan2016_features.csv

Licence and intended use