dices

  • Description:

The Diversity in Conversational AI Evaluation for Safety (DICES) dataset

Machine learning approaches are often trained and evaluated with datasets that require a clear separation between positive and negative examples. This approach overly simplifies the natural subjectivity present in many tasks and content items. It also obscures the inherent diversity in human perceptions and opinions. Often tasks that attempt to preserve the variance in content and diversity in humans are quite expensive and laborious. To fill in this gap and facilitate more in-depth model performance analyses we propose the DICES dataset - a unique dataset with diverse perspectives on safety of AI generated conversations. We focus on the task of safety evaluation of conversational AI systems. The DICES dataset contains detailed demographics information about each rater, extremely high replication of unique ratings per conversation to ensure statistical significance of further analyses and encodes rater votes as distributions across different demographics to allow for in-depth explorations of different rating aggregation strategies.

This dataset is well suited to observe and measure variance, ambiguity and diversity in the context of safety of conversational AI. The dataset is accompanied by a paper describing a set of metrics that show how rater diversity influences the safety perception of raters from different geographic regions, ethnicity groups, age groups and genders. The goal of the DICES dataset is to be used as a shared benchmark for safety evaluation of conversational AI systems.

CONTENT WARNING: This dataset contains adversarial examples of conversations that may be offensive.

@article{aroyo2024dices,
  title={ {DICES} dataset: Diversity in conversational {AI} evaluation for safety},
  author={Aroyo, Lora and Taylor, Alex and Diaz, Mark and Homan, Christopher and Parrish, Alicia and Serapio-Garc{\'\i}a, Gregory and Prabhakaran, Vinodkumar and Wang, Ding},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

dices/350 (default config)

  • Config description: Dataset 350 contains 350 conversations rated by a diverse rater pool of 123 unique raters. Each conversation is rated with five safety top-level categories and one overall comprehension question of the conversation. Raters were recruited were balanced by gender (man or woman), race/ethnicity (White, Black, Latine, Asian, Multiracial) and each rater rated all items. Each rater rated all conversations. Each conversation has 123 unique ratings. Total number of rows in this dataset is 43050.

  • Download size: 29.70 MiB

  • Dataset size: 74.43 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 43,050
  • Feature structure:
FeaturesDict({
    'Q1_whole_conversation_evaluation': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'Q2_harmful_content_dangerous_content': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_derogation_of_boat': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_insensitive': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_medical_legal_financial_or_relationship_advice': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_obscene_and_profane': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_personal_information': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_regulated_goods': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_sexually_suggestive_content': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_beliefs': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_gender_sexual_orientation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_inherited_attributes': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_status': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q5_political_affiliation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_making_endorsement': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_polarizing_topics': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'answer_time_ms': Scalar(shape=(), dtype=int64, description=Amount of time spent by each rater on each safety annotation question.),
    'answer_timestamp': Scalar(shape=(), dtype=int64, description=Time when each conversation was rated by each rater.),
    'context': Text(shape=(), dtype=string),
    'degree_of_harm': ClassLabel(shape=(), dtype=int64, num_classes=5),
    'harm_type': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=32)),
    'id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each row, representing all ratings by a single rater to a single conversation.),
    'item_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each conversation.),
    'phase': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_age': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_education': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_gender': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'rater_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each rater.),
    'rater_race': ClassLabel(shape=(), dtype=int64, num_classes=6),
    'rater_raw_race': Text(shape=(), dtype=string),
    'response': Text(shape=(), dtype=string),
    'safety_gold': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'safety_gold_reason': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
Q1_whole_conversation_evaluation ClassLabel int64 Rating about the understandability of a conversation.
Q2_harmful_content_dangerous_content ClassLabel int64
Q2_harmful_content_derogation_of_boat ClassLabel int64
Q2_harmful_content_insensitive ClassLabel int64
Q2_harmful_content_medical_legal_financial_or_relationship_advice ClassLabel int64
Q2_harmful_content_obscene_and_profane ClassLabel int64
Q2_harmful_content_other ClassLabel int64
Q2_harmful_content_overall ClassLabel int64
Q2_harmful_content_personal_information ClassLabel int64
Q2_harmful_content_regulated_goods ClassLabel int64
Q2_harmful_content_sexually_suggestive_content ClassLabel int64
Q3_bias_other ClassLabel int64
Q3_bias_overall ClassLabel int64
Q3_bias_targeting_beliefs ClassLabel int64
Q3_bias_targeting_gender_sexual_orientation ClassLabel int64
Q3_bias_targeting_inherited_attributes ClassLabel int64
Q3_bias_targeting_status ClassLabel int64
Q4_misinformation ClassLabel int64
Q5_political_affiliation ClassLabel int64
Q6_policy_guidelines_making_endorsement ClassLabel int64
Q6_policy_guidelines_other ClassLabel int64
Q6_policy_guidelines_overall ClassLabel int64
Q6_policy_guidelines_polarizing_topics ClassLabel int64
Q_overall ClassLabel int64
answer_time_ms Scalar int64 Amount of time spent by each rater on each safety annotation question.
answer_timestamp Scalar int64 Time when each conversation was rated by each rater.
context Text string The conversation turns before the final chatbot response.
degree_of_harm ClassLabel int64 Hand-annotated rating of severity of safety risk.
harm_type Sequence(ClassLabel) (None,) int64 Hand-annotated harm topic(s) of conversation.
id Scalar int64 Numerical identifier for each row, representing all ratings by a single rater to a single conversation.
item_id Scalar int64 Numerical identifier for each conversation.
phase ClassLabel int64 One of three distinct time periods.
rater_age ClassLabel int64 The age group of the rater.
rater_education ClassLabel int64 The education of the rater.
rater_gender ClassLabel int64 The gender of the rater.
rater_id Scalar int64 Numerical identifier for each rater.
rater_race ClassLabel int64 The race/ethnicity of the rater.
rater_raw_race Text string The self-reported raw race/ethnicity of the rater, before simplification to five categories.
response Text string The final chatbot response in the conversation.
safety_gold ClassLabel int64 The gold standard safety label provided by experts.
safety_gold_reason Text string The reason(s) (if given) for the gold safety label provided by experts.

dices/990

  • Config description: Dataset 990 contains 990 conversations rated by a diverse rater pool of 173 unique raters. Each conversation is rated with three safety top-level categories and one overall conversation comprehension question. Raters were recruited so that the number of raters for each conversation was balanced by gender (Man, Woman) and locale (US, India). Each rater rated only a sample of the conversation. Each conversation has 60-70 unique ratings. Total number of rows in this dataset is 72103.

  • Download size: 48.06 MiB

  • Dataset size: 150.38 MiB

  • Auto-cached (documentation): Only when shuffle_files=False (train)

  • Splits:

Split Examples
'train' 72,103
  • Feature structure:
FeaturesDict({
    'Q1_whole_conversation_evaluation': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'Q2_harmful_content_explicitly_sexual': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_financial_advice': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_harm_to_people_or_animals': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_health_and_safety_claims': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_how_tos_about_harmful_activities': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_obscenities_or_profanities': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_personal_information': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_promotes_or_condones_use_of_harmful_regulated_goods': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_promotes_or_condones_violence': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_violent_or_gory': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_3_Racial_religious_or_other_slurs_and_stereotypes_that_incites_or_promote_hatred_based_on_any_of_the_attributes_noted_above': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_calling_individuals_or_groups_evil_corrupt_or_malicious': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_incites_hatred': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_insults_about_emotional_romantic_and_or_sexual_attraction': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_physically_or_mentally_inferior': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_subhuman_comparisons': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_superiority_or_subjugation_of_a_group': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_unfair_bias_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_conspiracy_theories': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_contradicts_expert_consensus': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_false_theories': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_political': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'context': Text(shape=(), dtype=string),
    'degree_of_harm': ClassLabel(shape=(), dtype=int64, num_classes=5),
    'harm_type': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=32)),
    'id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each row, representing all ratings by a single rater to a single conversation.),
    'item_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each conversation.),
    'phase': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_age': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_education': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_gender': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'rater_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each rater.),
    'rater_locale': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'rater_race': ClassLabel(shape=(), dtype=int64, num_classes=6),
    'rater_raw_race': Text(shape=(), dtype=string),
    'response': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
Q1_whole_conversation_evaluation ClassLabel int64 Rating about the understandability of a conversation.
Q2_harmful_content_explicitly_sexual ClassLabel int64
Q2_harmful_content_financial_advice ClassLabel int64
Q2_harmful_content_harm_to_people_or_animals ClassLabel int64
Q2_harmful_content_health_and_safety_claims ClassLabel int64
Q2_harmful_content_how_tos_about_harmful_activities ClassLabel int64
Q2_harmful_content_obscenities_or_profanities ClassLabel int64
Q2_harmful_content_other ClassLabel int64
Q2_harmful_content_overall ClassLabel int64
Q2_harmful_content_personal_information ClassLabel int64
Q2_harmful_content_promotes_or_condones_use_of_harmful_regulated_goods ClassLabel int64
Q2_harmful_content_promotes_or_condones_violence ClassLabel int64
Q2_harmful_content_violent_or_gory ClassLabel int64
Q3_3_Racial_religious_or_other_slurs_and_stereotypes_that_incites_or_promote_hatred_based_on_any_of_the_attributes_noted_above ClassLabel int64
Q3_bias_calling_individuals_or_groups_evil_corrupt_or_malicious ClassLabel int64
Q3_bias_incites_hatred ClassLabel int64
Q3_bias_insults_about_emotional_romantic_and_or_sexual_attraction ClassLabel int64
Q3_bias_other ClassLabel int64
Q3_bias_physically_or_mentally_inferior ClassLabel int64
Q3_bias_subhuman_comparisons ClassLabel int64
Q3_bias_superiority_or_subjugation_of_a_group ClassLabel int64
Q3_unfair_bias_overall ClassLabel int64
Q4_misinformation_conspiracy_theories ClassLabel int64
Q4_misinformation_contradicts_expert_consensus ClassLabel int64
Q4_misinformation_false_theories ClassLabel int64
Q4_misinformation_other ClassLabel int64
Q4_misinformation_overall ClassLabel int64
Q4_misinformation_political ClassLabel int64
Q_overall ClassLabel int64
context Text string The conversation turns before the final chatbot response.
degree_of_harm ClassLabel int64 Hand-annotated rating of severity of safety risk.
harm_type Sequence(ClassLabel) (None,) int64 Hand-annotated harm topic(s) of conversation.
id Scalar int64 Numerical identifier for each row, representing all ratings by a single rater to a single conversation.
item_id Scalar int64 Numerical identifier for each conversation.
phase ClassLabel int64 One of three distinct time periods.
rater_age ClassLabel int64 The age group of the rater.
rater_education ClassLabel int64 The education of the rater.
rater_gender ClassLabel int64 The gender of the rater.
rater_id Scalar int64 Numerical identifier for each rater.
rater_locale ClassLabel int64 The locale of the rater.
rater_race ClassLabel int64 The race/ethnicity of the rater.
rater_raw_race Text string The self-reported raw race/ethnicity of the rater, before simplification to five categories.
response Text string The final chatbot response in the conversation.