Text Prediction Bias Analysis

Large Language Models such as GPT-3 are well-suited for text prediction tasks, which can help and delight users during text composition. LLMs are known to generate ethically inappropriate predictions even for seemingly innocuous contexts. Toxicity detection followed by filtering is a common strategy for mitigating the harm from such predictions. However, as we shall argue in this paper, in the context of text prediction, it is not sufficient to detect and filter toxic content. One also needs to ensure factual correctness and group-level fairness of the predictions; failing to do so can make the system ineffective and nonsensical at best, and unfair and detrimental to the users at worst. We discuss the gaps and challenges of toxicity detection approaches -- from blocklist-based approaches to sophisticated state-of-the-art neural classifiers -- by evaluating them on the text prediction task for English against a manually crafted CheckList of harms targeted at different groups and different levels of severity.

Citation (BibTeX)

@inproceedings{vashishtha-etal-2023-performance,
    title="Performance and Risk Trade-offs 
    for Multi-word Text Prediction at Scale",
    author="Vashishtha, Aniket  and
    Prasad, S Sai  and
    Bajaj, Payal  and
    Chaudhary, Vishrav  and
    Cook, Kate  and
    Dandapat, Sandipan  and
    Sitaram, Sunayana  and
    Choudhury, Monojit",
    booktitle="Findings of the Association for 
    Computational Linguistics: EACL 2023",
    year="2023",
    address="Dubrovnik, Croatia",
    publisher="Association for Computational Linguistics",
    url="https://aclanthology.org/2023.findings-eacl.167",
    pages="2226--2242",
}

Dataset and Resources

We curated and evaluated different datasets comprising of templates from dimensionalities: (1) Religion, Race, Ethnicity (RRE) (2) Nationality, Regionality (NReg), (3) Sexual Orientation and Gender Identity (SOGI), and (4) Offensive to an individual (Off). We also defined four classes in terms of severity of harms, namely: Toxic - clearly and almost in all cases toxic/offensive; Strongly sensitive - can be sensitive or offensive in many contexts; Weakly sensitive - it is unlikely but possible to be interpreted as sensitive in some special contexts. We refer to our datasets as In House Checklists (1 & 2). We simulated the Text Predictor on the sentences generated from all the templates in MaTo21, Bhatt21, IHCL-1 and IHCL-2.

Statistics of Checklists. Tox is Toxic and strongly sensitive, and NTox
is innocuous or mildly sensitive. Number of examples
are in thousands.

Performance and Risk Trade-offs for Multi-word Text Prediction at Scale

Citation (BibTeX)

Dataset and Resources

Paper