The ability to generalise is one of the primary desiderata of natural language processing (NLP). Yet, how "good generalisation" should be defined and evaluated is not well understood, nor are there common standards to evaluate generalisation. As a consequence, newly proposed models are usually not systematically tested for their ability to generalise. GenBench's mission is to make state-of-the-art generalisation testing the new status-quo in NLP. As a first step, we present a generalisation taxonomy, describing the underlying building blocks of generalisation in NLP. We use the taxonomy to do an elaborate review of over 400 generalisation papers, and we make recommendations for promising areas for the future.
@Article{Hupkes2023,
    author={Hupkes, Dieuwke and Giulianelli, Mario and Dankers, Verna and Artetxe, Mikel and Elazar, Yanai 
    and Pimentel, Tiago and Christodoulopoulos, Christos and Lasri, Karim and Saphra, Naomi and Sinclair, Arabella 
    and Ulmer, Dennis and Schottmann, Florian and Batsuren, Khuyagbaatar and Sun, Kaiser and Sinha, Koustuv 
    and Khalatbari, Leila and Ryskina, Maria and Frieske, Rita and Cotterell, Ryan and Jin, Zhijing},
    title={A taxonomy and review of generalization research in NLP},
    journal={Nature Machine Intelligence},
    year={2023},
    month={Oct},
    day={01},
    volume={5},
    number={10},
    pages={1161-1174},
    abstract={The ability to generalize well is one of the primary desiderata for models of natural language processing (NLP), but what `good generalization' entails and how it should be evaluated is not well understood. In this Analysis we present a taxonomy for characterizing and understanding generalization research in NLP. The proposed taxonomy is based on an extensive literature review and contains five axes along which generalization studies can differ: their main motivation, the type of generalization they aim to solve, the type of data shift they consider, the source by which this data shift originated, and the locus of the shift within the NLP modelling pipeline. We use our taxonomy to classify over 700 experiments, and we use the results to present an in-depth analysis that maps out the current state of generalization research in NLP and make recommendations for which areas deserve attention in the future.},
    issn={2522-5839},
    doi={10.1038/s42256-023-00729-y},
    url={https://doi.org/10.1038/s42256-023-00729-y}
    }
You can find all this information in our taxonomy paper. On this website, you can learn about the taxonomy, you can visually explore our results, create GenBench Evaluation Cards for your research papers, and get citations from our analysis or contribute papers that we will periodically add to our review.

News

July 30, 2024 The video summary of our Nature Machine Intelligence paper is live. Watch it here!
May 1, 2024 The new GenBench workshop page is live, visit for more information. See you at EMNLP In Miami in November!
October 23, 2023 Delighted to announce the GenBench 2024 Workshop will come back @ EMNLP 2024.
October 20, 2023 A taxonomy and review of generalization research in NLP paper published @ Nature Machine Intelligence.
June 8, 2023 GenBench Workshop Papers submission link is now available.
March 27, 2023 The First GenBench workshop will be held @ EMNLP 2023 in Singapore from December 6 to 10, 2023!