On this page you can view the results of our review, which contains taxonomy annotations of more than 400 *ACL papers. If you would like to contribute annotations for your paper or (new) papers you think are missing, please contribute. You can view the entries considered for our review on our references page. If you are curious about our conclusions based on these results, have a look at the results section of our paper!

In need of visualisations for your paper?

Are you writing a paper about generalisation in NLP and would you like to motivate your work with some graphics? We would happily provide you with visualisations through the interactive graphs listed on this page. If you hover over the graphs, you can click on the 📸 (camera) or 💾 (save) icons to download them in .png format, or you can contact us for custom adaptations of these graphs (email us at genbench@googlegroups.com).

  title = {State-of-the-art generalisation research in {NLP}: a taxonomy and review},
  author = {Dieuwke Hupkes and Mario Giulianelli and Verna Dankers and Mikel Artetxe and Yanai Elazar and Tiago Pimentel and Christos Christodoulopoulos and Karim Lasri and Naomi Saphra and 
    Arabella Sinclair and Dennis Ulmer and Florian Schottmann and Khuyagbaatar Batsuren and Kaiser Sun and Koustuv Sinha and Leila Khalatbari and Maria Ryskina and Rita Frieske and Ryan Cotterell and Zhijing Jin},
  year = {2022},
  journal = {CoRR},
  url = {https://arxiv.org/abs/2210.03050},

Download bib entry

Explore visualisations

Taxonomy overview

We provide two interactive plots that provide an overview of all five axes at the same time.

Sankey diagram

The taxonomy characterises research according to five axes. The Sankey diagram below illustrates the main relations between the axes that are most closely related, while illustrating the frequency of labels per class. You can see, for instance, that:

  • the most frequent motivation for generalisation research is ‘practical’, but practically motivated studies can still take on any type;
  • generalisation research introducing a covariate data shift mostly focuses on the difference between train (or fine-tune) and test data;
  • structural generalisation experiments are mostly motivated from a cognitive perspective, and are never about fairness;
  • fully generated data is often used to create shifts between the train and test data.

NB: this plot renders best on relatively wide screens

Chord diagram

If you would like to investigate the relations between all axes, take a look at the Chord diagram below. For readability purposes, we only illustrate connections that occurred more than 50 times. Each axis has its own colour scheme, and the node labels indicate both the axis’s name and the labels within that axis. If you hover over a node or click on that node, the diagram highlights the connections to other nodes.

Individual axes over time

Visualise how distribution over the axes values changes over time. Use the radio buttons to indicate which axis you would like to view, and to choose how to normalise your plot.

Individual axes over tasks

Here, you can visualise how the different axes values are distributed over different tasks. Use the radio buttons to indicate which taxonomy axis you would like to view, click on the tasks to remove them from the plot.

Relations between axes in a heatmap

Generate heatmaps of the relations between the different axes. Use the radio buttons to indicate which axis you would like on the x- and y-axis, and choose how to normalise your plot.

Looking for a different kind of plot? Let us know!