# GenBench: The first workshop on generalisation (benchmarking) in NLP ## Workshop description The ability to generalise well is often mentioned as one of the primary desiderata for models of natural language processing. It is crucial to ensure that models behave robustly, reliably and fairly when making predictions about data that is different from the data that they were trained on. Generalisation is also important when NLP models are considered from a cognitive perspective, as models of human language. Yet, there are still many open questions related to what it means for an NLP model to generalise well, and how generalisation should be evaluated. The first GenBench workshop aims to serve as a cornerstone to catalyse research on generalisation in the NLP community. In particular the workshop aims to: Bring together different expert communities to discuss challenging questions relating to generalisation in NLP; Crowd-source a collaborative generalisation benchmark, hosted on a platform for democratic state-of-the-art (SOTA) generalisation testing in NLP The first GenBench workshop on generalisation (benchmarking) in NLP will be co-located with EMNLP 2023. ## Submission types We call for two types of submissions: regular workshop submissions and collaborative benchmarking task submissions. The latter will consist of a data/task artefact and a companion paper motivating and evaluating the submission. In both cases, we accept archival papers and extended abstracts. ### Regular workshop submissions Regular workshop submissions present papers on the topic of generalisation (see examples listed below), but are not intended to be included on the GenBench evaluation platform. Regular workshop papers may be submitted as an archival paper, when they report on completed, original and unpublished research; or as a shorter extended abstract. More details on this category can be found below. Topics of interest include, but are not limited to: - Opinion or position papers about generalisation and how it should be evaluated; - Analyses of how existing or new models generalise; - Empirical studies that propose new paradigms to evaluate generalisation; - Meta analyses that compare how results from different generalisation studies compare; - Meta analyses that study how different types of generalisation are related; - Papers that discuss how generalisation of LLMs can be evaluated without access to training data; - Papers that discuss why generalisation is (not) important in the era of LLMs. - Studies on the relationship between generalisation and fairness or robustness; If you are unsure whether a specific topic is well-suited for submission, feel free to reach out to the organisers of the workshop at genbench@googlegroups.com. ### Collaborative Benchmarking Task submissions Collaborative benchmarking task submissions consist of a data/task artefact and a paper describing and motivating the submission and showcasing it on a select number of models. We accept submissions that introduce new datasets, resplits of existing datasets along particular dimensions, or in-context learning tasks, with the goal of measuring generalisation of NLP models. We especially encourage submissions that focus on: - Generalisation in the context of fairness and inclusivity - Multilingual generalisation - Generalisation in LLMs, where we have no control over the training data Each submission should contain information about the data (URIs, format, preprocessing), model preparation (finetuning loss, ICL prompt templates), and evaluation metrics. These will be defined either in a configuration file or in code. More details about the collaborative benchmark submissions and example submissions can be found on our website: genbench.org/cbt. Participants proposing previously unpublished datasets or splits may choose to submit an archival paper or an extended abstract. Generalisation evaluation datasets that have already been published elsewhere (or will be published at EMNLP 2023) can be submitted to the platform, as well, but only through an extended abstract, citing the original publication. We allow dual submissions with EMNLP, for more information, see below. If you are in doubt whether a particular type of dataset is suitable for submission, please consult the information page on our website, or reach out to the organisers of the workshop at genbench@googlegroups.com. ## Archival vs extended abstract Archival papers are up to 8 pages excluding references and report on completed, original and unpublished research. They follow the requirements of regular EMNLP 2023 submissions. Accepted papers will be published in the workshop proceedings and are expected to be presented at the workshop. The papers will undergo double-blind peer-review and should thus be anonymised. Extended abstracts can be up to 2 pages excluding references, and may report on work in progress or be cross submissions of work that has already appeared in another venue. Abstract titles will be posted on the workshop website, but will not be included in the proceedings. ## Submission instructions For both archival papers and extended abstracts, we refer to the EMNLP 2023 website for paper templates. Additional requirements for both regular workshop papers and collaborative benchmarking task submissions can be found on our website. All submissions can be submitted through OpenReview: https://openreview.net/group?id=GenBench.org/2023/Workshop We also accept regular workshop submissions (papers of category 1) through the ACL Rolling Review system. Authors that have their ARR reviews ready may submit their papers and reviews for consideration to the workshop up to two weeks before our notification deadline. ## Important dates August 1, 2023 – Sample data submission deadline September 1, 2023 – Paper submission deadline September 15, 2023 – ARR submission deadline October 6, 2023 – Notification deadline October 18, 2023 – Camera ready deadline December 6, 2023 – Workshop Note: all deadlines are 11:59PM UTC-12:00 ## Dual submissions We allow dual submissions with EMNLP, and we encourage relevant papers that were dual-submitted and accepted at EMNLP to redirect to a non-archival extended abstract submission. We furthermore welcome submissions of extended abstracts that describe work already presented at an earlier venue, both in the collaborative benchmarking and in the regular submission tracks. ## Preprints We do not have an anonymity deadline, preprints are allowed, both before the submission deadline as well as after. ## Contact Email address: genbench@googlegroups.com Website: genbench.org/workshop