Talk title: Rethinking Evaluation for the Future of Language Models
Abstract: Evaluation plays a central, but often underestimated, role in how large language models are developed and understood. This talk critically examines current evaluation practices, highlighting how they shape perceptions of model progress while often overlooking key challenges in robustness, multilingual performance, and real-world reliability. By reflecting on these gaps, I make the case for rethinking evaluation as a guiding force, and not just a final checkpoint, in building more capable, inclusive, and trustworthy LLMs.
Bio: Marzieh Fadaee is a staff research scientist at Cohere Labs (formerly Cohere For AI) whose work centers on multilingual language models, data-efficient learning, and robust evaluation methods. Her research aims to improve language technologies for diverse and low-resource languages, with a focus on understanding how data selection, representation, and model design influence generalization and fairness. She has published extensively on these topics and has contributed to the development of widely used multilingual benchmarks and datasets. In addition to her research, she is involved in mentoring early-career researchers and actively engages in community-driven science initiatives.
Talk title: A blueprint for adaptive and tokenizer-free foundation models
Abstract: Foundation models (FMs) process information as a sequence of internal representations; however, the length of this sequence is fixed and entirely determined by tokenization. This essentially decouples representation granularity from information content, which exacerbates the deployment costs of FMs and narrows their “horizons” over long sequences. What if, instead, we could free FMs from tokenizers by modelling bytes directly, while making them faster than current tokenizer-bound FMs? To achieve this goal, I will show how to: 1) learn tokenization end-to-end, by dynamically pooling representations in internal layers and progressively learning abstractions from raw data; 2) compress the KV cache (memory) of Transformers adaptively during generation without loss of performance; 3) predict multiple bytes per time step in an efficient yet expressive way; 4) retrofit existing tokenizer-bound FMs into byte-level FMs through cross-tokenizer distillation. By blending these ingredients, we may soon witness the emergence of new, more efficient architectures for foundation models.
Bio: Edoardo M. Ponti is an assistant professor in Natural Language Processing at the University of Edinburgh and a visiting professor at NVIDIA. Previously, he was a visiting postdoctoral scholar at Stanford University and a postdoctoral fellow at Mila Montreal. In 2021, he obtained a PhD in computational linguistics from the University of Cambridge, St John’s College. His main research foci are efficient architectures for foundation models (tokenizer-free and adaptive memory), modular deep learning, and grounded typology. His research earned him a Google Research Faculty Award and 3 best paper awards. He is a board member and co-founder of SIGTYP and a scholar of the European Lab for Learning and Intelligent Systems (ELLIS). He is also a (terrible) violinist and an aspiring practitioner of heroic viticulture.
Talk title: Semantic and perceptual knowledge in Large Language Models' embedding space: Insights, challenges and perspectives
Abstract: Text-based Large Language Models (LLMs) encode rich linguistic and world knowledge but have often been criticized for lack of grounding. In this talk, I will first present approaches that can serve to identify different types of semantic knowledge in LLMs' embedding space, much of which is implicit (not explicitly stated) in language. I will then go on to address the question of grounding, specifically exploring the implicit traces of perceptual and sensorimotor knowledge in LLMs. Drawing on insights from cognitive science and psycholinguistics about the indirect grounding of concepts, I will show that existing semantic analysis methods can also serve to detect traces of different perceptual modalities in embeddings and to achieve some sort of grounding through language. The data used in these experiments are synthetic (LLM-generated) as well as norms of sensorimotor strength initially developed to test embodied theories of cognition.
Bio: Marianna Apidianaki is a Senior Researcher in the Department of Computer and Information Science (CIS) of the School of Engineering and Applied Science at the University of Pennsylvania. She is currently on leave from the French National Research Center (CNRS) where she holds a tenured researcher position. From 2019 to 2021, she worked in the Language Technology lab at the University of Helsinki. Her research interests focus on computational linguistics and natural language processing, specifically in the area of semantics. She is interested in both theoretical and application-oriented aspects of natural language semantics and commonsense reasoning, and in cognitive aspects of language learning. Marianna Apidianaki is the President of SIGLEX (the ACL Special Interest Group on the Lexicon). She was Program Chair for the COLING 2025 and *SEM 2020 conferences, and co-chair of SemEval 2017, 2018 and 2019. She has organised several workshops, serves as Senior Area Chair for *CL/NLP conferences and as Senior Action Editor for ARR. She was one of the editors of the Special Issue of the Computational Linguistics journal on Language Learning, Representation, and Processing in Humans and Machines.
Talk title: With great power... My journey in Responsible AI
Abstract: As LLMs become part of products used daily by millions of people, it is increasingly urgent to ensure that these models are developed and operate responsibly. In this talk, I will share my journey in Responsible AI, from my early work in automated fact verification to the development of the Amazon Nova family of foundational models and my new role in the Information Commissioner's Office (ICO), the UK's data protection regulator. I will share some examples of RAI research that I and my former team have done, including a linguistically-inspired text-to-text LLM watermarking method. I will also talk about the role regulatory agencies like the ICO play in the extremely fast-paced world of post-ChatGPT AI. Finally, I am going to discuss how RAI is practiced in an industry setting and how it is influenced by and sometimes informs AI regulations.
Bio: Dr Christos Christodoulopoulos is a Principal Technology Adviser for the Information Commissioner's Office, the UK's data protection regulator, working with the AI Policy and AI Compliance teams. Before that, he was a Senior Applied Scientist at Amazon, working on Responsible AI for Alexa and the Amazon Nova family of Foundational Models. He has a PhD in Computational Linguistics from the University of Edinburgh, and he was a postdoctoral researcher at the University of Illinois. He is the Program Chair for EMNLP 2025, and was the General Chair for the 2021 Truth and Trust Online conference.
Talk title: Worth the effort? Large Language Models for claim verification, syntactic analysis and taxonomy creation
Abstract: In this talk, I describe recent experiments in applying Large Language Models (LLMs) to three quite different Natural Language Processing tasks. The first is automatic claim verification, in which the task is to verify a claim given a context and provide a justification for the decision. This is a type of reading comprehension task, and it cannot be done reliably well with only a superficial understanding of the claim and its context. The second is part-of-speech tagging of Irish language social media text. To do this task reliably well requires knowledge of contemporary Irish syntax and morphology. The third is taxonomy completion/generation. Given a seed set of concepts in a particular domain (e.g. food), the task is to place the concepts in an existing taxonomy or to generate a taxonomy from scratch. Domain expertise is required to do this task reliably well. The experiments across the three tasks use recent open-source and closed-source LLMs in various prompting and fine-tuning scenarios, and compare to older technologies that do not rely on LLMs. The results are a mixed bag - LLMs can sometimes outperform older techniques but not always, and it remains an open question how much effort is required to move from reasonable to reliably good performance with these so-called “everything machines”.
Bio: I am an experienced research leader in the field of Natural Language Processing. I have authored 100+ peer-reviewed publications on Natural Language Processing topics including language parsing and generation, interpretability, sentiment analysis, question answering, grammar checking and Irish language technology, and I have supervised thirteen PhD students to completion. I am a regular programme committee member for the main conferences in my field, and from 2016 to 2019, I served on the executive board of the Association for Computational Linguistics. I teach Natural Language Processing to students on the B.Sc. in Data Science and M.Sc in Computing in Dublin City University (DCU), and I am the Associate Dean for Teaching and Learning in the Faculty of Engineering and Computing in DCU.
Talk title: Challenges in Curating Data for Building a Multilingual LLM
Abstract: The performance of foundation language models largely depends on the data they are trained on. This applies to all the training stages, and the selection of data is especially crucial for the pre-training when the capacities of the base model are being laid down. In this talk, I will present the efforts we make at the Barcelona Supercomputing Centre to create a comprehensive dataset for our highly multilingual open-source models of the Salamandra family and our contribution to the data-related tasks of the OpenEuroLLM project. I will discuss our approaches to data gathering, processing, quality scoring and train mixture compilation, focusing on specific measures to mitigate over-reliance on high-resource languages and enhance the performance in mid- and low-resource languages.
Bio: Dr Alexander Shvets is a Senior Researcher at the Language Technologies Lab at the Barcelona Supercomputing Center (BSC). He leads the effort in the curation of data for the creation of foundational multilingual language models. He contributes to developing models and conducts research on deep learning techniques for applied tasks in text analysis. Alexander earned his PhD in Computer Science from the Russian Academy of Sciences ten years ago. Before joining BSC last year, he worked as a lead researcher at the natural language processing group at Pompeu Fabra University (UPF), in a number of large-scale international EU-funded R&D and knowledge transfer projects. He was the Virtual Conference Chair at COLING 2025. His research interests are focused on deep learning approaches for text understanding, including multilingual entity extraction, detection of multilingual lexical collocations, distributed dialogue assistance, aspect-oriented sentiment analysis, hate speech analysis, and text generation.
Talk title (interdisciplinary NLP session): Applying Corpus Linguistics and NLP in a Minoritised Language Context
Abstract: he creation of language technology resources in a minoritised language context poses interesting challenges but also presents opportunities that are not always available to developers of such resources for larger languages. In this presentation, I will demonstrate how scrutiny of the unique context of a specific minoritised language, combined with meaningful collaboration with potential user groups, can determine the design and construction of language resources. This presentation showcases some recent interdisciplinary and cross-institutional projects involving applied/corpus linguists and colleagues in NLP. It will include a short overview of CorCenCC (the National Corpus of Contemporary Welsh), a range of satellite projects including Thesawrws and FreeTxt (a bilingual toolkit that supports the analysis and visualisation of free text data), and a summary of the recently launched GDC-WDG website (an online collection of freely available digital resources designed to support the exploration, analysis, learning, and referencing of the Welsh language). Ongoing work on readability and CEFR prediction will also be profiled. The creation of these resources involved the development of important new tools and processes, including, in the case of CorCenCC, a unique user-driven corpus design in which language data was collected and validated through crowdsourcing, and an in-built pedagogic toolkit (Y Tiwtiadur), developed in consultation with representatives of all anticipated academic and community user groups. The approaches used to construct the resources mentioned in this talk provide an invaluable template for those researching other minoritised or minority languages. The specifics of how this template might inform corpus construction (and related projects) in such languages will be discussed in more depth during my presentation.
Bio: Dawn Knight is a Professor of English Language and Applied Linguistics at Cardiff University, Wales. Her research interests lie in the areas of corpus linguistics, multimodality and discourse analysis. Dawn has expertise in conceptualising, theorising and applying innovative interdisciplinary approaches/methodologies for extracting and predicting language patterns within/across social and linguistic contexts. Her pioneering work on Welsh language resource development (including CorCenCC and FreeTxt), supported by major AHRC, ESRC and Welsh Government grants, is helping to change the landscape of minoritised language research and the potential real-world applications of corpora/corpus-based enquiry.
Talk title (interdisciplinary NLP session): Interdisciplinary Research between NLP and Political Science: The Triggerpoint Project as a Use Case
Abstract: This talk takes a look at interdisciplinary research projects on the basis of a joint research project. The paper examines how certain politically charged words—such as Brexit, Rwanda, and Feminism—serve as “trigger points” in online discussions, triggering sudden spikes in engagement and emotional intensity. Using over 100 million Reddit comments, the study applies NLP models to measure reactions such as anger, hate speech, and controversiality after a trigger word appears. A causal difference-in-differences design shows these words reliably escalate debates. Theoretical framing highlights how these moments challenge social norms and contracts, distinct from dog whistles. The project links linguistic cues to affective polarisation and sets the stage for mapping triggers across cultures and platforms.
Bio: Christian Arnold's work lies at the intersection between social science, computer science and statistics. Over the past decades, digitalisation has profoundly transformed the world, presenting both opportunities and challenges. The accumulation and analysis of data have the potential to improve lives, but also pose significant threats to privacy and democratic governance. For instance, open data initiatives can benefit the population but also challenge data privacy. Similarly, social media can organise political movements but may undermine democratic societies. It requires interdisciplinary collaboration, particularly between the social sciences, computer science, and statistics, to manage such complexities and opportunities of a data-driven culture. Before joining the University of Birmingham, he was a Senior Lecturer in the School of Law and Politics at Cardiff University and a Departmental Lecturer at Oxford University. He also worked as a Data Scientist in the industry. His PhD in Political Science is from the Graduate School of the University of Mannheim.
Talk title (interdisciplinary NLP session): Multi-agency safeguarding, evaluation research and NLP
Abstract: Keeping children safe is everyone’s responsibility. Children come in contact with professionals from different organisations at different times for different reasons. Yet it is difficult for any one agency to make appropriate decisions about potential safeguarding risks as they often only hold a piece of the information. This means that agencies such as social care, police, education and health, need to share information and work together to safeguard children effectively. Information sharing, collaboration and decision-making can be challenging for a number of reasons. Serious case reviews of abused children continue to highlight missed opportunities to keep children safe. Evidence based interventions to improve multi-agency working are urgently required to improve our ability to keep children safe. This presentation will explore the potential for NLP to support and improve effective child safeguarding, and introduce SCALE a new research centre for social care and artificial intelligence at Cardiff University.
Bio: Dr Bennett is mixed-methods researcher with 10 years’ experience in the evaluation of safeguarding interventions in child health and social care. She has a particular interest in the use of advanced computing for injury interpretation and identifying signs of physical child abuse. She is keen to explore ways that NLP technology can help translate and communicate complex and specialist knowledge to help agencies in contact with children make better decisions about when and how to request social care involvement.