Horacio Saggion (Pompeu Fabra University)
Talk title: Natural Language Processing for Accessible Information: Simplifying Words in Context
Abstract: Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. In this talk I will cover current efforts from our Laboratory to address multilingual lexical simplification. I will describe a new approach we are investigating to make lexical simplification more “controllable” and report our latest findings. Also related to accessible information, I will also like to take the opportunity to briefly motivate our current work in a still uncharted, challenging area for us: Sign Language Translation.
Bio: Horacio Saggion is chair in Computer Science and Artificial Intelligence at the Department of Information & Communication Technologies, Universitat Pompeu Fabra (UPF), Barcelona, Spain. He was appointed by UPF in 2010 as a Ramón y Cajal research fellow from the Spanish Ministry of Science, passing his final evaluation in 2014 and becoming associate professor in 2015 (opposition for permanent position in 2016). He was promoted to full professor in 2021. Horacio is currently director of the Large Scale Text Understanding Systems Lab of the TALN Natural Language Processing Group where he Works in several areas of Natural Language Processing (NLP) automatic text summarization, text simplification, NLP for Sign Languages, information extraction, figurative language, sentiment analysis and related topics. His work combines symbolic and machine learning techniques. Since his arrival to UPF he has obtained funding to carry out his research from different national and international organizations: Ministerio de Industria, Turismo y Comercio (Proyecto Simplext 2010-2013), Ministerio de Economía y Competitividad (Proyecto SKATER-UPF-TALN 2013-2015, TUNER-UPF 2025-2017, individual project ConMuTeS 2020-2023), European Commission (SignON 2021-2023, Dr. Inventor 2014-2016; Able-to-Include 2014-2016, MultiSensor 2013-2015), and projects in NLP in the context of the two María de Maeztu excellence programs awarded to DTIC/UPF (MDM-2015-0502 and CEX2021-001195-M). Horacio also collaborates with the industry through contracts, projects and Doctoral PhDs.
Danushka Bollegala (Amazon / University of Liverpool)
Talk title: Back to the Future — Time Travel with Large Language Models
Abstract: Large Language Models (LLMs) act as the backbone in modern-day NLP systems. They are often trained once, on a fixed snapshot of a massive corpus, and then fine-tuned for various downstream tasks or used with prompts without any fine-tuning. Both pre-training as well as fine-tuning LLMs can be expensive and time consuming. Because of this, LLMs are often unaware of the latest information that did not exist in the training snapshot of the corpus. In this talk, I will first describe a lightweight approach to adapt an LLM from one time stamp to another using prompts (to appear in ACL 2023). I will then explain the difficulty of automatically finding good prompts (EACL 2023). Finally, I will explain unsupervised methods to detect whether a word has changed its meaning over time (to appear in ACL 2023) such that we can on-demand train LLMs for those words, thereby potentially reducing the retraining costs.
Bio: Danushka Bollegala is a Professor at the Department of Computer Science, University of Liverpool and a Scholar at Amazon Search. He obtained his PhD in 2009 from the University of Tokyo, where he subsequently worked as a Lecturer before moving to the UK. He has worked on different topics in NLP such as summarisation, information extraction, lexical/compositional semantics, and social bias mitigation.
Viviana Patti (university of Turin)
Talk title: Abusive Language Detection on Social Media: Are We Far from the Shallow Now?
Abstract: In recent years, abusive language and in particular the phenomenon of online hatred against marginalized and vulnerable groups are exponentially increasing in social media platforms becoming a relevant social problem that needs to be monitored. Computational linguistics techniques have been applied to monitoring and counteracting toxic speech online, with particular emphasis on the development of linguistic resources and automatic tools for analyzing, detecting, and contrasting various forms of abusive language, ranging from anti-immigrants discourse to misogynistic behaviors. The research field encompasses several challenges that will be discussed. The targeted nature of online hatred and the multilingual environment pose challenges related to the development of robust approaches for abusive language detection in multidomain and multilingual settings. Additionally, continuous monitoring requires evaluating the temporal robustness of models over time. Pragmatic aspects associated with the use of profanity must also be addressed. Moreover, abusive language is often expressed and to be interpreted in the context of widespread social phenomena of stereotyping and gender/ethnic discrimination, and we are interested in recognizing implicit forms of abusive language. Two key lines of reflections will be proposed. The first line addresses the challenges related to the need of applying semantic grids of finer-grained analysis, encompassing the examination of intersectional hatred expressions and the identification of underlying phenomena, such as prejudices, unintended biases, and subtle forms of abuse like microaggressions. In this context, the importance of inclusive design in corpus development will be discussed, aiming to incorporate the perspectives of marginalized groups, in accordance with the new perspectivist data manifesto. On the other hand, we will delve into the potential of leveraging computational linguistics methods to actively facilitate interventions against online hate speech. This involves the creation of positive counter-narratives, dedicated to raising awareness about toxic discourse online and empowering individuals facing discrimination to express themselves both in virtual and real-world settings.
Bio: Viviana Patti is Associate Professor of Computer Science at the University of Turin and part of the scientific board and executive committee of the Center for Logic, Language, and Cognition. Her main research interests are in the areas of NLP, Computational Linguistics and Affective Computing, and include sentiment analysis, emotion recognition and irony detection, with a focus on social media texts and the relation between language and social structure. She is applying her research in the field of hate speech monitoring, with a special interest on hate speech against migrants, populist rhetoric and automatic misogyny identification. She has been and is involved in funded projects on fighting hate speech and stereotypes against immigrants. She leads the development of Twitter corpora and models for multilingual hate speech detection and sentiment analysis exploited in shared tasks and international evaluation campaigns for different languages. She coordinated the pilot project "EVALITA4ELG: EVALITA Italian language reference resources, NLP services and tools for the ELG platform" funded by the European Language Grid H2020 project. She has (co-)authored 100+ peer-reviewed-publications, and serves regularly on boards and committees in NLP/AI journals and conferences. Since 2016 she is a member of the board of directors of the Italian Association of Computational Linguistics (AILC), and she serves in the role of vice-president in the new board 2022-25.
Nils Reimers (Cohere)
Talk title: Multilingual Semantic Search
Abstract: Connecting Large Language Models with embeddings and semantic search on your own data has become widely popular. But how does this work in other languages and across languages? Join me for this talk why multilingual semantic search is amazing, how respective models are trained, and new use-cases this unlocks.
Bio: Nils Reimers did his Ph.D. and Post-Doc at the TU Darmstadt, where he created the foundation on how to use transformer networks for semantic search. After his post-doc, he joined Hugging Face to work on self-supervised domain adaptation for semantic search. Last year, Nils joined Cohere.com as director of machine learning to work on large language models for text understanding, including search, classification and text aggregation.
Carolina Scarton (university of Sheffield)
Talk title: One size does not fit all: building NLP models for real-world applications
Abstract: Despite the recent and impressive advances in the Natural Language Processing and Machine Learning areas, most models are still trained on general / large-scale data, assuming an "one-size-fits-all" approach. However, for most real-world applications, personalised, task-oriented models are more suitable and can bring significant improvements as well as reduce unwanted biases. In this talk, I will present mine and my team's work on NLP models for applications that require personalisation and / or are task oriented. I will present our methods and results for classification-based (e.g. disinformation classification) and generation-based (e.g. machine translation) tasks, discussing the challenges of researching personalised, task-oriented NLP.
Bio: Carolina Scarton is a Senior Lecturer in Natural Language Processing at the Department of Computer Science, University of Sheffield, UK. She is a member of the Natural Language Processing group and part of the GATE team. Previously, she worked as an Academic Fellow at the University of Sheffield and as a Research Associate for the WeVerify and SIMPATICO European projects. In 2017, she was awarded a PhD degree in Computer Science from the University of Sheffield, funded by the EXPERT project (a Marie Curie ITN network). She also has a MSc and a BSc degree from the University of São Paulo, Brazil. Dr Scarton is particularly interested in online misinformation detection and verification, text adaptation, machine translation, evaluation of NLP task outputs, and multilingual and multimodal NLP models. She is the Sheffield PI of two research project on the area of disinformation analysis (EDMO Ireland, funded by the European Commission and HORIZON EU VIGILANT, funded by Innovate UK) and acts as a CO-I for one project in disinformation analysis (HORIZON EU vera.ai, funded by Innovate UK) and one project in idiomaticity processing (funded EPSRC).
Tom Kocmi (Microsoft)
Talk title: The Evolution of Automatic Metrics and Open Challenges in LLM era
Abstract: For years, the progress in modeling has outpaced the evaluation in NLP, where we relied predominantly on string-based matching metrics. In this talk, we will outline the differences and benefits between the three classes of metrics: n-gram matching (such as ChrF or BLEU), pretrained models (COMET, BLEURT), and the emerging group of black-box LLMs (GEMBA). We will primarily focus on the last group and how it may shift approach to the automatic evaluation, highlighting open questions and challenges anticipated in the new LLM era.
Bio: Tom Kocmi is a researcher at Microsoft Translator focussing on human and automatic evaluation of machine translation. He coordinates the annual WMT General MT shared task where researchers from both academia and industry compete to build the best performing MT systems.
Maria Liakata (Queen Mary University of London / Turing Institute)
Talk title: Personalised Longitudinal Natural Language Processing
Abstract: In most of the tasks and models that we have made great progress with in NLP in recent years, there isn't a notion of time. However many of these tasks are sensitive to changes and temporality in real world data, especially when pertaining to individuals, their behaviour and their evolution over time. I will introduce our programme of work on personalised longitudinal natural language processing. This consists in developing natural language processing methods to: (1) represent individuals over time from their language and other heterogenous and multi-modal content (2) capture changes in individuals' behaviour over time (3) generate and evaluate synthetic data from individuals' content over time (4) summarise the progress of an individual over time, incorporating information about changes. I will discuss progress and challenges this far as well as the implications of this programme of work for downstream tasks such as mental health monitoring.
Bio: Maria Liakata is Professor in Natural Language Processing (NLP) at the School of Electronic Engineering and Computer Science, Queen Mary University of London and Honorary Professor at the Department of Computer Science, University of Warwick. She holds a UKRI/EPSRC Turing AI fellowship (2019-2025) on Creating time sensitive sensors from user-generated language and heterogeneous content. The research in this fellowship involves developing new methods for NLP and multi-modal data to allow the creation of longitudinal personalized language monitoring. She is also the PI of projects on language sensing for dementia monitoring & diagnosis, opinion summarisation and rumour verification from social media. At the Alan Turing Institute she founded and co-leads the NLP and data science for mental health special interest groups. She has published over 150 papers on topics including sentiment analysis, semantics, summarisation, rumour verification, resources and evaluation and biomedical NLP.
Georgi Karadzhov (University of Cambridge)
Talk title: DEliBots - Deliberation Enhancing Bots
Abstract: Group deliberation occurs in a variety of contexts, such as hiring panels, study groups, and scientific project meetings. What is in common across these applications is that individuals in the group incorporate various deliberation strategies to communicate their ideas and ultimately to reach the best decision. This process of open dialogue creates a platform for ideas to be exchanged, debated, and evaluated, which allows for a wide array of perspectives to be presented and considered. Deliberation offers a framework that both allows for the propagation of ideas within a group, as well as facilitates the refinement of arguments and the introduction of new ideas. In this talk, Georgi will talk about DEliBots - Deliberation Enhancing Bots. The goal of a DEliBot is to improve group collaboration by asking probing questions, facilitating good discussion and healthy dialogue dynamics. More concretely, in this talk, we will discuss the first publicly available collaborative dataset (DeliData) that provides the foundation for dialogue systems research. Then, the talk will discuss how methods for changepoint detection can be incorporated to detect the change of mind within a conversation. Finally, Georgi will present the ongoing work on intervention timing and DEliBot text generation.
Bio: Georgi Karadzhov is a final-year PhD student at the NLIP group at the University of Cambridge. He was part of the organisation of conferences such as Truth and Trust Online and the Deliberation4Good workshop. Further, he organised seminars and meetups such as Cambridge NLIP seminars and PyData Sofia. His most recent work is focused on the DEliBot project - a cross-disciplinary, multi-institution project. In his previous research, Georgi worked on projects related to online trust, fact-checking, and offensive language identification. Before starting his PhD, Georgi worked as an NLP Engineer at SiteGround Webhosting, working primarily on dialogues systems for customer support.