DiBiMT is the first fully-manually curated benchmark for understanding and measuring Disambiguation Biases in Neural Machine Translation. Recipient of the Best Resource Paper award at ACL2022.
ExtEnD is a reformulation as a text extraction problem of Entity Disambiguation, the task of linking a mention in context with its most suitable entity in a reference knowledge base.
Probing LMs for Predicate Argument Structures
This study investigates the ability of current pretrained language models to capture predicate-argument structures, providing insights into how we can take advantage of such knowledge to improve Semantic Role Labeling systems.
Nibbling at the hard core of WSD
An in-depth study on modern approaches and evaluations for Word Sense Disambiguation. As a result, we outline what is currently missing and put forward a novel benchmark to measure future progress in WSD.
SRL4E is a unified evaluation framework focused on Semantic Role Labeling for Emotions, which unifies several datasets tagged with emotions and their semantic roles by using a common labeling scheme.
BMR is a new language-independent formalism that abstracts away from language-specific constraints thanks to two multilingual semantic resources, BabelNet and VerbAtlas. To put our formalism into practice, we also created BMR 1.0, the first dataset labeled according to BMR.
Visual Definition Modeling
Visual Definition Modeling is a multimodal task, where, given a context
represented by an image and an object (either a visual patch of the image or a word that represents the concept shown by the image), a model has to generate a textual definition.
The task aims at investigating whether modern multimodal architectures have a deep comprehension of words and objects in a visual context.
STEPS is the first sequence-to-sequence approach tackling the task of event process typing, which aims at understanding the overall goal of a protagonist in terms of an action and an object, given a sequence of events.
BabelNet Meaning Representations
BabelNet Meaning Representation is a cross-lingual Semantic Parsing formalism that provides a unified text representation across languages. BMR aims to be easy to understand for humans and to process for machines but also abstracts away from individual languages thanks to BabelNet synsets and VerbAtlas frames.
GeneSis is the first generative approach to English lexical substitution. With this novel approach, we reach state-of-the-art results; moreover, we can effortlessly generate silver data for the task. We assess the quality of the generated resources both qualitatively and quantitatively, showing that the released datasets can help supervised models improve.
Currently available systems for Entity Linking often require pretraining on massive amounts of data in order to achieve state-of-the-art results. In this paper, we address this issue and present several ways to exploit Named Entity Recognition to narrow the gap between Entity Linking systems trained on large and small datasets.
Sense-enhanced Information Retrieval (SIR) brings Word Sense Disambiguation and Information Retrieval closer and provides additional semantic information for the query via sense definitions.
This semantic information leads to improvements over a baseline that does not access semantics in multiple languages.
Exemplification Modeling: Can You Give Me an Example, Please?
Starting from (word, definition) pairs, we present a neural architecture capable of automatically generating usage examples for the word according to the requested semantics. It is possible to create high-quality sense-tagged data which cover the full range of meanings in any inventory of interest, and their interactions within sentences. The use of generated data as training corpus for Word Sense Disambiguation enables outperforming the current state of the art.
The first automatically-built, large-scale resource for lexical substitution. Through an automated approach, we are finally able to extract substitutes for words in context, building a large-scale dataset that allows simple models to be finetuned on lexical substitution, achieving results that compete with complex state-of-the-art models.
SGL: Speaking the Graph Languages of Semantic Parsing via Multilingual Translation
A state-of-the-art approach to cross-framework and cross-lingual semantic parsing, where we frame the task as multilingual NMT. This pushes the overall performances further up thanks to transfer learning and, besides, enables the usage of a single shared model.
A cross-lingual large-scale evaluation benchmark for the WSD task
featuring sense-annotated development and test sets in 18 languages from six different
linguistic families, together with language-specific silver training data.
A neural seq2seq model which contextualizes a target expression in a sentence by generating an ad hoc definition. The work is a unified approach to computational lexical-semantic tasks, encompassing state-of-the-art Word Sense Disambiguation, Definition Modeling and Word-in-Context.
A Survey on Multilingual Sense-Annotated Corpora for Word Sense Disambiguation
A survey picturing the main challenges in the field of multilingual Word Sense Disambiguation and highlighting the most important efforts in mitigating the knowledge-acquisition bottleneck problem.
A multilingual Word Sense Disambiguation system powered by SyntagNet. Designed to be fast, reliable, and easily accessible, SyntagRank allows the automatic labeling of concepts and named entities within the input sentence by exploiting the syntagmatic relations between them.
A manually-curated large-scale lexical-semantic combination database which associates pairs of concepts with pairs of co-occurring words, hence capturing sense distinctions evoked by syntagmatic relations. The database currently covers 78,000 noun-verb and noun-noun lexical combinations, with 88,019 semantic combinations linking 20,626 WordNet 3.0 unique synsets with a relation edge.
Sense Distribution Learning: EnDI and DaD
Two knowledge-based approaches for learning sense distributions from raw text data. Both approaches proved to attain state-of-the-art results in predicting the Most Frequent Sense of a word and to effectively scale to different languages.
Description: A multilingual sense-annotated resource, automatically built via the joint disambiguation of the Europarl parallel corpus in 21 languages, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities, drawn from the multilingual sense inventory of BabelNet
AMuSE-WSD provides an easy way to disambiguate text in 40 languages thanks to its state-of-the-art multilingual neural model and its intuitive API. Take advantage of AMuSE-WSD to integrate sense knowledge into multilingual downstream applications!
UniteD-SRL is a new benchmark for multilingual and cross-lingual Semantic Role Labeling. Differently from previous efforts, UniteD-SRL provides parallel gold-standard development and test sets annotated with a single inventory, VerbAtlas. UniteD-SRL is available in 4 languages: English, Chinese, French and Spanish.
SPRING Online Services provide a Web interface and RESTful APIs for our state-of-the-art AMR parsing and generation system, SPRING (Symmetric PaRsIng aNd Generation).
The Web interface has been developed to be easily used by the Natural Language Processing community, as well as by the general public.
It provides, among other things, a highly interactive visualization platform and a feedback mechanism to obtain user suggestions for further improvements of the system’s output.
MultiMirror: Neural Cross-lingual Word Alignment\\for Multilingual Word Sense Disambiguation
Sense projection approach for multilingual WSD. Based upon a novel neural model for word alignment, MultiMirror automatically generates sense-annotated datasets in multiple languages that lead a simple mBERT-powered classifier to surpass the previous state of the art on standard benchmarks in Multilingual WSD.
Generating Senses and RoLes: An End-to-End Model for Dependency- and Span-based Semantic Role Labeling
A state-of-the-art approach for end-to-end Semantic Role Labeling based on joint generation of senses and roles that rivals the long-standing best-performing sequence labeling approaches.
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources
A state-of-the-art approach to exploit heterogeneous data to perform cross-lingual Semantic Role Labeling over 100 languages with 6 different inventories.
Recipient of the NAACL 2021 outstanding paper award.
A simple, seq2seq symmetric Text-to-AMR parsing / AMR-to-Text
generation approach which exploits a pre-trained encoder-decoder and achieves state-of-the-art performance in both tasks.
A cross-lingual AMR parser that exploits the existing training data in English to transfer semantic representations across languages. The achieved results shed light on the applicability of AMR as an interlingua and set the state of the art in Chinese, German, Italian and Spanish cross-lingual AMR parsing. Furthermore, a detailed qualitative analysis shows that the proposed parser can overcome common translation divergences among languages.
A methodology to create effective multimodal sense embeddings starting from BabelPic. We prove such embeddings to improve the performance of a Word Sense Disambiguation architecture over a strong unimodal baseline.
CluBERT is a multilingual approach to inducing the distributions of word senses from a corpus of raw sentences by clustering words' occurrences according to their contextual representation.
A novel coarse-grained sense inventory of 45 labels shared across lemmas and parts-of-speech. CSI labels are highly descriptive, allowing humans to easily annotate data. Moreover, when used as sense inventory for WSD, CSI leads a supervised model to reach great performances without making the disambiguation task trivial.
Embedding Words and Senses Together via Joint Knowledge-Enhanced Training
A model which exploits large corpora and knowledge from semantic networks in order to produce a unified vector space of word and sense embeddings.
A large-scale high-quality corpus of disambiguated definitions in multiple languages, comprising sense annotations of both concepts and named entities from a wide-coverage unified sense inventory.
InVeRo-XL is the first prepackaged end-to-end system for cross-lingual Semantic Role Labeling. You can use InVeRo-XL to get predicate sense and semantic role annotations in more than 40 languages and 7 predicate-argument structure inventories!
Integrating Personalized PageRank into Neural Word Sense Disambiguation
We improve EWISER (Bevilacqua and Navigli, 2020) by incorporating an online neural approximated PageRank. Our method exploits the global graph structure while keeping space requirements linear in the number of edges. We obtain strong improvements, matching the current state of the art.
Ten Years of BabelNet: A Survey
BabelNet is now ten years old. In this timeframe it has been functioning as a repository of knowledge in hundreds of different languages. In this survey we document several applications enabled by BabelNet as well as discuss the most fruitful future development directions for the NLP and AI communities.
ESC: Redesigning WSD with Extractive Sense Comprehension
A redesigned approach to Word Sense Disambiguation through Extractive Reading Comprehension. Our system, ESCHER, achieves unprecedented performances on a number of different benchmarks and settings.
Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach
A fully language-agnostic SRL model that does away with morphological and syntactic features to achieve robustness across languages. Our approach outperforms the current state of the art in 6 languages, especially whenever a scarce amount of training data is available.
A semi-supervised approach for producing contextualized multilingual sense representations for all the concepts in a language vocabulary. ARES’ embeddings achieve state-of-the-art results on the English and multilingual Word Sense Disambiguation task, and competitive results in the Word-in-Context task.
MuLaN (Multilingual Label propagatioN) is a label propagation technique tailored to Word Sense Disambiguation and capable of automatically producing sense-tagged training datasets in multiple languages, jointly leveraging contextualized word embeddings and the multilingual information enclosed in knowledge bases.
A neural supervised Word Sense Disambiguation system that is able to incorporate both synset embeddings and the WordNet graph. State-of-the-art results, going for the first time beyond the 80% performance, in English, French, German, Italian and Spanish benchmarks!
A platform created with the aim of making Semantic Role Labeling more accessible to a wider audience: with InVeRo, users can easily annotate sentences with intelligible verbs and roles.
A language-independent method for automatically producing multilingual sense-annotated datasets on a large scale by leveraging Wikipedia's inner structure.
A Transformer-based architecture for contextualized embeddings which makes use of a co-attentive layer to produce more deeply bidirectional representations, better-fitting for the WSD task. As a result, a WSD system trained with QBERT beats the state of the art.
Neural Sequence Learning Models for Word Sense Disambiguation
An in-depth study on end-to-end neural architectures tailored to the WSD task, from bidirectional Long Short-Term Memory to encoder-decoder models.
CONtinuous SEnse Comprehension (ConSeC) is a novel approach to Word Sense Disambiguation: leveraging an extractive re-framing of this task as a text extraction problem, we introduce a feedback loop strategy that allows the disambiguation of a target word to be conditioned not only on its context but also on the explicit senses assigned to nearby words. Using this novel approach, ConSeC sets a new state of the art both in English and Multilingual WSD!
WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER
We exploit the texts of Wikipedia and introduce a new methodology based on the effective combination of knowledge-based approaches and neural models, together with a novel domain adaptation technique, to produce high-quality training corpora for NER in multiple languages.
Recent Trends in Word Sense Disambiguation: A Survey
An up-to-date survey on what’s in modern Word Sense Disambiguation, covering training and test data, as well as automatic classification systems with a focus on the recent trend of hybridization between supervised and knowledge-based algorithms.
Framing Word Sense Disambiguation as a Multi-Label Problem for Model-Agnostic Knowledge Integration
A simple approach to take into account multiple sense annotations for a target word in context. Framing Word Sense Disambiguation as a multi-label classification problem also provides an effective way to seamlessly integrate relational knowledge into a model.
Language-independent vector representations of concepts which place multilinguality at their core while retaining explicit relationships between concepts. Conception achieves state-of-the-art performance in multilingual and cross-lingual word similarity and English Word Sense Disambiguation.
A large multilingual benchmark for the Word in Context task, featuring gold standards in 12 new languages from varied language families and with different degrees of resource availability. XL-WiC opens room for evaluating the lexical-semantic capabilities of neural models on different scenarios such as zero-shot and cross-lingual transfer.
The first multimodal dataset with a focus on non-concrete nominal and verbal concepts which is also linked to WordNet and BabelNet. BabelPic is enhanced with a methodology for the automatic extension of its coverage to any BabelNet synsets.
A novel large-scale manually-crafted semantic resource for wide-coverage, intelligible and scalable Semantic Role Labeling. Its goal is to manually cluster WordNet synsets that share similar semantics into a set of semantically-coherent frames.
A knowledge-based approach for producing sense embeddings in multiple languages that lie in a space comparable with that of BERT contextualized word representations.
LSTMEmbed: Learning Word and Sense Representations from a LargeSemantically Annotated Corpus with Long Short-Term Memories
A study on the capabilities of bidirectional LSTM models to learn representations of word senses from semantically annotated corpora.
A knowledge-based approach for producing large amount of sense-annotated corpora in virtually more than 200 languages. Train-O-Matic paved the way to supervised Word Sense Disambiguation in languages other than English where manually-annotated data are not available.