Language Resources and Evaluation

Papers
(The TQCC of Language Resources and Evaluation is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)
ArticleCitations
Speech acts in the Dutch COVID-19 Press Conferences75
Commonsense based text mining on urban policy47
A survey on geocoding: algorithms and datasets for toponym resolution46
From LIMA to DeepLIMA: following a new path of interoperability35
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks25
Strategies for managing time and costs in speech corpus creation: insights from the Slovenian ARTUR corpus25
Hope speech detection in Spanish24
Investigating the role of swear words in abusive language detection tasks24
Spelling errors made by people with dyslexia24
Brazilian Portuguese corpora for teaching and translation: the CoMET project23
The Visual Language Research Corpus (VLRC): an annotated corpus of comics from Asia, Europe, and the United States21
IIT Delhi Dialogue Corpus: a quantitative analysis of a spoken corpus of Hindi20
Prompting encoder models for zero-shot classification: a cross-domain study in Italian19
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata18
The narratives of war (NoW) corpus of written testimonies of the Russia-Ukraine war18
Construction of Amharic information retrieval resources and corpora13
Understanding conversational interaction in multiparty conversations: the EVA Corpus13
A new evaluation method: evaluation data and metrics for Chinese grammatical error correction12
Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning12
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre12
A study on methods for revising dependency treebanks: in search of gold11
Spontaneous, controlled acts of reference between friends and strangers11
A comparative analysis of encoder only and decoder only models in intent classification and sentiment analysis: navigating the trade-offs in model size and performance10
Human–machine interaction in building an English reference dataset for natural language processing tasks10
Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese10
adaptNMT: an open-source, language-agnostic development environment for neural machine translation10
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI10
Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype9
Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation models9
Automatic readability assessment for sentences: neural, hybrid and large language models9
Ma’aks: manually-curated parallel dataset for Arabic text sentiment swap8
Conversion of the Spanish WordNet databases into a Prolog-readable format8
Perspectivist approaches to natural language processing: a survey8
Chinese-DiMLex: a lexicon of Chinese discourse connectives8
CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese8
UHated: hate speech detection in Urdu language using transfer learning8
DoSLex: automatic generation of all domain semantically rich sentiment lexicon8
Book Review: The Routledge handbook of discourse and disinformation7
An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual content7
TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis7
Ulysses Tesemõ: a new large corpus for Brazilian legal and governmental domain7
Uzbek news corpus for named entity recognition7
The Sanskrit Sembank7
Developing and mining an underage modern Greek chat corpus: Do students show signs of bullying behavior while working on a project?7
Managing, storing, and sharing long-form recordings and their annotations7
Slovenian parliamentary corpus siParl7
Studying word meaning evolution through incremental semantic shift detection6
Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection6
VeLeSpa: An inflected verbal lexicon of Peninsular Spanish and a quantitative analysis of paradigmatic predictability6
Language resources for clinical linguistics: introduction to the special issue6
Multi-task learning for multi-dialect Arabic sentiment classification and sarcasm detection6
Benchmarking Hindi-to-English direct speech-to-speech translation with synthetic data6
Open source platform for Estonian speech transcription5
The ParlaMint corpora of parliamentary proceedings5
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian5
Developing and testing syllabification systems for South African Sesotho5
PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews5
Correction to: Two sepedi‑english code‑switched speech corpora5
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus5
KurdiSent: a corpus for kurdish sentiment analysis5
The WASABI song corpus and knowledge graph for music lyrics analysis5
Constructing a cross-document event coreference corpus for Dutch5
A corpus of English learners with Arabic and Hebrew backgrounds5
ArgRewrite V.2: an annotated argumentative revisions corpus4
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect4
The limitations of irony detection in Dutch social media4
DILLo: an Italian lexical database for speech-language pathologists4
Using BERT models for breast cancer diagnosis from Turkish radiology reports4
Correction: Cross-linguistically consistent semantic and syntactic annotation of child-directed speech4
Design and construction of Guayaquil radio speech corpus (CHARG)4
Finnish parliament ASR corpus4
Creation of a gold standard Dutch corpus of clinical notes for adverse drug event detection: the Dutch ADE corpus4
kidsNARRATE: a versatile corpus for studying Chinese-english bilingual L2 narrative skills in preschoolers4
Part of speech (POS) tagging in Roman Urdu: datasets and models4
FullStop: punctuation and segmentation prediction for Dutch with transformers4
The Hmong Medical Corpus: a biomedical corpus for a minority language4
Automating translation checks of financial documents using large language models3
Correction to: Resources for Turkish natural language processing: A critical survey3
Examining inferred author and textual correlates of harmful language annotation3
OMCD: Offensive Moroccan Comments Dataset3
Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish3
Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus3
HASTIKA: hate speech and target identification in Kannada-English code-mixed text3
OLID-BR: offensive language identification dataset for Brazilian Portuguese3
Correction to: Semi-automation of gesture annotation by machine learning and human collaboration3
Investigating interoperable event corpora: limitations of reusability of resources and portability of models3
MulCogBench: a multi-modal cognitive benchmark dataset for evaluating Chinese and English computational language models3
PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines3
MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish3
Assessment of pragmatic abilities and cognitive substrates (APACS) brief remote: a novel tool for the rapid and tele-evaluation of pragmatic skills in Italian3
Correction: COLLIE: a broad-coverage ontology and lexicon of verbs in English3
A tale of four parsers: methodological reflections on diagnostic evaluation and in-depth error analysis for meaning representation parsing3
0.34413003921509