Language Resources and Evaluation

Papers
(The TQCC of Language Resources and Evaluation is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-08-01 to 2025-08-01.)
ArticleCitations
Strategies for managing time and costs in speech corpus creation: insights from the Slovenian ARTUR corpus61
Spelling errors made by people with dyslexia38
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks37
Commonsense based text mining on urban policy28
From LIMA to DeepLIMA: following a new path of interoperability26
Speech acts in the Dutch COVID-19 Press Conferences23
A survey on geocoding: algorithms and datasets for toponym resolution23
Hope speech detection in Spanish19
Investigating the role of swear words in abusive language detection tasks18
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata17
The narratives of war (NoW) corpus of written testimonies of the Russia-Ukraine war16
The Visual Language Research Corpus (VLRC): an annotated corpus of comics from Asia, Europe, and the United States15
Brazilian Portuguese corpora for teaching and translation: the CoMET project14
Prompting encoder models for zero-shot classification: a cross-domain study in Italian14
Spontaneous, controlled acts of reference between friends and strangers13
A new evaluation method: evaluation data and metrics for Chinese grammatical error correction11
Understanding conversational interaction in multiparty conversations: the EVA Corpus11
Construction of Amharic information retrieval resources and corpora11
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre10
Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning10
A study on methods for revising dependency treebanks: in search of gold10
Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese9
Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype8
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI8
adaptNMT: an open-source, language-agnostic development environment for neural machine translation8
CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese8
Human–machine interaction in building an English reference dataset for natural language processing tasks8
UHated: hate speech detection in Urdu language using transfer learning7
Automatic readability assessment for sentences: neural, hybrid and large language models7
A comparative analysis of encoder only and decoder only models in intent classification and sentiment analysis: navigating the trade-offs in model size and performance7
Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation models7
Perspectivist approaches to natural language processing: a survey7
TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis6
The Sanskrit Sembank6
Chinese-DiMLex: a lexicon of Chinese discourse connectives6
An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual content6
Slovenian parliamentary corpus siParl6
Conversion of the Spanish WordNet databases into a Prolog-readable format6
Uzbek news corpus for named entity recognition6
DoSLex: automatic generation of all domain semantically rich sentiment lexicon6
Language resources for clinical linguistics: introduction to the special issue5
Managing, storing, and sharing long-form recordings and their annotations5
Open source platform for Estonian speech transcription5
The WASABI song corpus and knowledge graph for music lyrics analysis5
Ulysses Tesemõ: a new large corpus for Brazilian legal and governmental domain5
Benchmarking Hindi-to-English direct speech-to-speech translation with synthetic data5
Identifying communicative functions in discourse with content types5
VeLeSpa: An inflected verbal lexicon of Peninsular Spanish and a quantitative analysis of paradigmatic predictability5
Studying word meaning evolution through incremental semantic shift detection4
Correction to: Two sepedi‑english code‑switched speech corpora4
Developing and testing syllabification systems for South African Sesotho4
A corpus of English learners with Arabic and Hebrew backgrounds4
Multi-task learning for multi-dialect Arabic sentiment classification and sarcasm detection4
PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews4
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus4
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian4
kidsNARRATE: a versatile corpus for studying Chinese-english bilingual L2 narrative skills in preschoolers4
Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection4
The ParlaMint corpora of parliamentary proceedings4
KurdiSent: a corpus for kurdish sentiment analysis4
ArgRewrite V.2: an annotated argumentative revisions corpus4
Constructing a cross-document event coreference corpus for Dutch4
Design and construction of Guayaquil radio speech corpus (CHARG)3
Using BERT models for breast cancer diagnosis from Turkish radiology reports3
DILLo: an Italian lexical database for speech-language pathologists3
Correction: COLLIE: a broad-coverage ontology and lexicon of verbs in English3
Benchmark of public intent recognition services3
Correction: Cross-linguistically consistent semantic and syntactic annotation of child-directed speech3
Creation of a gold standard Dutch corpus of clinical notes for adverse drug event detection: the Dutch ADE corpus3
FullStop: punctuation and segmentation prediction for Dutch with transformers3
OMCD: Offensive Moroccan Comments Dataset3
Correction to: Semi-automation of gesture annotation by machine learning and human collaboration3
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect3
The limitations of irony detection in Dutch social media3
Part of speech (POS) tagging in Roman Urdu: datasets and models3
The Hmong Medical Corpus: a biomedical corpus for a minority language3
Correction to: Resources for Turkish natural language processing: A critical survey3
Finnish parliament ASR corpus3
0.099816083908081