Language Resources and Evaluation

Papers
(The median citation count of Language Resources and Evaluation is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-10-01 to 2025-10-01.)
ArticleCitations
Spelling errors made by people with dyslexia67
Hope speech detection in Spanish41
Strategies for managing time and costs in speech corpus creation: insights from the Slovenian ARTUR corpus41
Investigating the role of swear words in abusive language detection tasks28
Speech acts in the Dutch COVID-19 Press Conferences28
Commonsense based text mining on urban policy24
A survey on geocoding: algorithms and datasets for toponym resolution23
From LIMA to DeepLIMA: following a new path of interoperability22
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks19
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata19
Brazilian Portuguese corpora for teaching and translation: the CoMET project18
IIT Delhi Dialogue Corpus: a quantitative analysis of a spoken corpus of Hindi18
The Visual Language Research Corpus (VLRC): an annotated corpus of comics from Asia, Europe, and the United States17
Prompting encoder models for zero-shot classification: a cross-domain study in Italian14
Spontaneous, controlled acts of reference between friends and strangers14
The narratives of war (NoW) corpus of written testimonies of the Russia-Ukraine war14
Construction of Amharic information retrieval resources and corpora11
Understanding conversational interaction in multiparty conversations: the EVA Corpus11
A new evaluation method: evaluation data and metrics for Chinese grammatical error correction10
A study on methods for revising dependency treebanks: in search of gold10
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre10
Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype9
Perspectivist approaches to natural language processing: a survey9
Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning9
Automatic readability assessment for sentences: neural, hybrid and large language models9
Human–machine interaction in building an English reference dataset for natural language processing tasks8
Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese8
CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese8
UHated: hate speech detection in Urdu language using transfer learning8
adaptNMT: an open-source, language-agnostic development environment for neural machine translation8
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI8
A comparative analysis of encoder only and decoder only models in intent classification and sentiment analysis: navigating the trade-offs in model size and performance8
Chinese-DiMLex: a lexicon of Chinese discourse connectives7
An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual content7
DoSLex: automatic generation of all domain semantically rich sentiment lexicon7
Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation models7
Conversion of the Spanish WordNet databases into a Prolog-readable format7
Uzbek news corpus for named entity recognition6
Managing, storing, and sharing long-form recordings and their annotations6
Slovenian parliamentary corpus siParl6
Ulysses Tesemõ: a new large corpus for Brazilian legal and governmental domain6
TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis6
The Sanskrit Sembank6
Book Review: The Routledge handbook of discourse and disinformation6
VeLeSpa: An inflected verbal lexicon of Peninsular Spanish and a quantitative analysis of paradigmatic predictability5
PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews5
Language resources for clinical linguistics: introduction to the special issue5
KurdiSent: a corpus for kurdish sentiment analysis5
Open source platform for Estonian speech transcription5
Multi-task learning for multi-dialect Arabic sentiment classification and sarcasm detection5
Benchmarking Hindi-to-English direct speech-to-speech translation with synthetic data5
Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection5
The WASABI song corpus and knowledge graph for music lyrics analysis5
Studying word meaning evolution through incremental semantic shift detection5
The ParlaMint corpora of parliamentary proceedings5
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus4
kidsNARRATE: a versatile corpus for studying Chinese-english bilingual L2 narrative skills in preschoolers4
ArgRewrite V.2: an annotated argumentative revisions corpus4
Design and construction of Guayaquil radio speech corpus (CHARG)4
Correction to: Two sepedi‑english code‑switched speech corpora4
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian4
Developing and testing syllabification systems for South African Sesotho4
Correction: Cross-linguistically consistent semantic and syntactic annotation of child-directed speech4
Part of speech (POS) tagging in Roman Urdu: datasets and models4
Constructing a cross-document event coreference corpus for Dutch4
A corpus of English learners with Arabic and Hebrew backgrounds4
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect4
Using BERT models for breast cancer diagnosis from Turkish radiology reports4
The limitations of irony detection in Dutch social media3
DILLo: an Italian lexical database for speech-language pathologists3
Creation of a gold standard Dutch corpus of clinical notes for adverse drug event detection: the Dutch ADE corpus3
Correction to: Resources for Turkish natural language processing: A critical survey3
Finnish parliament ASR corpus3
FullStop: punctuation and segmentation prediction for Dutch with transformers3
OMCD: Offensive Moroccan Comments Dataset3
Correction to: Semi-automation of gesture annotation by machine learning and human collaboration3
Examining inferred author and textual correlates of harmful language annotation3
Benchmark of public intent recognition services3
The Hmong Medical Corpus: a biomedical corpus for a minority language3
Correction: COLLIE: a broad-coverage ontology and lexicon of verbs in English3
HASTIKA: hate speech and target identification in Kannada-English code-mixed text3
A sentiment corpus for the cryptocurrency financial domain: the CryptoLin corpus2
SOLD: Sinhala offensive language dataset2
A Spanish dataset for reproducible benchmarked offline handwriting recognition2
Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus2
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation2
Assessment of pragmatic abilities and cognitive substrates (APACS) brief remote: a novel tool for the rapid and tele-evaluation of pragmatic skills in Italian2
Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish2
Detecting explicit lyrics: a case study in Italian music2
“You’ll be a nurse, my son!” Automatically assessing gender biases in autoregressive language models in French and Italian2
A comprehensive evaluation of semantic relation knowledge of pretrained language models and humans2
Predicting lexical complexity in English texts: the Complex 2.0 dataset2
A Chinese natural speech complex emotion dataset based on emotion vector annotation method2
Multi-domain adaptation for named entity recognition with multi-aspect relevance learning2
Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection2
MulCogBench: a multi-modal cognitive benchmark dataset for evaluating Chinese and English computational language models2
MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish2
OLID-BR: offensive language identification dataset for Brazilian Portuguese2
Automatic genre identification: a survey2
The Najdi Arabic Corpus: a new corpus for an underrepresented Arabic dialect2
DiscoNaija: a discourse-annotated parallel Nigerian Pidgin-English corpus2
Aspect-based multimodal sentiment analysis via employing visual-to-emotional-caption translation network using visual-caption pairs2
Detection of political hate speech in Korean language2
COLLIE: a broad-coverage ontology and lexicon of verbs in English2
RUN-AS: a novel approach to annotate news reliability for disinformation detection2
Text complexity of open educational resources in Portuguese: mixing written and spoken registers in a multi-task approach2
Evaluation of a rule-based approach to automatic factual question generation using syntactic and semantic analysis2
A tale of four parsers: methodological reflections on diagnostic evaluation and in-depth error analysis for meaning representation parsing2
Automating translation checks of financial documents using large language models2
PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines2
Investigating interoperable event corpora: limitations of reusability of resources and portability of models2
Normalized dataset for Sanskrit word segmentation and morphological parsing2
Data-driven weakly supervised emotion classification with consistency regularization: Mandarin Chinese as a case2
Beyond plain toxic: building datasets for detection of flammable topics and inappropriate statements2
Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text1
VeLeRo: an inflected verbal lexicon of standard Romanian and a quantitative analysis of morphological predictability1
Rei Miyata: controlled document authoring in a machine translation age1
Evaluation of the Brazilian Portuguese version of linguistic inquiry and word count 2015 (BP-LIWC2015)1
Entity normalization in a Spanish medical corpus using a UMLS-based lexicon: findings and limitations1
The corpus of aggressive language in Polish parliamentary debates1
POS tagging of low-resource Pashto language: annotated corpus and BERT-based model1
Human–robot dialogue annotation for multi-modal common ground1
A new methodology for automatic creation of concept maps of Turkish texts1
Czech news dataset for semantic textual similarity1
Fake news article detection datasets for Hindi language1
A survey and study impact of tweet sentiment analysis via transfer learning in low resource scenarios1
A new corpus of geolocated ASR transcripts from Germany1
An aligned corpus of Spanish bibles1
CsFEVER and CTKFacts: acquiring Czech data for fact verification1
Building a specialised Hebrew textual corpus on construction, planning and architecture1
Detoxifying language model outputs: combining multi-agent debates and reinforcement learning for improved summarization1
NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese1
A corpus of Persian literary text1
Human-inspired computational models for European Portuguese: a review1
Historical Portuguese corpora: a survey1
Spoken Spanish PoS tagging: gold standard dataset1
The Mandarin Chinese speech database: a corpus of 18,820 auditory neutral nonsense sentences1
Error annotation: a review and faceted taxonomy1
Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language1
Infectious risk events and their novelty in event-based surveillance: new definitions and annotated corpus1
Attention and LoRA-based multimodal emotion detection system1
Disfluency annotated corpora for Indian English in technical domains1
Multilingual prediction of semantic norms with language models: a study on English and Chinese1
RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese1
From extended chunking to dependency parsing using traditional Arabic grammar1
Editorial: LRE updates1
The robotic-surgery propositional bank1
A rich task-oriented dialogue corpus in Vietnamese1
Evaluation of end-to-end continuous spanish lipreading in different data conditions1
Building a relevance feedback corpus for legal information retrieval in the real-case scenario of the Brazilian Chamber of Deputies1
Building the VisSE Corpus of Spanish SignWriting1
Speech emotion recognition for the Urdu language1
Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches1
Regionalized models for Spanish language variations based on Twitter1
Disfluency processing for cascaded speech translation involving English and Indian languages1
CINWA (database of terminology for cultivated plants in indigenous languages of northwestern South America): introducing a resource for research in ethnobiology, anthropology, historical linguistics, 1
Semi-automation of gesture annotation by machine learning and human collaboration1
Umplc: the first longitudinal learner corpus of Portuguese1
DepreSym: A Depression Symptom Annotated Corpus and the Role of Large Language Models as Assessors of Psychological Markers1
Improving Arabic sentiment analysis across context-aware attention deep model based on natural language processing1
Two sepedi-english code-switched speech corpora1
A flexible tool for a qualia-enriched FrameNet: the FrameNet Brasil WebTool1
Content-free speech activity records: interviews with people with schizophrenia1
Parallel Trees: a novel resource with aligned dependency and constituency syntactic representations1
0.27176809310913