Language Resources and Evaluation

Papers
(The TQCC of Language Resources and Evaluation is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Strategies for managing time and costs in speech corpus creation: insights from the Slovenian ARTUR corpus58
Spelling errors made by people with dyslexia32
A survey on geocoding: algorithms and datasets for toponym resolution32
From LIMA to DeepLIMA: following a new path of interoperability26
Speech acts in the Dutch COVID-19 Press Conferences25
Commonsense based text mining on urban policy22
Investigating the role of swear words in abusive language detection tasks21
Hope speech detection in Spanish20
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks17
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata15
The narratives of war (NoW) corpus of written testimonies of the Russia-Ukraine war15
The Visual Language Research Corpus (VLRC): an annotated corpus of comics from Asia, Europe, and the United States14
Brazilian Portuguese corpora for teaching and translation: the CoMET project13
Spontaneous, controlled acts of reference between friends and strangers12
Construction of Amharic information retrieval resources and corpora12
A new evaluation method: evaluation data and metrics for Chinese grammatical error correction11
Understanding conversational interaction in multiparty conversations: the EVA Corpus11
Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning10
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre10
Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese9
CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese9
A study on methods for revising dependency treebanks: in search of gold9
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI9
A comparative analysis of encoder only and decoder only models in intent classification and sentiment analysis: navigating the trade-offs in model size and performance8
Automatic readability assessment for sentences: neural, hybrid and large language models8
UHated: hate speech detection in Urdu language using transfer learning7
Perspectivist approaches to natural language processing: a survey7
Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype7
adaptNMT: an open-source, language-agnostic development environment for neural machine translation7
Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation models7
Uzbek news corpus for named entity recognition6
Labelling the past: data set creation and multi-label classification of Dutch archaeological excavation reports6
Chinese-DiMLex: a lexicon of Chinese discourse connectives5
DoSLex: automatic generation of all domain semantically rich sentiment lexicon5
Conversion of the Spanish WordNet databases into a Prolog-readable format5
An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual content5
TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis5
Slovenian parliamentary corpus siParl5
Ulysses Tesemõ: a new large corpus for Brazilian legal and governmental domain5
VeLeSpa: An inflected verbal lexicon of Peninsular Spanish and a quantitative analysis of paradigmatic predictability4
Manfred Stede and Jodi Schneider: Argumentation mining. Synthesis lectures on human language technologies, edited by Graeme Hirst4
PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews4
Benchmarking Hindi-to-English direct speech-to-speech translation with synthetic data4
The ParlaMint corpora of parliamentary proceedings4
Managing, storing, and sharing long-form recordings and their annotations4
Open source platform for Estonian speech transcription4
Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection4
Multi-task learning for multi-dialect Arabic sentiment classification and sarcasm detection4
KurdiSent: a corpus for kurdish sentiment analysis4
Language resources for clinical linguistics: introduction to the special issue4
Identifying communicative functions in discourse with content types4
Studying word meaning evolution through incremental semantic shift detection4
The WASABI song corpus and knowledge graph for music lyrics analysis4
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus3
Constructing a cross-document event coreference corpus for Dutch3
Correction: Cross-linguistically consistent semantic and syntactic annotation of child-directed speech3
Design and construction of Guayaquil radio speech corpus (CHARG)3
Correction to: Two sepedi‑english code‑switched speech corpora3
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian3
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect3
The limitations of irony detection in Dutch social media3
Developing and testing syllabification systems for South African Sesotho3
ArgRewrite V.2: an annotated argumentative revisions corpus3
Low resource language specific pre-processing and features for sentiment analysis task3
A corpus of English learners with Arabic and Hebrew backgrounds3
RUN-AS: a novel approach to annotate news reliability for disinformation detection2
Correction to: Semi-automation of gesture annotation by machine learning and human collaboration2
Assessment of pragmatic abilities and cognitive substrates (APACS) brief remote: a novel tool for the rapid and tele-evaluation of pragmatic skills in Italian2
Correction: COLLIE: a broad-coverage ontology and lexicon of verbs in English2
PRAUTOCAL corpus: a corpus for the study of Down syndrome prosodic aspects2
Using BERT models for breast cancer diagnosis from Turkish radiology reports2
MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish2
Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection2
Investigating interoperable event corpora: limitations of reusability of resources and portability of models2
A tale of four parsers: methodological reflections on diagnostic evaluation and in-depth error analysis for meaning representation parsing2
DILLo: an Italian lexical database for speech-language pathologists2
OLID-BR: offensive language identification dataset for Brazilian Portuguese2
OMCD: Offensive Moroccan Comments Dataset2
FullStop: punctuation and segmentation prediction for Dutch with transformers2
Correction to: Resources for Turkish natural language processing: A critical survey2
Evaluation of a rule-based approach to automatic factual question generation using syntactic and semantic analysis2
Normalized dataset for Sanskrit word segmentation and morphological parsing2
Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus2
A Spanish dataset for reproducible benchmarked offline handwriting recognition2
PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines2
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation2
The Hmong Medical Corpus: a biomedical corpus for a minority language2
Finnish parliament ASR corpus2
Benchmark of public intent recognition services2
Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish2
A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases2
0.041833877563477