Language Resources and Evaluation

Papers
(The TQCC of Language Resources and Evaluation is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
Machine translation systems and quality assessment: a systematic review59
DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text46
A comparative evaluation and analysis of three generations of Distributional Semantic Models24
Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity18
The ParlaMint corpora of parliamentary proceedings18
A large English–Thai parallel corpus from the web and machine-generated text16
Automatic genre identification: a survey16
SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news15
Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale15
Low resource language specific pre-processing and features for sentiment analysis task15
Machine translation in society: insights from UK users14
Roman Urdu toxic comment classification13
AI2D-RST: a multimodal corpus of 1000 primary school science diagrams13
The impact of preprocessing on word embedding quality: a comparative study10
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks9
Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection8
Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool8
The Electronic Corpus of 17th- and 18th-century Polish Texts8
TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese8
LDC-IL: The Indian repository of resources for language technology7
Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology7
Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach7
Investigating the role of swear words in abusive language detection tasks6
A large and evolving cognate database6
The robotic-surgery propositional bank6
Exploring the role of lexis and grammar for the stable identification of register in an unrestricted corpus of web documents6
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian6
Commonsense based text mining on urban policy6
Resources for Turkish natural language processing: A critical survey6
SetembroBR: a social media corpus for depression and anxiety disorder prediction6
Towards alignment strategies in human-agent interactions based on measures of lexical repetitions5
Label modification and bootstrapping for zero-shot cross-lingual hate speech detection5
Content-free speech activity records: interviews with people with schizophrenia4
Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha4
Nonverbal communication with emojis in social media: dissociating hedonic intensity from frequency4
Detecting explicit lyrics: a case study in Italian music4
OLID-BR: offensive language identification dataset for Brazilian Portuguese4
NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese4
Finnish parliament ASR corpus4
Predicting lexical complexity in English texts: the Complex 2.0 dataset4
A Spanish dataset for reproducible benchmarked offline handwriting recognition4
Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system4
Labelling the past: data set creation and multi-label classification of Dutch archaeological excavation reports4
Representing variation in a spoken corpus of an endangered dialect: the case of Torlak4
A multi-source entity-level sentiment corpus for the financial domain: the FinLin corpus4
PRAUTOCAL corpus: a corpus for the study of Down syndrome prosodic aspects3
Unparalleled sarcasm: a framework of parallel deep LSTMs with cross activation functions towards detection and generation of sarcastic statements3
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus3
A semantics-aware approach for multilingual natural language inference3
DISCO PAL: Diachronic Spanish sonnet corpus with psychological and affective labels3
Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations3
TuLeD (Tupían lexical database): introducing a database of a South American language family3
Two languages, one treebank: building a Turkish–German code-switching treebank and its challenges3
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation3
Semi-automation of gesture annotation by machine learning and human collaboration3
Register identification from the unrestricted open Web using the Corpus of Online Registers of English3
Annotating affective dimensions in user-generated content3
LexO: an open-source system for managing OntoLex-Lemon resources3
The WASABI song corpus and knowledge graph for music lyrics analysis3
Making the most of comparable corpora in Neural Machine Translation: a case study3
Sentence boundary detection of various forms of Tunisian Arabic3
LanguageCrawl: a generic tool for building language models upon common Crawl2
Składnica: a constituency treebank of Polish harmonised with the Walenty valency dictionary2
CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese2
ArgRewrite V.2: an annotated argumentative revisions corpus2
Automatic generation of creative text in Portuguese: an overview2
Understanding conversational interaction in multiparty conversations: the EVA Corpus2
Jira: a Central Kurdish speech recognition system, designing and building speech corpus and pronunciation lexicon2
Two sepedi-english code-switched speech corpora2
MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish2
Broad coverage emotion annotation2
Redundancy and coverage aware enriched dragonfly-FL single document summarization2
Sense representations for Portuguese: experiments with sense embeddings and deep neural language models2
Multi-domain adaptation for named entity recognition with multi-aspect relevance learning2
Text complexity of open educational resources in Portuguese: mixing written and spoken registers in a multi-task approach2
Towards the benchmarking of question generation: introducing the Monserrate corpus2
Speech acts in the Dutch COVID-19 Press Conferences2
Spelling errors made by people with dyslexia2
Manipuri–English comparable corpus for cross-lingual studies2
Assessment of pragmatic abilities and cognitive substrates (APACS) brief remote: a novel tool for the rapid and tele-evaluation of pragmatic skills in Italian2
Corpora compilation for prosody-informed speech processing2
UHated: hate speech detection in Urdu language using transfer learning2
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre2
Live blog summarization2
The LRE Map: what does it tell us about the last decade of our field?2
A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals2
0.034826993942261