Language Resources and Evaluation

Papers
(The median citation count of Language Resources and Evaluation is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-03-01 to 2024-03-01.)
ArticleCitations
Resources and benchmark corpora for hate speech detection: a systematic review111
Machine translation systems and quality assessment: a systematic review37
DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text27
A multi-platform dataset for detecting cyberbullying in social media25
Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition17
Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity16
A comparative evaluation and analysis of three generations of Distributional Semantic Models16
The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions15
A large English–Thai parallel corpus from the web and machine-generated text13
The ParlaMint corpora of parliamentary proceedings13
SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news12
Automatic genre identification: a survey11
AI2D-RST: a multimodal corpus of 1000 primary school science diagrams11
Low resource language specific pre-processing and features for sentiment analysis task11
C2SI corpus: a database of speech disorder productions to assess intelligibility and quality of life in head and neck cancers10
Roman Urdu toxic comment classification9
Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale9
Machine translation in society: insights from UK users9
The Electronic Corpus of 17th- and 18th-century Polish Texts8
Developing computational infrastructure for the CorCenCC corpus: The National Corpus of Contemporary Welsh8
Language resources for Maghrebi Arabic dialects’ NLP: a survey7
Mapping languages: the Corpus of Global Language Use7
Fake opinion detection: how similar are crowdsourced datasets to real data?7
Exploring the role of lexis and grammar for the stable identification of register in an unrestricted corpus of web documents6
ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization6
A large and evolving cognate database6
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks6
Comparing web-crawled and traditional corpora6
MEmoFC: introducing the Multilingual Emotional Football Corpus6
The impact of preprocessing on word embedding quality: a comparative study5
LDC-IL: The Indian repository of resources for language technology5
Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology5
Commonsense based text mining on urban policy5
Improvement of sentiment analysis via re-evaluation of objective words in SenticNet for hotel reviews5
The KAS corpus of Slovenian academic writing5
Detecting explicit lyrics: a case study in Italian music4
Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool4
TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese4
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian4
Finnish parliament ASR corpus4
Representing variation in a spoken corpus of an endangered dialect: the case of Torlak4
Labelling the past: data set creation and multi-label classification of Dutch archaeological excavation reports4
Making the most of comparable corpora in Neural Machine Translation: a case study3
PRAUTOCAL corpus: a corpus for the study of Down syndrome prosodic aspects3
Arabic real time entity resolution using inverted indexing3
A multi-source entity-level sentiment corpus for the financial domain: the FinLin corpus3
DISCO PAL: Diachronic Spanish sonnet corpus with psychological and affective labels3
Register identification from the unrestricted open Web using the Corpus of Online Registers of English3
Annotating affective dimensions in user-generated content3
LexO: an open-source system for managing OntoLex-Lemon resources3
LanguageCrawl: a generic tool for building language models upon common Crawl3
Sentence boundary detection of various forms of Tunisian Arabic3
Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha3
TuLeD (Tupían lexical database): introducing a database of a South American language family3
SetembroBR: a social media corpus for depression and anxiety disorder prediction3
Nonverbal communication with emojis in social media: dissociating hedonic intensity from frequency2
Towards the benchmarking of question generation: introducing the Monserrate corpus2
Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach2
Investigating the role of swear words in abusive language detection tasks2
A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals2
Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system2
Broad coverage emotion annotation2
Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations2
Sense representations for Portuguese: experiments with sense embeddings and deep neural language models2
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre2
Live blog summarization2
Development and evaluation of an Urdu treebank (CLE-UTB) and a statistical parser2
Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection2
The robotic-surgery propositional bank2
Writer’s uncertainty identification in scientific biomedical articles: a tool for automatic if-clause tagging2
Semi-automation of gesture annotation by machine learning and human collaboration2
Redundancy and coverage aware enriched dragonfly-FL single document summarization2
Corpora compilation for prosody-informed speech processing2
Unparalleled sarcasm: a framework of parallel deep LSTMs with cross activation functions towards detection and generation of sarcastic statements2
ArgRewrite V.2: an annotated argumentative revisions corpus2
Predicting lexical complexity in English texts: the Complex 2.0 dataset2
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus2
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation2
A Spanish dataset for reproducible benchmarked offline handwriting recognition2
Składnica: a constituency treebank of Polish harmonised with the Walenty valency dictionary2
Semantics-aware typographical choices via affective associations2
Two languages, one treebank: building a Turkish–German code-switching treebank and its challenges2
Towards alignment strategies in human-agent interactions based on measures of lexical repetitions2
Text complexity of open educational resources in Portuguese: mixing written and spoken registers in a multi-task approach1
Jira: a Central Kurdish speech recognition system, designing and building speech corpus and pronunciation lexicon1
DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation1
Content-free speech activity records: interviews with people with schizophrenia1
Assessment of pragmatic abilities and cognitive substrates (APACS) brief remote: a novel tool for the rapid and tele-evaluation of pragmatic skills in Italian1
FullStop: punctuation and segmentation prediction for Dutch with transformers1
A semantics-aware approach for multilingual natural language inference1
NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese1
Correction to: The LRE Map: what does it tell us about the last decade of our field?1
Linguistic annotation of Byzantine book epigrams1
Speech emotion recognition for the Urdu language1
An eye-tracking-with-EEG coregistration corpus of narrative sentences1
Understanding conversational interaction in multiparty conversations: the EVA Corpus1
Universal Dependencies for Mandarin Chinese1
Spelling errors made by people with dyslexia1
JWSAN: Japanese word similarity and association norm1
Manipuri–English comparable corpus for cross-lingual studies1
The WASABI song corpus and knowledge graph for music lyrics analysis1
OMCD: Offensive Moroccan Comments Dataset1
EventDNA: a dataset for Dutch news event extraction as a basis for news diversification1
Resources for Turkish natural language processing: A critical survey1
Evaluating cross-lingual textual similarity on dictionary alignment problem1
adaptNMT: an open-source, language-agnostic development environment for neural machine translation1
ChoCo: a multimodal corpus of the Choctaw language1
Multi-domain adaptation for named entity recognition with multi-aspect relevance learning1
The LRE Map: what does it tell us about the last decade of our field?1
Constructing a cross-document event coreference corpus for Dutch1
Speech acts in the Dutch COVID-19 Press Conferences1
RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese1
A benchmark dataset and evaluation methodology for Chinese zero pronoun translation1
TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus1
Develop corpora and methods for cross-lingual text reuse detection for English Urdu language pair at lexical, syntactical, and phrasal levels1
Benchmark of public intent recognition services1
Identifying communicative functions in discourse with content types1
A rich task-oriented dialogue corpus in Vietnamese1
Rant or rave: variation over time in the language of online reviews1
The B-Subtle framework: tailoring subtitles to your needs1
Determinants of grader agreement: an analysis of multiple short answer corpora1
0.050118923187256