Language Resources and Evaluation

Papers
(The median citation count of Language Resources and Evaluation is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
Machine translation systems and quality assessment: a systematic review59
DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text46
A comparative evaluation and analysis of three generations of Distributional Semantic Models24
Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity18
The ParlaMint corpora of parliamentary proceedings18
Automatic genre identification: a survey16
A large English–Thai parallel corpus from the web and machine-generated text16
Low resource language specific pre-processing and features for sentiment analysis task15
SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news15
Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale15
Machine translation in society: insights from UK users14
AI2D-RST: a multimodal corpus of 1000 primary school science diagrams13
Roman Urdu toxic comment classification13
The impact of preprocessing on word embedding quality: a comparative study10
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks9
TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese8
Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection8
Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool8
The Electronic Corpus of 17th- and 18th-century Polish Texts8
Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach7
LDC-IL: The Indian repository of resources for language technology7
Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology7
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian6
Commonsense based text mining on urban policy6
Resources for Turkish natural language processing: A critical survey6
SetembroBR: a social media corpus for depression and anxiety disorder prediction6
Investigating the role of swear words in abusive language detection tasks6
A large and evolving cognate database6
The robotic-surgery propositional bank6
Exploring the role of lexis and grammar for the stable identification of register in an unrestricted corpus of web documents6
Label modification and bootstrapping for zero-shot cross-lingual hate speech detection5
Towards alignment strategies in human-agent interactions based on measures of lexical repetitions5
Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system4
Labelling the past: data set creation and multi-label classification of Dutch archaeological excavation reports4
Representing variation in a spoken corpus of an endangered dialect: the case of Torlak4
A multi-source entity-level sentiment corpus for the financial domain: the FinLin corpus4
Content-free speech activity records: interviews with people with schizophrenia4
Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha4
Nonverbal communication with emojis in social media: dissociating hedonic intensity from frequency4
Detecting explicit lyrics: a case study in Italian music4
OLID-BR: offensive language identification dataset for Brazilian Portuguese4
NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese4
Finnish parliament ASR corpus4
Predicting lexical complexity in English texts: the Complex 2.0 dataset4
A Spanish dataset for reproducible benchmarked offline handwriting recognition4
Semi-automation of gesture annotation by machine learning and human collaboration3
Register identification from the unrestricted open Web using the Corpus of Online Registers of English3
Annotating affective dimensions in user-generated content3
LexO: an open-source system for managing OntoLex-Lemon resources3
The WASABI song corpus and knowledge graph for music lyrics analysis3
Making the most of comparable corpora in Neural Machine Translation: a case study3
Sentence boundary detection of various forms of Tunisian Arabic3
PRAUTOCAL corpus: a corpus for the study of Down syndrome prosodic aspects3
Unparalleled sarcasm: a framework of parallel deep LSTMs with cross activation functions towards detection and generation of sarcastic statements3
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus3
A semantics-aware approach for multilingual natural language inference3
DISCO PAL: Diachronic Spanish sonnet corpus with psychological and affective labels3
Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations3
TuLeD (Tupían lexical database): introducing a database of a South American language family3
Two languages, one treebank: building a Turkish–German code-switching treebank and its challenges3
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation3
Manipuri–English comparable corpus for cross-lingual studies2
Assessment of pragmatic abilities and cognitive substrates (APACS) brief remote: a novel tool for the rapid and tele-evaluation of pragmatic skills in Italian2
Corpora compilation for prosody-informed speech processing2
UHated: hate speech detection in Urdu language using transfer learning2
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre2
Live blog summarization2
The LRE Map: what does it tell us about the last decade of our field?2
A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals2
LanguageCrawl: a generic tool for building language models upon common Crawl2
Składnica: a constituency treebank of Polish harmonised with the Walenty valency dictionary2
CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese2
ArgRewrite V.2: an annotated argumentative revisions corpus2
Automatic generation of creative text in Portuguese: an overview2
Understanding conversational interaction in multiparty conversations: the EVA Corpus2
Jira: a Central Kurdish speech recognition system, designing and building speech corpus and pronunciation lexicon2
Two sepedi-english code-switched speech corpora2
MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish2
Broad coverage emotion annotation2
Redundancy and coverage aware enriched dragonfly-FL single document summarization2
Sense representations for Portuguese: experiments with sense embeddings and deep neural language models2
Multi-domain adaptation for named entity recognition with multi-aspect relevance learning2
Text complexity of open educational resources in Portuguese: mixing written and spoken registers in a multi-task approach2
Towards the benchmarking of question generation: introducing the Monserrate corpus2
Speech acts in the Dutch COVID-19 Press Conferences2
Spelling errors made by people with dyslexia2
Determinants of grader agreement: an analysis of multiple short answer corpora1
PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews1
The language of discrimination: assessing attention discrimination by Hungarian local governments1
An eye-tracking-with-EEG coregistration corpus of narrative sentences1
JWSAN: Japanese word similarity and association norm1
Automatic language identification: a case study of Pahari languages1
Infectious risk events and their novelty in event-based surveillance: new definitions and annotated corpus1
Universal Dependencies for Mandarin Chinese1
FullStop: punctuation and segmentation prediction for Dutch with transformers1
A morphologically annotated longitudinal corpus of spoken Czech child–adult interactions1
OMCD: Offensive Moroccan Comments Dataset1
A corpus of Schlieren photography of speech production: potential methodology to study aerodynamics of labial, nasal and vocalic processes1
Benchmark of public intent recognition services1
A rich task-oriented dialogue corpus in Vietnamese1
Using BERT models for breast cancer diagnosis from Turkish radiology reports1
adaptNMT: an open-source, language-agnostic development environment for neural machine translation1
Orthographic features for emotion classification in Chinese in informal short texts1
When MIPVU goes to no man’s land: a new language resource for hybrid, morpheme-based metaphor identification in Hungarian1
The semantically annotated corpus of Polish quantificational expressions1
Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning1
RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese1
NewsCom-TOX: a corpus of comments on news articles annotated for toxicity in Spanish1
TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus1
Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach1
Correction to: The LRE Map: what does it tell us about the last decade of our field?1
DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation1
The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond1
Develop corpora and methods for cross-lingual text reuse detection for English Urdu language pair at lexical, syntactical, and phrasal levels1
Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text1
Features in extractive supervised single-document summarization: case of Persian news1
Mining culture from professional discourse: a lexicon-based hybrid method1
Speech emotion recognition for the Urdu language1
Improving Arabic sentiment analysis across context-aware attention deep model based on natural language processing1
Building the VisSE Corpus of Spanish SignWriting1
CsFEVER and CTKFacts: acquiring Czech data for fact verification1
A benchmark dataset and evaluation methodology for Chinese zero pronoun translation1
The Visual Language Research Corpus (VLRC): an annotated corpus of comics from Asia, Europe, and the United States1
Correction to: Semi-automation of gesture annotation by machine learning and human collaboration1
Constructing a cross-document event coreference corpus for Dutch1
POMET: a corpus for poetic meter classification1
Identifying communicative functions in discourse with content types1
The Hmong Medical Corpus: a biomedical corpus for a minority language1
EventDNA: a dataset for Dutch news event extraction as a basis for news diversification1
Rant or rave: variation over time in the language of online reviews1
The limitations of irony detection in Dutch social media1
Linguistic annotation of Byzantine book epigrams1
Introducing the 3MT_French dataset to investigate the timing of public speaking judgements0
A corpus of English learners with Arabic and Hebrew backgrounds0
Data-driven dependency parsing of Vedic Sanskrit0
Manfred Stede and Jodi Schneider: Argumentation mining. Synthesis lectures on human language technologies, edited by Graeme Hirst0
Human-inspired computational models for European Portuguese: a review0
Computational approaches to Portuguese: introduction to the special issue0
Developing and testing syllabification systems for South African Sesotho0
Regionalized models for Spanish language variations based on Twitter0
A comparative evaluation for question answering over Greek texts by using machine translation and BERT0
Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language0
The DELAD initiative for sharing language resources on speech disorders0
Pragmatic evaluations of automated linguistic creativity0
Preservation of sentiment in machine translation of low-resource languages: a case study on Slovak movie subtitles0
A semi-supervised method to generate a persian dataset for suggestion classification0
DoSLex: automatic generation of all domain semantically rich sentiment lexicon0
A flexible tool for a qualia-enriched FrameNet: the FrameNet Brasil WebTool0
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata0
Conversion of the Spanish WordNet databases into a Prolog-readable format0
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect0
Semantic search as extractive paraphrase span detection0
Historical Portuguese corpora: a survey0
TIARA 2.0: an interactive tool for annotating discourse structure and text improvement0
Spoken Spanish PoS tagging: gold standard dataset0
The development of a labelled te reo Māori–English bilingual database for language technology0
Umigon-lexicon: rule-based model for interpretable sentiment analysis and factuality categorization0
The Reading Everyday Emotion Database (REED): a set of audio-visual recordings of emotions in music and language0
A new corpus of geolocated ASR transcripts from Germany0
Investigating interoperable event corpora: limitations of reusability of resources and portability of models0
"Approaches to sentiment analysis of Hungarian political news at the sentence level"0
A multilingual, multimodal dataset of aggression and bias: the ComMA dataset0
Beyond plain toxic: building datasets for detection of flammable topics and inappropriate statements0
Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection0
CINWA (database of terminology for cultivated plants in indigenous languages of northwestern South America): introducing a resource for research in ethnobiology, anthropology, historical linguistics, 0
Šolar, the developmental corpus of Slovene0
KurdiSent: a corpus for kurdish sentiment analysis0
The link between translation difficulty and the quality of machine translation: a literature review and empirical investigation0
A new methodology for automatic creation of concept maps of Turkish texts0
Blackfoot Words: a database of Blackfoot lexical forms0
Correction to: Universal Dependencies for Mandarin Chinese0
Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus0
A longitudinal multi-modal dataset for dementia monitoring and diagnosis0
Perspectivist approaches to natural language processing: a survey0
The Najdi Arabic Corpus: a new corpus for an underrepresented Arabic dialect0
Not all arguments are processed equally: a distributional model of argument complexity0
Chinese-DiMLex: a lexicon of Chinese discourse connectives0
From extended chunking to dependency parsing using traditional Arabic grammar0
Normalized dataset for Sanskrit word segmentation and morphological parsing0
Map Task Corpus of Heritage BCMS spoken by second-generation speakers in Switzerland0
Syntactic annotation for Portuguese corpora: standards, parsers, and search interfaces0
Correction to: Development and evaluation of an Urdu treebank (CLE-UTB) and a statistical parser0
Democratizing neural machine translation with OPUS-MT0
Slovenian parliamentary corpus siParl0
Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation models0
RUN-AS: a novel approach to annotate news reliability for disinformation detection0
PESTS: Persian_English cross lingual corpus for semantic textual similarity0
Construction of Amharic information retrieval resources and corpora0
VeLeRo: an inflected verbal lexicon of standard Romanian and a quantitative analysis of morphological predictability0
DILLo: an Italian lexical database for speech-language pathologists0
The C-ORAL-ESQ project: a corpus for the study of spontaneous speech of individuals with schizophrenia0
“You’ll be a nurse, my son!” Automatically assessing gender biases in autoregressive language models in French and Italian0
Which words are important?: an empirical study of Assamese sentiment analysis0
Between welcome culture and border fence0
Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches0
Evaluation of the Brazilian Portuguese version of linguistic inquiry and word count 2015 (BP-LIWC2015)0
Evaluation of a rule-based approach to automatic factual question generation using syntactic and semantic analysis0
Resources building for sentiment analysis of content disseminated by Tunisian medias in social networks0
NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links0
From greatest simplicity to full power0
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI0
SOLD: Sinhala offensive language dataset0
Language resources for clinical linguistics: introduction to the special issue0
Research on translation quality self-evaluation by expert translators: an empirical study0
Book Review: the Routledge Handbook of Translation and Ethics0
COLLIE: a broad-coverage ontology and lexicon of verbs in English0
How different is different? Systematically identifying distribution shifts and their impacts in NER datasets0
Correction to: Morphological analysis and disambiguation for Breton0
Large scale annotated dataset for code-mix abusive short noisy text0
Introduction to the Special Issue: Selected papers from LREC 20180
A sentiment corpus for the cryptocurrency financial domain: the CryptoLin corpus0
From LIMA to DeepLIMA: following a new path of interoperability0
Morphological analysis and disambiguation for Breton0
Entity normalization in a Spanish medical corpus using a UMLS-based lexicon: findings and limitations0
Exploratory Analysis of Rinconada Bikol Language-Nabua Text Corpus0
Hope speech detection in Spanish0
Annotation and evaluation of a dialectal Arabic sentiment corpus against benchmark datasets using transformers0
Lexical modeling for the development of Amharic automatic speech recognition systems0
Cross-linguistically consistent semantic and syntactic annotation of child-directed speech0
A tale of four parsers: methodological reflections on diagnostic evaluation and in-depth error analysis for meaning representation parsing0
Building a relevance feedback corpus for legal information retrieval in the real-case scenario of the Brazilian Chamber of Deputies0
Correction to: Two sepedi‑english code‑switched speech corpora0
Data augmentation strategies to improve text classification: a use case in smart cities0
Rei Miyata: controlled document authoring in a machine translation age0
EmoTwiCS: a corpus for modelling emotion trajectories in Dutch customer service dialogues on Twitter0
Brazilian Portuguese corpora for teaching and translation: the CoMET project0
The UAN Colombian co-speech gesture corpus0
CachacaNER: a dataset for named entity recognition in texts about the cachaça beverage0
TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis0
Spontaneous, controlled acts of reference between friends and strangers0
Analyzing learner language: the case of the Hebrew Learner Essay Corpus0
Design and construction of Guayaquil radio speech corpus (CHARG)0
Cantonese natural language processing in the transformers era: a survey and current challenges0
Usage disambiguation of Turkish discourse connectives0
BRISE-plandok: a German legal corpus of building regulations0
A study on methods for revising dependency treebanks: in search of gold0
Open source platform for Estonian speech transcription0
The Italian Roots in Australian Soil (IRIAS) multilingual speech corpus. Speech variation in two generations of Italo-Australians0
Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese0
A corpus of Persian literary text0
Studying word meaning evolution through incremental semantic shift detection0
Fine-tuning language models to recognize semantic relations0
SCTB-V2: the 2nd version of the Chinese treebank in the scientific domain0
A survey on geocoding: algorithms and datasets for toponym resolution0
Training and evaluation of vector models for Galician0
ArEntail: manually-curated Arabic natural language inference dataset from news headlines0
Automatic construction of direction-aware sentiment lexicon using direction-dependent words0
Correction: Cross-linguistically consistent semantic and syntactic annotation of child-directed speech0
Editorial: LRE updates0
Correction: The DELAD initiative for sharing language resources on speech disorders0
0.028920888900757