(The H4-Index of Language Resources and Evaluation is 13. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-07-01 to 2024-07-01.)
Resources and benchmark corpora for hate speech detection: a systematic review132
Machine translation systems and quality assessment: a systematic review51
DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text39
Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition22
A comparative evaluation and analysis of three generations of Distributional Semantic Models21
The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions21
The ParlaMint corpora of parliamentary proceedings17
A large English–Thai parallel corpus from the web and machine-generated text16
Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity16
Low resource language specific pre-processing and features for sentiment analysis task15
Automatic genre identification: a survey15
SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news15
Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale14