OOIR: Observatory of International Research

Papers

(The TQCC of Transactions of the Association for Computational Linguistics is 10. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-10-01 to 2025-10-01.)

Article	Citations
The Ethics of Automating Legal Actors	380
From Robustness to Improved Generalization and Calibration in Pre-trained Language Models	222
KEFT: Knowledge-Enhanced Fine-Tuning for Large Language Models in Domain-Specific Question Answering	181
DARE: Diverse Visual Question Answering with Robustness Evaluation	161
Transformers for Tabular Data Representation: A Survey of Models and Applications	154
State of What Art? A Call for Multi-Prompt LLM Evaluation	126
Cross-functional Analysis of Generalization in Behavioral Learning	90
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection	90
Segmentation-Free Streaming Machine Translation	87
Revisiting Meta-evaluation for Grammatical Error Correction	83
Erasure of Unaligned Attributes from Neural Representations	74
Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval	74
T 2 -NER: A Two-Stage Span-Based Framework for Unified Named Entity Recognition with Templates	70
A Survey of Text Games for Reinforcement Learning Informed by Natural Language	59
The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation	57
A Survey on Automated Fact-Checking	56
DEAR: Disentangled Event-Agnostic Representation Learning for Early Fake News Detection	52
Federated Learning for Exploiting Annotators’ Disagreements in Natural Language Processing	50
Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing	50
Benchmarking the Generation of Fact Checking Explanations	48
mtRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems	45
Learning More from Mixed Emotions: A Label Refinement Method for Emotion Recognition in Conversations	45
Learning English with Peppa Pig	43
Uncertainty Estimation and Reduction of Pre-trained Models for Text Regression	43
Context-Aware Machine Translation with Source Coreference Explanation	42

Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval	40
Do Multi-Document Summarization Models Synthesize?	36
Time-Aware Language Models as Temporal Knowledge Bases	36
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces	34
Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation	33
Compositional Evaluation on Japanese Textual Entailment and Similarity	33
Few-Shot Multilingual Open-Domain QA from Five Examples	31
To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human Translation	30
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art	29
Scientia Potentia Est—On the Role of Knowledge in Computational Argumentation	29
Morphology Without Borders: Clause-Level Morphology	28
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets	28
An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation	27
Communication Drives the Emergence of Language Universals in Neural Agents: Evidence from the Word-order/Case-marking Trade-off	26
ProoFVer: Natural Logic Theorem Proving for Fact Verification	25
Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models	25
Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition	25
True Few-Shot Learning with Prompts—A Real-World Perspective	23
PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains	22
Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet?	22
Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale	22
Template-based Abstractive Microblog Opinion Summarization	22
Questions Are All You Need to Train a Dense Passage Retriever	21
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance in Adaptation	20
Conformal Prediction for Natural Language Processing: A Survey	19
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models	19
InSCIt: Information-Seeking Conversations with Mixed-Initiative Interactions	19
Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis	18
Adapting to the Long Tail: A Meta-Analysis of Transfer Learning Research for Language Understanding Tasks	18
Prompt Contrastive Transformation: An Enhanced Strategy for Efficient Prompt Transfer in Natural Language Processing	18
Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models	15
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation	15
Navigating the Landscape of Hint Generation Research: From the Past to the Future	15
Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation	15
Retrieve What You Need: A Mutual Learning Framework for Open-domain Question Answering	15
OpenFact: Factuality Enhanced Open Knowledge Extraction	14
A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction	14
Efficient Long-Text Understanding with Short-Text Models	14
Sense-specific Historical Word Usage Generation	14
Neuron-level Interpretation of Deep NLP Models: A Survey	14
Interactive Machine Teaching by Labeling Rules and Instances	14
Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations	14
Pre-train, Prompt, and Recommendation: A Comprehensive Survey of Language Modeling Paradigm Adaptations in Recommender Systems	13
Addressing the Binning Problem in Calibration Assessment through Scalar Annotations	13
ABNIRML: Analyzing the Behavior of Neural IR Models	12
MENLI: Robust Evaluation Metrics from Natural Language Inference	12
Learning Fair Representations via Rate-Distortion Maximization	12
Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?	12
Explainable Abuse Detection as Intent Classification and Slot Filling	12
Is My Model Using the Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning	11

TaxoPro: A Plug-In LoRA-based Cross-Domain Method for Low-Resource Taxonomy Completion	11
Human Choice Prediction in Language-based Persuasion Games: Simulation-based Off-Policy Evaluation	11
Investigating Critical Period Effects in Language Acquisition through Neural Language Models	11
How “Real” is Your Real-Time Simultaneous Speech-to-Text Translation System?	10
Self-Rationalization in the Wild: A Large-scale Out-of-Distribution Evaluation on NLI-related tasks	10
Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation	10
Modeling Emotion Dynamics in Song Lyrics with State Space Models	10
Time-and-Space-Efficient Weighted Deduction	10
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark	10
Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization	10
NLP Security and Ethics, in the Wild	10
PaniniQA: Enhancing Patient Education Through Interactive Question Answering	10
TANQ: An Open Domain Dataset of Table Answered Questions	10