OOIR: Observatory of International Research

Papers

(The median citation count of Transactions of the Association for Computational Linguistics is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
Persona-Aware Alignment Framework for Personalized Dialogue Generation	868
Overcoming Source Object Grounding for Semantic Image Editing	317
From Robustness to Improved Generalization and Calibration in Pre-trained Language Models	299
Cross-functional Analysis of Generalization in Behavioral Learning	268
Segmentation-Free Streaming Machine Translation	152
The Ethics of Automating Legal Actors	113
M o N a C o : More Natural and Complex Question	111
How to Select Datapoints for Efficient Human Evaluation of NLG Models?	95
KEFT: Knowledge-Enhanced Fine-Tuning for Large Language Models in Domain-Specific Question Answering	93
State of What Art? A Call for Multi-Prompt LLM Evaluation	84
DARE: Diverse Visual Question Answering with Robustness Evaluation	82
Transformers for Tabular Data Representation: A Survey of Models and Applications	79
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection	76
Erasure of Unaligned Attributes from Neural Representations	70
Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval	68
T 2 -NER: A Two-Stage Span-Based Framework for Unified Named Entity Recognition with Templates	68
Revisiting Meta-evaluation for Grammatical Error Correction	66
Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing	63
Context-Aware Machine Translation with Source Coreference Explanation	59
Benchmarking the Generation of Fact Checking Explanations	54
Investigating Adversarial Trigger Transfer in Large Language Models	47
Do Multi-Document Summarization Models Synthesize?	47
Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval	45
Federated Learning for Exploiting Annotators’ Disagreements in Natural Language Processing	44
mtRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems	43

Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation	40
Learning More from Mixed Emotions: A Label Refinement Method for Emotion Recognition in Conversations	40
DEAR: Disentangled Event-Agnostic Representation Learning for Early Fake News Detection	38
Few-Shot Multilingual Open-Domain QA from Five Examples	36
Adversarial Defense without Adversarial Defense : Enhancing Language Model Robustness via Instance-level Principal Component Removal	35
Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation	32
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art	31
PsyMem: Fine-grained Psychological Alignment and Explicit Memory Control for Advanced Role-Playing LLMs	29
Accelerating Language Model Workflows with Prompt Choreography	29
Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking	28
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance in Adaptation	27
To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human Translation	27
Communication Drives the Emergence of Language Universals in Neural Agents: Evidence from the Word-order/Case-marking Trade-off	26
Are Triggers Needed for Document-Level Event Extraction?	23
Aligned Probing: Relating Toxic Behavior and Model Internals	21
Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet?	21
Questions Are All You Need to Train a Dense Passage Retriever	19
Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models	19
A Systematic Assessment of Language Models with Linguistic Minimal Pairs in Chinese	19
An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation	19
Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis	18
InSCIt: Information-Seeking Conversations with Mixed-Initiative Interactions	18
Conformal Prediction for Natural Language Processing: A Survey	18
CorefInst: Leveraging LLMs for Multilingual Coreference Resolution	18
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation	18
Prompt Contrastive Transformation: An Enhanced Strategy for Efficient Prompt Transfer in Natural Language Processing	18
Navigating the Landscape of Hint Generation Research: From the Past to the Future	18
Retrieve What You Need: A Mutual Learning Framework for Open-domain Question Answering	16
Beyond One-Size-Fits-All : Inversion Learning for Highly Effective NLG Evaluation Prompts	16
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models	16
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation	15
Efficient Long-Text Understanding with Short-Text Models	14
OpenFact: Factuality Enhanced Open Knowledge Extraction	14
Localizing Factual Inconsistencies in Attributable Text Generation	14
Interactive Machine Teaching by Labeling Rules and Instances	14
Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models	14
Objectifying the Subjective: Cognitive Biases in Topic Interpretations	13
Sense-specific Historical Word Usage Generation	13
Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?	12
Addressing the Binning Problem in Calibration Assessment through Scalar Annotations	12
BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context	12
A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction	12
MENLI: Robust Evaluation Metrics from Natural Language Inference	12
Pre-train, Prompt, and Recommendation: A Comprehensive Survey of Language Modeling Paradigm Adaptations in Recommender Systems	12
NLP Security and Ethics, in the Wild	11
TaxoPro: A Plug-In LoRA-based Cross-Domain Method for Low-Resource Taxonomy Completion	11
Adding Chocolate to Mint : Mitigating Metric Interference in Machine Translation	11
Modeling Emotion Dynamics in Song Lyrics with State Space Models	10
Investigating Critical Period Effects in Language Acquisition through Neural Language Models	10
Human Choice Prediction in Language-based Persuasion Games: Simulation-based Off-Policy Evaluation	10

How “Real” is Your Real-Time Simultaneous Speech-to-Text Translation System?	10
Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs	9
PaniniQA: Enhancing Patient Education Through Interactive Question Answering	9
MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs	9
Self-Rationalization in the Wild: A Large-scale Out-of-Distribution Evaluation on NLI-related tasks	9
TANQ: An Open Domain Dataset of Table Answered Questions	9
Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation	9
Time-and-Space-Efficient Weighted Deduction	9
Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization	9
Towards More Realistic Extraction Attacks: An Adversarial Perspective	9
Patchwise Cooperative Game-based Interpretability Method for Large Vision-language Models	8
xcomet : Transparent Machine Translation Evaluation through Fine-grained Error Detection	8
Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding	8
Dissecting GraphRAG: A Modular Analysis of Knowledge Structuring for Factoid Question Answering	8
Visual Spatial Reasoning	8
Data-driven Parsing Evaluation for Child-Parent Interactions	8
Large Language Models Enable Few-Shot Clustering	8
Sub-Character Tokenization for Chinese Pretrained Language Models	8
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation	7
Know Your Limits: A Survey of Abstention in Large Language Models	7
Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement	7
Evaluating Transformer Models and Human Behaviors on Chinese Character Naming	7
Step-by-Step Unmasking for Parameter-Efficient Fine-Tuning of Large Language Models	7
Benchmarking Large Language Models for News Summarization	7
The Causal Influence of Grammatical Gender on Distributional Semantics	7
How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure	6
Abstractive Meeting Summarization: A Survey	6
QE4PE: Word-level Quality Estimation for Human Post-Editing	6
Visually Grounded Speech Models Have a Mutual Exclusivity Bias	6
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision	6
QAmeleon: Multilingual QA with Only 5 Examples	6
Scope Ambiguities in Large Language Models	6
A Cross-Linguistic Pressure for Uniform Information Density in Word Order	6
Can Authorship Representation Learning Capture Stylistic Features?	6
On the Effect of Instruction Tuning Loss on Generalization	6
CreoleVal: Multilingual Multitask Benchmarks for Creoles	6
Hallucinations in Large Multilingual Translation Models	6
Direct Speech Translation for Automatic Subtitling	6
Chinese Idiom Paraphrasing	6
Collective Human Opinions in Semantic Textual Similarity	5
Conformalizing Machine Translation Evaluation	5
Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences	5
STPar: A Structure-Aware Triaffine Parser for Screenplay Character Coreference Resolution	5
A Unifying Scheme for Extractive Content Selection Tasks	5
Comparing Humans and Large Language Models on an Experimental Protocol Inventory for Theory of Mind Evaluation (EPITOME)	5
Can Large Language Models Generalize Analogy Solving Like Children Can?	5
The Parallelism Tradeoff: Limitations of Log-Precision Transformers	5
Fine-tuning Large Language Models with Limited Data: A Survey and Practical Guide	5
Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023	5
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR	5
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends, and Metrics Analysis	5
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering	5
Cultural Adaptation of Recipes	5
Expectations over Unspoken Alternatives Predict Pragmatic Inferences	5
A Comparative Approach for Auditing Multilingual Phonetic Transcript Archives	5
mGPT: Few-Shot Learners Go Multilingual	5
Meta-Learning a Cross-lingual Manifold for Semantic Parsing	5
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization	5
Self-Consistency Falls Short! The Adverse Effects of Positional Bias on Long-Context Problems	5
Lost in the Middle: How Language Models Use Long Contexts	4
Cross-layer Attention Sharing for Pre-trained Large Language Models	4
KoBBQ: Korean Bias Benchmark for Question Answering	4
Hate Speech Classifiers Learn Normative Social Stereotypes	4
FoVer: First-Order Logic Verification for Natural Language Reasoning	4
Unleashing the True Potential of Sequence-to-Sequence Models for Sequence Tagging and Structure Parsing	4
Shared Lexical Items as Triggers of Code Switching	4
Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions	4
Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue Response Generation Models by Causal Discovery	4
ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation	4
Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts?	4
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing	4
Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?	4
How Much Semantic Information is Available in Large Language Model Tokens?	4
MAKE: Memory-Associated Knowledge Editing	4
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns	4
Naturalistic Causal Probing for Morpho-Syntax	4
Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions	4
Do Text Simplification Systems Preserve Meaning? A Human Evaluation via Reading Comprehension	4
Self-supervised Topic Taxonomy Discovery in the Box Embedding Space	3
Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?	3

Automatic Reviewers Fail to Detect Faulty Reasoning in Research Papers: A New Counterfactual Evaluation Framework	3
`PASTA`: A Dataset for Modeling PArticipant STAtes in Narratives	3
Decision-Oriented Dialogue for Human-AI Collaboration	3
CRVQ: Channel-Relaxed Vector Quantization for Extreme Compression of LLMs	3
Explicitly Representing Syntax Improves Sentence-to-Layout Prediction of Unexpected Situations	3
PiKGL: Leveraging Pruned Knowledge Graphs for Explainable Stance Detection	3
`Holmes` ⌕ A Benchmark to Assess the Linguistic Competence of Language Models	3
An Efficient Self-Supervised Cross-View Training For Sentence Embedding	3
How Often Are Errors in Natural Language Reasoning Due to Paraphrastic Variability?	3
Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies	3
FINCH: Prompt-guided Key-Value Cache Compression for Large Language Models	3
Reasoning over Public and Private Data in Retrieval-Based Systems	3
A Survey on Model Compression for Large Language Models	3
Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues	3
What Can String Probability Tell Us About Grammaticality?	3
Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design	3
The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations	3
Tracking Brand-Associated Polarity-Bearing Topics in User Reviews	3