OOIR: Observatory of International Research

Papers

(The H4-Index of IEEE-ACM Transactions on Audio Speech and Language Processing is 39. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-11-01 to 2025-11-01.)

Article	Citations
Representation Learning With Hidden Unit Clustering for Low Resource Speech Applications	363
Decorrelation in Feedback Delay Networks	264
CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations	239
WDEA: The Structure and Semantic Fusion With Wasserstein Distance for Low-Resource Language Entity Alignment	158
$\mathcal {P}$owMix: A Versatile Regularizer for Multimodal Sentiment Analysis	140
Multi-Channel to Multi-Channel Noise Reduction and Reverberant Speech Preservation in Time-Varying Acoustic Scenes for Binaural Reproduction	119
Audio-Only Phonetic Segment Classification Using Embeddings Learned From Audio and Ultrasound Tongue Imaging Data	117
The Harmonic Shift Algorithm for Efficient Multi-Pitch Detection	110
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features	90
Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors	83
DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding	81
SBSim: A Sentence-BERT Similarity-Based Evaluation Metric for Indian Language Neural Machine Translation Systems	71
The VoxCeleb Speaker Recognition Challenge: A Retrospective	64
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization	63
Efficient Lightweight Speaker Verification With Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps	61
Envelope-Based Multichannel Noise Reduction for Cochlear Implant Applications	61
Improvement of Accent Classification Models Through Grad-Transfer From Spectrograms and Gradient-Weighted Class Activation Mapping	60
Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition	60
MO-Transformer: Extract High-Level Relationship Between Words for Neural Machine Translation	57
Inference Skipping for More Efficient Real-Time Speech Enhancement With Parallel RNNs	57
A User-Centric Approach for Deep Residual-Echo Suppression in Double-Talk	55
Attention-Based Speech Enhancement Using Human Quality Perception Modeling	54
Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network	52
Interpretable Multimodal Capsule Fusion	50
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space	49

Multi-Level Time-Frequency Bins Selection for Direction of Arrival Estimation Using a Single Acoustic Vector Sensor	48
Review of Methods for Automatic Speaker Verification	46
Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity	46
Towards Generating Diverse Audio Captions via Adversarial Training	45
AudioLM: A Language Modeling Approach to Audio Generation	44
Adaptive Multi-Domain Dialogue State Tracking on Spoken Conversations	43
Enhanced Multi-Domain Dialogue State Tracker With Second-Order Slot Interactions	43
Learning Discriminative Representations and Decision Boundaries for Open Intent Detection	43
Predicting Level-Dependent Changes in Concurrent Vowel Scores Using the 2D-CNN Models	42
Label-Correction Capsule Network for Hierarchical Text Classification	41
Complex-Domain Pitch Estimation Algorithm for Narrowband Speech Signals	41
Spherically Steerable Vector Differential Microphone Arrays	41
Implicit Self-Supervised Language Representation for Spoken Language Diarization	40
End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations	40
Integrated Syntactic and Semantic Tree for Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network	39