IEEE-ACM Transactions on Audio Speech and Language Processing

Papers
(The H4-Index of IEEE-ACM Transactions on Audio Speech and Language Processing is 29. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-03-01 to 2024-03-01.)
ArticleCitations
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units559
Pre-Training With Whole Word Masking for Chinese BERT428
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech122
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning120
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation111
CTNet: Conversational Transformer Network for Emotion Recognition101
FSD50K: An Open Dataset of Human-Labeled Sound Events94
Dense CNN With Self-Attention for Time-Domain Speech Enhancement91
Wavesplit: End-to-End Speech Separation by Speaker Clustering88
Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement72
SoundStream: An End-to-End Neural Audio Codec56
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network53
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation50
Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks49
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition48
Overview and Evaluation of Sound Event Localization and Detection in DCASE 201945
Analyzing Multimodal Sentiment Via Acoustic- and Visual-LSTM With Channel-Aware Temporal Convolution Network44
FluentNet: End-to-End Detection of Stuttered Speech Disfluencies With Deep Learning42
Towards Model Compression for Deep Learning Based Speech Enhancement39
The Detection of Parkinson's Disease From Speech Using Voice Source Information39
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation38
Multiple Source Direction of Arrival Estimations Using Relative Sound Pressure Based MUSIC33
Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks33
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement32
Bridging Text and Video: A Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog31
Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data30
MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis30
Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations29
Steering Study of Linear Differential Microphone Arrays29
Audio-Visual Deep Neural Network for Robust Person Verification29
0.05646800994873