IEEE-ACM Transactions on Audio Speech and Language Processing

Papers
(The TQCC of IEEE-ACM Transactions on Audio Speech and Language Processing is 5. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-02-01 to 2025-02-01.)
ArticleCitations
Statistical Analysis for Speaker Recognition Evaluation With Data Dependence and Three Score Distributions167
Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models139
Interrelate Training and Clustering for Online Speaker Diarization103
One General Teacher for Multi-Data Multi-Task: A New Knowledge Distillation Framework for Discourse Relation Analysis55
Verification on Head-Related Transfer Functions of a Snowman Model Simulated Using the Finite-Difference Time-Domain Method50
Exploring Interactive and Contrastive Relations for Nested Named Entity Recognition48
Cross Domain Optimization for Speech Enhancement: Parallel or Cascade?47
DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog41
Harmonic Detection From Noisy Speech With Auditory Frame Gain for Intelligibility Enhancement40
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition38
SBSim: A Sentence-BERT Similarity-Based Evaluation Metric for Indian Language Neural Machine Translation Systems35
Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy33
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning32
Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation32
FastMVAE2: On Improving and Accelerating the Fast Variational Autoencoder-Based Source Separation Algorithm for Determined Mixtures32
Rethinking Textual Adversarial Defense for Pre-Trained Language Models31
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning30
Dynamic Convolutional Neural Networks as Efficient Pre-Trained Audio Models28
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model28
DetTrans: A Lightweight Framework to Detect and Translate Noisy Inputs Simultaneously27
Operation-Augmented Numerical Reasoning for Question Answering27
Measuring the Structural Complexity of Music: From Structural Segmentations to the Automatic Evaluation of Models for Music Generation27
CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations27
Exploring Multi-Stage Information Interactions for Multi-Source Neural Machine Translation26
List of Reviewers25
Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech25
Disentanglement in a GAN for Unconditional Speech Synthesis25
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification24
Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the 1000-Hour Scale24
Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis24
Joint Maximum Likelihood Estimation of Microphone Array Parameters for a Reverberant Single Source Scenario23
Online Phase Reconstruction via DNN-Based Phase Differences Estimation22
Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition21
Amplitude Matching for Multizone Sound Field Control21
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition21
Transferable Latent of CNN-Based Selective Fixed-Filter Active Noise Control21
Empathetic Response Generation Based on Plug-and-Play Mechanism With Empathy Perturbation20
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization20
Task-Adaptive Feature Fusion for Generalized Few-Shot Relation Classification in an Open World Environment20
General Robust Subband Adaptive Filtering: Algorithms and Applications20
Decomposed Meta-Learning for Few-Shot Sequence Labeling20
RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging19
CircularE: A Complex Space Circular Correlation Relational Model for Link Prediction in Knowledge Graph Embedding19
Rotor Noise-Aware Noise Covariance Matrix Estimation for Unmanned Aerial Vehicle Audition19
Cross-Domain Aspect-Based Sentiment Classification With Tripartite Graph Modeling19
Interpretable Multimodal Capsule Fusion19
Low-Latency Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks19
Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech18
Adaptive Pre-Training and Collaborative Fine-Tuning: A Win-Win Strategy to Improve Review Analysis Tasks18
Acoustic Imaging With Circular Microphone Array: A New Approach for Sound Field Analysis18
Enhanced Acoustic Howling Suppression via Hybrid Kalman Filter and Deep Learning Models18
Multi-Channel to Multi-Channel Noise Reduction and Reverberant Speech Preservation in Time-Varying Acoustic Scenes for Binaural Reproduction17
Improvement of Accent Classification Models Through Grad-Transfer From Spectrograms and Gradient-Weighted Class Activation Mapping17
Towards Generating Diverse Audio Captions via Adversarial Training17
CL-XABSA: Contrastive Learning for Cross-Lingual Aspect-Based Sentiment Analysis17
Three-Dimensional Room Transfer Function Parameterization Based on Multiple Concentric Planar Circular Arrays17
Grid-Based Decimation for Wavelet Transforms With Stably Invertible Implementation16
A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition16
Exploit Feature and Relation Hierarchy for Relation Extraction16
Low-Latency Active Noise Control Using Attentive Recurrent Network16
Enhanced Speaker-Aware Multi-Party Multi-Turn Dialogue Comprehension15
Multi-Level Interaction Based Knowledge Graph Completion15
DBSA-Net: Dual Branch Self-Attention Network for Underwater Acoustic Signal Denoising15
Gradformer: A Framework for Multi-Aspect Multi-Granularity Pronunciation Assessment14
Specialized Mathematical Solving by a Step-By-Step Expression Chain Generation14
Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds14
The VoxCeleb Speaker Recognition Challenge: A Retrospective14
Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems14
Towards Maximizing a Perceptual Sweet Spot for Spatial Sound With Loudspeakers14
Towards Comprehensive Subgroup Performance Analysis in Speech Models14
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture14
Block-Based Perceptually Adaptive Sound Zones With Reproduction Error Constraints14
Multilingual Customized Keyword Spotting Using Similar-Pair Contrastive Learning14
Representation Learning With Hidden Unit Clustering for Low Resource Speech Applications13
RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition13
Large-Scale Unsupervised Audio Pre-Training for Video-to-Speech Synthesis13
Handover QG: Question Generation by Decoder Fusion and Reinforcement Learning13
Adjustable Coherent-to-Diffuse Power Estimator for Binaural Speech Enhancement in Multi-Talker Environments13
Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information13
An AST Structure Enhanced Decoder for Code Generation13
A Two-Stage Audio-Visual Fusion Piano Transcription Model Based on the Attention Mechanism13
Attention-Based Speech Enhancement Using Human Quality Perception Modeling13
JMS-QA: A Joint Hierarchical Architecture for Mental Health Question Answering12
A Novel Unsupervised Approach for Cross-Lingual Word Alignment in Low Isomorphic Embedding Spaces12
RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems12
Enhancing Semantic Relation Classification With Shortest Dependency Path Reasoning12
Envelope-Based Multichannel Noise Reduction for Cochlear Implant Applications12
NoiER: An Approach for Training More Reliable Fine-Tuned Downstream Task Models12
A User-Centric Approach for Deep Residual-Echo Suppression in Double-Talk12
Spatially Selective Speaker Separation Using a DNN With a Location Dependent Feature Extraction12
Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network12
Review of Methods for Automatic Speaker Verification12
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement12
Model-Agnostic Meta-Learning for Fast Text-Dependent Speaker Embedding Adaptation12
STN4DST: A Scalable Dialogue State Tracking Based on Slot Tagging Navigation12
Triple Alliance Prototype Orthotist Network for Long-Tailed Multi-Label Text Classification12
Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement12
Extractive Dialogue Summarization Without Annotation Based on Distantly Supervised Machine Reading Comprehension in Customer Service12
Bayesian Estimation of PLDA in the Presence of Noisy Training Labels, With Applications to Speaker Verification11
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space11
Acoustic Source Localization in the Circular Harmonic Domain Using Deep Learning Architecture11
When Speaker Recognition Meets Noisy Labels: Optimizations for Front-Ends and Back-Ends11
Audio-Visual Based Online Multi-Source Separation11
Sparsity-Promoting Affine Projection Algorithm With Periodically-Updated Gain Matrix and Its Performance Analysis11
Generating Rational Commonsense Knowledge-Aware Dialogue Responses With Channel-Aware Knowledge Fusing Network11
Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments11
Neural Fusion for Voice Cloning10
Decorrelation in Feedback Delay Networks10
Dynamic Prompt-Driven Zero-Shot Relation Extraction10
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement10
Exploring the Role of Language Families for Building Indic Speech Synthesisers10
En-HACN: Enhancing Hybrid Architecture With Fast Attention and Capsule Network for End-to-end Speech Recognition10
Retrieve-and-Edit Domain Adaptation for End2End Aspect Based Sentiment Analysis10
Dual Microphone Speech Enhancement Based on Statistical Modeling of Interchannel Phase Difference10
High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks10
Lightweight Speaker Verification Using Transformation Module With Feature Partition and Fusion10
A Universal Filter Approximation of Edge Diffraction for Geometrical Acoustics10
Improved Transformer With Multi-Head Dense Collaboration10
Bilateral Cochlear Implant Processing of Coding Strategies With CCi-MOBILE, an Open-Source Research Platform10
Multi-Level Time-Frequency Bins Selection for Direction of Arrival Estimation Using a Single Acoustic Vector Sensor10
DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding10
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification9
Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria9
Iterative Semantic Transformer by Greedy Distillation for Community Question Answering9
Speaker Anonymization Using Orthogonal Householder Neural Network9
SANet: A Compressed Speech Encoder and Steganography Algorithm Independent Steganalysis Deep Neural Network9
End-to-End Lip-Reading Without Large-Scale Data9
From LSAT: The Progress and Challenges of Complex Reasoning9
SIFTER: A Framework for Robust Rumor Detection9
Zero-Note Samba: Self-Supervised Beat Tracking9
Improving Seq2Seq TTS Frontends With Transcribed Speech Audio9
Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification9
Analysis of the Frequency Interference in the Narrowband Active Noise Control System9
Prompt-Based Prototypical Framework for Continual Relation Extraction9
Drone Audition: Sound Source Localization Using On-Board Microphones9
Selective Listening by Synchronizing Speech With Lips9
Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity9
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors9
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning9
The Harmonic Shift Algorithm for Efficient Multi-Pitch Detection9
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition9
Compression of Higher-Order Ambisonic Signals Using Directional Audio Coding9
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks9
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis9
Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis8
A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition8
Document-Level Relation Extraction With Context Guided Mention Integration and Inter-Pair Reasoning8
WDEA: The Structure and Semantic Fusion With Wasserstein Distance for Low-Resource Language Entity Alignment8
Low-Rank Room Impulse Response Estimation8
Spatial Analysis and Synthesis Methods: Subjective and Objective Evaluations Using Various Microphone Arrays in the Auralization of a Critical Listening Room8
Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors8
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks8
Affine-Projection-Like Maximum Correntropy Criteria Algorithm for Robust Active Noise Control8
A General Unfolding Speech Enhancement Method Motivated by Taylor's Theorem8
Inference Skipping for More Efficient Real-Time Speech Enhancement With Parallel RNNs8
AudioLM: A Language Modeling Approach to Audio Generation8
Wavelet Multiresolution Analysis Based Speech Emotion Recognition System Using 1D CNN LSTM Networks8
MVT: Chinese NER Using Multi-View Transformer8
Audio-Only Phonetic Segment Classification Using Embeddings Learned From Audio and Ultrasound Tongue Imaging Data8
Occlusion Effect Cancellation in Headphones and Hearing Devices—The Sister of Active Noise Cancellation8
Hyperbolic Pre-Trained Language Model8
Learning Speech Emotion Representations in the Quaternion Domain8
EmoInt-Trans: A Multimodal Transformer for Identifying Emotions and Intents in Social Conversations8
Enhancing Multimodal Entity and Relation Extraction With Variational Information Bottleneck8
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features8
Kronecker Product Multichannel Linear Filtering for Adaptive Weighted Prediction Error-Based Speech Dereverberation8
$\mathcal {P}$owMix: A Versatile Regularizer for Multimodal Sentiment Analysis8
Improving Chinese Named Entity Recognition by Large-Scale Syntactic Dependency Graph8
Abstractive Financial News Summarization via Transformer-BiLSTM Encoder and Graph Attention-Based Decoder8
Uncertainty-Driven Knowledge Distillation for Language Model Compression8
Masking Hierarchical Tokens for Underwater Acoustic Target Recognition With Self-Supervised Learning8
A Two-Stage Approach to Quality Restoration of Bone-Conducted Speech8
Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation8
On Ambisonic Source Separation With Spatially Informed Non-Negative Tensor Factorization8
Learning Discriminative Representations and Decision Boundaries for Open Intent Detection8
Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN8
Differentiable Artificial Reverberation7
Implicit Self-Supervised Language Representation for Spoken Language Diarization7
SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System7
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems7
Automatic Detection of Speech Sound Disorder in Cantonese-Speaking Pre-School Children7
E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications7
FTDKD: Frequency-Time Domain Knowledge Distillation for Low-Quality Compressed Audio Deepfake Detection7
Efficient Lightweight Speaker Verification With Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps7
Cacophony: An Improved Contrastive Audio-Text Model7
The Impact of Silence on Speech Anti-Spoofing7
Improving Mispronunciation Detection Using Speech Reconstruction7
Learning Label-Adaptive Representation for Large-Scale Multi-Label Text Classification7
An Efficient Algorithm for Segmenting Quasi-Periodic Digital Signals Into Pseudo Cycles: Application in Lossy Audio Compression7
Syntax-Augmented Hierarchical Interactive Encoder for Zero-Shot Cross-Lingual Information Extraction7
Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation7
Proper Error Estimation and Calibration for Attention-Based Encoder-Decoder Models7
Spherically Steerable Vector Differential Microphone Arrays7
Sound Field Estimation Based on Physics-Constrained Kernel Interpolation Adapted to Environment7
How to Train Your Ears: Auditory-Model Emulation for Large-Dynamic-Range Inputs and Mild-to-Severe Hearing Losses7
KGAgent: Learning a Deep Reinforced Agent for Keyphrase Generation7
Neural Coupled Sequence Labeling for Heterogeneous Annotation Conversion7
MO-Transformer: Extract High-Level Relationship Between Words for Neural Machine Translation7
FxLMS/F Based Tap Decomposed Adaptive Filter for Decentralized Active Noise Control System7
CLAPSep: Leveraging Contrastive Pre-Trained Model for Multi-Modal Query-Conditioned Target Sound Extraction7
An Interpretable Deep Mutual Information Curriculum Metric for a Robust and Generalized Speech Emotion Recognition System7
Interpreting Intermediate Convolutional Layers of Generative CNNs Trained on Waveforms6
Entity Resolution in Situated Dialog With Unimodal and Multimodal Transformers6
Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks6
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild6
Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification6
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR6
Constant-Beamwidth Beamforming With Nonuniform Concentric Ring Arrays6
IEEE Signal Processing Society Information6
Weighted Loudspeaker Placement Method for Sound Field Reproduction6
Leveraging Diverse Modeling Contexts With Collaborating Learning for Neural Machine Translation6
Domain-Slot Relationship Modeling Using a Pre-Trained Language Encoder for Multi-Domain Dialogue State Tracking6
Interpretable Spectrum Transformation Attacks to Speaker Recognition Systems6
Harmonic Attention for Monaural Speech Enhancement6
Music Source Separation With Band-Split RNN6
Multi-Task Attentive Residual Networks for Argument Mining6
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition6
Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses6
Towards Unified Multi-Domain Machine Translation With Mixture of Domain Experts6
EC-ANC: Edge Case-Enhanced Active Noise Cancellation for True Wireless Stereo Earbuds6
Blind Identification of Binaural Room Impulse Responses From Smart Glasses6
Alleviating Exposure Bias for Neural Machine Translation via Contextual Augmentation and Self Distillation6
Adaptive Multi-Domain Dialogue State Tracking on Spoken Conversations6
Enhancing Low-Resource NLP by Consistency Training With Data and Model Perturbations6
Word-Region Alignment-Guided Multimodal Neural Machine Translation6
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre6
StarSum: A Star Architecture Based Model for Extractive Summarization6
Predicting Level-Dependent Changes in Concurrent Vowel Scores Using the 2D-CNN Models6
Complex Question Enhanced Transfer Learning for Zero-Shot Joint Information Extraction6
Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition5
Mixture Representation Learning for Deep Speaker Embedding5
ISNet: Individual Standardization Network for Speech Emotion Recognition5
Joint Dual Learning With Mutual Information Maximization for Natural Language Understanding and Generation in Dialogues5
Emotion Prediction Oriented Method With Multiple Supervisions for Emotion-Cause Pair Extraction5
Modal Contrastive Learning Based End-to-End Text Image Machine Translation5
Differential Beamforming From a Geometric Perspective5
EfficientTDNN: Efficient Architecture Search for Speaker Recognition5
Modularized Pre-Training for End-to-End Task-Oriented Dialogue5
A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition5
Distributed Microphone Array Localization Problem via SDP-SOCP Method5
Data-Centric Methods for Environmental Sound Classification With Limited Labels5
Fundamental Approaches to Robust Differential Beamforming With High Directivity Factors5
Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation5
EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion5
Modeling Speech Structure to Improve T-F Masks for Speech Enhancement and Recognition5
Learning With an Open Horizon in Ever-Changing Dialogue Circumstances5
Integrated Syntactic and Semantic Tree for Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network5
Sparse DNN Model for Frequency Expanding of Higher Order Ambisonics Encoding Process5
Design of 2D and 3D Differential Microphone Arrays With a Multistage Framework5
Selective-Memory Meta-Learning With Environment Representations for Sound Event Localization and Detection5
0.15491700172424