IEEE-ACM Transactions on Audio Speech and Language Processing

Papers
(The TQCC of IEEE-ACM Transactions on Audio Speech and Language Processing is 7. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-04-01 to 2024-04-01.)
ArticleCitations
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units577
Pre-Training With Whole Word Masking for Chinese BERT441
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech125
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning124
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation112
CTNet: Conversational Transformer Network for Emotion Recognition107
FSD50K: An Open Dataset of Human-Labeled Sound Events100
Dense CNN With Self-Attention for Time-Domain Speech Enhancement93
Wavesplit: End-to-End Speech Separation by Speaker Clustering89
Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement77
SoundStream: An End-to-End Neural Audio Codec57
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network55
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation53
Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks49
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition48
Overview and Evaluation of Sound Event Localization and Detection in DCASE 201945
Analyzing Multimodal Sentiment Via Acoustic- and Visual-LSTM With Channel-Aware Temporal Convolution Network44
FluentNet: End-to-End Detection of Stuttered Speech Disfluencies With Deep Learning43
The Detection of Parkinson's Disease From Speech Using Voice Source Information42
Towards Model Compression for Deep Learning Based Speech Enhancement41
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation39
Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks34
Multiple Source Direction of Arrival Estimations Using Relative Sound Pressure Based MUSIC33
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement32
Bridging Text and Video: A Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog31
Audio-Visual Deep Neural Network for Robust Person Verification30
Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data30
Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations30
MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis30
Steering Study of Linear Differential Microphone Arrays29
Expressive TTS Training With Frame and Style Reconstruction Loss29
A Unified Target-Oriented Sequence-to-Sequence Model for Emotion-Cause Pair Extraction28
Multi-Task Sequence Tagging for Emotion-Cause Pair Extraction Via Tag Distribution Refinement28
High-Resolution Piano Transcription With Pedals by Regressing Onset and Offset Times28
Block-Based High Performance CNN Architectures for Frame-Level Overlapping Speech Detection28
Nearest Kronecker Product Decomposition Based Linear-in-The-Parameters Nonlinear Filters28
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT27
Recent Progress in the CUHK Dysarthric Speech Recognition System27
AudioLM: A Language Modeling Approach to Audio Generation26
Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence26
Modified Magnitude-Phase Spectrum Information for Spoofing Detection26
Neural Spectrospatial Filtering25
Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition25
Zero-Shot Audio Classification Via Semantic Embeddings24
Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition24
Towards Duration Robust Weakly Supervised Sound Event Detection24
Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling24
Pretraining Techniques for Sequence-to-Sequence Voice Conversion24
Multimodal Sentiment Analysis With Two-Phase Multi-Task Learning23
Multi-View Speech Emotion Recognition Via Collective Relation Construction23
Towards Robust Speech Super-Resolution22
DUMA: Reading Comprehension With Transposition Thinking22
Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization22
Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition22
Encoder-Decoder Based Attractors for End-to-End Neural Diarization21
Optimal Output-Constrained Active Noise Control Based on Inverse Adaptive Modeling Leak Factor Estimate21
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild21
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features21
Kronecker Product Multichannel Linear Filtering for Adaptive Weighted Prediction Error-Based Speech Dereverberation21
DBT-Net: Dual-Branch Federative Magnitude and Phase Estimation With Attention-in-Attention Transformer for Monaural Speech Enhancement21
Deep Learning Based Real-Time Speech Enhancement for Dual-Microphone Mobile Phones20
A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter20
Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation20
Multimodal Emotion Recognition With Temporal and Semantic Consistency20
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training20
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech20
Neonatal Bowel Sound Detection Using Convolutional Neural Network and Laplace Hidden Semi-Markov Model19
Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition19
High-Order Pair-Wise Aspect and Opinion Terms Extraction With Edge-Enhanced Syntactic Graph Convolution19
Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis19
S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder19
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation18
Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain18
A Wave Digital Newton-Raphson Method for Virtual Analog Modeling of Audio Circuits with Multiple One-Port Nonlinearities18
Drone Audition: Sound Source Localization Using On-Board Microphones18
ISNet: Individual Standardization Network for Speech Emotion Recognition18
Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity18
Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks17
Exploiting Temporal Context in CNN Based Multisource DOA Estimation17
Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering17
PhaseDCN: A Phase-Enhanced Dual-Path Dilated Convolutional Network for Single-Channel Speech Enhancement17
Unsupervised Speech Enhancement Using Dynamical Variational Autoencoders17
Group Communication With Context Codec for Lightweight Source Separation17
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection16
Many-to-Many Voice Transformer Network16
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models16
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation16
Robust Q-Gradient Subband Adaptive Filter for Nonlinear Active Noise Control15
On the Design of Differential Kronecker Product Beamformers15
Fundamental Approaches to Robust Differential Beamforming With High Directivity Factors15
Desynchronization Attacks Resilient Watermarking Method Based on Frequency Singular Value Coefficient Modification15
Improving Chinese Named Entity Recognition by Large-Scale Syntactic Dependency Graph15
Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System15
Beamforming with Cube Microphone Arrays Via Kronecker Product Decompositions15
On the Robustness of the Superdirective Beamformer15
The Weighted Cross-Modal Attention Mechanism With Sentiment Prediction Auxiliary Task for Multimodal Sentiment Analysis15
Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points15
Determined BSS Based on Time-Frequency Masking and Its Application to Harmonic Vector Analysis15
Reinforcement Learning-Based Dialogue Guided Event Extraction to Exploit Argument Relations15
Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks15
Affine Projection Algorithm Over Acoustic Sensor Networks for Active Noise Control15
LSBert: Lexical Simplification Based on BERT14
Contrastive Information Extraction With Generative Transformer14
Room Acoustical Parameter Estimation From Room Impulse Responses Using Deep Neural Networks14
Deep Selective Memory Network With Selective Attention and Inter-Aspect Modeling for Aspect Level Sentiment Classification14
Speech Emotion Recognition Using Sequential Capsule Networks14
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech14
Cascaded Random Fourier Filter for Robust Nonlinear Active Noise Control14
Binaural Reproduction Based on Bilateral Ambisonics and Ear-Aligned HRTFs14
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network14
Double-Cross-Correlation Processing for Blind Sampling-Rate and Time-Offset Estimation13
Generating Images From Spoken Descriptions13
Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement13
Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis13
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation13
Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning13
Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification13
Identification of Room Acoustic Impulse Responses via Kronecker Product Decompositions13
Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain13
Knowing Where to Leverage: Context-Aware Graph Convolutional Network With an Adaptive Fusion Layer for Contextual Spoken Language Understanding13
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection13
On Improved Training of CNN for Acoustic Source Localisation13
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition13
Robust Voice Feature Selection Using Interval Type-2 Fuzzy AHP for Automated Diagnosis of Parkinson's Disease12
Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation12
Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization12
Neural Cascade Architecture With Triple-Domain Loss for Speech Enhancement12
A Novel Approach for Improved Noise Reduction Performance in Feed-Forward Active Noise Control Systems With (Loudspeaker) Saturation Non-Linearity in the Secondary Path12
Diverse Distractor Generation for Constructing High-Quality Multiple Choice Questions12
A Study on Reference Microphone Selection for Multi-Microphone Speech Enhancement12
Spatial Active Noise Control Based on Kernel Interpolation of Sound Field12
SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement12
Mixed Source Sound Field Translation for Virtual Binaural Application With Perceptual Validation12
Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task Learning12
Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation12
Deformable Self-Attention for Text Classification12
Deep Normalization for Speaker Vectors12
Proximal Normalized Subband Adaptive Filtering for Acoustic Echo Cancellation12
Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis12
Neural Cascade Architecture for Multi-Channel Acoustic Echo Suppression12
Enhancement of Noisy Reverberant Speech Using Polynomial Matrix Eigenvalue Decomposition12
Affine-Projection-Like Maximum Correntropy Criteria Algorithm for Robust Active Noise Control12
Detection of Multiple Steganography Methods in Compressed Speech Based on Code Element Embedding, Bi-LSTM and CNN With Attention Mechanisms12
Efficient Combinatorial Optimization for Word-Level Adversarial Textual Attack12
Robust Subband Adaptive Filter Algorithms-Based Mixture Correntropy and Application to Acoustic Echo Cancellation12
Generation of Personal Sound Zones With Physical Meaningful Constraints and Conjugate Gradient Method12
Improved Lite Audio-Visual Speech Enhancement12
End-to-End Speech Recognition: A Survey12
Improving Skip-Gram Embeddings Using BERT11
TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition11
Multiple Acoustic Source Localization in Microphone Array Networks11
Hierarchical Neighbor Propagation With Bidirectional Graph Attention Network for Relation Prediction11
Modeling Future Cost for Neural Machine Translation11
Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech11
A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy11
Speech Reconstruction With Reminiscent Sound Via Visual Voice Memory11
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning11
Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization11
Distributed Combined Acoustic Echo Cancellation and Noise Reduction in Wireless Acoustic Sensor and Actuator Networks11
Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs11
Differentiable Artificial Reverberation11
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture11
Sparsity-Based Audio Declipping Methods: Selected Overview, New Algorithms, and Large-Scale Evaluation11
Sarcasm Detection with Commonsense Knowledge11
TDOA-Based Robust Sound Source Localization With Sparse Regularization in Wireless Acoustic Sensor Networks10
A Time-Frequency Attention Module for Neural Speech Enhancement10
Language-Independent Approach for Automatic Computation of Vowel Articulation Features in Dysarthric Speech Assessment10
Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation10
End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy10
Inference Skipping for More Efficient Real-Time Speech Enhancement With Parallel RNNs10
Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models10
Wave Digital Modeling and Implementation of Nonlinear Audio Circuits With Nullors10
Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers10
Unsupervised Speech Segmentation and Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding10
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation10
Cognitive Load Estimation From Speech Commands to Simulated Aircraft10
Extracting and Predicting Word-Level Style Variations for Speech Synthesis10
Selective Listening by Synchronizing Speech With Lips10
Meta-AF: Meta-Learning for Adaptive Filters10
Low Latency Speech Enhancement for Hearing Aids Using Deep Filtering10
Scalable and Efficient Neural Speech Coding: A Hybrid Design10
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network10
Controlling Elevation and Azimuth Beamwidths With Concentric Circular Microphone Arrays10
PROTOTYPE-TO-STYLE: Dialogue Generation With Style-Aware Editing on Retrieval Memory10
Conditioned Source Separation for Musical Instrument Performances9
Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction9
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training9
Improved Speech Enhancement Considering Speech PSD Uncertainty9
Relation Extraction in Dialogues: A Deep Learning Model Based on the Generality and Specialty of Dialogue Text9
Mixture Representation Learning for Deep Speaker Embedding9
SBSim: A Sentence-BERT Similarity-Based Evaluation Metric for Indian Language Neural Machine Translation Systems9
A Joint Model for Named Entity Recognition With Sentence-Level Entity Type Attentions9
A Time-Domain Real-Valued Generalized Wiener Filter for Multi-Channel Neural Separation Systems9
Bayesian Learning for Deep Neural Network Adaptation9
Computation of Spherical Harmonic Representations of Source Directivity Based on the Finite-Distance Signature9
On the Design of Sparse Arrays With Frequency-Invariant Beam Pattern9
From LSAT: The Progress and Challenges of Complex Reasoning9
Reconfigurable Nonuniform Filter Bank for Hearing Aid Systems9
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting9
Domain-Shift Conditioning Using Adaptable Filtering Via Hierarchical Embeddings for Robust Chinese Spell Check9
Music Source Separation With Band-Split RNN9
USEV: Universal Speaker Extraction With Visual Cue9
Flexibly Focusing on Supporting Facts, Using Bridge Links, and Jointly Training Specialized Modules for Multi-Hop Question Answering8
Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation8
Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism8
Adaptive Convolution for Semantic Role Labeling8
Converting Foreign Accent Speech Without a Reference8
Acoustic Source Localization in the Circular Harmonic Domain Using Deep Learning Architecture8
Non-Autoregressive ASR Modeling Using Pre-Trained Language Models for Chinese Speech Recognition8
Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition8
Multi-Turn Dialogue Reading Comprehension With Pivot Turns and Knowledge8
Exemplar-Based Emotive Speech Synthesis8
Review and Arrange: Curriculum Learning for Natural Language Understanding8
Squared Sine Adaptive Algorithm and Its Performance Analysis8
Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization8
Directly Comparing the Listening Strategies of Humans and Machines8
Counterfactually Fair Automatic Speech Recognition8
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization8
Towards Energy-Preserving Natural Language Understanding With Spiking Neural Networks8
M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER8
U-Shaped Transformer With Frequency-Band Aware Attention for Speech Enhancement8
Localization-Driven Speech Enhancement in Noisy Multi-Speaker Hospital Environments Using Deep Learning and Meta Learning8
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency8
A Room Impulse Response Measurement Method Robust Towards Nonlinearities Based on Orthogonal Periodic Sequences8
ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding8
Nonlinear Spatial Filtering in Multichannel Speech Enhancement8
Regularized Phrase-Based Topic Model for Automatic Question Classification With Domain-Agnostic Class Labels8
Spatial Active Noise Control in Rooms Using Higher Order Sources8
Chinese Lexical Simplification8
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition8
DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays8
Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms7
Deep Learning Approaches in Topics of Singing Information Processing7
EfficientTDNN: Efficient Architecture Search for Speaker Recognition7
Bayesian Neural Network Language Modeling for Speech Recognition7
MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer With One Transformer VAE7
SIFTER: A Framework for Robust Rumor Detection7
Preordering Encoding on Transformer for Translation7
Decoupled Multiple Speaker Direction-of-Arrival Estimator Under Reverberant Environments7
The Temporal Limits Encoder as a Sound Coding Strategy for Bilateral Cochlear Implants7
Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet7
Word-Region Alignment-Guided Multimodal Neural Machine Translation7
Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition7
Adaptive Adapters: An Efficient Way to Incorporate BERT Into Neural Machine Translation7
Reference Knowledgeable Network for Machine Reading Comprehension7
Generalized Hyperbolic Tangent Based Random Fourier Conjugate Gradient Filter for Nonlinear Active Noise Control7
Monaural Speech Separation Using Speaker Embedding From Preliminary Separation7
A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition7
A Graph-to-Sequence Learning Framework for Summarizing Opinionated Texts7
General Robust Subband Adaptive Filtering: Algorithms and Applications7
0.060878992080688