IEEE-ACM Transactions on Audio Speech and Language Processing

Papers
(The median citation count of IEEE-ACM Transactions on Audio Speech and Language Processing is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-03-01 to 2024-03-01.)
ArticleCitations
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units559
Pre-Training With Whole Word Masking for Chinese BERT428
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech122
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning120
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation111
CTNet: Conversational Transformer Network for Emotion Recognition101
FSD50K: An Open Dataset of Human-Labeled Sound Events94
Dense CNN With Self-Attention for Time-Domain Speech Enhancement91
Wavesplit: End-to-End Speech Separation by Speaker Clustering88
Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement72
SoundStream: An End-to-End Neural Audio Codec56
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network53
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation50
Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks49
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition48
Overview and Evaluation of Sound Event Localization and Detection in DCASE 201945
Analyzing Multimodal Sentiment Via Acoustic- and Visual-LSTM With Channel-Aware Temporal Convolution Network44
FluentNet: End-to-End Detection of Stuttered Speech Disfluencies With Deep Learning42
The Detection of Parkinson's Disease From Speech Using Voice Source Information39
Towards Model Compression for Deep Learning Based Speech Enhancement39
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation38
Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks33
Multiple Source Direction of Arrival Estimations Using Relative Sound Pressure Based MUSIC33
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement32
Bridging Text and Video: A Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog31
Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data30
MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis30
Audio-Visual Deep Neural Network for Robust Person Verification29
Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations29
Steering Study of Linear Differential Microphone Arrays29
High-Resolution Piano Transcription With Pedals by Regressing Onset and Offset Times28
A Unified Target-Oriented Sequence-to-Sequence Model for Emotion-Cause Pair Extraction28
Expressive TTS Training With Frame and Style Reconstruction Loss28
Multi-Task Sequence Tagging for Emotion-Cause Pair Extraction Via Tag Distribution Refinement27
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT27
Nearest Kronecker Product Decomposition Based Linear-in-The-Parameters Nonlinear Filters27
Block-Based High Performance CNN Architectures for Frame-Level Overlapping Speech Detection26
Modified Magnitude-Phase Spectrum Information for Spoofing Detection26
Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence26
AudioLM: A Language Modeling Approach to Audio Generation25
Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition25
Neural Spectrospatial Filtering25
Towards Duration Robust Weakly Supervised Sound Event Detection24
Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling24
Pretraining Techniques for Sequence-to-Sequence Voice Conversion24
Recent Progress in the CUHK Dysarthric Speech Recognition System24
Zero-Shot Audio Classification Via Semantic Embeddings23
Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition23
Multi-View Speech Emotion Recognition Via Collective Relation Construction23
Towards Robust Speech Super-Resolution22
Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition22
Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization21
Encoder-Decoder Based Attractors for End-to-End Neural Diarization21
DUMA: Reading Comprehension With Transposition Thinking21
Multimodal Sentiment Analysis With Two-Phase Multi-Task Learning20
Optimal Output-Constrained Active Noise Control Based on Inverse Adaptive Modeling Leak Factor Estimate20
DBT-Net: Dual-Branch Federative Magnitude and Phase Estimation With Attention-in-Attention Transformer for Monaural Speech Enhancement20
S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder19
Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition19
Deep Learning Based Real-Time Speech Enhancement for Dual-Microphone Mobile Phones19
Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis19
Neonatal Bowel Sound Detection Using Convolutional Neural Network and Laplace Hidden Semi-Markov Model19
Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation19
Kronecker Product Multichannel Linear Filtering for Adaptive Weighted Prediction Error-Based Speech Dereverberation19
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features18
A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter18
Drone Audition: Sound Source Localization Using On-Board Microphones18
A Wave Digital Newton-Raphson Method for Virtual Analog Modeling of Audio Circuits with Multiple One-Port Nonlinearities18
Multimodal Emotion Recognition With Temporal and Semantic Consistency18
Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain18
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training18
High-Order Pair-Wise Aspect and Opinion Terms Extraction With Edge-Enhanced Syntactic Graph Convolution18
Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity18
Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks17
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech17
Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering17
Group Communication With Context Codec for Lightweight Source Separation17
Unsupervised Speech Enhancement Using Dynamical Variational Autoencoders17
Exploiting Temporal Context in CNN Based Multisource DOA Estimation16
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation16
ISNet: Individual Standardization Network for Speech Emotion Recognition16
PhaseDCN: A Phase-Enhanced Dual-Path Dilated Convolutional Network for Single-Channel Speech Enhancement16
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection16
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild16
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation16
Many-to-Many Voice Transformer Network16
Determined BSS Based on Time-Frequency Masking and Its Application to Harmonic Vector Analysis15
Reinforcement Learning-Based Dialogue Guided Event Extraction to Exploit Argument Relations15
Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points15
Fundamental Approaches to Robust Differential Beamforming With High Directivity Factors15
Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System15
Robust Q-Gradient Subband Adaptive Filter for Nonlinear Active Noise Control15
On the Robustness of the Superdirective Beamformer15
On the Design of Differential Kronecker Product Beamformers15
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models14
Affine Projection Algorithm Over Acoustic Sensor Networks for Active Noise Control14
Binaural Reproduction Based on Bilateral Ambisonics and Ear-Aligned HRTFs14
Improving Chinese Named Entity Recognition by Large-Scale Syntactic Dependency Graph14
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network14
Deep Selective Memory Network With Selective Attention and Inter-Aspect Modeling for Aspect Level Sentiment Classification14
Desynchronization Attacks Resilient Watermarking Method Based on Frequency Singular Value Coefficient Modification14
Beamforming with Cube Microphone Arrays Via Kronecker Product Decompositions14
Room Acoustical Parameter Estimation From Room Impulse Responses Using Deep Neural Networks14
The Weighted Cross-Modal Attention Mechanism With Sentiment Prediction Auxiliary Task for Multimodal Sentiment Analysis14
Speech Emotion Recognition Using Sequential Capsule Networks14
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech14
Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks14
LSBert: Lexical Simplification Based on BERT13
Identification of Room Acoustic Impulse Responses via Kronecker Product Decompositions13
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition13
Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning13
Cascaded Random Fourier Filter for Robust Nonlinear Active Noise Control13
On Improved Training of CNN for Acoustic Source Localisation13
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation13
Knowing Where to Leverage: Context-Aware Graph Convolutional Network With an Adaptive Fusion Layer for Contextual Spoken Language Understanding13
Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification13
Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement13
Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain13
Enhancement of Noisy Reverberant Speech Using Polynomial Matrix Eigenvalue Decomposition12
Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization12
A Novel Approach for Improved Noise Reduction Performance in Feed-Forward Active Noise Control Systems With (Loudspeaker) Saturation Non-Linearity in the Secondary Path12
Contrastive Information Extraction With Generative Transformer12
Robust Voice Feature Selection Using Interval Type-2 Fuzzy AHP for Automated Diagnosis of Parkinson's Disease12
Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation12
Double-Cross-Correlation Processing for Blind Sampling-Rate and Time-Offset Estimation12
Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation12
Deep Normalization for Speaker Vectors12
Improved Lite Audio-Visual Speech Enhancement12
Neural Cascade Architecture for Multi-Channel Acoustic Echo Suppression12
Mixed Source Sound Field Translation for Virtual Binaural Application With Perceptual Validation12
Detection of Multiple Steganography Methods in Compressed Speech Based on Code Element Embedding, Bi-LSTM and CNN With Attention Mechanisms12
Generating Images From Spoken Descriptions12
Generation of Personal Sound Zones With Physical Meaningful Constraints and Conjugate Gradient Method12
A Study on Reference Microphone Selection for Multi-Microphone Speech Enhancement12
Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis12
Efficient Combinatorial Optimization for Word-Level Adversarial Textual Attack11
TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition11
Diverse Distractor Generation for Constructing High-Quality Multiple Choice Questions11
Multiple Acoustic Source Localization in Microphone Array Networks11
SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement11
Sparsity-Based Audio Declipping Methods: Selected Overview, New Algorithms, and Large-Scale Evaluation11
Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task Learning11
Deformable Self-Attention for Text Classification11
Robust Subband Adaptive Filter Algorithms-Based Mixture Correntropy and Application to Acoustic Echo Cancellation11
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection11
Speech Reconstruction With Reminiscent Sound Via Visual Voice Memory11
Differentiable Artificial Reverberation11
Hierarchical Neighbor Propagation With Bidirectional Graph Attention Network for Relation Prediction11
Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech11
Neural Cascade Architecture With Triple-Domain Loss for Speech Enhancement11
A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy11
Improving Skip-Gram Embeddings Using BERT11
Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis11
Spatial Active Noise Control Based on Kernel Interpolation of Sound Field11
Modeling Future Cost for Neural Machine Translation11
Distributed Combined Acoustic Echo Cancellation and Noise Reduction in Wireless Acoustic Sensor and Actuator Networks11
Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs10
A Time-Frequency Attention Module for Neural Speech Enhancement10
Proximal Normalized Subband Adaptive Filtering for Acoustic Echo Cancellation10
Scalable and Efficient Neural Speech Coding: A Hybrid Design10
Cognitive Load Estimation From Speech Commands to Simulated Aircraft10
Affine-Projection-Like Maximum Correntropy Criteria Algorithm for Robust Active Noise Control10
Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models10
Sarcasm Detection with Commonsense Knowledge10
Meta-AF: Meta-Learning for Adaptive Filters10
Unsupervised Speech Segmentation and Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding10
Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation10
End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy10
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture10
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning10
Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization10
Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers10
TDOA-Based Robust Sound Source Localization With Sparse Regularization in Wireless Acoustic Sensor Networks10
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation10
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network10
End-to-End Speech Recognition: A Survey10
Inference Skipping for More Efficient Real-Time Speech Enhancement With Parallel RNNs10
Extracting and Predicting Word-Level Style Variations for Speech Synthesis10
PROTOTYPE-TO-STYLE: Dialogue Generation With Style-Aware Editing on Retrieval Memory10
USEV: Universal Speaker Extraction With Visual Cue9
On the Design of Sparse Arrays With Frequency-Invariant Beam Pattern9
Conditioned Source Separation for Musical Instrument Performances9
SBSim: A Sentence-BERT Similarity-Based Evaluation Metric for Indian Language Neural Machine Translation Systems9
Wave Digital Modeling and Implementation of Nonlinear Audio Circuits With Nullors9
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training9
Bayesian Learning for Deep Neural Network Adaptation9
Relation Extraction in Dialogues: A Deep Learning Model Based on the Generality and Specialty of Dialogue Text9
Computation of Spherical Harmonic Representations of Source Directivity Based on the Finite-Distance Signature9
Controlling Elevation and Azimuth Beamwidths With Concentric Circular Microphone Arrays9
A Joint Model for Named Entity Recognition With Sentence-Level Entity Type Attentions9
Language-Independent Approach for Automatic Computation of Vowel Articulation Features in Dysarthric Speech Assessment9
Improved Speech Enhancement Considering Speech PSD Uncertainty9
Mixture Representation Learning for Deep Speaker Embedding9
Selective Listening by Synchronizing Speech With Lips9
From LSAT: The Progress and Challenges of Complex Reasoning9
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting9
A Time-Domain Real-Valued Generalized Wiener Filter for Multi-Channel Neural Separation Systems9
Spatial Active Noise Control in Rooms Using Higher Order Sources8
Chinese Lexical Simplification8
Low Latency Speech Enhancement for Hearing Aids Using Deep Filtering8
Review and Arrange: Curriculum Learning for Natural Language Understanding8
Domain-Shift Conditioning Using Adaptable Filtering Via Hierarchical Embeddings for Robust Chinese Spell Check8
Directly Comparing the Listening Strategies of Humans and Machines8
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization8
Reconfigurable Nonuniform Filter Bank for Hearing Aid Systems8
Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction8
Exemplar-Based Emotive Speech Synthesis8
Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation8
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency8
Regularized Phrase-Based Topic Model for Automatic Question Classification With Domain-Agnostic Class Labels8
Counterfactually Fair Automatic Speech Recognition8
Non-Autoregressive ASR Modeling Using Pre-Trained Language Models for Chinese Speech Recognition8
Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition8
DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays8
Squared Sine Adaptive Algorithm and Its Performance Analysis8
A Room Impulse Response Measurement Method Robust Towards Nonlinearities Based on Orthogonal Periodic Sequences8
Adaptive Convolution for Semantic Role Labeling8
Converting Foreign Accent Speech Without a Reference8
Nonlinear Spatial Filtering in Multichannel Speech Enhancement8
General Robust Subband Adaptive Filtering: Algorithms and Applications7
Privacy and Utility of X-Vector Based Speaker Anonymization7
Hybrid Speech and Text Analysis Methods for Speaker Change Detection7
A Digital Twin Architecture for Wireless Networked Adaptive Active Noise Control7
Deep Learning Approaches in Topics of Singing Information Processing7
Decoupled Multiple Speaker Direction-of-Arrival Estimator Under Reverberant Environments7
Preordering Encoding on Transformer for Translation7
The Temporal Limits Encoder as a Sound Coding Strategy for Bilateral Cochlear Implants7
Music Source Separation With Band-Split RNN7
ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding7
Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition7
Acoustic Source Localization in the Circular Harmonic Domain Using Deep Learning Architecture7
RARS: Recognition of Audio Recording Source Based on Residual Neural Network7
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition7
Multi-Turn Dialogue Reading Comprehension With Pivot Turns and Knowledge7
Monaural Speech Separation Using Speaker Embedding From Preliminary Separation7
Localization-Driven Speech Enhancement in Noisy Multi-Speaker Hospital Environments Using Deep Learning and Meta Learning7
Flexibly Focusing on Supporting Facts, Using Bridge Links, and Jointly Training Specialized Modules for Multi-Hop Question Answering7
Bayesian Neural Network Language Modeling for Speech Recognition7
Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization7
Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism7
SIFTER: A Framework for Robust Rumor Detection7
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing7
Parametric Ambisonic Encoding of Arbitrary Microphone Arrays7
Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization7
On the Design of 3D Steerable Beamformers With Uniform Concentric Circular Microphone Arrays7
Reference Knowledgeable Network for Machine Reading Comprehension7
Generalized Hyperbolic Tangent Based Random Fourier Conjugate Gradient Filter for Nonlinear Active Noise Control7
Word-Region Alignment-Guided Multimodal Neural Machine Translation7
MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer With One Transformer VAE7
A Graph-to-Sequence Learning Framework for Summarizing Opinionated Texts7
0.082405090332031