EURASIP Journal on Audio Speech and Music Processing

Papers
(The median citation count of EURASIP Journal on Audio Speech and Music Processing is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
A review of infant cry analysis and classification50
Accent modification for speech recognition of non-native speakers using neural style transfer21
Automated audio captioning: an overview of recent progress and new challenges18
End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network18
Dynamically localizing multiple speakers based on the time-frequency domain18
Progressive loss functions for speech enhancement with deep neural networks17
Performance vs. hardware requirements in state-of-the-art automatic speech recognition16
An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction16
Auxiliary function-based algorithm for blind extraction of a moving speaker15
dEchorate: a calibrated room impulse response dataset for echo-aware signal processing13
Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition12
Towards cross-modal pre-training and learning tempo-spatial characteristics for audio recognition with convolutional and recurrent neural networks11
Acoustic DOA estimation using space alternating sparse Bayesian learning11
Components loss for neural networks in mask-based speech enhancement10
MetaMGC: a music generation framework for concerts in metaverse10
Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices9
Geometry calibration in wireless acoustic sensor networks utilizing DoA and distance information9
Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music9
Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation9
MYRiAD: a multi-array room acoustic database8
Steerable differential beamformers with planar microphone arrays8
Benefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech8
Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling8
Comparison of semi-supervised deep learning algorithms for audio classification7
AUC optimization for deep learning-based voice activity detection7
Time–frequency scattering accurately models auditory similarities between instrumental playing techniques7
Deep neural networks for automatic speech processing: a survey from large corpora to limited data7
Beyond the Big Five personality traits for music recommendation systems6
Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms6
DOANet: a deep dilated convolutional neural network approach for search and rescue with drone-embedded sound source localization6
Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit5
Single-channel speech enhancement based on joint constrained dictionary learning5
NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain5
A simulation study on optimal scores for speaker recognition5
Review of methods for coding of speech signals5
Estimation of playable piano fingering by pitch-difference fingering match model5
Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system5
Depression-level assessment from multi-lingual conversational speech data using acoustic and text features5
Speech emotion recognition based on emotion perception5
RPCA-DRNN technique for monaural singing voice separation5
An online algorithm for echo cancellation, dereverberation and noise reduction based on a Kalman-EM Method4
AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks4
Audio source separation by activity probability detection with maximum correlation and simplex geometry4
A survey of technologies for automatic Dysarthric speech recognition4
Stripe-Transformer: deep stripe feature learning for music source separation4
A CNN-based approach to identification of degradations in speech signals4
Comparative evaluation of interpolation methods for the directivity of musical instruments4
Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy4
Trainable windows for SincNet architecture3
Points2Sound: from mono to binaural audio using 3D point cloud scenes3
U2-VC: one-shot voice conversion using two-level nested U-structure3
Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling3
Sparse pursuit and dictionary learning for blind source separation in polyphonic music recordings3
Attention mechanism combined with residual recurrent neural network for sound event detection and localization3
Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement3
Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios3
A large TV dataset for speech and music activity detection3
Time-domain adaptive attention network for single-channel speech separation3
Dynamic out-of-vocabulary word registration to language model for speech recognition3
Deep semantic learning for acoustic scene classification2
Pronunciation augmentation for Mandarin-English code-switching speech recognition2
Correction: Trainable windows for SincNet architecture2
A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices2
Convolutional neural networks for the classification of guitar effects and extraction of the parameter settings of single and multi-guitar effects from instrument mixes2
Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources2
DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection2
Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization2
Masked multi-center angular margin loss for language recognition2
An integrated MVDR beamformer for speech enhancement using a local microphone array and external microphones2
Data-based spatial audio processing2
Deep learning-based wave digital modeling of rate-dependent hysteretic nonlinearities for virtual analog applications2
Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation2
Cross-corpus speech emotion recognition using subspace learning and domain adaption2
Speech emotion recognition based on Graph-LSTM neural network2
Feature compensation based on independent noise estimation for robust speech recognition2
Black-box adversarial attacks through speech distortion for speech emotion recognition2
A recursive expectation-maximization algorithm for speaker tracking and separation2
Frequency-dependent auto-pooling function for weakly supervised sound event detection2
On the selection of the number of beamformers in beamforming-based binaural reproduction2
Automatic detection of attachment style in married couples through conversation analysis2
Channel and temporal-frequency attention UNet for monaural speech enhancement2
Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition2
Sound field reconstruction using neural processes with dynamic kernels2
Robust single- and multi-loudspeaker least-squares-based equalization for hearing devices2
Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization2
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders2
A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions2
Musical note onset detection based on a spectral sparsity measure2
Multi-source localization by using offset residual weight2
PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio1
Vulnerability issues in Automatic Speaker Verification (ASV) systems1
Residual feedback suppression with extended model-based postfilters1
Cascade algorithms for combined acoustic feedback cancelation and noise reduction1
Language agnostic missing subtitle detection1
GLFER-Net: a polyphonic sound source localization and detection network based on global-local feature extraction and recalibration1
A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence1
Multichannel speaker interference reduction using frequency domain adaptive filtering1
Paralinguistic and spectral feature extraction for speech emotion classification using machine learning techniques1
Sound event triage: detecting sound events considering priority of classes1
Improving sign-algorithm convergence rate using natural gradient for lossless audio compression1
Explicit-memory multiresolution adaptive framework for speech and music separation1
A latent rhythm complexity model for attribute-controlled drum pattern generation1
Heterogeneous separation consistency training for adaptation of unsupervised speech separation1
W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision1
Learning-based robust speaker counting and separation with the aid of spatial coherence1
Nonlinear residual echo suppression based on dual-stream DPRNN1
Automatic discrimination between front and back ensemble locations in HRTF-convolved binaural recordings of music1
Forward-backward recursive expectation-maximization for concurrent speaker tracking1
Interaural time difference individualization in HRTF by scaling through anthropometric parameters1
Parallel processing of distributed beamforming and multichannel linear prediction for speech denoising and deverberation in wireless acoustic sensor networks1
Towards multidimensional attentive voice tracking—estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling1
Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy1
Three-stage training and orthogonality regularization for spoken language recognition1
A multichannel learning-based approach for sound source separation in reverberant environments1
Correction to: An integrated MVDR beamformer for speech enhancement using a local microphone array and external microphones1
Neural network-based non-intrusive speech quality assessment using attention pooling function1
Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments1
Analysis of transition cost and model parameters in speaker diarization for meetings1
An MMSE graph spectral magnitude estimator for speech signals residing on an undirected multiple graph1
Variational Autoencoders for chord sequence generation conditioned on Western harmonic music complexity1
Dual input neural networks for positional sound source localization1
The whole is greater than the sum of its parts: improving music source separation by bridging networks1
Fake speech detection using VGGish with attention block1
Multi-rate modulation encoding via unsupervised learning for audio event detection1
Automatic dysarthria detection and severity level assessment using CWT-layered CNN model1
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting1
Quantifying headphone listening experience in virtual sound environments using distraction1
0.070455074310303