EURASIP Journal on Audio Speech and Music Processing

Papers
(The median citation count of EURASIP Journal on Audio Speech and Music Processing is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning24
MIRACLE—a microphone array impulse response dataset for acoustic learning22
AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks21
Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement18
Generating chord progression from melody with flexible harmonic rhythm and controllable harmonic density18
A simplified and controllable model of mode coupling for addressing nonlinear phenomena in sound synthesis processes17
Domain-weighted transfer learning and discriminative embeddings for low-resource speaker verification17
Compression of room impulse responses for compact storage and fast low-latency convolution16
End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network15
Parameter-efficient adaptation with multi-channel adversarial training for far-field speech recognition13
Estimation of playable piano fingering by pitch-difference fingering match model12
Sound recurrence analysis for acoustic scene classification12
Investigations on higher-order spherical harmonic input features for deep learning-based multiple speaker detection and localization12
Attention mechanism combined with residual recurrent neural network for sound event detection and localization11
Benefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech11
Sound field reconstruction using neural processes with dynamic kernels10
Three-stage training and orthogonality regularization for spoken language recognition10
Feature compensation based on independent noise estimation for robust speech recognition10
Enhancing Speaker Recognition with CRET Model: a fusion of CONV2D, RESNET and ECAPA-TDNN10
Silent speech recognition using visual cascading fusion of tongue-lip movements based on pre-trained and fine-tuned model9
Multi-rate modulation encoding via unsupervised learning for audio event detection9
Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control8
The whole is greater than the sum of its parts: improving music source separation by bridging networks7
Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit7
Variational Autoencoders for chord sequence generation conditioned on Western harmonic music complexity7
Vulnerability issues in Automatic Speaker Verification (ASV) systems7
Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape7
Parallel processing of distributed beamforming and multichannel linear prediction for speech denoising and deverberation in wireless acoustic sensor networks7
Pronunciation augmentation for Mandarin-English code-switching speech recognition6
Residual feedback suppression with extended model-based postfilters6
Dance2Music-Diffusion: leveraging latent diffusion models for music generation from dance videos6
Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system6
W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision6
Correction: N-dimensional N-microphone sound source localization5
Auxiliary function-based algorithm for blind extraction of a moving speaker5
Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy5
An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction5
dEchorate: a calibrated room impulse response dataset for echo-aware signal processing5
Automatic detection of attachment style in married couples through conversation analysis5
Neural network-based non-intrusive speech quality assessment using attention pooling function5
Data-based spatial audio processing4
Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation4
Automatic dysarthria detection and severity level assessment using CWT-layered CNN model4
Data-driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines4
Single-microphone speaker separation and voice activity detection in noisy and reverberant environments4
Robust and early howling detection based on a sparsity measure4
Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization4
DOA-informed switching independent vector extraction and beamforming for speech enhancement in underdetermined situations4
Multi-scale Information Aggregation for Spoofing Detection4
Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models4
Recognition of target domain Japanese speech using language model replacement4
Masked multi-center angular margin loss for language recognition4
Training audio transformers for cover song identification4
Performance evaluation of perceptible impulsive noise detection methods based on auditory models4
Optimal sensor placement for the spatial reconstruction of sound fields4
Fake speech detection using VGGish with attention block4
Blind extraction of guitar effects through blind system inversion and neural guitar effect modeling3
Language agnostic missing subtitle detection3
An MMSE graph spectral magnitude estimator for speech signals residing on an undirected multiple graph3
On the selection of the number of beamformers in beamforming-based binaural reproduction3
Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling3
Guest editorial: AI for computational audition—sound and music processing3
Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array3
YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation3
Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling3
Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios3
Significance of relative phase features for shouted and normal speech classification3
A survey of technologies for automatic Dysarthric speech recognition3
Multi-encoder attention-based architectures for sound recognition with partial visual assistance3
Deep learning-based wave digital modeling of rate-dependent hysteretic nonlinearities for virtual analog applications3
PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio3
Optimizing tiny colorless feedback delay networks3
AI-based Chinese-style music generation from video content: a study on cross-modal analysis and generation methods3
Automated audio captioning: an overview of recent progress and new challenges2
Steered Response Power for Sound Source Localization: a tutorial review2
SVQ-MAE: an efficient speech pre-training framework with constrained computational resources2
Exploration of Whisper fine-tuning strategies for low-resource ASR2
Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy2
Points2Sound: from mono to binaural audio using 3D point cloud scenes2
Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?2
Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning2
Comparative evaluation of interpolation methods for the directivity of musical instruments2
Black-box adversarial attacks through speech distortion for speech emotion recognition2
Convolutional neural networks for the classification of guitar effects and extraction of the parameter settings of single and multi-guitar effects from instrument mixes2
Acoustic object canceller: removing a known signal from monaural recording using blind synchronization2
Microphone utility estimation in acoustic sensor networks using single-channel signal features2
Explicit-memory multiresolution adaptive framework for speech and music separation2
Multi-source localization by using offset residual weight2
Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks2
Singing to speech conversion with generative flow2
Multi-pitch estimation with polyphony per instrument information for Western classical and electronic music2
Efficient binaural rendering of spherical microphone array data by linear filtering2
A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions2
Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music1
Lightweight target speaker separation network based on joint training1
Improved capsule routing for weakly labeled sound event detection1
Music time signature detection using ResNet181
An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment1
Adaptive multi-task learning for speech to text translation1
Speech emotion recognition based on emotion perception1
Improving sign-algorithm convergence rate using natural gradient for lossless audio compression1
Beyond the Big Five personality traits for music recommendation systems1
A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models1
Quantifying headphone listening experience in virtual sound environments using distraction1
Heterogeneous separation consistency training for adaptation of unsupervised speech separation1
Whisper-based spoken term detection systems for search on speech ALBAYZIN evaluation challenge1
Interaural time difference individualization in HRTF by scaling through anthropometric parameters1
MUSIB: musical score inpainting benchmark1
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders1
Correction: Trainable windows for SincNet architecture1
A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices1
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting1
Robust single- and multi-loudspeaker least-squares-based equalization for hearing devices1
Spherical harmonic covariance and magnitude function encodings for beamformer design1
Voice activity detection in the presence of transient based on graph1
The power of humorous audio: exploring emotion regulation in traffic congestion through EEG-based study1
Deep semantic learning for acoustic scene classification1
Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition1
Analysis of spatial filtering in neural spatiospectral filters and its dependence on training target characteristics1
Exploring the power of pure attention mechanisms in blind room parameter estimation1
Automatic classification of the physical surface in sound uroflowmetry using machine learning methods1
A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence1
Cross-corpus speech emotion recognition using subspace learning and domain adaption1
Single-channel speech enhancement based on joint constrained dictionary learning1
DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection1
A speech recognition method with enhanced transformer decoder1
0.034075975418091