EURASIP Journal on Audio Speech and Music Processing

Papers
(The median citation count of EURASIP Journal on Audio Speech and Music Processing is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-02-01 to 2025-02-01.)
ArticleCitations
Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy50
Automated audio captioning: an overview of recent progress and new challenges21
Analysis of transition cost and model parameters in speaker diarization for meetings20
Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system18
Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling18
Data-based spatial audio processing18
Multi-source localization by using offset residual weight16
PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio15
Gated recurrent unit predictor model-based adaptive differential pulse code modulation speech decoder13
Residual feedback suppression with extended model-based postfilters12
Learning-based robust speaker counting and separation with the aid of spatial coherence11
Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning11
UTran-DSR: a novel transformer-based model using feature enhancement for dysarthric speech recognition10
Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model10
The power of humorous audio: exploring emotion regulation in traffic congestion through EEG-based study10
Automatic detection of attachment style in married couples through conversation analysis9
Auxiliary function-based algorithm for blind extraction of a moving speaker9
Neural network-based non-intrusive speech quality assessment using attention pooling function9
Stripe-Transformer: deep stripe feature learning for music source separation8
MIRACLE—a microphone array impulse response dataset for acoustic learning8
Heterogeneous separation consistency training for adaptation of unsupervised speech separation8
dEchorate: a calibrated room impulse response dataset for echo-aware signal processing7
NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain7
Dynamically localizing multiple speakers based on the time-frequency domain6
On the selection of the number of beamformers in beamforming-based binaural reproduction6
A lightweight approach to real-time speaker diarization: from audio toward audio-visual data streams5
Modelling note’s pitch and duration in trained professional singers5
AUC optimization for deep learning-based voice activity detection5
Correction: N-dimensional N-microphone sound source localization5
Deep neural networks for automatic speech processing: a survey from large corpora to limited data5
Continuous lipreading based on acoustic temporal alignments5
Correction to: An integrated MVDR beamformer for speech enhancement using a local microphone array and external microphones5
An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction5
Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis4
Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks4
Microphone utility estimation in acoustic sensor networks using single-channel signal features4
Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios4
Correction: Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios4
Generating chord progression from melody with flexible harmonic rhythm and controllable harmonic density4
Correction: DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection4
A latent rhythm complexity model for attribute-controlled drum pattern generation4
Quantifying headphone listening experience in virtual sound environments using distraction3
Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks3
AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks3
Frequency-dependent auto-pooling function for weakly supervised sound event detection3
Masked multi-center angular margin loss for language recognition3
A multichannel learning-based approach for sound source separation in reverberant environments3
A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions3
Musical note onset detection based on a spectral sparsity measure3
Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?3
Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement3
End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network2
A recursive expectation-maximization algorithm for speaker tracking and separation2
Compression of room impulse responses for compact storage and fast low-latency convolution2
Efficient binaural rendering of spherical microphone array data by linear filtering2
SVQ-MAE: an efficient speech pre-training framework with constrained computational resources2
Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms2
Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization2
Domain-weighted transfer learning and discriminative embeddings for low-resource speaker verification2
Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning2
End-to-end training of acoustic scene classification using distributed sound-to-light conversion devices: verification through simulation experiments2
U2-VC: one-shot voice conversion using two-level nested U-structure2
RPCA-DRNN technique for monaural singing voice separation2
Voice activity detection in the presence of transient based on graph2
Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach2
An integrated MVDR beamformer for speech enhancement using a local microphone array and external microphones2
Recognition of target domain Japanese speech using language model replacement2
A review of infant cry analysis and classification2
Online distributed waveform-synchronization for acoustic sensor networks with dynamic topology2
Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization2
A simplified and controllable model of mode coupling for addressing nonlinear phenomena in sound synthesis processes2
Comparative evaluation of interpolation methods for the directivity of musical instruments2
Steered Response Power for Sound Source Localization: a tutorial review2
Analysis of spatial filtering in neural spatiospectral filters and its dependence on training target characteristics2
Spherical harmonic covariance and magnitude function encodings for beamformer design2
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting2
Points2Sound: from mono to binaural audio using 3D point cloud scenes2
Explicit-memory multiresolution adaptive framework for speech and music separation2
Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction2
A framework for the acoustic simulation of passing vehicles using variable length delay lines1
Beyond the Big Five personality traits for music recommendation systems1
Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices1
Sound field reconstruction using neural processes with dynamic kernels1
Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models1
Accent modification for speech recognition of non-native speakers using neural style transfer1
Attention mechanism combined with residual recurrent neural network for sound event detection and localization1
Comparison of semi-supervised deep learning algorithms for audio classification1
Black-box adversarial attacks through speech distortion for speech emotion recognition1
Sound event triage: detecting sound events considering priority of classes1
Sampling the user controls in neural modeling of audio devices1
Predominant audio source separation in polyphonic music1
Estimation of playable piano fingering by pitch-difference fingering match model1
An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment1
A large TV dataset for speech and music activity detection1
Exploration of Whisper fine-tuning strategies for low-resource ASR1
Significance of relative phase features for shouted and normal speech classification1
Benefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech1
Paralinguistic and spectral feature extraction for speech emotion classification using machine learning techniques1
Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation1
Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition1
Training audio transformers for cover song identification1
Performance evaluation of perceptible impulsive noise detection methods based on auditory models1
Three-stage training and orthogonality regularization for spoken language recognition1
Time-domain adaptive attention network for single-channel speech separation1
DOA-informed switching independent vector extraction and beamforming for speech enhancement in underdetermined situations1
Sound recurrence analysis for acoustic scene classification1
Fake speech detection using VGGish with attention block1
Improving sign-algorithm convergence rate using natural gradient for lossless audio compression1
Multi-rate modulation encoding via unsupervised learning for audio event detection1
Multi-channel neural audio decorrelation using generative adversarial networks1
Acoustic object canceller: removing a known signal from monaural recording using blind synchronization1
Feature compensation based on independent noise estimation for robust speech recognition1
Point neuron learning: a new physics-informed neural network architecture1
Speech emotion recognition based on emotion perception1
DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection1
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders1
0.038494110107422