OOIR: Observatory of International Research

Papers

(The median citation count of EURASIP Journal on Audio Speech and Music Processing is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
MIRACLE—a microphone array impulse response dataset for acoustic learning	49
Hybrid lightweight temporal-frequency analysis network for multi-channel speech enhancement	44
Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning	42
Generating chord progression from melody with flexible harmonic rhythm and controllable harmonic density	30
Advancing guitar emotion recognition through audio data augmentation to enhance smart musical instruments	30
Domain-weighted transfer learning and discriminative embeddings for low-resource speaker verification	29
A simplified and controllable model of mode coupling for addressing nonlinear phenomena in sound synthesis processes	29
Speech-dependent data augmentation for own voice reconstruction with hearable microphones in noisy environments	25
Compression of room impulse responses for compact storage and fast low-latency convolution	23
AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks	21
Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement	20
Parameter-efficient adaptation with multi-channel adversarial training for far-field speech recognition	19
Attention mechanism combined with residual recurrent neural network for sound event detection and localization	17
Parameter optimisation for a physical model of the vocal system	17
Three-stage training and orthogonality regularization for spoken language recognition	16
Enhancing Speaker Recognition with CRET Model: a fusion of CONV2D, RESNET and ECAPA-TDNN	15
Benefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech	15
Sound recurrence analysis for acoustic scene classification	15
Investigations on higher-order spherical harmonic input features for deep learning-based multiple speaker detection and localization	15
Multi-rate modulation encoding via unsupervised learning for audio event detection	14
Sound field reconstruction using neural processes with dynamic kernels	14
Silent speech recognition using visual cascading fusion of tongue-lip movements based on pre-trained and fine-tuned model	14
Parallel processing of distributed beamforming and multichannel linear prediction for speech denoising and deverberation in wireless acoustic sensor networks	13
Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control	13
Real-time playing technique recognition embedded in a smart acoustic guitar	12

Variational Autoencoders for chord sequence generation conditioned on Western harmonic music complexity	12
Dance2Music-Diffusion: leveraging latent diffusion models for music generation from dance videos	11
The whole is greater than the sum of its parts: improving music source separation by bridging networks	11
AudioSet-tools: a Python framework for taxonomy-aware AudioSet curation and reproducible audio research	11
Vulnerability issues in Automatic Speaker Verification (ASV) systems	11
Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape	11
W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision	10
Correction: N-dimensional N-microphone sound source localization	9
Masked multi-center angular margin loss for language recognition	9
Automatic detection of attachment style in married couples through conversation analysis	9
Performance evaluation of perceptible impulsive noise detection methods based on auditory models	9
Data-based spatial audio processing	9
Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization	9
Training audio transformers for cover song identification	8
Robust and early howling detection based on a sparsity measure	8
DOA-informed switching independent vector extraction and beamforming for speech enhancement in underdetermined situations	8
Recognition of target domain Japanese speech using language model replacement	8
Guest editorial: AI for computational audition—sound and music processing	7
Optimal sensor placement for the spatial reconstruction of sound fields	7
Single-microphone speaker separation and voice activity detection in noisy and reverberant environments	7
Comparative study of state-based neural networks for virtual analog audio effects modeling	7
Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models	7
Multi-scale Information Aggregation for Spoofing Detection	6
AI-based Chinese-style music generation from video content: a study on cross-modal analysis and generation methods	6
A survey of technologies for automatic Dysarthric speech recognition	6
Significance of relative phase features for shouted and normal speech classification	6
Multilingual speech-to-vocal tract visualization using deep learning for pronunciation training	6
Fake speech detection using VGGish with attention block	6
Automatic dysarthria detection and severity level assessment using CWT-layered CNN model	6
Data-driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines	6
Coded speech enhancement using auxiliary utterance-level information	6
Blind extraction of guitar effects through blind system inversion and neural guitar effect modeling	5
YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation	5
Deep learning-based wave digital modeling of rate-dependent hysteretic nonlinearities for virtual analog applications	5
Deep room impulse response completion	5
Explicit-memory multiresolution adaptive framework for speech and music separation	5
Multi-encoder attention-based architectures for sound recognition with partial visual assistance	5
Parametric virtual microphone techniques for sound field reconstruction with early reflection modeling	5
Automated audio captioning: an overview of recent progress and new challenges	5
Optimizing tiny colorless feedback delay networks	5
PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio	5
Acoustic scene classification using inter- and intra-subarray spatial features in distributed microphone array	5
Language agnostic missing subtitle detection	5
An MMSE graph spectral magnitude estimator for speech signals residing on an undirected multiple graph	5
Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios	5
Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling	5
Black-box adversarial attacks through speech distortion for speech emotion recognition	4
Exploration of Whisper fine-tuning strategies for low-resource ASR	4
Lightweight target speaker separation network based on joint training	4
Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks	4

Steered Response Power for Sound Source Localization: a tutorial review	4
Multi-pitch estimation with polyphony per instrument information for Western classical and electronic music	4
Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy	4
Enhancing drone audition with rotor-conditioned deep models	4
ICRCycleGAN-VC: a robust one-to-one voice conversion method based on CycleGAN and inception-ResNet blocks	4
Points2Sound: from mono to binaural audio using 3D point cloud scenes	4
Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning	4
Singing to speech conversion with generative flow	4
SVQ-MAE: an efficient speech pre-training framework with constrained computational resources	4
Convolutional neural networks for the classification of guitar effects and extraction of the parameter settings of single and multi-guitar effects from instrument mixes	4
MUSIB: musical score inpainting benchmark	4
Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?	4
Microphone utility estimation in acoustic sensor networks using single-channel signal features	4
Acoustic object canceller: removing a known signal from monaural recording using blind synchronization	4
Analysis of spatial filtering in neural spatiospectral filters and its dependence on training target characteristics	3
Exploring the power of pure attention mechanisms in blind room parameter estimation	3
Polygraph and audio synchronization applied to apnea event analysis based on non-negative matrix factorization	3
A speech recognition method with enhanced transformer decoder	3
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders	3
Beyond the Big Five personality traits for music recommendation systems	3
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting	3
Deep semantic learning for acoustic scene classification	3
Music time signature detection using ResNet18	3
The power of humorous audio: exploring emotion regulation in traffic congestion through EEG-based study	3
Voice activity detection in the presence of transient based on graph	3
MIDI music plagiarism detection method based on feature similarity learning	3
Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning	3
Whisper-based spoken term detection systems for search on speech ALBAYZIN evaluation challenge	3
Heterogeneous separation consistency training for adaptation of unsupervised speech separation	3
Quantifying headphone listening experience in virtual sound environments using distraction	3
An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment	3
DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection	3
Speaker embedding loss for end-to-end speaker diarization without external embedding networks	3