OOIR: Observatory of International Research

Papers

(The median citation count of Speech Communication is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
A comprehensive study on supervised single-channel noisy speech separation with multi-task learning	69
Psychoacoustic features explain creakiness classifications made by naive and non-naive listeners	63
Editorial Board	58
Subband fusion of complex spectrogram for fake speech detection	58
Automatic classification of vocal intensity categories from amplitude-normalized speech signals by comparing acoustic features and classifier models	52
Editorial Board	48
Data augmentation for speech separation	38
A corpus of audio-visual recordings of linguistically balanced, Danish sentences for speech-in-noise experiments	36
An introduction to pluricentric languages in speech science and technology	32
Self-Supervised Learning for Speaker Recognition: A study and review	32
Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition	31
A robust temporal map of speech monitoring from planning to articulation	30
A novel distortion-tolerant speech encryption scheme for secure voice communication	29
Vocal emotion perception in Mandarin-speaking older adults with hearing loss	28
Editorial Board	27
Editorial Board	25
The prosody of theme, rheme and focus in Egyptian Arabic: A quantitative investigation of tunes, configurations and speaker variability	23
Topological data analysis of human vowels: Persistent homologies across representation spaces	22
Blood pressure monitoring from naturally recorded speech sounds: advancements and future prospects	22
Editorial Board	21
HC-APNet: Harmonic Compensation Auditory Perception Network for low-complexity speech enhancement	20
Frequent-words analysis for forensic speaker comparison	18
Investigating prosodic entrainment from global conversations to local turns and tones in Mandarin conversations	18
Editorial Board	18
Expectation of speech style improves audio-visual perception of English vowels	17

"I said simPle, not symBol!"Is clear speech tailored to the listener's feedback	17
Two-stage UNet with channel and temporal-frequency attention for multi-channel speech enhancement	17
Efficient acoustic feature transformation in mismatched environments using a Guided-GAN	17
Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: cumulating speech resources in a pluricentric language	16
Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency	15
Evaluating the effects of continuous pitch and speech tempo modifications on perceptual speaker verification performance by familiar and unfamiliar listeners	15
Editorial Board	14
Real-time intelligibility affects the realization of French word-final schwa	14
A study of correlation between physiological process of articulation and emotions on Mandarin Chinese	14
Paradigm fusion learning from overt and silent chinese speech based on pseudo-siamese multiscale capsule neural network	14
Towards unsupervised speech recognition without pronunciation models	13
Influence of speech-in-noise perception, gender, and age on lipreading ability for monosyllabic words	13
Vocal characteristics of accuracy in eyewitness testimony	13
Learning and controlling the source-filter representation of speech with a variational autoencoder	13
Hand gesture realisation of contrastive focus in real-time whisper-to-speech synthesis: Investigating the transfer from implicit to explicit control of intonation	13
Sequential perception of tone and focus in parallel–A computational simulation	12
A new universal camouflage attack algorithm for intelligent speech system	12
The effect of fluency strategy training on interpreter trainees’ speech fluency: Does content familiarity matter?	12
Prosody in narratives: An exploratory study with children with sex chromosomes trisomies	12
Dynamic graph learning with gated convolutions for single-channel speech separation	12
Effects of voice onset time and place of articulation on perception of dichotic Turkish syllables	12
Disordered speech recognition considering low resources and abnormal articulation	11
Speechformer-CTC: Sequential modeling of depression detection with speech temporal classification	11
Adaptive weighting in a transformer framework for multimodal emotion recognition	11
Editorial Board	11
Multi-modal co-learning for silent speech recognition based on ultrasound tongue images	11
Exploiting Locality Sensitive Hashing - Clustering and gloss feature for sign language production	11
Yanbian Korean speakers tend to merge /e/ and /ɛ/ when exposed to Seoul Korean	11
Coarse-to-fine speech separation method in the time-frequency domain	11
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions	10
GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition	10
Nasal coarticulation in Lombard speech	10
Perceptual effects of interpolated Austrian and German standard varieties	10
Tone-syllable synchrony in Mandarin: New evidence and implications	10
Prosody and fluency of Finland Swedish as a second language: Investigating global parameters for automated speaking assessment	10
Progressive channel fusion for more efficient TDNN on speaker verification	10
Modulation spectral features for speech emotion recognition using deep neural networks	10
FinnAffect: An affective speech corpus for spontaneous Finnish	10
Pathological voice classification using MEEL features and SVM-TabNet model	9
Editorial Board	9
Role of language familiarity in understanding speech in noise under various acoustic environments	9
Efficient time-domain speech separation using short encoded sequence network	9
Arabic Automatic Speech Recognition: Challenges and Progress	9
Enhancing bone-conducted speech with spectrum similarity metric in adversarial learning	9
Automatic speech recognition technology to evaluate an audiometric word recognition test: A preliminary investigation	9
Speakers’ vocal expression of sexual orientation depends on experimenter gender	9
Combined approach to dysarthric speaker verification using data augmentation and feature fusion	9
A cross-modal attention model with contextual enhancements for speech emotion recognition	9
Differential constant-beamwidth beamforming with cube arrays	8
Cross-modal information fusion for voice spoofing detection	8

MC-Mamba: Cross-modal target speaker extraction model based on multiple consistency	8
Editorial Board	8
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech	8
One-shot emotional voice conversion based on feature separation	8
Accurate synthesis of dysarthric Speech for ASR data augmentation	8
Editorial Board	8
Assessing Cancer-Related Cognitive Impairment for breast cancer survivors with speech analysis	7
Learning transfer from singing to speech: Insights from vowel analyses in aging amateur singers and non-singers	7
CSLNSpeech: Solving the extended speech separation problem with the help of Chinese sign language	7
Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus	7
Editorial Board	7
Exploring LoRA variants to adapt whisper models for robust recognition of children’s speech	7
Controllable speech synthesis by learning discrete phoneme-level prosodic representations	7
Addressing the semi-open set dialect recognition problem under resource-efficient considerations	7
The perception of intonational peaks and valleys: The effects of plateaux, declination and experimental task	7
Categorization of patients affected with neurogenerative dysarthria among Hindi-speaking population and analyzing factors causing reduced speech intelligibility at the human-machine interface	7
Prosodic characteristics of deceptive picture descriptions in Finnish: Acoustics, beliefs, self-evaluations, and deception theories	7
Robust prosody modeling for synthetic speech detection	6
Effect of individual characteristics on impressions of one’s own recorded voice	6
Domain adaptation using non-parallel target domain corpus for self-supervised learning-based automatic speech recognition	6
MaTSE: A hybrid Mamba-Transformer model for monaural Speech Enhancement	6
Editorial Board	6
An adaptive autoregressive pre-whitener for speech and acoustic signals based on parametric NMF	6
The role of visual cues indicating onset times of target speech syllables in release from informational or energetic masking	6
Chinese speech intelligibility and speech intelligibility index for the elderly	6
Robustness of emotion recognition in dialogue systems: A study on third-party API integrations and black-box attacks	6
Editorial Board	6
Recursive Feature Diversity Network for audio super-resolution	6
The Role of Auditory and Visual Cues in the Perception of Mandarin Emotional Speech in Male Drug Addicts	6
Speech emotion recognition using energy based adaptive mode selection	5
Effects of hearing loss and audio-visual cues on children's speech processing speed	5
Coalescence of fractionally derived and statistically tuned correlated vocal tract and excitation source gaussian mixture super-vectors for improved modelling of speech emotion	5
Exploring one-formant vowel perception in real speech corpus through typicality assessment	5
Leveraging Kolmogorov-Arnold networks for voice liveness detection in anti-spoofing systems	5
CFAD: A Chinese dataset for fake audio detection	5
Enhanced cross-modal parallel training for improving end-to-end accented speech recognition	5
PLDE: A lightweight pooling layer for spoken language recognition	5
Multimodal Arabic emotion recognition using deep learning	5
The role of prosody and hand gestures in the perception of boundaries in speech✰	5
MFFN: Multi-level Feature Fusion Network for monaural speech separation	4
Deep learning based stage-wise two-dimensional speaker localization with large ad-hoc microphone arrays	4
The discriminative capacity of English segments in forensic speaker comparison	4
Editorial Board	4
Object detection for cross-linguistic vowel analysis: A novel language-agnostic method for forensic speech processing	4
Development of a hybrid word recognition system and dataset for the Azerbaijani Sign Language dactyl alphabet	4
Understanding perception and production in loan adaptation: Cases of English loans in Mandarin	4
Some properties of mental speech preparation as revealed by self-monitoring	4
On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement	4
Lateral channel dynamics and F3 modulation: Quantifying para-sagittal articulation in Australian English /l/	4
Spatio-temporal masked autoencoder-based phonetic segments classification from ultrasound	4
TranSTYLer: Multimodal behavioural style transfer for facial and body gestures generation	4
The effect of clear speech to foreign-sounding interlocutors on native listeners’ perception of intelligibility	4
One-class network leveraging spectro-temporal features for generalized synthetic speech detection	4
Editorial Board	4
The dependence of accommodation processes on conversational experience	4
APIN: Amplitude- and phase-aware interaction network for speech emotion recognition	4
Zero-shot voice conversion based on feature disentanglement	4
Editorial Board	4
Editorial Board	4
Editorial Board	4
Robust beamforming guided by prior directional information for speech enhancement in ship engine rooms	4
The role of probability and duration in perception of speech sounds	4
Selective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting	4
Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion	4
Corrigendum to “FinnAffect: An affective speech corpus for spontaneous Finnish” [Speech Communication 175 (2025) 103327]	3
Model predictive PESQ-ANFIS/FUZZY C-MEANS for image-based speech signal evaluation	3
Editorial Board	3
Towards robust heart failure detection in digital telephony environments by utilizing transformer-based codec inversion	3
Source and filter characteristics based transfer learning for dysarthria severity classification in amyotrophic lateral sclerosis	3
Analysis of forced aligner performance on L2 English speech	3
LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild	3
Phonology-guided speech-to-speech translation for African languages	3
Editorial Board	3
Comparing Levenshtein distance and dynamic time warping in predicting listeners’ judgments of accent distance	3
LORT: Locally refined convolution and Taylor transformer for monaural speech enhancement	3
Self-supervised speech denoising using only noisy audio signals	3
Trading accuracy for fluency? An investigation of word retrieval difficulties in connected speech	3
Automatic speaker and age identification of children from raw speech using sincNet over ERB scale	3
Comparative analysis of various feature extraction techniques for classification of speech disfluencies	3
CLESSR-VC: Contrastive learning enhanced self-supervised representations for one-shot voice conversion	3

Towards automating the Frenchay dysarthria assessment: Can neural phoneme posteriorgrams inform the analysis of dysarthric speech?	3
Single-channel speech enhancement using improved progressive deep neural network and masking-based harmonic regeneration	3
Discriminative speaker embedding with serialized multi-layer multi-head attention	3
The Ohio Child Speech Corpus	3
Multiscale-multichannel feature extraction and classification through one-dimensional convolutional neural network for Speech emotion recognition	3
An analysis of prosodic boundaries across speaking styles in two varieties of German	3
Leveraging audible and inaudible signals for pronunciation training by sensing articulation through a smartphone	3
Analysis-by-synthesis based training target extraction of the DNN for noise masking	3
Speech emotion recognition approaches: A systematic review	3
Choosing only the best voice imitators: Top-K many-to-many voice conversion with StarGAN	3
Do all features matter? Layer-wise feature probing of self-supervised speech models for dysarthria severity classification	3
The cross-linguistics perception of liquids: Motivation for the superclass	2
Fusion-based speech emotion classification using two-stage feature selection	2
Optimization-based planning of speech articulation using general Tau Theory	2
Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated	2
TSIP-Net: No-reference speech intelligibility prediction in the presence of competing speech	2
Editorial Board	2
Non-native disadvantage in spoken word recognition is due to lexical knowledge and not type/level of noise	2
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement	2
Non-native (Czech and Russian L1) auditor assessments of some English suprasegmental features: Prominence and pitch accents	2
Cross-corpus speech emotion recognition using semi-supervised domain adaptation network	2
Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse filtering	2
A bimodal network based on Audio–Text-Interactional-Attention with ArcFace loss for speech emotion recognition	2
Deep temporal clustering features for speech emotion recognition	2
Language fusion via adapters for low-resource speech recognition	2
Phonological level wav2vec2-based Mispronunciation Detection and Diagnosis method	2
The effects of informational and energetic/modulation masking on the efficiency and ease of speech communication across the lifespan	2
Speech intelligibility prediction using generalized ESTOI with fine-tuned parameters	2
Seeing lexical tone: Head and face motion in production and perception of Cantonese lexical tones	2
Dysarthria severity classification using multi-head attention and multi-task learning	2
Individual differences in language acquisition: The impact of study abroad on native English speakers learning Spanish	2
The combined effects of bilingualism and musicianship on listeners’ perception of non-native lexical tones	2
A study on the layer-wise transferability of self-supervised learning features for children’s speech processing tasks	2
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions	2
Emotions recognition in audio signals using an extension of the latent block model	2
Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review	2
Predicting speech intelligibility in older adults for speech enhancement using the Gammachirp Envelope Similarity Index, GESI	2
Single-channel speech separation using soft-minimum permutation invariant training	2
Enhancement of formant regions in magnitude spectra to develop children’s KWS system in zero resource scenario	2
Phonetic and phonological sound changes in an agent-based model	2
Comparing neural network architectures for non-intrusive speech quality prediction	2
Speakers of different L1 dialects with acoustically proximal vowel systems present with similar nonnative speech perception abilities: Data from Greek listeners of Dutch	2
Diagnosis-aware multitask fine-tuning of Whisper for dysarthric speech recognition	2
DNN controlled adaptive front-end for replay attack detection systems	2
Editorial Board	2
Editorial Board	2