Speech Communication

Papers
(The median citation count of Speech Communication is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Progress of machine learning based automatic phoneme recognition and its prospect81
A comprehensive study on supervised single-channel noisy speech separation with multi-task learning68
Phase unwrapping based packet loss concealment using deep neural networks61
Psychoacoustic features explain creakiness classifications made by naive and non-naive listeners46
Facemask occlusion's impact on L2 listening comprehension45
Subband fusion of complex spectrogram for fake speech detection44
Editorial Board44
Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition40
Editorial Board34
Editorial Board27
Data augmentation for speech separation27
Articulation rates’ inter-correlations and discriminating powers in an English speech corpus26
Editorial Board23
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt23
A corpus of audio-visual recordings of linguistically balanced, Danish sentences for speech-in-noise experiments22
A novel distortion-tolerant speech encryption scheme for secure voice communication20
A robust temporal map of speech monitoring from planning to articulation20
An introduction to pluricentric languages in speech science and technology20
NHSS: A speech and singing parallel database20
Perceptual asymmetry between pitch peaks and valleys19
Editorial Board17
Vocal emotion perception in Mandarin-speaking older adults with hearing loss17
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces17
Editorial Board17
The Relationship Between Turn-taking, Vocal Pitch Synchrony, and Rapport in Creative Problem-Solving Communication15
Investigating a neural all pass warp in modern TTS applications14
Neural speech-rate conversion with multispeaker WaveNet vocoder14
The prosody of theme, rheme and focus in Egyptian Arabic: A quantitative investigation of tunes, configurations and speaker variability14
Speech intelligibility deterioration for normal hearing and hearing impaired patients with different types of tinnitus14
Editorial Board14
Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: cumulating speech resources in a pluricentric language13
The influence of task engagement on phonetic convergence13
Efficient acoustic feature transformation in mismatched environments using a Guided-GAN13
An automated integrated speech and face imageanalysis system for the identification of human emotions13
Two-stage UNet with channel and temporal-frequency attention for multi-channel speech enhancement13
Editorial Board13
HC-APNet: Harmonic Compensation Auditory Perception Network for low-complexity speech enhancement12
Unsupervised Automatic Speech Recognition: A review12
Learning and controlling the source-filter representation of speech with a variational autoencoder11
Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency11
Investigating prosodic entrainment from global conversations to local turns and tones in Mandarin conversations11
Frequent-words analysis for forensic speaker comparison11
Editorial Board10
Evaluating the effects of continuous pitch and speech tempo modifications on perceptual speaker verification performance by familiar and unfamiliar listeners10
The interplay of prosodic cues in the L2: How intonation, rhythm, and speech rate in speech by Spanish learners of Dutch contribute to L1 Dutch perceptions of accentedness and comprehensibility10
Blind Speech Separation and Dereverberation using neural beamforming10
Real-time intelligibility affects the realization of French word-final schwa10
A study of correlation between physiological process of articulation and emotions on Mandarin Chinese9
Multilingual speech recognition for GlobalPhone languages9
Vocal characteristics of accuracy in eyewitness testimony9
Effects of urgent speech and congruent/incongruent text on speech intelligibility for older adults in the presence of noise and reverberation9
A formant modification method for improved ASR of children’s speech9
Speechformer-CTC: Sequential modeling of depression detection with speech temporal classification8
Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers8
Differences between listeners with early and late immersion age in spatial release from masking in various acoustic environments8
Editorial Board8
Effects of voice onset time and place of articulation on perception of dichotic Turkish syllables8
The effect of fluency strategy training on interpreter trainees’ speech fluency: Does content familiarity matter?8
A new universal camouflage attack algorithm for intelligent speech system8
Prosody in narratives: An exploratory study with children with sex chromosomes trisomies8
Sequential perception of tone and focus in parallel–A computational simulation8
Multi-modal co-learning for silent speech recognition based on ultrasound tongue images8
The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners8
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations8
Perceptual effects of interpolated Austrian and German standard varieties7
Prosody and fluency of Finland Swedish as a second language: Investigating global parameters for automated speaking assessment7
Oral configurations during vowel nasalization in English7
GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition7
Tone-syllable synchrony in Mandarin: New evidence and implications7
Editorial Board7
Recognition of vocoded speech in English by Mandarin-speaking English-learners7
Coarse-to-fine speech separation method in the time-frequency domain7
Disordered speech recognition considering low resources and abnormal articulation7
Bangladeshi Bangla speech corpus for automatic speech recognition research7
Yanbian Korean speakers tend to merge /e/ and /ɛ/ when exposed to Seoul Korean7
Progressive channel fusion for more efficient TDNN on speaker verification7
One-shot emotional voice conversion based on feature separation6
Speech pause distribution as an early marker for Alzheimer’s disease6
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions6
Nasal coarticulation in Lombard speech6
Pathological voice classification using MEEL features and SVM-TabNet model6
Arabic Automatic Speech Recognition: Challenges and Progress6
Differential constant-beamwidth beamforming with cube arrays6
Modulation spectral features for speech emotion recognition using deep neural networks6
Deep ad-hoc beamforming based on speaker extraction for target-dependent speech separation6
Enhancing bone-conducted speech with spectrum similarity metric in adversarial learning6
Efficient time-domain speech separation using short encoded sequence network6
Controllable speech synthesis by learning discrete phoneme-level prosodic representations6
Cross-modal information fusion for voice spoofing detection6
The Second-Language Productivity of Two Mandarin Tone Sandhi Patterns6
Combined approach to dysarthric speaker verification using data augmentation and feature fusion6
Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation6
Role of language familiarity in understanding speech in noise under various acoustic environments6
Speakers’ vocal expression of sexual orientation depends on experimenter gender6
Editorial Board6
Editorial Board5
CSLNSpeech: Solving the extended speech separation problem with the help of Chinese sign language5
Comparing the nativeness vs. intelligibility approach in prosody instruction for developing speaking skills by interpreter trainees: An experimental study5
Editorial Board5
Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function5
The role of visual cues indicating onset times of target speech syllables in release from informational or energetic masking5
Editorial Board5
Listener's ratings and acoustic analyses of voice qualities associated with English and Korean sarcastic utterances5
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech5
Accurate synthesis of dysarthric Speech for ASR data augmentation5
An adaptive autoregressive pre-whitener for speech and acoustic signals based on parametric NMF5
Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems5
Recursive Feature Diversity Network for audio super-resolution5
Learning transfer from singing to speech: Insights from vowel analyses in aging amateur singers and non-singers5
Addressing the semi-open set dialect recognition problem under resource-efficient considerations5
Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition5
Editorial Board5
Consonant gemination in Italian: The affricate and fricative case5
Coalescence of fractionally derived and statistically tuned correlated vocal tract and excitation source gaussian mixture super-vectors for improved modelling of speech emotion4
The effect of clear speech to foreign-sounding interlocutors on native listeners’ perception of intelligibility4
On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement4
Zero-shot voice conversion based on feature disentanglement4
Editorial Board4
Effects of hearing loss and audio-visual cues on children's speech processing speed4
Multimodal Arabic emotion recognition using deep learning4
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler4
Enhanced cross-modal parallel training for improving end-to-end accented speech recognition4
Artificial bandwidth extension usingH4
Editorial Board4
Dialect contact in real interactions and in an agent-based model4
Editorial Board4
The Role of Auditory and Visual Cues in the Perception of Mandarin Emotional Speech in Male Drug Addicts4
The role of prosody and hand gestures in the perception of boundaries in speech✰4
PLDE: A lightweight pooling layer for spoken language recognition4
CFAD: A Chinese dataset for fake audio detection4
Development of a hybrid word recognition system and dataset for the Azerbaijani Sign Language dactyl alphabet4
Spatio-temporal masked autoencoder-based phonetic segments classification from ultrasound4
Effect of prior exposure on the perception of Japanese vowel length contrast in reverberation for nonnative listeners4
Chinese speech intelligibility and speech intelligibility index for the elderly4
One-class network leveraging spectro-temporal features for generalized synthetic speech detection3
Some properties of mental speech preparation as revealed by self-monitoring3
Factorized WaveNet for voice conversion with limited data3
Selective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting3
Multiscale-multichannel feature extraction and classification through one-dimensional convolutional neural network for Speech emotion recognition3
Discriminative speaker embedding with serialized multi-layer multi-head attention3
Comparative analysis of various feature extraction techniques for classification of speech disfluencies3
Uncertainty assessment for detection of spoofing attacks to speaker verification systems using a Bayesian approach3
Comparing Levenshtein distance and dynamic time warping in predicting listeners’ judgments of accent distance3
Self-supervised speech denoising using only noisy audio signals3
The dependence of accommodation processes on conversational experience3
An analysis of prosodic boundaries across speaking styles in two varieties of German3
Editorial Board3
Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion3
Foreign accent strength and intelligibility at the segmental level3
The Ohio Child Speech Corpus3
Analysis of forced aligner performance on L2 English speech3
Editorial Board3
Single-channel speech enhancement using improved progressive deep neural network and masking-based harmonic regeneration3
APIN: Amplitude- and phase-aware interaction network for speech emotion recognition3
The role of probability and duration in perception of speech sounds3
Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations3
The effect of sampling variability on systems and individual speakers in likelihood ratio-based forensic voice comparison3
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis3
Leveraging audible and inaudible signals for pronunciation training by sensing articulation through a smartphone3
Speech emotion recognition approaches: A systematic review3
Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift3
Automatic speaker and age identification of children from raw speech using sincNet over ERB scale3
Model predictive PESQ-ANFIS/FUZZY C-MEANS for image-based speech signal evaluation3
Seeing lexical tone: Head and face motion in production and perception of Cantonese lexical tones2
Enhancement of formant regions in magnitude spectra to develop children’s KWS system in zero resource scenario2
Automatic audiovisual synchronisation for ultrasound tongue imaging2
A novel multi-speakers Urdu singing voices synthesizer using Wasserstein Generative Adversarial Network2
Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse filtering2
Automatic speaker verification from affective speech using Gaussian mixture model based estimation of neutral speech characteristics2
Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated2
Speakers of different L1 dialects with acoustically proximal vowel systems present with similar nonnative speech perception abilities: Data from Greek listeners of Dutch2
Non-native disadvantage in spoken word recognition is due to lexical knowledge and not type/level of noise2
The effect of speech and noise levels on the quality perceived by cochlear implant and normal hearing listeners2
A bimodal network based on Audio–Text-Interactional-Attention with ArcFace loss for speech emotion recognition2
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement2
Phonetic and phonological sound changes in an agent-based model2
CLESSR-VC: Contrastive learning enhanced self-supervised representations for one-shot voice conversion2
The combined effects of bilingualism and musicianship on listeners’ perception of non-native lexical tones2
Choosing only the best voice imitators: Top-K many-to-many voice conversion with StarGAN2
Analysis-by-synthesis based training target extraction of the DNN for noise masking2
Speech intelligibility prediction using generalized ESTOI with fine-tuned parameters2
Comparing neural network architectures for non-intrusive speech quality prediction2
Editorial Board2
Optimization-based planning of speech articulation using general Tau Theory2
LPIPS-AttnWav2Lip: Generic audio-driven lip synchronization for talking head generation in the wild2
Single-channel speech separation using soft-minimum permutation invariant training2
Dysarthria severity classification using multi-head attention and multi-task learning2
The effects of informational and energetic/modulation masking on the efficiency and ease of speech communication across the lifespan2
Fusion-based speech emotion classification using two-stage feature selection1
Multistage approach for steerable differential beamforming with rectangular arrays1
Incorporating group update for speech enhancement based on convolutional gated recurrent network1
Acoustic properties of non-native clear speech: Korean speakers of English1
Fractional feature-based speech enhancement with deep neural network1
Advancing speaker embedding learning: Wespeaker toolkit for research and production1
SDTF-Net: Static and dynamic time–frequency network for Speech Emotion Recognition1
Editorial Board1
The impact of first and second formant variations on vowel identification among elderly Japanese listeners1
Editorial Board1
Editorial Board1
Computer-assisted pronunciation training—Speech synthesis is almost all you need1
Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review1
Automatic assessment of oral readings of young pupils1
An adaptive transmission line cochlear model based front-end for replay attack detection1
Who converges? Variation reveals individual speaker adaptability1
RETRACTED: Multi-channel adaptive loudness compensation algorithm based on noise tracking in digital hearing aids1
Editorial Board1
Strengthening speech content authentication against tampering1
AMGCN: An adaptive multi-graph convolutional network for speech emotion recognition1
Multimodal attention for lip synthesis using conditional generative adversarial networks1
Editorial Board1
Transfer knowledge for punctuation prediction via adversarial training1
Measuring the intelligibility of dysarthric speech through automatic speech recognition in a pluricentric language1
The impact of non-native English speakers’ phonological and prosodic features on automatic speech recognition accuracy1
Improved AED with multi-stage feature extraction and fusion based on RFAConv and PSA1
Editorial Board1
LLM-based speaker diarization correction: A generalizable approach1
Glottal flow characteristics in vowels produced by speakers with heart failure1
Emotions recognition in audio signals using an extension of the latent block model1
Cross-corpus speech emotion recognition using semi-supervised domain adaptation network1
Consonant gemination in Italian: The nasal and liquid case1
End-to-end integration of speech separation and voice activity detection for low-latency diarization of telephone conversations1
Editorial Board1
An improved CycleGAN-based emotional voice conversion model by augmenting temporal dependency with a transformer1
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions1
On intrusive speech quality measures and a global SNR based metric1
Review of analysis methods for speech applications1
Comparison and analysis of new curriculum criteria for end-to-end ASR1
Deep temporal clustering features for speech emotion recognition1
Editorial Board1
Language fusion via adapters for low-resource speech recognition1
Emotional voice conversion: Theory, databases and ESD1
DNN controlled adaptive front-end for replay attack detection systems1
The cross-linguistics perception of liquids: Motivation for the superclass1
Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions1
0.03837513923645