Speech Communication

Papers
(The median citation count of Speech Communication is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-03-01 to 2024-03-01.)
ArticleCitations
Speech emotion recognition using fusion of three multi-task learning-based classifiers: HSF-DNN, MS-CNN and LLD-RNN89
Learning deep multimodal affective features for spontaneous speech emotion recognition51
Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features47
Masked multi-head self-attention for causal speech enhancement43
CN-Celeb: Multi-genre speaker recognition38
Emotional voice conversion: Theory, databases and ESD35
Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion33
The Hearing-Aid Speech Perception Index (HASPI) Version 230
Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM24
Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition20
A review of multi-objective deep learning speech denoising methods20
Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson’s Disease and Cleft Lip and Palate19
Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework18
Speech enhancement using a DNN-augmented colored-noise Kalman filter17
An Iterative Graph Spectral Subtraction Method for Speech Enhancement17
Automatic accent identification as an analytical tool for accent robust automatic speech recognition17
CyTex: Transforming speech to textured images for speech emotion recognition17
Unsupervised Automatic Speech Recognition: A review17
A time–frequency smoothing neural network for speech enhancement17
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition17
Improving generative adversarial networks for speech enhancement through regularization of latent representations17
B&Anet: Combining bidirectional LSTM and self-attention for end-to-end learning of task-oriented dialogue system16
Speech pause distribution as an early marker for Alzheimer’s disease16
Text-conditioned Transformer for automatic pronunciation error detection15
DeepConversion: Voice conversion with limited parallel training data14
A supervised non-negative matrix factorization model for speech emotion recognition14
Computer-assisted pronunciation training—Speech synthesis is almost all you need14
Amplitude and Frequency Modulation-based features for detection of replay Spoof Speech14
Analysis and classification of phonation types in speech and singing voice13
Automatic quality control and enhancement for voice-based remote Parkinson’s disease detection13
Perceptual realization of Greek consonants by Russian monolingual speakers13
Automatic speaker profiling from short duration speech data13
Automatic classification of infant vocalization sequences with convolutional neural networks12
PACDNN: A phase-aware composite deep neural network for speech enhancement12
Improving phoneme recognition of throat microphone speech recordings using transfer learning12
Phonetic accommodation to natural and synthetic voices: Behavior of groups and individuals in speech shadowing12
Analytic phase features for dysarthric speech detection and intelligibility assessment12
A study on data augmentation in voice anti-spoofing12
Modulation spectral features for speech emotion recognition using deep neural networks12
Speech signal processing on graphs: The graph frequency analysis and an improved graph Wiener filtering method11
GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition11
Model architectures to extrapolate emotional expressions in DNN-based text-to-speech11
A formant modification method for improved ASR of children’s speech11
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt11
The interplay of prosodic cues in the L2: How intonation, rhythm, and speech rate in speech by Spanish learners of Dutch contribute to L1 Dutch perceptions of accentedness and comprehensibility10
Uneven success: automatic speech recognition and ethnicity-related dialects10
Seeing lexical tone: Head and face motion in production and perception of Cantonese lexical tones9
A two-stage complex network using cycle-consistent generative adversarial networks for speech enhancement9
A method for improving bot effectiveness by recognising implicit customer intent in contact centre conversations9
An investigation of domain adaptation in speaker embedding space for speaker recognition9
Discriminative neural network pruning in a multiclass environment: A case study in spoken emotion recognition8
NHSS: A speech and singing parallel database8
Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers8
Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems8
Affective synthesis and animation of arm gestures from speech prosody8
Non-intrusive quality assessment of noise-suppressed speech using unsupervised deep features8
Accuracy, recording interference, and articulatory quality of headsets for ultrasound recordings8
Cross-modal information fusion for voice spoofing detection7
Acoustic and temporal representations in convolutional neural network models of prosodic events7
Multistage approach for steerable differential beamforming with rectangular arrays7
Sinusoidal model-based hypernasality detection in cleft palate speech using CVCV sequence7
RPCA-based real-time speech and music separation method7
Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: cumulating speech resources in a pluricentric language7
A cross-linguistic analysis of the temporal dynamics of turn-taking cues using machine learning as a descriptive tool7
Computer-assisted assessment of phonetic fluency in a second language: a longitudinal study of Japanese learners of French7
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler6
Analysis of trade-offs between magnitude and phase estimation in loss functions for speech denoising and dereverberation6
Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition6
Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors6
Foreign accent strength and intelligibility at the segmental level6
Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation6
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces6
Cosine metric learning based speaker verification6
Analysis of glottal inverse filtering in the presence of source-filter interaction6
Multilingual speech recognition for GlobalPhone languages6
An automated integrated speech and face imageanalysis system for the identification of human emotions6
Learning transfer from singing to speech: Insights from vowel analyses in aging amateur singers and non-singers6
Discriminative speaker embedding with serialized multi-layer multi-head attention6
Bangladeshi Bangla speech corpus for automatic speech recognition research6
Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations6
Improving speaker de-identification with functional data analysis of f0 trajectories6
A bimodal network based on Audio–Text-Interactional-Attention with ArcFace loss for speech emotion recognition6
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech6
Comparing the nativeness vs. intelligibility approach in prosody instruction for developing speaking skills by interpreter trainees: An experimental study6
A unified system for multilingual speech recognition and language identification6
Automatic intelligibility assessment of dysarthric speech using glottal parameters6
Acoustic differences in emotional speech of people with dysarthria6
Adaptive and hybrid Kronecker product beamforming for far-field speech signals5
End-to-end acoustic modelling for phone recognition of young readers5
Phonetic imitation of multidimensional acoustic variation of the nasal split short-a system5
The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners5
An empirical study of the effect of acoustic-prosodic entrainment on the perceived trustworthiness of conversational avatars5
The effect of sampling variability on systems and individual speakers in likelihood ratio-based forensic voice comparison5
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations5
Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers5
Modeling concurrent vowel identification for shorter durations5
Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift5
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis5
Consonant gemination in Italian: The nasal and liquid case4
Who converges? Variation reveals individual speaker adaptability4
Integrating lexical and prosodic features for automatic paragraph segmentation4
Consonant gemination in Italian: The affricate and fricative case4
Seamless equal accuracy ratio for inclusive CTC speech recognition4
Vowels and tones as acoustic cues in Chinese subregional dialect identification4
Automatic speaker verification from affective speech using Gaussian mixture model based estimation of neutral speech characteristics4
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis4
Factorized WaveNet for voice conversion with limited data4
Multilingual and multimode phone recognition system for Indian languages4
Differences between listeners with early and late immersion age in spatial release from masking in various acoustic environments4
Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function4
Fusing features of speech for depression classification based on higher-order spectral analysis4
Dysarthria severity classification using multi-head attention and multi-task learning4
Enhancement of cleft palate speech using temporal and spectral processing4
HiLAM-state discriminative multi-task deep neural network in dynamic time warping framework for text-dependent speaker verification4
Wh-question or wh-declarative? Prosody makes the difference4
Glottal flow characteristics in vowels produced by speakers with heart failure4
Native language identification for Indian-speakers by an ensemble of phoneme-specific, and text-independent convolutions4
Talker adjustment to perceived communication errors4
A study on the perception of prosodic cues to focus by Egyptian listeners: Some make use of them, but most of them don't4
Investigating a neural all pass warp in modern TTS applications4
An improved CycleGAN-based emotional voice conversion model by augmenting temporal dependency with a transformer3
The Relationship Between Turn-taking, Vocal Pitch Synchrony, and Rapport in Creative Problem-Solving Communication3
Curriculum Learning based approaches for robust end-to-end far-field speech recognition3
Phonetic correlates of laryngeal and place contrasts of Burushaski3
Exploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic features3
Dialogic ItAlian: the creation of a corpus of Italian spontaneous speech3
Learning and controlling the source-filter representation of speech with a variational autoencoder3
The effect of speech and noise levels on the quality perceived by cochlear implant and normal hearing listeners3
Speakers of different L1 dialects with acoustically proximal vowel systems present with similar nonnative speech perception abilities: Data from Greek listeners of Dutch3
Laughter entrainment in dyadic interactions: Temporal distribution and form3
The N400 reveals implicit accent-induced prejudice3
Uncertainty assessment for detection of spoofing attacks to speaker verification systems using a Bayesian approach3
On quantifying the quality of acoustic models in hybrid DNN-HMM ASR3
Development and structure of the VariaNTS corpus: A spoken Dutch corpus containing talker and linguistic variability3
Computational modelling of segmental and prosodic levels of analysis for capturing variation across Arabic dialects3
Recognition of vocoded speech in English by Mandarin-speaking English-learners3
Optimum step-size control for a variable step-size stereo acoustic echo canceller in the frequency domain3
Listener's ratings and acoustic analyses of voice qualities associated with English and Korean sarcastic utterances3
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement3
Depression assessment in people with Parkinson’s disease: The combination of acoustic features and natural language processing3
Single-channel speech enhancement using improved progressive deep neural network and masking-based harmonic regeneration3
Speech rhythm convergence in a dyadic reading task3
POLEMAD–A database for the multimodal analysis of Polish pronunciation3
Self-supervised speech denoising using only noisy audio signals3
Exploring the effects of restraining the use of gestures on narrative speech3
Measuring the intelligibility of dysarthric speech through automatic speech recognition in a pluricentric language3
Multimodal perception of prominence in spontaneous speech: A methodological proposal using mixed models and AIC3
Single-channel speech enhancement with correlated spectral components: Limits-potential3
Non-native disadvantage in spoken word recognition is due to lexical knowledge and not type/level of noise3
CASE-Net: Integrating local and non-local attention operations for speech enhancement3
Understanding acceptability of disordered speech through Audience Response Systems-based evaluation3
Speech intelligibility deterioration for normal hearing and hearing impaired patients with different types of tinnitus3
Differential constant-beamwidth beamforming with cube arrays2
Articulation rates’ inter-correlations and discriminating powers in an English speech corpus2
Musical noise suppression using a low-rank and sparse matrix decomposition approach2
On the deficiency of intelligibility metrics as proxies for subjective intelligibility2
wUnet: A new network used for ultrasonic tongue contour extraction2
Progress of machine learning based automatic phoneme recognition and its prospect2
A two-level Item Response Theory model to evaluate speech synthesis and recognition2
Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated2
Neural speech-rate conversion with multispeaker WaveNet vocoder2
Arm motion symmetry in conversation2
Incorporating group update for speech enhancement based on convolutional gated recurrent network2
Perceptual asymmetry between pitch peaks and valleys2
A study of continuous space word and sentence representations applied to ASR error detection2
Prosody and fluency of Finland Swedish as a second language: Investigating global parameters for automated speaking assessment2
On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement2
Single and multiple frame coding of LSF parameters using deep neural network and pyramid vector quantizer2
A comparative study of fundamental frequency stability between speech and singing2
Keyword spotting in continuous speech using convolutional neural network2
Blind Speech Separation and Dereverberation using neural beamforming2
Response type selection for chat-like spoken dialog systems based on LSTM and multi-task learning2
The Fharvard corpus: A phonemically-balanced French sentence resource for audiology and intelligibility research2
Acoustic features correlated to perceived urgency in evacuation announcements2
Deep ad-hoc beamforming based on speaker extraction for target-dependent speech separation2
Perceptual effects of interpolated Austrian and German standard varieties2
Psychoacoustic features explain creakiness classifications made by naive and non-naive listeners2
Effects of hearing loss and audio-visual cues on children's speech processing speed2
Oral configurations during vowel nasalization in English2
The effect of fluency strategy training on interpreter trainees’ speech fluency: Does content familiarity matter?2
Facemask occlusion's impact on L2 listening comprehension2
A study of correlation between physiological process of articulation and emotions on Mandarin Chinese2
Automatic audiovisual synchronisation for ultrasound tongue imaging2
Low-resource automatic speech recognition and error analyses of oral cancer speech2
Duration of the rhotic approximant /ɹ/ in spastic dysarthria of different severity levels2
SDTF-Net: Static and dynamic time–frequency network for Speech Emotion Recognition2
0.023782968521118