Speech Communication

Papers
(The median citation count of Speech Communication is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
CN-Celeb: Multi-genre speaker recognition61
Learning deep multimodal affective features for spontaneous speech emotion recognition57
Emotional voice conversion: Theory, databases and ESD57
Masked multi-head self-attention for causal speech enhancement54
Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion46
The Hearing-Aid Speech Perception Index (HASPI) Version 236
Unsupervised Automatic Speech Recognition: A review33
Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM31
Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition30
Modulation spectral features for speech emotion recognition using deep neural networks27
Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework25
GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition24
CyTex: Transforming speech to textured images for speech emotion recognition22
Uneven success: automatic speech recognition and ethnicity-related dialects20
A formant modification method for improved ASR of children’s speech19
A study on data augmentation in voice anti-spoofing19
Computer-assisted pronunciation training—Speech synthesis is almost all you need19
A time–frequency smoothing neural network for speech enhancement18
Speech enhancement using a DNN-augmented colored-noise Kalman filter18
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition18
B&Anet: Combining bidirectional LSTM and self-attention for end-to-end learning of task-oriented dialogue system17
Speech pause distribution as an early marker for Alzheimer’s disease17
Phonetic accommodation to natural and synthetic voices: Behavior of groups and individuals in speech shadowing16
Text-conditioned Transformer for automatic pronunciation error detection16
A supervised non-negative matrix factorization model for speech emotion recognition16
Amplitude and Frequency Modulation-based features for detection of replay Spoof Speech16
A method for improving bot effectiveness by recognising implicit customer intent in contact centre conversations16
PACDNN: A phase-aware composite deep neural network for speech enhancement16
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt15
Automatic quality control and enhancement for voice-based remote Parkinson’s disease detection14
Perceptual realization of Greek consonants by Russian monolingual speakers14
Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems13
Speech signal processing on graphs: The graph frequency analysis and an improved graph Wiener filtering method13
Improving phoneme recognition of throat microphone speech recordings using transfer learning12
Cross-modal information fusion for voice spoofing detection12
The interplay of prosodic cues in the L2: How intonation, rhythm, and speech rate in speech by Spanish learners of Dutch contribute to L1 Dutch perceptions of accentedness and comprehensibility12
Model architectures to extrapolate emotional expressions in DNN-based text-to-speech12
A two-stage complex network using cycle-consistent generative adversarial networks for speech enhancement11
Improving speaker de-identification with functional data analysis of f0 trajectories11
NHSS: A speech and singing parallel database11
Seeing lexical tone: Head and face motion in production and perception of Cantonese lexical tones11
A cross-linguistic analysis of the temporal dynamics of turn-taking cues using machine learning as a descriptive tool11
CASE-Net: Integrating local and non-local attention operations for speech enhancement10
Sinusoidal model-based hypernasality detection in cleft palate speech using CVCV sequence10
An investigation of domain adaptation in speaker embedding space for speaker recognition10
Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers10
Bangladeshi Bangla speech corpus for automatic speech recognition research10
Non-intrusive quality assessment of noise-suppressed speech using unsupervised deep features9
Multilingual speech recognition for GlobalPhone languages9
Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: cumulating speech resources in a pluricentric language9
Analysis of trade-offs between magnitude and phase estimation in loss functions for speech denoising and dereverberation9
A bimodal network based on Audio–Text-Interactional-Attention with ArcFace loss for speech emotion recognition9
A unified system for multilingual speech recognition and language identification9
Fusing features of speech for depression classification based on higher-order spectral analysis9
Dysarthria severity classification using multi-head attention and multi-task learning9
Phonetic imitation of multidimensional acoustic variation of the nasal split short-a system8
Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations8
Multistage approach for steerable differential beamforming with rectangular arrays8
The effect of sampling variability on systems and individual speakers in likelihood ratio-based forensic voice comparison8
Acoustic and temporal representations in convolutional neural network models of prosodic events8
Computer-assisted assessment of phonetic fluency in a second language: a longitudinal study of Japanese learners of French8
Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors8
Multiscale-multichannel feature extraction and classification through one-dimensional convolutional neural network for Speech emotion recognition8
Acoustic differences in emotional speech of people with dysarthria8
RPCA-based real-time speech and music separation method8
Glottal flow characteristics in vowels produced by speakers with heart failure7
Differential constant-beamwidth beamforming with cube arrays7
Compact deep neural networks for real-time speech enhancement on resource-limited devices7
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis7
Learning and controlling the source-filter representation of speech with a variational autoencoder7
An improved CycleGAN-based emotional voice conversion model by augmenting temporal dependency with a transformer7
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces7
The N400 reveals implicit accent-induced prejudice7
Discriminative speaker embedding with serialized multi-layer multi-head attention7
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations7
End-to-end acoustic modelling for phone recognition of young readers6
An automated integrated speech and face imageanalysis system for the identification of human emotions6
Foreign accent strength and intelligibility at the segmental level6
Speech emotion recognition approaches: A systematic review6
Self-supervised speech denoising using only noisy audio signals6
Keyword spotting in continuous speech using convolutional neural network6
Comparing the nativeness vs. intelligibility approach in prosody instruction for developing speaking skills by interpreter trainees: An experimental study6
Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift6
The Relationship Between Turn-taking, Vocal Pitch Synchrony, and Rapport in Creative Problem-Solving Communication6
On the deficiency of intelligibility metrics as proxies for subjective intelligibility6
Fusion-based speech emotion classification using two-stage feature selection6
Optimum step-size control for a variable step-size stereo acoustic echo canceller in the frequency domain6
Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition6
Speakers of different L1 dialects with acoustically proximal vowel systems present with similar nonnative speech perception abilities: Data from Greek listeners of Dutch6
An empirical study of the effect of acoustic-prosodic entrainment on the perceived trustworthiness of conversational avatars6
Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation6
Chirplet transform based time frequency analysis of speech signal for automated speech emotion recognition6
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler6
The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners5
Native language identification for Indian-speakers by an ensemble of phoneme-specific, and text-independent convolutions5
Automatic speaker verification from affective speech using Gaussian mixture model based estimation of neutral speech characteristics5
A study on the perception of prosodic cues to focus by Egyptian listeners: Some make use of them, but most of them don't5
POLEMAD–A database for the multimodal analysis of Polish pronunciation5
Who converges? Variation reveals individual speaker adaptability5
Dialogic ItAlian: the creation of a corpus of Italian spontaneous speech5
Differences between listeners with early and late immersion age in spatial release from masking in various acoustic environments5
Talker adjustment to perceived communication errors5
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement5
Laughter entrainment in dyadic interactions: Temporal distribution and form5
Exploring the effects of restraining the use of gestures on narrative speech5
Modeling concurrent vowel identification for shorter durations5
Automatic audiovisual synchronisation for ultrasound tongue imaging5
Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring5
Depression assessment in people with Parkinson’s disease: The combination of acoustic features and natural language processing5
Automatic classification of neurological voice disorders using wavelet scattering features5
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis5
Neural speech-rate conversion with multispeaker WaveNet vocoder4
Investigating a neural all pass warp in modern TTS applications4
A novel distortion-tolerant speech encryption scheme for secure voice communication4
Exploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic features4
Effects of hearing loss and audio-visual cues on children's speech processing speed4
Listener's ratings and acoustic analyses of voice qualities associated with English and Korean sarcastic utterances4
Understanding acceptability of disordered speech through Audience Response Systems-based evaluation4
Speech rhythm convergence in a dyadic reading task4
Consonant gemination in Italian: The nasal and liquid case4
Seamless equal accuracy ratio for inclusive CTC speech recognition4
Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function4
Comparative analysis of various feature extraction techniques for classification of speech disfluencies4
Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions4
Recognition of vocoded speech in English by Mandarin-speaking English-learners4
wUnet: A new network used for ultrasonic tongue contour extraction4
Consonant gemination in Italian: The affricate and fricative case4
Duration of the rhotic approximant /ɹ/ in spastic dysarthria of different severity levels3
Acoustic features correlated to perceived urgency in evacuation announcements3
Single-channel speech enhancement using improved progressive deep neural network and masking-based harmonic regeneration3
Modeling trajectories of human speech articulators using general Tau theory3
Perceptual clustering of high-pitched vowels in Chinese Yue Opera3
Development and structure of the VariaNTS corpus: A spoken Dutch corpus containing talker and linguistic variability3
Computational modelling of segmental and prosodic levels of analysis for capturing variation across Arabic dialects3
The effect of clear speech to foreign-sounding interlocutors on native listeners’ perception of intelligibility3
Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners3
A study of correlation between physiological process of articulation and emotions on Mandarin Chinese3
A comparative study of fundamental frequency stability between speech and singing3
The effect of speech and noise levels on the quality perceived by cochlear implant and normal hearing listeners3
Low-resource automatic speech recognition and error analyses of oral cancer speech3
Automatic speaker and age identification of children from raw speech using sincNet over ERB scale3
SDTF-Net: Static and dynamic time–frequency network for Speech Emotion Recognition3
Uncertainty assessment for detection of spoofing attacks to speaker verification systems using a Bayesian approach3
Factorized WaveNet for voice conversion with limited data3
Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion3
Curriculum Learning based approaches for robust end-to-end far-field speech recognition3
Subband fusion of complex spectrogram for fake speech detection3
The effect of fluency strategy training on interpreter trainees’ speech fluency: Does content familiarity matter?3
Facemask occlusion's impact on L2 listening comprehension3
Mandarin lexical tone duration: Impact of speech style, word length, syllable position and prosodic position3
On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement3
Detecting Wilson's disease from unstructured connected speech: An embedding-based approach augmented by attention and bi-directional dependency3
Deletion and insertion tampering detection for speech authentication based on fluctuating super vector of electrical network frequency3
Speech intelligibility deterioration for normal hearing and hearing impaired patients with different types of tinnitus3
One-shot emotional voice conversion based on feature separation3
Incorporating group update for speech enhancement based on convolutional gated recurrent network3
Deep ad-hoc beamforming based on speaker extraction for target-dependent speech separation3
Phonetic correlates of laryngeal and place contrasts of Burushaski3
Musical noise suppression using a low-rank and sparse matrix decomposition approach3
Multimodal perception of prominence in spontaneous speech: A methodological proposal using mixed models and AIC3
Simulating vocal learning of spoken language: Beyond imitation3
Measuring the intelligibility of dysarthric speech through automatic speech recognition in a pluricentric language3
Blind Speech Separation and Dereverberation using neural beamforming3
Non-native disadvantage in spoken word recognition is due to lexical knowledge and not type/level of noise3
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network2
Correction of whitespace and word segmentation in noisy Pashto text using CRF2
Self-supervised learning based domain regularization for mask-wearing speaker verification2
Perceptual asymmetry between pitch peaks and valleys2
Controllable speech synthesis by learning discrete phoneme-level prosodic representations2
The dependence of accommodation processes on conversational experience2
The influence of task engagement on phonetic convergence2
Review of analysis methods for speech applications2
Development of a speech emotion recognizer for large-scale child-centered audio recordings from a hospital environment2
Psychoacoustic features explain creakiness classifications made by naive and non-naive listeners2
Acoustic properties of non-native clear speech: Korean speakers of English2
Dual-model self-regularization and fusion for domain adaptation of robust speaker verification2
Artificial bandwidth extension usingH2
DNN controlled adaptive front-end for replay attack detection systems2
Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated2
Articulation rates’ inter-correlations and discriminating powers in an English speech corpus2
Frequent-words analysis for forensic speaker comparison2
Prosody and fluency of Finland Swedish as a second language: Investigating global parameters for automated speaking assessment2
Perceptual learning of phonetic convergence2
Effect of prior exposure on the perception of Japanese vowel length contrast in reverberation for nonnative listeners2
Multimodal attention for lip synthesis using conditional generative adversarial networks2
Oral configurations during vowel nasalization in English2
End-to-end integration of speech separation and voice activity detection for low-latency diarization of telephone conversations2
Application of virtual human sign language translation based on speech recognition2
A two-level Item Response Theory model to evaluate speech synthesis and recognition2
Pre-trained models for detection and severity level classification of dysarthria from speech2
The acquisition of L2 voiced stops by English learners of Spanish and Spanish learners of English2
Perceptual effects of interpolated Austrian and German standard varieties2
The Fharvard corpus: A phonemically-balanced French sentence resource for audiology and intelligibility research2
RETRACTED: Multi-channel adaptive loudness compensation algorithm based on noise tracking in digital hearing aids2
Arm motion symmetry in conversation2
Disordered speech recognition considering low resources and abnormal articulation2
The amalgamation of wavelet packet information gain entropy tuned source and system parameters for improved speech emotion recognition2
Use of affect context in dyadic interactions for continuous emotion recognition2
0.21428608894348