Computer Speech and Language

Papers
(The median citation count of Computer Speech and Language is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)
ArticleCitations
Corpus and unsupervised benchmark: Towards Tagalog grammatical error correction84
A language-agnostic model of child language acquisition84
Towards privacy-preserving conversation analysis in everyday life: Exploring the privacy-utility trade-off78
Optimization of modular multi-speaker distant conversational speech recognition77
Stochastic Data-to-Text Generation Using Syntactic Dependency Information68
Automatic detection of behavioural codes in team interactions68
KddRES: A Multi-level Knowledge-driven Dialogue Dataset for Restaurant Towards Customized Dialogue System43
Editorial Board40
Seq2Seq dynamic planning network for progressive text generation39
Speech enhancement approach for body-conducted unvoiced speech based on Taylor–Boltzmann machines trained DNN38
Room impulse response reshaping-based expectation–maximization in an underdetermined reverberant environment36
Monotonic Gaussian regularization of attention for robust automatic speech recognition32
Unsupervised question-retrieval approach based on topic keywords filtering and multi-task learning32
Misogynistic attitude detection in YouTube comments and replies: A high-quality dataset and algorithmic models31
Identifying offensive memes in low-resource languages: A multi-modal multi-task approach using valence and arousal30
Editorial Board29
A method of phonemic annotation for Chinese dialects based on a deep learning model with adaptive temporal attention and a feature disentangling structure29
Multi-branch feature aggregation based on multiple weighting for speaker verification29
PaSCoNT - Parallel Speech Corpus of Northern-central Thai for automatic speech recognition24
Complementary regional energy features for spoofed speech detection24
Contextual emotion detection using ensemble deep learning23
Vishing: Detecting social engineering in spoken communication — A first survey & urgent roadmap to address an emerging societal challenge23
Editorial Board21
Exploring accidental triggers of smart speakers20
A hybrid approach to Natural Language Inference for the SICK dataset20
Augmentative and alternative speech communication (AASC) aid for people with dysarthria20
Maximal activation weighted memory for aspect based sentiment analysis20
Predicting accentedness and comprehensibility through ASR scores and acoustic features19
Editorial Board19
Improving self-supervised learning model for audio spoofing detection with layer-conditioned embedding fusion19
Enhancing analysis of diadochokinetic speech using deep neural networks19
A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages19
Combining replay and LoRA for continual learning in natural language understanding18
Combined generative and predictive modeling for speech super-resolution16
Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source16
Representation learning strategies to model pathological speech: Effect of multiple spectral resolutions16
A novel channel estimate for noise robust speech recognition15
The use of Active Learning systems for stimulus selection and response modelling in perception experiments15
A lightweight approach based on prompt for few-shot relation extraction15
A mobile application using automatic speech analysis for classifying Alzheimer's disease and mild cognitive impairment15
Loanword identification based on web resources: A case study on wikipedia15
English–Assamese neural machine translation using prior alignment and pre-trained language model15
Under the hood: Phonemic Restoration in transformer-based automatic speech recognition15
Privacy-preserving feature extractor using adversarial pruning for TBI assessment from speech14
Unsupervised induction of inflectional families14
Meta adversarial learning improves low-resource speech recognition14
Editorial Board14
Editorial Board13
Evidence and Axial Attention Guided Document-level Relation Extraction13
Conversations in the wild: Data collection, automatic generation and evaluation13
A bias evaluation solution for multiple sensitive attribute speech recognition13
Compress, Align, and Transfer: A new method for transferring pre-trained language models knowledge to CTC-based speech recognition13
Editorial Board13
Effects of cross-cultural language differences on social cognition during human-agent interaction in cooperative game environments13
A multi-label emoji classification method using balanced pointwise mutual information-based feature selection12
SecNLP: An NLP classification model watermarking framework based on multi-task learning12
A novel graph kernel algorithm for improving the effect of text classification12
Named entity recognition using neural language model and CRF for Hindi language12
Zero-Shot Strike: Testing the generalisation capabilities of out-of-the-box LLM models for depression detection12
Raw acoustic-articulatory multimodal dysarthric speech recognition12
Performance assessment of voice conversion models using speech production-based parameters12
MPSA-DenseNet: A novel deep learning model for English accent classification12
Modality fusion using auxiliary tasks for dementia detection12
A flexible BERT model enabling width- and depth-dynamic inference11
Towards decoupling frontend enhancement and backend recognition in monaural robust ASR11
Editorial Board11
Conversation Initiation of Mothers, Fathers, and Toddlers in their Natural Home Environment11
Offensive language detection in Tamil YouTube comments by adapters and cross-domain knowledge transfer11
Tailored design of Audio–Visual Speech Recognition models using Branchformers11
Effective infant cry signal analysis and reasoning using IARO based leaky Bi-LSTM model11
Towards inclusive automatic speech recognition11
A tag-based methodology for the detection of user repair strategies in task-oriented conversational agents11
Towards detecting the level of trust in the skills of a virtual assistant from the user’s speech11
Addressing subjectivity in paralinguistic data labeling for improved classification performance: A case study with Spanish-speaking Mexican children using data balancing and semi-supervised learning11
Improved relation extraction through key phrase identification using community detection on dependency trees11
Neural multi-task learning for end-to-end Arabic aspect-based sentiment analysis11
Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement11
Prototypical networks relation classification model based on entity convolution11
A computational analysis of transcribed speech of people living with dementia: The Anchise 2022 Corpus11
FinD: Fine-grained discrepancy-based fake news detection enhanced by event abstract generation11
Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge10
Multiple time-instances features based approach for reference-free speech quality measurement10
Improving BERT with local context comprehension for multi-turn response selection in retrieval-based dialogue systems10
Multi-task unified model for Chinese aspect-based sentiment analysis10
Continual End-to-End Speech-to-Text translation using augmented bi-sampler10
A closer look at reinforcement learning-based automatic speech recognition10
GenCeption: Evaluate vision LLMs with unlabeled unimodal data10
Towards lifelong human assisted speaker diarization10
A neural network approach for speech enhancement and noise-robust bandwidth extension9
Adaptive feature extraction for entity relation extraction9
DiffATSM: High quality adaptive time-scale modification using diffusion-based post-processing9
Corrigendum to <UGR-MINDVOICE: A multimodal EEG-audio dataset for overt and covert Iberian Spanish speech production>9
End-to-End Speech-to-Text Translation: A Survey9
Deep feature representations and fusion strategies for speech emotion recognition from acoustic and linguistic modalities: A systematic review9
Hate speech and offensive language detection in Dravidian languages using deep ensemble framework9
Refining the evaluation of speech synthesis: A summary of the Blizzard Challenge 20239
A physical exertion inspired multi-task learning framework for detecting out-of-breath speech9
Test-retest reliability of acoustic and linguistic measures of speech tasks9
Measuring and implementing lexical alignment: A systematic literature review8
A novel approach to cross-linguistic transfer learning for hope speech detection in Tamil and Malayalam8
Universal constituency treebanking and parsing: A pilot study8
Minerva 2 for speech and language tasks8
Morse wavelet transform-based features for voice liveness detection8
An automated quality evaluation framework of psychotherapy conversations with local quality estimates8
Classification of stuttering – The ComParE challenge and beyond8
Two in One: A multi-task framework for politeness turn identification and phrase extraction in goal-oriented conversations8
Speech self-supervised representations benchmarking: A case for larger probing heads7
SEBGM: Sentence Embedding Based on Generation Model with multi-task learning7
Three-stage modular speaker diarization collaborating with front-end techniques in the CHiME-8 NOTSOFAR-1 challenge7
Channel and channel subband selection for speaker diarization7
A cross-attention augmented model for event-triggered context-aware story generation7
Significance of chirp MFCC as a feature in speech and audio applications7
Improving named entity correctness of abstractive summarization by generative negative sampling7
Goal-oriented conditional variational autoencoders for proactive and knowledge-aware conversational recommender system7
A knowledge-augmented heterogeneous graph convolutional network for aspect-level multimodal sentiment analysis6
Using Knowledge Induction strategies: LLMs can do better in knowledge-driven dialogue tasks6
Spoofing countermeasure for fake speech detection using brute force features6
Discovering phonetic inventories with crosslingual automatic speech recognition6
GTSO: Gradient tangent search optimization enabled voice transformer with speech intelligibility for aphasia6
Preserving speaker information in direct Speech-to-Speech Translation with non-autoregressive generation and pre-training6
Talking-heads attention-based knowledge representation for link prediction6
Multilingual non-intrusive binaural intelligibility prediction based on phone classification6
Automatic screening of mild cognitive impairment and Alzheimer’s disease by means of posterior-thresholding hesitation representation6
Building a text retrieval system for the Sanskrit language: Exploring indexing, stemming, and searching issues6
Direct enhancement of pre-trained speech embeddings for speech processing in noisy conditions6
Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement5
Towards better Chinese-centric neural machine translation for low-resource languages5
A new speech corpus of super-elderly Japanese for acoustic modeling5
Two evaluations on Ontology-style relation annotations5
An optimal approach for text feature selection5
Cross-lingual multi-speaker speech synthesis with limited bilingual training data5
Speaking to remember: Model-based adaptive vocabulary learning using automatic speech recognition5
UniKDD: A Unified Generative model for Knowledge-driven Dialogue5
Accurate speaker counting, diarization and separation for advanced recognition of multichannel multispeaker conversations5
Real-time audio enhancement framework for vocal performances based on LSTM and time-frequency masking algorithm5
Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data5
Optimizing pipeline task-oriented dialogue systems using post-processing networks5
Rep-MCA-former: An efficient multi-scale convolution attention encoder for text-independent speaker verification5
LRetUNet: A U-Net-based retentive network for single-channel speech enhancement5
A novel word sense disambiguation approach using WordNet knowledge graph5
C-KGE: Curriculum learning-based Knowledge Graph Embedding5
FE-CFNER: Feature Enhancement-based approach for Chinese Few-shot Named Entity Recognition5
A potential relation trigger method for entity-relation quintuple extraction in text with excessive entities5
Editorial Board5
Uncertainty-aware non-autoregressive neural machine translation5
On significance of constant-Q transform for pop noise detection5
M-Sim: Multi-level Semantic Inference Model for Chinese short answer scoring in low-resource scenarios4
Editorial Board4
An experimental review of speaker diarization methods with application to two-speaker conversational telephone speech recordings4
Multi-task learning neural framework for categorizing sexism4
Spectral–temporal saliency masks and modulation tensorgrams for generalizable COVID-19 detection4
Speaker anonymization by modifying fundamental frequency and x-vector singular value4
EMGVox-GAN: A transformative approach to EMG-based speech synthesis, enhancing clarity, and efficiency via extensive dataset utilization4
COMPASS: A creative support system that alerts novelists to the unnoticed missing contents4
A semi-supervised high-quality pseudo labels algorithm based on multi-constraint optimization for speech deception detection4
Editorial Board4
Enhancing Turkish Coreference Resolution: Insights from deep learning, dropped pronouns, and multilingual transfer learning4
Editorial Board4
Simultaneous speech and background sound recognition in diverse acoustic environments with branched neural networks4
Time–Frequency Causal Hidden Markov Model for speech-based Alzheimer’s disease longitudinal detection4
Predicting children’s perceived reading proficiency with prosody modeling4
How to make embeddings suitable for PLDA4
Neural referential form selection: Generalisability and interpretability4
What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures4
Character expression for spoken dialogue systems with semi-supervised learning using Variational Auto-Encoder4
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition4
Knowledge-grounded dialogue modelling with dialogue-state tracking, domain tracking, and entity extraction4
Demystifying large language models in second language development research4
A code-mixed task-oriented dialog dataset for medical domain4
TadaStride: Using time adaptive strides in audio data for effective downsampling4
Copiously Quote Classics: Improving Chinese Poetry Generation with historical allusion knowledge4
An analysis of machine learning models for sentiment analysis of Tamil code-mixed data4
LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech3
A generalized decoding method for neural text generation3
Modelling child comprehension: A case of suffixal passive construction in Korean3
HOTTEST: Hate and Offensive content identification in Tamil using Transformers and Enhanced STemming3
Editorial Board3
Discriminating speech traits of Alzheimer's disease assessed through a corpus of reading task for Spanish language3
Gnowsis: Multimodal multitask learning for oral proficiency assessments3
A speech prediction model based on codec modeling and transformer decoding3
Self-feeding training method for semi-supervised grammatical error correction3
New research on monaural speech segregation based on quality assessment3
Exploring the ability of LLMs to classify written proficiency levels3
Automatic offline annotation of turn-taking transitions in task-oriented dialogue3
Incorporating external knowledge for text matching model3
Multi-level context features extraction for named entity recognition3
Editorial Board3
Sentiment analysis for live video comments with variational residual representations3
Supervised speech separation combined with adaptive beamforming3
A multimodal perspective on adaptive communication: Extending the hyper- and hypo-articulation theory3
Editorial Board3
Editorial Board3
Taking relations as known conditions: A tagging based method for relational triple extraction3
Multimodal laryngoscopic video analysis for assisted diagnosis of vocal fold paralysis3
Towards explainable spoofed speech attribution and detection: A probabilistic approach for characterizing speech synthesizer components3
Train from scratch: Single-stage joint training of speech separation and recognition3
Knowledge-enhanced meta-prompt for few-shot relation extraction3
AraFastQA: a transformer model for question-answering for Arabic language using few-shot learning3
Single-channel speech enhancement using colored spectrograms3
Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems3
Analysis of Instantaneous Frequency Components of Speech Signals for Epoch Extraction3
0.29929995536804