Multimedia Systems

Papers
(The TQCC of Multimedia Systems is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-12-01 to 2025-12-01.)
ArticleCitations
A visual question answering model based on image captioning127
Unsupervised deep metric learning algorithm for crop disease images based on knowledge distillation networks92
Pseudo-global strategy-based visual comfort assessment considering attention mechanism88
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy83
A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network76
Real emotion seeker: recalibrating annotation for facial expression recognition71
A comparative study of color quantization methods using various image quality assessment indices70
BENet: bi-directional enhanced network for image captioning67
Correction: STASiamRPN: visual tracking based on spatiotemporal and attention58
Towards domain adaptation underwater image enhancement and restoration49
Dual-branch spectral–spatial feature extraction network for multispectral image compression47
Face and voice cross-modal association with learning convex feature embedding43
ConASD: Contrastive Few Shot Learning for Detecting Autism Spectrum Disorder via Eye Tracking Scanpath42
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network41
Feature fusion and optimization integrated refined deep residual network for diabetic retinopathy severity classification using fundus image41
360° video quality assessment based on saliency-guided viewport extraction40
SFRA: spatial fusion regression augmentation network for facial landmark detection39
SEMNet: a simple and efficient MLP-based network for 3D Face point clouds landmarks localization38
Model-based portrait video compression with spatial constraint and adaptive pose processing35
Improving text-image cross-modal retrieval with contrastive loss34
Segmentation-aware image super-resolution with generative adversarial networks30
CAPNet: tomato leaf disease detection network based on adaptive feature fusion and convolutional enhancement30
SS-YOLOv8: small-size object detection algorithm based on improved YOLOv8 for UAV imagery29
Dual convolutional neural network with attention for image blind denoising29
SFFN-YOLO for small object detection in aerial images28
CHCoT-MSLU: a coupled hierarchical chain-of-thought prompt learning model for multi-intent spoken language understanding28
DiffRA: universal restorative adversarial attack based on diffusion model27
The segmented UEC Food-100 dataset with benchmark experiment on food detection26
GVA: guided visual attention approach for automatic image caption generation26
On-line monitoring of structural performance of scraper conveyor driven by digital twin26
Design and realization of pulse-controlled multi-memristor Hopfield neural networks and their applications in information encryption24
Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach23
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model23
Automatic lymph node segmentation using deep parallel squeeze & excitation and attention Unet23
User authentication method based on keystroke dynamics and mouse dynamics using HDA22
Multi-view Isolated sign language recognition based on cross-view and multi-level transformer22
Multi-level sentiment-aware clustering for denoising in multimodal sentiment analysis with ASR errors21
EDB-Diff: a EdgeDevice based diffusion network for brain tumor image segmentation21
Optimizing codebook training through control chart analysis20
RGB-Net: transformer-based lightweight low-light image enhancement network via RGB channel separation20
Spatial interpolation of head-related transfer functions using a physics-informed autoencoder20
Weakly supervised anomaly detection with multi-level contextual modeling18
DMFTNet: dense multimodal fusion transfer network for free-space detection18
Fast bilateral filter with spatial subsampling18
SoftBinReduce: data reduction for color quantization through soft binning18
Overcoming the practical restrictions in H.266/VVC-based video communication systems by a PI bit rate controller18
TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction18
Design and evaluation of a serious game in virtual reality to increase empathy towards students with phonological dyslexia17
CGMAformer: CNN and gated multi axial-sparse transformer feature fusion network for image deraining17
Enhanced target recognition and localization using binocular vision and infrared thermal imaging17
Multi-view region proposal network predictive learning for tracking17
Big-LITTLE-Net: a dual-branch network for small UAV detection17
An automatic music generation method based on RSCLN_Transformer network17
A verifiable variable threshold visual image secret sharing scheme16
A multi-label classification method combined with texture enhancement for deepfake face detection16
Similarity-guided contrastive learning for deep multi-view clustering16
Inter-class distance enhanced prototypical network for few-shot text classification16
Exploiting local detail in single image super-resolution via hypergraph convolution16
Efficient and self-adaptive rationale knowledge base for visual commonsense reasoning16
A survey of multimodal federated learning: background, applications, and perspectives16
RefinerHash: a new hashing-based re-ranking technique for image retrieval15
GL-MambaNet: Mamba-based global and local feature fusion for image dehazing15
CR-DM: A novel craniofacial reconstruction framework based on diffusion model15
DAFMixerSR: a lightweight fusion-enhanced adaptive perception network for image super-resolution15
Workpiece tracking based on improved SiamFC++ and virtual dataset15
Speech-driven talking face video generation15
Game and reference: efficient policy making for epidemic prevention and control15
CCM-Net: image splicing localization network based on context-aware and cross-domain multi-scale fusion15
Enhanced 3D reconstruction with all-neighbor-first philosophy and Ricci flow-based mesh smoothing approach15
Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping15
A deep learning-based framework for detecting COVID-19 patients using chest X-rays15
Pull and concentrate: improving unsupervised semantic segmentation adaptation with cross- and intra-domain consistencies15
Double-scale similarity with rich features for cross-modal retrieval14
Cross-modality geometry-guided historical momentum learning for coupled noisy visible-infrared re-identification14
EDCM-EA: event prediction based on event development context mining considering event arguments14
Graph contrastive learning for recommendation with generative data augmentation14
A plug-and-play image enhancement model for end-to-end object detection in low-light condition14
Unsupervised cross-database micro-expression recognition based on distribution adaptation13
A CNN-transformer hybrid network with selective fusion and dual attention for image super-resolution13
NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism13
Wireless multipath video transmission: when IoT video applications meet networking—a survey13
PDSRN: a progressive distillation network for generalizable single image super-resolution13
3D human pose estimation method based on multi-constrained dilated convolutions13
Skeleton-based human activity recognition with wifi CSI using a hybrid approach combining convolutional neural network and long short term memory13
LAM-YOLOv11 for UAV transmission line inspection: overcoming environmental challenges with enhanced detection efficiency12
RMVAE: one-class classification via divergence regularization and maximization mutual information12
Occluded scene text detection via context-awareness from sketch-level image representations12
Automated brain tumor malignancy detection via 3D MRI using adaptive-3-D U-Net and heuristic-based deep neural network12
Enhancing long-tailed classification via multi-strategy weighted experts with hybrid distillation12
UAPT: an underwater acoustic target recognition method based on pre-trained Transformer12
LPR: learning point-level temporal action localization through re-training12
MCLSC-Fusion: a multi-scale cross-modality long-short connection fusion network for infrared and visible images12
Depth alignment interaction network for camouflaged object detection12
Multimodal-enhanced hierarchical attention network for video captioning12
Computer-aided diagnosis for early detection and staging of human pancreatic tumors using an optimized 3D CNN on computed tomography12
PCAF: UAV scenarios detector via pyramid converge-and-assign fusion network12
Object detection of mural images based on improved YOLOv812
Gicnet: global information capture network for visual place recognition11
Developing novel video coding model using modified dual-tree wavelet-based multi-resolution technique11
Style matching CAPTCHA: match neural transferred styles to thwart intelligent attacks11
A comprehensive survey on human pose estimation approaches11
HandO: a hybrid 3D hand–object reconstruction model for unknown objects11
Tb-mmrd: transformer-based multi-modal election rumor detection with agreement-aware gating and semantic fusion11
Scd-yolo: a novel object detection method for efficient road crack detection11
Unsupervised knowledge representation of panoramic dental X-ray images using SVG image-and-object clustering11
Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning11
Synthetic shadows: the interplay of forensic detection and anti-forensic techniques in GAN-generated images11
CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrence11
Learning shared features from specific and ambiguous descriptions for text-based person search11
Smartphone-based gait recognition using convolutional neural networks and dual-tree complex wavelet transform11
3D model watermarking using surface integrals of generated random vector fields10
Facial action unit detection with emotion consistency: a cross-modal learning approach10
Polarity-aware attention network for image sentiment analysis10
COVID-SegNet: encoder–decoder-based architecture for COVID-19 lesion segmentation in chest X-ray10
Text-centered cross-sample fusion network for multimodal sentiment analysis10
Deepfake detection of occluded images using a patch-based approach10
HSGNet: hierarchically stacked graph network with attention mechanism for 3D human pose estimation10
Multi-level fine-grained center calibration network for unsupervised person re-identification10
Non-convex fractional-order TV model for image inpainting10
Remote sensing image cloud removal based on multi-scale spatial information perception10
You watch once more: a more effective CNN architecture for video spatio-temporal action localization10
Detecting offensive language on instagram with a combined approach of the Gray Wolf algorithm and deep learning networks10
Asymmetric exponential loss function for crack segmentation10
Overcomplete-to-sparse representation learning for few-shot class-incremental learning10
Bag of states: a non-sequential approach to video-based engagement measurement10
SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering10
Unsupervised single-image dehazing via self-guided inverse-retinex GAN9
Dual-stream progressive neural network based on cross fusion in image manipulation localization9
User quality of experience estimation using social network analysis9
VCounselor: a psychological intervention chat agent based on a knowledge-enhanced large language model9
ST-GRU: spatiotemporal gated recurrent unit for video prediction9
Student engagement detection in online environment using computer vision and multi-dimensional feature fusion9
Tex-Net: texture-based parallel branch cross-attention generalized robust Deepfake detector9
3D human pose estimation with multi-hypotheses gated transformer9
Compact twice fusion network for edge detection9
Dual-guided multi-modal bias removal strategy for temporal sentence grounding in video9
CBLC-SOOD: contrastive background and label correction for semi-supervised oriented object detection9
Gender estimation based on deep learned and handcrafted features in an uncontrolled environment9
Same-clothes person re-identification with dual-stream network9
STSD: spatial–temporal semantic decomposition transformer for skeleton-based action recognition9
Reducing blind spots in esophagogastroduodenoscopy examinations using a novel deep learning model9
GloFP-MSF: monocular scene flow estimation with global feature perception9
Hfffap-net: unsupervised fundus image enhancement with high-frequency feature fusion and artifact processing9
A Three-stage multimodal emotion recognition network based on text low-rank fusion9
Face attribute recognition via end-to-end weakly supervised regional location9
Lightweight super-resolution via multi-group window self-attention and residual blueprint separable convolution9
ReDiT: re-evaluating large visual question answering model confidence by defining input scenario difficulty and applying temperature mapping9
$$\hbox {DA}^2$$Net: a dual attention-aware network for robust crowd counting9
Dual-visual collaborative enhanced transformer for image captioning9
Learning unified anchor graph based on affinity relationships with strong consensus for multi-view spectral clustering9
Local discriminative graph convolutional networks for text classification9
Adversarial training in logit space against tiny perturbations8
Indirect visual–semantic alignment for generalized zero-shot recognition8
Multimodal large language model enhancement network for multimodal sentiment analysis8
Dual attention transformer with adaptive frequency enhancement for real-world Chinese–English scene text image super-resolution8
EA-EDNet: encapsulated attention encoder-decoder network for 3D reconstruction in low-light-level environment8
GCMR-Net: A Global Context-Enhanced Multi-scale Residual Network for medical image segmentation8
DS-Diff: a dual-stage network with degradation-aware and semantic-aware for adverse weather removal based on diffusion models8
Generating generalized zero-shot learning based on dual-path feature enhancement8
Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction8
Teaching authentic sign language through multiple representation learning8
Fine-tuning CLIP for difference-guided composed image retrieval8
Fast-colorfool: faster and more transferable semantic adversarial attack with complementary colors and cumulative perturbation8
EfficientFace: an efficient deep network with feature enhancement for accurate face detection8
Estimating visibility via differential regression network8
Gmd: Gaussian mixture descriptor for pair matching of 3D fragments8
Hierarchical segmentation for traditional cultural pattern based on iterative compression and clustering8
Selecting generated synthetic features using clustering algorithm for generalized zero-shot learning8
Hybrid embedding for multimodal few-frame action recognition8
MGSAN: multimodal graph self-attention network for skeleton-based action recognition8
Special issue on data-driven personalisation of television content8
Music genre classification based on auditory image, spectral and acoustic features7
Link prediction in social networks using hyper-motif representation on hypergraph7
Role of deep learning models and analytics in industrial multimedia environment7
CAFIN: cross-attention based face image repair network7
WFIL-NET: image inpainting based on wavelet downsampling and frequency integrated learning module7
Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS7
Lightweight dual-path octave generative adversarial networks for few-shot image generation7
Multi-granular dynamic interaction network for multimodal sarcasm detection7
Recognition of miner action and violation behavior based on the ANODE-GCN model7
Propagating prior information with transformer for robust visual object tracking7
Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes7
Map modeling for full body gesture using flex sensor and machine learning algorithms7
Exploring multi-dimensional interests for session-based recommendation7
DRL-based transmission control for QoE guaranteed transmission efficiency optimization in tile-based panoramic video streaming7
Channel modulus normalization for CNN image classification7
YOLO-ERF: lightweight object detector for UAV aerial images7
Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN7
Prior-based bi-encoder transformer for underwater image enhancement7
Dual-stream network with cross-layer attention and similarity constraint for micro-expression recognition7
Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning7
Adaptafood: an intelligent system to adapt recipes to specialised diets and healthy lifestyles7
CAPTCHA farm detection and user authentication via mouse-trajectory similarity measurement7
ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution7
KECAN: knowledge-enhanced cross-modal alignment network for ophthalmic report generation7
PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention7
HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection7
LCFormer: linear complexity transformer for efficient image super-resolution6
A multi-level feature weight fusion model for salient object detection6
Locally controllable network based on visual–linguistic relation alignment for text-to-image generation6
Mmy-net: a multimodal network exploiting image and patient metadata for simultaneous segmentation and diagnosis6
Collaborative point cloud geometry compression for both human vision and machine vision6
Personalized time-sync comment generation based on a multimodal transformer6
Editorial note for few-shot learning for intelligent multimedia systems6
LET-Net: locally enhanced transformer network for medical image segmentation6
Adp-clf: adaptive dual-perception contrastive learning for gastrointestinal endoscopic image classification6
PS-YOLO: a small object detector based on efficient convolution and multi-scale feature fusion6
SR-DAYOLOv8: cross-domain adaptive object detection based on super-resolution domain classifier6
Mpv-pcqa: multimodal no-reference point cloud quality assessment via point cloud and captured dynamic video6
IOPCNet: inner and outer point classification based low overlap rate local-to-global point cloud registration6
Full reference image quality assessment based on dual-space multi-feature fusion6
An efficient federated learning method based on enhanced classification-GAN for medical image classification6
A MADDPG-based multi-agent antagonistic algorithm for sea battlefield confrontation6
Learning effective embedding for automated COVID-19 prediction from chest X-ray images6
GCIF: graph based cross-modal information fusion for conversational emotion recognition6
Prior tissue knowledge-driven contrastive learning for brain CT report generation6
A multi-scale channel attention network with federated learning for magnetic resonance image super-resolution6
Rescue decision via Earthquake Disaster Knowledge Graph reasoning6
Spatial attention-guided deformable fusion network for salient object detection6
A weakly supervised pavement crack segmentation based on adversarial learning and transformers6
Image and audio caps: automated captioning of background sounds and images using deep learning6
TrafficTrack: rethinking the motion and appearance cue for multi-vehicle tracking in traffic monitoring6
Msfusenet: a multi-stage information fusion network for multi-modal skin lesion diagnosis6
Irregular feature enhancer for low-dose CT denoising6
An adaptive Bagging algorithm based on lightweight transformer for multi-class imbalance recognition6
From coarse to fine: a two-stage common semantic space construction for unpaired cross modal retrieval6
A multi-scale feature fusion spatial–channel attention model for background subtraction6
A two-stage forgery detection and localization framework based on feature classification and similarity metric6
DATaR: Depth Augmented Target Redetection using Kernelized Correlation Filter6
A cross-view geo-localization method guided by relation-aware global attention6
A multi-scale no-reference video quality assessment method based on transformer6
Multiscale geometric window transformer for orthodontic teeth point cloud registration6
ITrans: generative image inpainting with transformers6
Layer-wise enhanced transformer with multi-modal fusion for image caption6
TSGFormer: temporal-aware network and spatial encoding GCN for three-dimensional human pose estimation6
Wavelet guided real time detection transformer with sparse attention6
Composite makeup transfer model based on generative adversarial networks6
A prompt-based dual-layer cross-modal distillation learning method for aspect-based sentiment analysis6
Collaborative multi-knowledge distillation under the influence of softmax regression representation5
PillarVTP: vehicle trajectory prediction method based on local point cloud aggregation and receptive field expansion5
Breast density measurement methods on mammograms: a review5
Weighted sparse gradient reconstruction model with a robust fidelity for edge-aware image smoothing5
Identification of haploid and diploid maize seeds using hybrid transformer model5
DHRA-UNet: a lightweight SLM powder-spreading defect image segmentation algorithm5
Edge-preserving image denoising using noise-enhanced patch-based non-local means5
Adaptive region assisted GAN for image steganography5
Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method5
AFEV-INet: adaptive feature extraction variational interactive network for remote sensing image denoising5
View adjustment: helping users improve photographic composition5
IS-DGM: an improved steganography method based on a deep generative model and hyper logistic map encryption via social media networks5
0.11136817932129