Multimedia Systems

Papers
(The median citation count of Multimedia Systems is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)
ArticleCitations
Pseudo-global strategy-based visual comfort assessment considering attention mechanism171
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy123
Face and voice cross-modal association with learning convex feature embedding93
DiffRA: universal restorative adversarial attack based on diffusion model84
SFFN-YOLO for small object detection in aerial images74
Improving text-image cross-modal retrieval with contrastive loss64
TreeSegNet: multi-scale query-based instance segmentation with frequency-aware and gated feature enhancement63
GVA: guided visual attention approach for automatic image caption generation55
Dual-branch spectral–spatial feature extraction network for multispectral image compression54
A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network52
FedMAB: adaptive multimodal federated learning with multi-armed bandits51
A visual question answering model based on image captioning47
CAPNet: tomato leaf disease detection network based on adaptive feature fusion and convolutional enhancement45
Automatic lymph node segmentation using deep parallel squeeze & excitation and attention Unet42
User authentication method based on keystroke dynamics and mouse dynamics using HDA41
Unsupervised deep metric learning algorithm for crop disease images based on knowledge distillation networks40
The segmented UEC Food-100 dataset with benchmark experiment on food detection38
Multi-view Isolated sign language recognition based on cross-view and multi-level transformer35
Model-based portrait video compression with spatial constraint and adaptive pose processing32
Segmentation-aware image super-resolution with generative adversarial networks31
JAMD-Net: image splicing forgery detection based on JPEG compression artifacts and multi-dilated channel refinement fusion31
Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach30
CHCoT-MSLU: a coupled hierarchical chain-of-thought prompt learning model for multi-intent spoken language understanding30
Real emotion seeker: recalibrating annotation for facial expression recognition30
Towards domain adaptation underwater image enhancement and restoration29
SFRA: spatial fusion regression augmentation network for facial landmark detection27
SEMNet: a simple and efficient MLP-based network for 3D Face point clouds landmarks localization27
Atacr-net: adaptive temporal alignment and contrastive refinement network for skeleton-based action recognition26
ConASD: Contrastive Few Shot Learning for Detecting Autism Spectrum Disorder via Eye Tracking Scanpath26
Fast latent-feature augmentation for cross-domain face forgery detection26
360° video quality assessment based on saliency-guided viewport extraction26
Hierarchical feature multi-contrastive learning for skin cancer classification25
Saliency guided deep unfolding network for compressive sensing25
Sketch-guided neural style transfer25
Semi-supervised adversarial training via disentangled contrastive learning25
A comparative study of color quantization methods using various image quality assessment indices24
A variational causal inference-based method for recognizing object state changes in videos23
On-line monitoring of structural performance of scraper conveyor driven by digital twin23
Feature fusion and optimization integrated refined deep residual network for diabetic retinopathy severity classification using fundus image23
Multi-level sentiment-aware clustering for denoising in multimodal sentiment analysis with ASR errors23
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network23
Design and realization of pulse-controlled multi-memristor Hopfield neural networks and their applications in information encryption22
Dual convolutional neural network with attention for image blind denoising22
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model22
Mamba-driven context-aware tracking with dual prompts22
LEA-depth: a lightweight self-supervised monocular depth estimation with attention fusion and edge-aware distillation21
SS-YOLOv8: small-size object detection algorithm based on improved YOLOv8 for UAV imagery21
GCGV: a dual-branch hybrid network integrating graph attention, CNNs, and vision transformers for enhanced hyperspectral image classification21
BENet: bi-directional enhanced network for image captioning21
EDB-Diff: a EdgeDevice based diffusion network for brain tumor image segmentation20
Optimizing codebook training through control chart analysis20
SoftBinReduce: data reduction for color quantization through soft binning19
A verifiable variable threshold visual image secret sharing scheme19
Game and reference: efficient policy making for epidemic prevention and control19
Fast bilateral filter with spatial subsampling19
Spatial interpolation of head-related transfer functions using a physics-informed autoencoder19
RGB-Net: transformer-based lightweight low-light image enhancement network via RGB channel separation19
Exploiting local detail in single image super-resolution via hypergraph convolution18
Inter-class distance enhanced prototypical network for few-shot text classification18
Multi-view region proposal network predictive learning for tracking18
Enhanced target recognition and localization using binocular vision and infrared thermal imaging17
Quantifying Factual Divergence in Generative Models: SHAP-LIME Based Hallucination Score for LLMs17
CGMAformer: CNN and gated multi axial-sparse transformer feature fusion network for image deraining17
GL-MambaNet: Mamba-based global and local feature fusion for image dehazing17
RefinerHash: a new hashing-based re-ranking technique for image retrieval17
Big-LITTLE-Net: a dual-branch network for small UAV detection17
DAFMixerSR: a lightweight fusion-enhanced adaptive perception network for image super-resolution17
Badinterpreter: Backdoor attack on LLM-based interpretable recommendation16
Incrementaldreamer: scene-level 3D generation with incremental optimization16
Transferable diffusion transformer for low-light image enhancement16
An automatic music generation method based on RSCLN_Transformer network16
VLM-driven fine-grained semantic regularization for low-light image enhancement16
A multi-label classification method combined with texture enhancement for deepfake face detection15
Speech-driven talking face video generation15
Weakly supervised anomaly detection with multi-level contextual modeling15
Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping15
Pull and concentrate: improving unsupervised semantic segmentation adaptation with cross- and intra-domain consistencies15
DMFTNet: dense multimodal fusion transfer network for free-space detection15
Similarity-guided contrastive learning for deep multi-view clustering15
Design and evaluation of a serious game in virtual reality to increase empathy towards students with phonological dyslexia14
Fgef-net: frequency-guided and enhanced fusion dehazing network for visibility enhancement in maritime traffic surveillance14
TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction14
CR-DM: A novel craniofacial reconstruction framework based on diffusion model14
A survey of multimodal federated learning: background, applications, and perspectives14
Vulnerability Positioner (VulP): enhancing code vulnerability localization with CodeBERT14
CCM-Net: image splicing localization network based on context-aware and cross-domain multi-scale fusion14
Enhanced 3D reconstruction with all-neighbor-first philosophy and Ricci flow-based mesh smoothing approach13
Computer-aided diagnosis for early detection and staging of human pancreatic tumors using an optimized 3D CNN on computed tomography13
Object detection of mural images based on improved YOLOv813
Workpiece tracking based on improved SiamFC++ and virtual dataset13
Semantic segmentation network for remote sensing images based on category-aware cross-fusion13
PDSRN: a progressive distillation network for generalizable single image super-resolution13
EDCM-EA: event prediction based on event development context mining considering event arguments13
Cross-modality geometry-guided historical momentum learning for coupled noisy visible-infrared re-identification13
NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism12
Occluded scene text detection via context-awareness from sketch-level image representations12
LPR: learning point-level temporal action localization through re-training12
Style matching CAPTCHA: match neural transferred styles to thwart intelligent attacks12
Rethinking RGB-D salient object detection12
PCAF: UAV scenarios detector via pyramid converge-and-assign fusion network12
MCLSC-Fusion: a multi-scale cross-modality long-short connection fusion network for infrared and visible images12
Multimodal-enhanced hierarchical attention network for video captioning12
Learning shared features from specific and ambiguous descriptions for text-based person search12
CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrence12
A plug-and-play image enhancement model for end-to-end object detection in low-light condition12
Skeleton-based human activity recognition with wifi CSI using a hybrid approach combining convolutional neural network and long short term memory12
Graph contrastive learning for recommendation with generative data augmentation12
3D human pose estimation method based on multi-constrained dilated convolutions12
Joint $$\alpha {-}\beta $$-divergences reconstruction and non-convex sparse regularization for image clustering12
Automated brain tumor malignancy detection via 3D MRI using adaptive-3-D U-Net and heuristic-based deep neural network12
A comprehensive survey on human pose estimation approaches11
AI-driven Braille character recognition using partitioned spatial modeling and sequential learning11
LAM-YOLOv11 for UAV transmission line inspection: overcoming environmental challenges with enhanced detection efficiency11
Multi-level fine-grained center calibration network for unsupervised person re-identification11
Smartphone-based gait recognition using convolutional neural networks and dual-tree complex wavelet transform11
Scd-yolo: a novel object detection method for efficient road crack detection11
Depth alignment interaction network for camouflaged object detection11
Overcomplete-to-sparse representation learning for few-shot class-incremental learning11
HSGNet: hierarchically stacked graph network with attention mechanism for 3D human pose estimation11
Enhancing long-tailed classification via multi-strategy weighted experts with hybrid distillation11
Multi-domain feature enhanced adaptive fusion network for multi-modal fake news detection11
A CNN-transformer hybrid network with selective fusion and dual attention for image super-resolution11
Gicnet: global information capture network for visual place recognition11
UAPT: an underwater acoustic target recognition method based on pre-trained Transformer11
You watch once more: a more effective CNN architecture for video spatio-temporal action localization10
Pointlgfn: local–global fusion network for point cloud classification10
Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning10
Detecting offensive language on instagram with a combined approach of the Gray Wolf algorithm and deep learning networks10
Unsupervised knowledge representation of panoramic dental X-ray images using SVG image-and-object clustering10
SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering10
CloudCap3D: enhancing 3D in-scene descriptions via point cloud integration and efficient text filtering10
FedVC-ADDiM: a federated learning framework for diagnosis of alzheimer disease using deep learning10
COVID-SegNet: encoder–decoder-based architecture for COVID-19 lesion segmentation in chest X-ray10
Deepfake detection of occluded images using a patch-based approach10
DFGAnet: a dual-branch multimodal fusion network based on graph and attention for emotion recognition in conversation10
Multi-object tracking in the low-light with two-stage association and denoising based on image feature enhancement10
Bag of states: a non-sequential approach to video-based engagement measurement10
Swiftavatar: real-time human reconstruction via semantic graph deformation and surface awareness10
Polarity-aware attention network for image sentiment analysis10
Tb-mmrd: transformer-based multi-modal election rumor detection with agreement-aware gating and semantic fusion10
Facial action unit detection with emotion consistency: a cross-modal learning approach10
Diff-mednet: differential convolution and median-enhanced attention multiscale fusion for infrared small target detection10
3D model watermarking using surface integrals of generated random vector fields10
Synthetic shadows: the interplay of forensic detection and anti-forensic techniques in GAN-generated images10
A hybrid spatial and spectral mamba network for hyperspectral image super-resolution10
Reducing blind spots in esophagogastroduodenoscopy examinations using a novel deep learning model9
Dual-stream progressive neural network based on cross fusion in image manipulation localization9
Unsupervised single-image dehazing via self-guided inverse-retinex GAN9
MobileViNeXt: a lightweight fusion model for ship-radiated noise recognition9
Same-clothes person re-identification with dual-stream network9
Text-centered cross-sample fusion network for multimodal sentiment analysis9
GloFP-MSF: monocular scene flow estimation with global feature perception9
Tex-Net: texture-based parallel branch cross-attention generalized robust Deepfake detector9
A robust federated aggregation algorithm for multimodal data in smart grid scenarios9
Fine-grained behavior interaction-aware network for efficient multi-person motion forecasting9
3D human pose estimation with multi-hypotheses gated transformer9
MGSAN: multimodal graph self-attention network for skeleton-based action recognition9
ReDiT: re-evaluating large visual question answering model confidence by defining input scenario difficulty and applying temperature mapping9
Remote sensing image cloud removal based on multi-scale spatial information perception9
Lightweight super-resolution via multi-group window self-attention and residual blueprint separable convolution9
CBLC-SOOD: contrastive background and label correction for semi-supervised oriented object detection9
Dual-visual collaborative enhanced transformer for image captioning9
Task-adaptive parameter optimization for medical image classification transfer learning9
Compact twice fusion network for edge detection9
ST-GRU: spatiotemporal gated recurrent unit for video prediction9
Gender estimation based on deep learned and handcrafted features in an uncontrolled environment9
Non-convex fractional-order TV model for image inpainting9
EfficientFace: an efficient deep network with feature enhancement for accurate face detection8
Dual-guided multi-modal bias removal strategy for temporal sentence grounding in video8
GCMR-Net: A Global Context-Enhanced Multi-scale Residual Network for medical image segmentation8
Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction8
Deep unfolding low-rank network for image denoising8
Multi-granular dynamic interaction network for multimodal sarcasm detection8
Teaching authentic sign language through multiple representation learning8
Hybrid embedding for multimodal few-frame action recognition8
PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention8
DS-Diff: a dual-stage network with degradation-aware and semantic-aware for adverse weather removal based on diffusion models8
A Three-stage multimodal emotion recognition network based on text low-rank fusion8
Local discriminative graph convolutional networks for text classification8
Learning unified anchor graph based on affinity relationships with strong consensus for multi-view spectral clustering8
Hfffap-net: unsupervised fundus image enhancement with high-frequency feature fusion and artifact processing8
Gmd: Gaussian mixture descriptor for pair matching of 3D fragments8
Indirect visual–semantic alignment for generalized zero-shot recognition8
Lightweight dual-path octave generative adversarial networks for few-shot image generation8
Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN8
ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution8
Selecting generated synthetic features using clustering algorithm for generalized zero-shot learning8
KECAN: knowledge-enhanced cross-modal alignment network for ophthalmic report generation8
Face attribute recognition via end-to-end weakly supervised regional location8
Special issue on data-driven personalisation of television content8
Student engagement detection in online environment using computer vision and multi-dimensional feature fusion8
STSD: spatial–temporal semantic decomposition transformer for skeleton-based action recognition8
Multi-document localization method based on bottom-up architecture8
Generating generalized zero-shot learning based on dual-path feature enhancement8
Multimodal large language model enhancement network for multimodal sentiment analysis8
Blind super-resolution based on matrix-variable optimization for video images8
Prior-based bi-encoder transformer for underwater image enhancement8
DRL-based transmission control for QoE guaranteed transmission efficiency optimization in tile-based panoramic video streaming8
Dual attention transformer with adaptive frequency enhancement for real-world Chinese–English scene text image super-resolution8
Hierarchical segmentation for traditional cultural pattern based on iterative compression and clustering8
Adversarial training in logit space against tiny perturbations8
Scene text image super-resolution algorithm based on directional feature modeling8
VCounselor: a psychological intervention chat agent based on a knowledge-enhanced large language model8
Recognition of miner action and violation behavior based on the ANODE-GCN model7
Hierarchical segmentation-guided diffusion framework for high-fidelity sonar image generation7
Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning7
Layer-wise enhanced transformer with multi-modal fusion for image caption7
An adaptive Bagging algorithm based on lightweight transformer for multi-class imbalance recognition7
A prompt-based dual-layer cross-modal distillation learning method for aspect-based sentiment analysis7
X2Fashion: temporally consistent fashion video generation guided by image, pose and text7
USGA: unified intra- and cross-scale features with global–local aggregation for long-term tracking7
TSGFormer: temporal-aware network and spatial encoding GCN for three-dimensional human pose estimation7
YOLO-ERF: lightweight object detector for UAV aerial images7
CAPTCHA farm detection and user authentication via mouse-trajectory similarity measurement7
Advanced techniques in digital media processing for special effects enhancement in film and television post-production7
Estimating visibility via differential regression network7
Opfusion: a deep blind image super resolution network using generative diffusion models and neural operator learning7
Composite makeup transfer model based on generative adversarial networks7
Personalized time-sync comment generation based on a multimodal transformer7
An efficient federated learning method based on enhanced classification-GAN for medical image classification7
Context-aware feature complementary screening network for mass segmentation in whole mammograms7
CAFIN: cross-attention based face image repair network7
Fast-colorfool: faster and more transferable semantic adversarial attack with complementary colors and cumulative perturbation7
Dual-stream network with cross-layer attention and similarity constraint for micro-expression recognition7
WFIL-NET: image inpainting based on wavelet downsampling and frequency integrated learning module7
HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection7
Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS7
Map modeling for full body gesture using flex sensor and machine learning algorithms7
Role of deep learning models and analytics in industrial multimedia environment7
Rescue decision via Earthquake Disaster Knowledge Graph reasoning7
Propagating prior information with transformer for robust visual object tracking7
TIPDF-DWSF: a task-oriented two-stage optimization framework for diffusion model LoRA fine-tuning7
LET-Net: locally enhanced transformer network for medical image segmentation7
Fine-tuning CLIP for difference-guided composed image retrieval7
Link prediction in social networks using hyper-motif representation on hypergraph7
EA-EDNet: encapsulated attention encoder-decoder network for 3D reconstruction in low-light-level environment7
A self-supervised enhancement method for real world low-light images using Retinex and camera response function6
A multi-scale feature fusion spatial–channel attention model for background subtraction6
Food nutrition estimation with RGB-D fusion module and bidirectional feature pyramid network6
DATaR: Depth Augmented Target Redetection using Kernelized Correlation Filter6
Prometheus: an efficient federated collaborative learning framework for coevolution of edge-cloud heterogeneous models6
Dynamical semantic enhancement network for continuous sign language recognition6
CLDE-Net: crowd localization and density estimation based on CNN and transformer network6
A multi-scale no-reference video quality assessment method based on transformer6
Adp-clf: adaptive dual-perception contrastive learning for gastrointestinal endoscopic image classification6
Dual-focus: person search from Coarse-Grained Focus to Fine-Grained Focus6
Prior tissue knowledge-driven contrastive learning for brain CT report generation6
A sub-grouping-based resource allocation method for layered video’s multicast broadcast service (MBS) over the 5G cellular network6
Learning effective embedding for automated COVID-19 prediction from chest X-ray images6
Personalized music recommendation algorithm based on machine learning6
0.082291841506958