Multimedia Systems

Papers
(The TQCC of Multimedia Systems is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network94
360° video quality assessment based on saliency-guided viewport extraction91
Model-based portrait video compression with spatial constraint and adaptive pose processing80
Unsupervised deep metric learning algorithm for crop disease images based on knowledge distillation networks79
Pseudo-global strategy-based visual comfort assessment considering attention mechanism78
A visual question answering model based on image captioning75
A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network75
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy72
SS-YOLOv8: small-size object detection algorithm based on improved YOLOv8 for UAV imagery57
Point cloud inpainting with normal-based feature matching53
Correction: STASiamRPN: visual tracking based on spatiotemporal and attention49
Improving text-image cross-modal retrieval with contrastive loss48
Real emotion seeker: recalibrating annotation for facial expression recognition47
CAPNet: tomato leaf disease detection network based on adaptive feature fusion and convolutional enhancement39
Automatic lymph node segmentation using deep parallel squeeze & excitation and attention Unet39
Dual convolutional neural network with attention for image blind denoising38
User authentication method based on keystroke dynamics and mouse dynamics using HDA38
Towards domain adaptation underwater image enhancement and restoration38
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model37
GVA: guided visual attention approach for automatic image caption generation36
SFRA: spatial fusion regression augmentation network for facial landmark detection33
Recent advancement in haze removal approaches33
The segmented UEC Food-100 dataset with benchmark experiment on food detection32
A comparative study of color quantization methods using various image quality assessment indices32
Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach32
Segmentation-aware image super-resolution with generative adversarial networks29
SEMNet: a simple and efficient MLP-based network for 3D Face point clouds landmarks localization25
Multi-level sentiment-aware clustering for denoising in multimodal sentiment analysis with ASR errors25
Dual-branch spectral–spatial feature extraction network for multispectral image compression23
Feature fusion and optimization integrated refined deep residual network for diabetic retinopathy severity classification using fundus image23
BENet: bi-directional enhanced network for image captioning22
Multi-view Isolated sign language recognition based on cross-view and multi-level transformer22
EDB-Diff: a EdgeDevice based diffusion network for brain tumor image segmentation21
Game and reference: efficient policy making for epidemic prevention and control21
Special issue on low complexity methods for multimedia security21
Weakly supervised anomaly detection with multi-level contextual modeling20
A deep learning-based framework for detecting COVID-19 patients using chest X-rays20
Inter-class distance enhanced prototypical network for few-shot text classification20
SoftBinReduce: data reduction for color quantization through soft binning19
RGB-Net: transformer-based lightweight low-light image enhancement network via RGB channel separation19
An automatic music generation method based on RSCLN_Transformer network18
Fast bilateral filter with spatial subsampling18
Optimizing codebook training through control chart analysis18
Scale-aware attention-based multi-resolution representation for multi-person pose estimation18
Improved SSD using deep multi-scale attention spatial–temporal features for action recognition17
Exploiting local detail in single image super-resolution via hypergraph convolution17
CR-DM: A novel craniofacial reconstruction framework based on diffusion model17
RefinerHash: a new hashing-based re-ranking technique for image retrieval16
Efficient and self-adaptive rationale knowledge base for visual commonsense reasoning16
CGMAformer: CNN and gated multi axial-sparse transformer feature fusion network for image deraining16
Multi-view region proposal network predictive learning for tracking16
TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction16
DMFTNet: dense multimodal fusion transfer network for free-space detection16
Overcoming the practical restrictions in H.266/VVC-based video communication systems by a PI bit rate controller16
Double-scale similarity with rich features for cross-modal retrieval15
Pull and concentrate: improving unsupervised semantic segmentation adaptation with cross- and intra-domain consistencies15
A survey of multimodal federated learning: background, applications, and perspectives15
Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus15
Enhanced 3D reconstruction with all-neighbor-first philosophy and Ricci flow-based mesh smoothing approach14
CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrence14
Unsupervised cross-database micro-expression recognition based on distribution adaptation14
3D human pose estimation method based on multi-constrained dilated convolutions14
Graph contrastive learning for recommendation with generative data augmentation14
Depth alignment interaction network for camouflaged object detection13
A plug-and-play image enhancement model for end-to-end object detection in low-light condition13
Learning shared features from specific and ambiguous descriptions for text-based person search13
Occluded scene text detection via context-awareness from sketch-level image representations13
PCAF: UAV scenarios detector via pyramid converge-and-assign fusion network12
EDCM-EA: event prediction based on event development context mining considering event arguments12
Workpiece tracking based on improved SiamFC++ and virtual dataset12
Object detection of mural images based on improved YOLOv812
A CNN-transformer hybrid network with selective fusion and dual attention for image super-resolution12
Attention based video captioning framework for Hindi12
LAM-YOLOv11 for UAV transmission line inspection: overcoming environmental challenges with enhanced detection efficiency12
Automated brain tumor malignancy detection via 3D MRI using adaptive-3-D U-Net and heuristic-based deep neural network12
Skeleton-based human activity recognition with wifi CSI using a hybrid approach combining convolutional neural network and long short term memory11
Enhancing long-tailed classification via multi-strategy weighted experts with hybrid distillation11
Smartphone-based gait recognition using convolutional neural networks and dual-tree complex wavelet transform11
LPR: learning point-level temporal action localization through re-training11
NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism11
Multimodal-enhanced hierarchical attention network for video captioning11
Computer-aided diagnosis for early detection and staging of human pancreatic tumors using an optimized 3D CNN on computed tomography11
Scd-yolo: a novel object detection method for efficient road crack detection11
UAPT: an underwater acoustic target recognition method based on pre-trained Transformer11
HSGNet: hierarchically stacked graph network with attention mechanism for 3D human pose estimation10
Style matching CAPTCHA: match neural transferred styles to thwart intelligent attacks10
Bag of states: a non-sequential approach to video-based engagement measurement10
A CNN-based scheme for COVID-19 detection with emergency services provisions using an optimal path planning10
Overcomplete-to-sparse representation learning for few-shot class-incremental learning10
A comprehensive survey on human pose estimation approaches10
Unsupervised knowledge representation of panoramic dental X-ray images using SVG image-and-object clustering10
HandO: a hybrid 3D hand–object reconstruction model for unknown objects10
RMVAE: one-class classification via divergence regularization and maximization mutual information10
SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering10
Wireless multipath video transmission: when IoT video applications meet networking—a survey10
Developing novel video coding model using modified dual-tree wavelet-based multi-resolution technique10
3D model watermarking using surface integrals of generated random vector fields9
Deepfake detection of occluded images using a patch-based approach9
Polarity-aware attention network for image sentiment analysis9
Gicnet: global information capture network for visual place recognition9
Facial action unit detection with emotion consistency: a cross-modal learning approach9
Asymmetric exponential loss function for crack segmentation9
Text-centered cross-sample fusion network for multimodal sentiment analysis9
Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning9
Multi-level fine-grained center calibration network for unsupervised person re-identification9
Practical 3D human skeleton tracking based on multi-view and multi-Kinect fusion9
You watch once more: a more effective CNN architecture for video spatio-temporal action localization9
Remote sensing image cloud removal based on multi-scale spatial information perception9
Non-convex fractional-order TV model for image inpainting9
Lightweight super-resolution via multi-group window self-attention and residual blueprint separable convolution8
Image quality measurement-based comparative analysis of illumination compensation methods for face image normalization8
Tex-Net: texture-based parallel branch cross-attention generalized robust Deepfake detector8
EfficientFace: an efficient deep network with feature enhancement for accurate face detection8
Compact twice fusion network for edge detection8
DwiMark: a multiscale robust deep watermarking framework for diffusion-weighted imaging images8
GloFP-MSF: monocular scene flow estimation with global feature perception8
Reducing blind spots in esophagogastroduodenoscopy examinations using a novel deep learning model8
$$\hbox {DA}^2$$Net: a dual attention-aware network for robust crowd counting8
GCMR-Net: A Global Context-Enhanced Multi-scale Residual Network for medical image segmentation8
Local discriminative graph convolutional networks for text classification8
Gender estimation based on deep learned and handcrafted features in an uncontrolled environment8
VCounselor: a psychological intervention chat agent based on a knowledge-enhanced large language model8
COVID-SegNet: encoder–decoder-based architecture for COVID-19 lesion segmentation in chest X-ray8
User quality of experience estimation using social network analysis8
Same-clothes person re-identification with dual-stream network8
ReDiT: re-evaluating large visual question answering model confidence by defining input scenario difficulty and applying temperature mapping8
MGSAN: multimodal graph self-attention network for skeleton-based action recognition8
Estimating visibility via differential regression network7
Adversarial training in logit space against tiny perturbations7
Recognition of miner action and violation behavior based on the ANODE-GCN model7
STSD: spatial–temporal semantic decomposition transformer for skeleton-based action recognition7
Learning unified anchor graph based on affinity relationships with strong consensus for multi-view spectral clustering7
A Three-stage multimodal emotion recognition network based on text low-rank fusion7
Face attribute recognition via end-to-end weakly supervised regional location7
PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention7
Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction7
Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN7
HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection7
Dual-visual collaborative enhanced transformer for image captioning7
Unsupervised single-image dehazing via self-guided inverse-retinex GAN7
Student engagement detection in online environment using computer vision and multi-dimensional feature fusion7
Special issue on data-driven personalisation of television content7
DS-Diff: a dual-stage network with degradation-aware and semantic-aware for adverse weather removal based on diffusion models7
DRL-based transmission control for QoE guaranteed transmission efficiency optimization in tile-based panoramic video streaming7
Generating generalized zero-shot learning based on dual-path feature enhancement7
Dual-guided multi-modal bias removal strategy for temporal sentence grounding in video7
3D human pose estimation with multi-hypotheses gated transformer7
Image lossless encoding and encryption method of EBCOT Tier1 based on 4D hyperchaos7
WFIL-NET: image inpainting based on wavelet downsampling and frequency integrated learning module7
Hierarchical segmentation for traditional cultural pattern based on iterative compression and clustering6
Automatic segmentation of melanoma skin cancer using transfer learning and fine-tuning6
Link prediction in social networks using hyper-motif representation on hypergraph6
CAFIN: cross-attention based face image repair network6
Irregular feature enhancer for low-dose CT denoising6
Personalized time-sync comment generation based on a multimodal transformer6
Fast-colorfool: faster and more transferable semantic adversarial attack with complementary colors and cumulative perturbation6
Lightweight dual-path octave generative adversarial networks for few-shot image generation6
Indirect visual–semantic alignment for generalized zero-shot recognition6
YOLO-ERF: lightweight object detector for UAV aerial images6
EA-EDNet: encapsulated attention encoder-decoder network for 3D reconstruction in low-light-level environment6
A multi-scale channel attention network with federated learning for magnetic resonance image super-resolution6
A cross-view geo-localization method guided by relation-aware global attention6
Propagating prior information with transformer for robust visual object tracking6
An adaptive Bagging algorithm based on lightweight transformer for multi-class imbalance recognition6
Map modeling for full body gesture using flex sensor and machine learning algorithms6
Hybrid embedding for multimodal few-frame action recognition6
Gmd: Gaussian mixture descriptor for pair matching of 3D fragments6
An improved algorithm of video quality assessment by danmaku analysis6
Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS6
Music genre classification based on auditory image, spectral and acoustic features6
Multi-granular dynamic interaction network for multimodal sarcasm detection6
Channel modulus normalization for CNN image classification6
Multiscale geometric window transformer for orthodontic teeth point cloud registration6
Exploring multi-dimensional interests for session-based recommendation6
TrafficTrack: rethinking the motion and appearance cue for multi-vehicle tracking in traffic monitoring6
Dual-stream network with cross-layer attention and similarity constraint for micro-expression recognition6
Unsupervised adversarial image retrieval6
ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution6
A MADDPG-based multi-agent antagonistic algorithm for sea battlefield confrontation5
ITrans: generative image inpainting with transformers5
Spatial attention-guided deformable fusion network for salient object detection5
Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning5
A novel SPLIT-SIM approach for efficient image retrieval5
PS-YOLO: a small object detector based on efficient convolution and multi-scale feature fusion5
Sat-DehazeGAN: an efficient dehazing model in water-sky background for river-sea transport5
Editorial note for few-shot learning for intelligent multimedia systems5
Full reference image quality assessment based on dual-space multi-feature fusion5
Image and audio caps: automated captioning of background sounds and images using deep learning5
LET-Net: locally enhanced transformer network for medical image segmentation5
SR-DAYOLOv8: cross-domain adaptive object detection based on super-resolution domain classifier5
An efficient federated learning method based on enhanced classification-GAN for medical image classification5
Layer-wise enhanced transformer with multi-modal fusion for image caption5
Rescue decision via Earthquake Disaster Knowledge Graph reasoning5
Exemplar-guided low-light image enhancement5
Exploring granularity-associated invariance features for text-to-image person re-identification5
LCFormer: linear complexity transformer for efficient image super-resolution5
Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes5
Composite makeup transfer model based on generative adversarial networks5
Role of deep learning models and analytics in industrial multimedia environment5
Code generation from a graphical user interface via attention-based encoder–decoder model5
Adaptafood: an intelligent system to adapt recipes to specialised diets and healthy lifestyles5
Mmy-net: a multimodal network exploiting image and patient metadata for simultaneous segmentation and diagnosis5
BCRA: bidirectional cross-modal implicit relation reasoning and aligning for text-to-image person retrieval5
IOPCNet: inner and outer point classification based low overlap rate local-to-global point cloud registration5
Topic-guided multi-domain fake news detection4
Pinyin-to-Chinese conversion on sentence-level for domain-specific applications using self-attention model4
Hybrid features and semantic reinforcement network for image forgery detection4
Triple-level relationship enhanced transformer for image captioning4
A multi-level feature weight fusion model for salient object detection4
Infant head and brain segmentation from magnetic resonance images using fusion-based deep learning strategies4
Prior tissue knowledge-driven contrastive learning for brain CT report generation4
A novel exponent–sine–cosine chaos map-based multiple-image encryption technique4
Facial expression recognition via joint loss constraining attention-modulated contextual spatial information network4
Adaptive region assisted GAN for image steganography4
Breast density measurement methods on mammograms: a review4
Radlora: a smart low-rank adaptive approach for radiological image classification4
Image compression and encryption algorithm based on 2D compressive sensing and hyperchaotic system4
A two-stage forgery detection and localization framework based on feature classification and similarity metric4
Multi-branch feature fusion and refinement network for salient object detection4
PillarVTP: vehicle trajectory prediction method based on local point cloud aggregation and receptive field expansion4
Personalized music recommendation algorithm based on machine learning4
Collaborative multi-knowledge distillation under the influence of softmax regression representation4
A multi-scale feature fusion spatial–channel attention model for background subtraction4
Identification of haploid and diploid maize seeds using hybrid transformer model4
Dual-focus: person search from Coarse-Grained Focus to Fine-Grained Focus4
A multi-scale no-reference video quality assessment method based on transformer4
DATaR: Depth Augmented Target Redetection using Kernelized Correlation Filter4
Modeling the non-uniform retinal perception for viewport-dependent streaming of immersive video4
ED-YOLO: an object detection algorithm for drone imagery focusing on edge information and small object features4
Meta-relationship for course recommendation in MOOCs4
IS-DGM: an improved steganography method based on a deep generative model and hyper logistic map encryption via social media networks4
Food nutrition estimation with RGB-D fusion module and bidirectional feature pyramid network4
From coarse to fine: multi-level feature fusion network for fine-grained image retrieval4
Comprehensive systematic review on virtual reality for cultural heritage practices: coherent taxonomy and motivations4
Deep learning in multimedia healthcare applications: a review4
Weighted sparse gradient reconstruction model with a robust fidelity for edge-aware image smoothing4
Segmentation and recognition of filed sweet pepper based on improved self-attention convolutional neural networks4
PointSGLN: a novel point cloud classification network based on sampling grouping and local point normalization4
Locally controllable network based on visual–linguistic relation alignment for text-to-image generation4
MT-ASM: a multi-task attention strengthening model for fine-grained object recognition4
Hierarchical bi-directional conceptual interaction for text-video retrieval4
CLDE-Net: crowd localization and density estimation based on CNN and transformer network4
Dynamical semantic enhancement network for continuous sign language recognition4
Deep learning and evolutionary intelligence with fusion-based feature extraction for detection of COVID-19 from chest X-ray images4
Edge-preserving image denoising using noise-enhanced patch-based non-local means4
Design and implementation of a real-time face recognition system based on artificial intelligence techniques4
A comprehensive survey of image and video forgery techniques: variants, challenges, and future directions4
From coarse to fine: a two-stage common semantic space construction for unpaired cross modal retrieval4
Refinecurvelane: lane detection with B-spline curve in a layer-by-layer refinement manner4
Learning effective embedding for automated COVID-19 prediction from chest X-ray images4
0.21476006507874