IEEE Transactions on Multimedia

Papers
(The H4-Index of IEEE Transactions on Multimedia is 76. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-12-01 to 2025-12-01.)
ArticleCitations
Disaggregation Distillation for Person Search732
Adaptive Weight Generator for Multi-Task Image Recognition by Task Grouping Prompt405
Semi-Supervised Domain Adaptation via Joint Transductive and Inductive Subspace Learning348
Improving Vision Anomaly Detection With the Guidance of Language Modality272
Focusing on Subtle Differences: A Feature Disentanglement Model for Series Photo Selection256
SGG-Nets: Generic Rotation-Invariant Plugin Networks for Point Cloud Analysis255
Mix-Based Training Strategies for Learning Implicit Neural Representations248
Weakly-Supervised 3D Visual Grounding Based on Visual Language Alignment209
Self-Guided Discriminative Locality Preserving Projections208
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition198
Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames169
Robust Multi-Stage Tracking via Multi-Scale and Multi-Level Representation Learning164
SkyML: A MLaaS Federation Design for Multicloud-Based Multimedia Analytics164
Exploring Kernel Transformations for Implicit Neural Representations162
Self-Mining the Confident Prototypes for Source-Free Unsupervised Domain Adaptation in Image Segmentation153
Online Low-Light Sand-Dust Video Enhancement Using Adaptive Dynamic Brightness Correction and a Rolling Guidance Filter147
Feature First: Advancing Image-Text Retrieval Through Improved Visual Features146
SCSP: An Unsupervised Image-to-Image Translation Network Based on Semantic Cooperative Shape Perception143
ICE: Interactive 3D Game Character Facial Editing via Dialogue142
Multi-Level Transitional Contrast Learning for Personalized Image Aesthetics Assessment141
Vulnerability of Feature Extractors in 2D Image-Based 3D Object Retrieval141
Watch Where You Move: Region-Aware Dynamic Aggregation and Excitation for Gait Recognition141
Quality Assessment for DIBR-Synthesized Views Based on Wavelet Transform and Gradient Magnitude Similarity135
Semantic-Aware Triplet Loss for Image Classification132
BMB: Balanced Memory Bank for Long-Tailed Semi-Supervised Learning130
Pixel Bleach Network for Detecting Face Forgery Under Compression127
Rethinking Affine Transform for Efficient Image Enhancement: A Color Space Perspective125
One-Shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing119
MHRN: A Multimodal Hierarchical Reasoning Network for Topic Detection118
Few-Shot Generative Model Adaptation via Style-Guided Prompt117
Asymptotics-Aware Multi-View Subspace Clustering117
Bias-Correction Feature Learner for Semi-Supervised Instance Segmentation115
FoodSAM: Any Food Segmentation114
Structured Attention Network for Referring Image Segmentation112
Optimal Transport-Based Patch Matching for Image Style Transfer111
Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes110
Towards Fast and Robust Real Image Denoising With Attentive Neural Network and PID Controller107
Semi-Supervised Domain Adaptation for Major Depressive Disorder Detection106
Ensemble Prototype Networks for Unsupervised Cross-Modal Hashing With Cross-Task Consistency105
BASNet: Boundary Assisted Network for Image Splicing Forgery Detection103
Adaptive HEVC Video Steganography With High Performance Based on Attention-Net and PU Partition Modes103
Anomaly-Led Prompting Learning Caption Generating Model and Benchmark101
Semantic Dual-Adversarial Network for Blended-Target Domain Adaptation100
Skeleton-Based Action Recognition With Select-Assemble-Normalize Graph Convolutional Networks99
Hierarchical Equalization Loss for Long-Tailed Instance Segmentation98
Annealing Genetic GAN for Imbalanced Web Data Learning98
Bidirectional Translation Between UHD-HDR and HD-SDR Videos97
Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments97
Rethinking Video Sentence Grounding From a Tracking Perspective With Memory Network and Masked Attention96
Scale Up Composed Image Retrieval Learning via Modification Text Generation95
MGKsite: Multi-Modal Knowledge-Driven Site Selection via Intra and Inter-Modal Graph Fusion95
Improving Pre-Trained Model-Based Speech Emotion Recognition From a Low-Level Speech Feature Perspective93
Siamese Alignment Network for Weakly Supervised Video Moment Retrieval93
Dual-Task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding93
Semi-Supervised Contrastive Learning With Similarity Co-Calibration89
Neighborhood Contrastive Transformer for Change Captioning88
Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations86
Progressive Local Filter Pruning for Image Retrieval Acceleration86
A Total Variation With Joint Norms For Infrared and Visible Image Fusion86
Unsupervised Learning-Based Framework for Deepfake Video Detection85
Guided Image-to-Image Translation by Discriminator-Generator Communication84
PhotoHelper: Portrait Photographing Guidance Via Deep Feature Retrieval and Fusion84
AMS-Net: Adaptive Multi-Scale Network for Image Compressive Sensing84
XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework84
Deep Semantic-Consistent Penalizing Hashing for Cross-Modal Retrieval83
Interpretable Graph Convolutional Network for Multi-View Semi-Supervised Learning82
Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization82
Dynamic Contrastive Distillation for Image-Text Retrieval82
Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation With Interpretability79
SLCGC: A lightweight Self-supervised Low-Pass Contrastive Graph Clustering Network for Hyperspectral Images79
A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition79
JPEG AI Compressed Domain Face Detection: a Multi-scale Bridging Perspective78
Spatial-Temporal Saliency Guided Unbiased Contrastive Learning for Video Scene Graph Generation77
Primary Code Guided Targeted Attack against Cross-modal Hashing Retrieval76
Universal Infrared Image Nonuniformity Correction via Stripe-Aware Attention Network76
Towards Neural Codec-Empowered 360$^\circ$ Video Streaming: A Saliency-Aided Synergistic Approach76
Exploring Local and Global Consistent Correlation on Hypergraph for Rotation Invariant Point Cloud Analysis76
0.1515839099884