IEEE Transactions on Multimedia

Papers
(The H4-Index of IEEE Transactions on Multimedia is 68. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Focusing on Subtle Differences: A Feature Disentanglement Model for Series Photo Selection639
Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations361
Optimal Transport-Based Patch Matching for Image Style Transfer310
Adaptive Weight Generator for Multi-Task Image Recognition by Task Grouping Prompt234
Semi-Supervised Domain Adaptation via Joint Transductive and Inductive Subspace Learning229
Rethinking Video Sentence Grounding From a Tracking Perspective With Memory Network and Masked Attention223
Disaggregation Distillation for Person Search204
Multi-Level Transitional Contrast Learning for Personalized Image Aesthetics Assessment201
Semantic-Aware Triplet Loss for Image Classification193
Robust Multi-stage Tracking via Multi-scale and Multi-level Representation Learning177
Improving Vision Anomaly Detection With the Guidance of Language Modality155
Towards Fast and Robust Real Image Denoising With Attentive Neural Network and PID Controller150
Self-Mining the Confident Prototypes for Source-Free Unsupervised Domain Adaptation in Image Segmentation147
Improving Pre-Trained Model-Based Speech Emotion Recognition From a Low-Level Speech Feature Perspective140
One-Shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing135
Quality Assessment for DIBR-Synthesized Views Based on Wavelet Transform and Gradient Magnitude Similarity131
Few-Shot Generative Model Adaptation via Style-Guided Prompt129
MHRN: A Multimodal Hierarchical Reasoning Network for Topic Detection124
Pixel Bleach Network for Detecting Face Forgery Under Compression123
Rethinking Affine Transform for Efficient Image Enhancement: A Color Space Perspective122
Bias-Correction Feature Learner for Semi-Supervised Instance Segmentation118
Asymptotics-Aware Multi-View Subspace Clustering117
Ensemble Prototype Networks for Unsupervised Cross-Modal Hashing With Cross-Task Consistency115
BMB: Balanced Memory Bank for Long-Tailed Semi-Supervised Learning114
MGKsite: Multi-Modal Knowledge-Driven Site Selection via Intra and Inter-Modal Graph Fusion111
Semi-Supervised Domain Adaptation for Major Depressive Disorder Detection111
Annealing Genetic GAN for Imbalanced Web Data Learning110
Adaptive HEVC Video Steganography With High Performance Based on Attention-Net and PU Partition Modes110
Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments110
Feature First: Advancing Image-Text Retrieval Through Improved Visual Features108
Deep Semantic-Consistent Penalizing Hashing for Cross-Modal Retrieval107
XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework107
Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding103
Exploring Kernel Transformations for Implicit Neural Representations100
SkyML: A MLaaS Federation Design for Multicloud-Based Multimedia Analytics98
ICE: Interactive 3D Game Character Facial Editing via Dialogue96
Total Generate: Cycle in Cycle Generative Adversarial Networks for Generating Human Faces, Hands, Bodies, and Natural Scenes96
Online Low-Light Sand-Dust Video Enhancement Using Adaptive Dynamic Brightness Correction and a Rolling Guidance Filter94
Unsupervised Learning-Based Framework for Deepfake Video Detection92
Semi-Supervised Contrastive Learning With Similarity Co-Calibration91
Scale Up Composed Image Retrieval Learning via Modification Text Generation89
Weakly-Supervised 3D Visual Grounding based on Visual Language Alignment88
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition87
Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames87
Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization85
SCSP: An Unsupervised Image-to-Image Translation Network Based on Semantic Cooperative Shape Perception84
Siamese Alignment Network for Weakly Supervised Video Moment Retrieval84
Vulnerability of Feature Extractors in 2D Image-Based 3D Object Retrieval84
Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation With Interpretability83
Interpretable Graph Convolutional Network for Multi-View Semi-Supervised Learning83
AMS-Net: Adaptive Multi-Scale Network for Image Compressive Sensing83
Bidirectional Translation Between UHD-HDR and HD-SDR Videos82
SGG-Nets: Generic Rotation-Invariant Plugin Networks for Point Cloud Analysis81
Neighborhood Contrastive Transformer for Change Captioning81
Structured Attention Network for Referring Image Segmentation80
A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition80
Progressive Local Filter Pruning for Image Retrieval Acceleration78
FoodSAM: Any Food Segmentation76
Hierarchical Equalization Loss for Long-Tailed Instance Segmentation75
Guided Image-to-Image Translation by Discriminator-Generator Communication75
A Total Variation With Joint Norms For Infrared and Visible Image Fusion75
Skeleton-Based Action Recognition With Select-Assemble-Normalize Graph Convolutional Networks75
PhotoHelper: Portrait Photographing Guidance Via Deep Feature Retrieval and Fusion74
Dynamic Contrastive Distillation for Image-Text Retrieval74
Cps-STS: Bridging the Gap Between Content and Position for Coarse-Point-Supervised Scene Text Spotter73
SLCGC: A lightweight Self-supervised Low-pass Contrastive Graph Clustering Network for Hyperspectral Images73
DREAMT: Diversity Enlarged Mutual Teaching for Unsupervised Domain Adaptive Person Re-Identification73
Unsupervised Image and Text Fusion for Travel Information Enhancement70
Show, Tell and Rephrase: Diverse Video Captioning via Two-Stage Progressive Training68
Benchmark Dataset and Pair-Wise Ranking Method for Quality Evaluation of Night-Time Image Enhancement68
0.036854028701782