IEEE Transactions on Multimedia

Papers
(The TQCC of IEEE Transactions on Multimedia is 9. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-02-01 to 2025-02-01.)
ArticleCitations
Editorial378
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering221
Hybrid Motion Representation Learning for Prediction From Raw Sensor Data179
Conditional Consistency Regularization for Semi-Supervised Multi-Label Image Classification161
Disaggregation Distillation for Person Search158
Cross-Modal Cognitive Consensus Guided Audio–Visual Segmentation131
PRA-Det: Anchor-Free Oriented Object Detection With Polar Radius Representation127
Content-Aware Tunable Selective Encryption for HEVC Using Sine-Modular Chaotification Model119
Multi-Perspective Pseudo-Label Generation and Confidence-Weighted Training for Semi-Supervised Semantic Segmentation110
Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding100
HNR-ISC: Hybrid Neural Representation for Image Set Compression96
JPEG Image Encryption With DC Rotation and Undivided RSV-Based AC Group Permutation92
Video Instance Segmentation Without Using Mask and Identity Supervision91
PointAttention: Rethinking Feature Representation and Propagation in Point Cloud88
Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization86
PointMCD: Boosting Deep Point Cloud Encoders Via Multi-View Cross-Modal Distillation for 3D Shape Recognition86
CarveNet: Carving Point-Block for Complex 3D Shape Completion85
Gait Recognition With Multi-Level Skeleton-Guided Refinement79
TANet: Target Attention Network for Video Bit-Depth Enhancement79
Semi-Supervised Medical Report Generation via Graph-Guided Hybrid Feature Consistency77
LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation75
CroMM-VSR: Cross-Modal Memory Augmented Visual Speech Recognition75
Generative Essential Graph Convolutional Network for Multi-View Semi-Supervised Classification75
Coherent Image Animation Using Spatial-Temporal Correspondence72
Domain Adaptive Transformer Tracking Under Occlusions70
PPM-SEM: A Privacy-Preserving Mechanism for Sharing Electronic Patient Records and Medical Images in Telemedicine69
Blind Image Quality Assessment via Transformer Predicted Error Map and Perceptual Quality Token69
Boosting Robust Learning Via Leveraging Reusable Samples in Noisy Web Data67
Weakly Supervised Instance Segmentation by Exploring Entire Object Regions66
Temporal Attention-Pyramid Pooling for Temporal Action Detection66
Semi-Supervised Contrastive Learning With Similarity Co-Calibration64
Iterative Network for Image Super-Resolution64
Lightweight Video-Based Respiration Rate Detection Algorithm: An Application Case on Intensive Care64
Semantic Image Segmentation by Dynamic Discriminative Prototypes61
Multi-Level Second-Order Few-Shot Learning60
ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image Translation59
Multi-Dimensional Attention With Similarity Constraint for Weakly-Supervised Temporal Action Localization57
Action Coherence Network for Weakly-Supervised Temporal Action Localization56
Unified Low-Rank Tensor Learning and Spectral Embedding for Multi-View Subspace Clustering56
Context-Aware 3D Point Cloud Semantic Segmentation With Plane Guidance54
ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning54
Annealing Genetic GAN for Imbalanced Web Data Learning54
Fine-Grained Attention and Feature-Sharing Generative Adversarial Networks for Single Image Super-Resolution52
A Semi-Fragile Reversible Watermarking for Authenticating 3D Models Based on Virtual Polygon Projection and Double Modulation Strategy51
Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning50
Graph Contrastive Partial Multi-View Clustering50
TSFNet: Triple-Steam Image Captioning50
Building Multimodal Knowledge Bases With Multimodal Computational Sequences and Generative Adversarial Networks49
Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations49
3D Holoscopic Image Compression Based on Gaussian Mixture Model49
Neighborhood Contrastive Transformer for Change Captioning48
Weakly Supervised Temporal Adjacent Network for Language Grounding47
Provably Secure Robust Image Steganography47
Dynamic Residual Filtering With Laplacian Pyramid for Instance Segmentation46
Comment-Context Dual Collaborative Masked Transformer Network for Fake News Detection45
DBiased-P: Dual-Biased Predicate Predictor for Unbiased Scene Graph Generation45
Effective End-to-End Vision Language Pretraining With Semantic Visual Loss44
Self-Supervised Fine-Grained Cycle-Separation Network (FSCN) for Visual-Audio Separation44
Semi-Supervised Learning of Perceptual Video Quality by Generating Consistent Pairwise Pseudo-Ranks44
Adversarial Learning Guided Task Relatedness Refinement for Multi-Task Deep Learning44
Learning Representations by Contrastive Spatio-Temporal Clustering for Skeleton-Based Action Recognition43
Seek Common Ground While Reserving Differences: A Model-Agnostic Module for Noisy Domain Adaptation43
Self-Supervised Learning for Heterogeneous Audiovisual Scene Analysis43
Double-Domain Adaptation Semantics for Retrieval-Based Long-Term Visual Localization43
TFRNet: Semantic Segmentation Network with Token Filtration and Refinement Method42
Inter- and Intra-Domain Potential User Preferences for Cross-Domain Recommendation42
Incomplete Multi-View Clustering via Correntropy and Complement Consensus Learning42
Bidirectional Knowledge Reconfiguration for Lightweight Point Cloud Analysis41
Exploiting Temporal Correlations for 3D Human Pose Estimation41
LHNetV2: A Balanced Low-Cost Hybrid Network for Single Image Dehazing41
Global Representation Guided Adaptive Fusion Network for Stable Video Crowd Counting41
Leveraging the Video-Level Semantic Consistency of Event for Audio-Visual Event Localization40
TIF: Threshold Interception and Fusion for Compact and Fine-Grained Visual Attribution40
SmartSit: Sitting Posture Recognition Through Acoustic Sensing on Smartphones39
Automatic Generation of Interactive Nonlinear Video for Online Apparel Shopping Navigation39
Context-Patch Representation Learning With Adaptive Neighbor Embedding for Robust Face Image Super-Resolution39
GPS2Vec: Pre-Trained Semantic Embeddings for Worldwide GPS Coordinates39
Exploring Zero-Shot Emotion Recognition in Speech Using Semantic-Embedding Prototypes38
No-Reference Light Field Image Quality Assessment Using Four-Dimensional Sparse Transform38
Towards Comprehensive Monocular Depth Estimation: Multiple Heads are Better Than One38
Intra- and Inter-Class Induced Discriminative Deep Dictionary Learning for Visual Recognition38
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes With Semantic Consistency and Attention Mechanism38
Grouping by Center: Predicting Centripetal Offsets for the Bottom-up Human Pose Estimation38
DDAug: Differentiable Data Augmentation for Weakly Supervised Semantic Segmentation37
Cross-Modality Spatial-Temporal Transformer for Video-Based Visible-Infrared Person Re-Identification37
Pixel Bleach Network for Detecting Face Forgery Under Compression36
From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios36
Clicking Matters: Towards Interactive Human Parsing36
A Robust Local Texture Descriptor in the Parametric Space of the Weibull Distribution36
Depth-Induced Gap-Reducing Network for RGB-D Salient Object Detection: An Interaction, Guidance and Refinement Approach36
Quality Assessment for DIBR-Synthesized Views Based on Wavelet Transform and Gradient Magnitude Similarity36
Attention Map Guided Transformer Pruning for Occluded Person Re-Identification on Edge Device36
Robust Saliency-Aware Distillation for Few-Shot Fine-Grained Visual Recognition35
SRDRL: A Blind Super-Resolution Framework With Degradation Reconstruction Loss35
From Front to Rear: 3D Semantic Scene Completion Through Planar Convolution and Attention-Based Network35
Region Separable Stereo Matching35
Stealthy Physical Masked Face Recognition Attack via Adversarial Style Optimization34
M$^{3}$ANet: Multi-Modal and Multi-Attention Fusion Network for Ship License Plate Recognition34
A Commonality Modeling Framework for Enhanced Video Coding Leveraging on the Cuboidal Partitioning Based Representation of Frames34
Learning Fashion Compatibility With Context Conditioning Embedding34
Refining Uncertain Features With Self-Distillation for Face Recognition and Person Re-Identification34
Inter-Modal Masked Autoencoder for Self-Supervised Learning on Point Clouds34
InDecGAN: Learning to Generate Complex Images From Captions via Independent Object-Level Decomposition and Enhancement33
Camera Topology Graph Guided Vehicle Re-Identification33
Fast Monocular Depth Estimation via Side Prediction Aggregation with Continuous Spatial Refinement33
Contrastive JS: A Novel Scheme for Enhancing the Accuracy and Robustness of Deep Models33
A Novel Video Stabilization Model With Motion Morphological Component Priors33
Guided Image-to-Image Translation by Discriminator-Generator Communication33
Multi-Source Style Transfer via Style Disentanglement Network33
Deep Enhanced Weakly-Supervised Hashing With Iterative Tag Refinement33
Reinforcement Learning for Logic Recipe Generation: Bridging Gaps From Images to Plans32
List-Wise Rank Learning for Stereoscopic Image Retargeting Quality Assessment32
Feature First: Advancing Image-Text Retrieval Through Improved Visual Features32
CMAT: Integrating Convolution Mixer and Self-Attention for Visual Tracking32
DASI: Learning Domain Adaptive Shape Impression for 3D Object Reconstruction31
Multi-Label Continual Learning Using Augmented Graph Convolutional Network31
High-Quality Reconstruction of Depth Maps From Graph-Based Non-Uniform Sampling31
CrossNet: Cross-scene Background Subtraction Network via 3D Optical Flow31
List of Reviewers31
STAT: Multi-Object Tracking Based on Spatio-Temporal Topological Constraints31
Transformer-Based High-Fidelity Facial Displacement Completion for Detailed 3D Face Reconstruction30
Live 360° Video Streaming to Heterogeneous Clients in 5G Networks30
Disguised Heterogeneous Face Generation With Iterative-Adversarial Style Unification30
Integration of Global and Local Knowledge for Foreground Enhancing in Weakly Supervised Temporal Action Localization30
Federated Adversarial Domain Hallucination for Privacy-Preserving Domain Generalization30
Non-Orthogonal Multiple Access Enhanced Scalable 360-Degree Video Multicast30
Multi-Level Transitional Contrast Learning for Personalized Image Aesthetics Assessment30
Towards Real-Time Video Caching at Edge Servers: A Cost-Aware Deep Q-Learning Solution29
Deconfounding Causal Inference for Zero-Shot Action Recognition29
Spatial-Temporal Action Localization With Hierarchical Self-Attention29
AdaCrowd: Unlabeled Scene Adaptation for Crowd Counting29
Adaptive HEVC Video Steganography With High Performance Based on Attention-Net and PU Partition Modes29
Multi-Sentence Complementarily Generation for Text-to-Image Synthesis29
Estimating the Secret Key of Spread Spectrum Watermarking Based on Equivalent Keys29
PH-GCN: Person Retrieval With Part-Based Hierarchical Graph Convolutional Network28
IEEE Transactions on Multimedia Publication Information28
Enhancing Cross-task Transferability of Adversarial Examples via Spatial and Channel Attention28
Robust Geometry-Dependent Attack for 3D Point Clouds28
Towards Adaptive Multi-Scale Intermediate Domain via Progressive Training for Unsupervised Domain Adaptation28
Decoder-Side Cross Resolution Synthesis for Video Compression Enhancement28
An Efficient Ungrouped Mask Method with Two Learnable Parameters for 3D Object Detection28
Combining Retargeting Quality and Depth Perception Measures for Quality Evaluation of Retargeted Stereopairs28
RFMask: A Simple Baseline for Human Silhouette Segmentation With Radio Signals27
Few-Shot Generative Model Adaptation via Style-Guided Prompt27
A Benchmark for Controllable Text -Image-to-Video Generation27
RFGAN: RF-Based Human Synthesis27
Optimal Transport-Based Patch Matching for Image Style Transfer27
Anchor Graph-Based Feature Selection for One-Step Multi-View Clustering27
Deep Hashing Network With Hybrid Attention and Adaptive Weighting for Image Retrieval27
Cycle-Free Weakly Referring Expression Grounding With Self-Paced Learning27
Region-Aware Arbitrary-Shaped Text Detection With Progressive Fusion27
SVGC-AVA: 360-Degree Video Saliency Prediction With Spherical Vector-Based Graph Convolution and Audio-Visual Attention27
CrowdCaption++: Collective-Guided Crowd Scenes Captioning27
Gait Recognition With Drones: A Benchmark26
Domain Adaptive LiDAR Point Cloud Segmentation With 3D Spatial Consistency26
Learning from Mistakes: Self-Regularizing Hierarchical Representations in Point Cloud Semantic Segmentation26
Learning With Imbalanced Noisy Data by Preventing Bias in Sample Selection26
Flexible Alignment Super-Resolution Network for Multi-Contrast Magnetic Resonance Imaging26
Learning Stage-Wise GANs for Whistle Extraction in Time-Frequency Spectrograms26
Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic Distribution26
Positive Unlabeled Fake News Detection via Multi-Modal Masked Transformer Network26
CRADA: Cross Domain Object Detection With Cyclic Reconstruction and Decoupling Adaptation26
Negative-Sensitive Framework With Semantic Enhancement for Composed Image Retrieval25
Self-Mining the Confident Prototypes for Source-Free Unsupervised Domain Adaptation in Image Segmentation25
Meta Noise Adaption Framework for Multimodal Sentiment Analysis With Feature Noise25
A Progressive Placeholder Learning Network for Multimodal Zero-Shot Learning25
A New Data Augmentation Method Based on Mixup and Dempster-Shafer Theory25
DCRP: Class-Aware Feature Diffusion Constraint and Reliable Pseudo-Labeling for Imbalanced Semi-Supervised Learning25
MHRN: A Multimodal Hierarchical Reasoning Network for Topic Detection25
DMH-CL: Dynamic Model Hardness Based Curriculum Learning for Complex Pose Estimation24
Bio-Inspired Multi-Scale Contourlet Attention Networks24
Tensor Product and Tensor-Singular Value Decomposition Based Multi-Exposure Fusion of Images24
Audio-Visual Contrastive and Consistency Learning for Semi-Supervised Action Recognition24
Gated SwitchGAN for Multi-Domain Facial Image Translation24
Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality Contribution24
Semantic-Aware Triplet Loss for Image Classification24
The Model May Fit You: User-Generalized Cross-Modal Retrieval24
Exploiting Web Images for Fine-Grained Visual Recognition via Dynamic Loss Correction and Global Sample Selection24
Progressive Motion Boosting for Video Frame Interpolation24
Exploiting Low-Rank Latent Gaussian Graphical Model Estimation for Visual Sentiment Distributions24
Neural Logic Vision Language Explainer24
Post-Distillation via Neural Resuscitation24
Hierarchical Equalization Loss for Long-Tailed Instance Segmentation24
Dual Noise Elimination and Dynamic Label Correlation Guided Partial Multi-Label Learning24
Active Gradual Domain Adaptation: Dataset and Approach23
Rethinking Affine Transform for Efficient Image Enhancement: A Color Space Perspective23
Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution23
Learning Label Semantics for Weakly Supervised Group Activity Recognition23
Focusing on Subtle Differences: A Feature Disentanglement Model for Series Photo Selection23
Bilateral Fast Low-Rank Representation With Equivalent Transformation for Subspace Clustering23
NIR-Assisted Image Denoising: A Selective Fusion Approach and A Real-World Benchmark Dataset23
Progressive Knowledge Distillation from Different Levels of Teachers for Online Action Detection23
RV-TMO: Large-Scale Dataset for Subjective Quality Assessment of Tone Mapped Images23
Decoupled Representation Learning for Character Glyph Synthesis23
No-Reference Point Cloud Quality Assessment via Graph Convolutional Network23
Alignment-Guided Self-Supervised Learning for Diagram Question Answering23
BASICS: Broad Quality Assessment of Static Point Clouds in a Compression Scenario23
Image Aesthetics Assessment Based on Hypernetwork of Emotion Fusion23
DFR-Net: Density Feature Refinement Network for Image Dehazing Utilizing Haze Density Difference23
Leveraging Enriched Skeleton Representation with Multi-relational Metrics for Few-shot Action Recognition22
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition22
MDANet: Modality-Aware Domain Alignment Network for Visible-Infrared Person Re-Identification22
HFGlobalFormer: When High-Frequency Recovery Meets Global Context Modeling for Compressed Image Deraindrop22
Discriminative Anchor Learning for Efficient Multi-view Clustering22
DeepEraser: Deep Iterative Context Mining for Generic Text Eraser22
StyleAM: Perception-Oriented Unsupervised Domain Adaption for No-reference Image Quality Assessment22
MGKsite: Multi-Modal Knowledge-Driven Site Selection via Intra and Inter-Modal Graph Fusion22
Relation Inference Enhancement Network for Visual Commonsense Reasoning22
Generalizable Prompt Learning via Gradient Constrained Sharpness-aware Minimization22
Enhancing Distributed Source Coding with Encoder-Centric Frequency Adaptation and Spatial Transformation22
SkyML: A MLaaS Federation Design for Multicloud-based Multimedia Analytics22
MulDeF: A Model-Agnostic Debiasing Framework for Robust Multimodal Sentiment Analysis22
Viscoelastic Cluster-constrained PBD-based Soft Tissue Behavior and Interactive Media Applications for Surgical Simulation22
Efficient Image Super-Resolution with Feature Interaction Weighted Hybrid Network22
Masked Attribute Description Embedding for Cloth-Changing Person Re-identification22
Towards Fast and Robust Real Image Denoising With Attentive Neural Network and PID Controller21
SCSP: An Unsupervised Image-to-Image Translation Network Based on Semantic Cooperative Shape Perception21
IRVR: A General Image Restoration Framework for Visual Recognition21
Beyond Triplet Loss: Meta Prototypical N-Tuple Loss for Person Re-identification21
Multi-granularity Context Perception Network for Open Set Recognition of Camouflaged Objects21
End-to-End Rain Removal Network Based on Progressive Residual Detail Supplement21
A Boundary-Aware Network for Shadow Removal21
Relation-Aware Compositional Zero-Shot Learning for Attribute-Object Pair Recognition21
Bi-RSTU: Bidirectional Recurrent Upsampling Network for Space-Time Video Super-Resolution21
Multi-Source Multi-Label Learning for User Profiling in Online Games21
UniMF: A Unified Multimodal Framework for Multimodal Sentiment Analysis in Missing Modalities and Unaligned Multimodal Sequences21
Personalized Representation With Contrastive Loss for Recommendation Systems21
Caching in Dynamic Environments: A Near-Optimal Online Learning Approach21
Character-Aware Sampling and Rectification for Scene Text Recognition20
Zero-Shot Predicate Prediction for Scene Graph Parsing20
E-Commerce Storytelling Recommendation Using Attentional Domain-Transfer Network and Adversarial Pre-Training20
Bias-Correction Feature Learner for Semi-Supervised Instance Segmentation20
Vulnerability of Feature Extractors in 2D Image-Based 3D Object Retrieval20
GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning20
A Graph-Based Discriminator Architecture for Multi-Attribute Facial Image Editing20
Intra-Class Adaptive Augmentation With Neighbor Correction for Deep Metric Learning20
FP-AGL: Filter Pruning With Adaptive Gradient Learning for Accelerating Deep Convolutional Neural Networks20
Bal-R$^2$CNN: High Quality Recurrent Object Detection With Balance Optimization20
Bridging the Gap Between Semantic Segmentation and Instance Segmentation20
Multi-Range View Aggregation Network With Vision Transformer Feature Fusion for 3D Object Retrieval20
Progressive Bidirectional Feature Extraction and Enhancement Network for Quality Evaluation of Night-Time Images20
CPG3D: Cross-Modal Priors Guided 3D Object Reconstruction20
Self-Supervised Monocular Depth Estimation With Frequency-Based Recurrent Refinement20
MIGN: Multiscale Image Generation Network for Remote Sensing Image Semantic Segmentation20
Location-Free Camouflage Generation Network20
High Capacity Reversible Data Hiding in Encrypted Image Based on Adaptive MSB Prediction20
Causal Interventional Training for Image Recognition20
DOC: Text Recognition via Dual Adaptation and Clustering20
Multi-Vehicle Multi-Camera Tracking With Graph-Based Tracklet Features20
Delving Into Important Samples of Semi-Supervised Old Photo Restoration: A New Dataset and Method19
0.13757991790771