IEEE Transactions on Multimedia

Papers
(The median citation count of IEEE Transactions on Multimedia is 5. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
StrongSORT: Make DeepSORT Great Again297
Human Memory Update Strategy: A Multi-Layer Template Update Mechanism for Remote Visual Monitoring223
Low-Light Image Enhancement With Semi-Decoupled Decomposition209
AttentionFGAN: Infrared and Visible Image Fusion Using Attention-Based Generative Adversarial Networks208
Extended Feature Pyramid Network for Small Object Detection205
MFDNet: Collaborative Poses Perception and Matrix Fisher Distribution for Head Pose Estimation174
Coarse-to-Fine CNN for Image Super-Resolution152
DSLR: Deep Stacked Laplacian Restorer for Low-Light Image Enhancement147
Image-to-Image Translation: Methods and Applications145
3D Room Layout Estimation From a Single RGB Image137
Consensus Graph Learning for Multi-View Clustering137
EAPT: Efficient Attention Pyramid Transformer for Image Processing131
Parameter Sharing Exploration and Hetero-Center Triplet Loss for Visible-Thermal Person Re-Identification128
Beyond Triplet Loss: Person Re-Identification With Fine-Grained Difference-Aware Pairwise Loss127
SPA-GAN: Spatial Attention GAN for Image-to-Image Translation125
Geometric Back-Projection Network for Point Cloud Classification119
TBEFN: A Two-Branch Exposure-Fusion Network for Low-Light Image Enhancement119
Adaptive Graph Completion Based Incomplete Multi-View Clustering119
Learning Deep Multi-Level Similarity for Thermal Infrared Object Tracking116
Spatio-Temporal Attention Networks for Action Recognition and Detection111
Spatial-Temporal Cascade Autoencoder for Video Anomaly Detection in Crowded Scenes107
VehicleNet: Learning Robust Visual Representation for Vehicle Re-Identification104
Predicting the Perceptual Quality of Point Cloud: A 3D-to-2D Projection-Based Exploration100
Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation100
Image-Text Multimodal Emotion Classification via Multi-View Attentional Network99
CCAFNet: Crossflow and Cross-Scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images97
YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer96
Deep Multi-View Subspace Clustering With Unified and Discriminative Learning93
Low-Rank Pairwise Alignment Bilinear Network For Few-Shot Fine-Grained Image Classification91
Stacked U-Shape Network With Channel-Wise Attention for Salient Object Detection90
Deep-IRTarget: An Automatic Target Detector in Infrared Imagery Using Dual-Domain Feature Extraction and Allocation88
Multi-View Multi-Label Learning With Sparse Feature Selection for Image Annotation87
SiamCorners: Siamese Corner Networks for Visual Tracking86
Real-Time and Accurate UAV Pedestrian Detection for Social Distancing Monitoring in COVID-19 Pandemic86
Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval85
STNReID: Deep Convolutional Networks With Pairwise Spatial Transformer Networks for Partial Person Re-Identification85
Kernelized Multiview Subspace Analysis By Self-Weighted Learning84
Anti-Forensics for Face Swapping Videos via Adversarial Training83
Learning Disentangled Representation Implicitly Via Transformer for Occluded Person Re-Identification82
Multi-Channel Deep Networks for Block-Based Image Compressive Sensing81
An Automated and Robust Image Watermarking Scheme Based on Deep Neural Networks79
A Serial Image Copy-Move Forgery Localization Scheme With Source/Target Distinguishment78
Luminance-Aware Pyramid Network for Low-Light Image Enhancement77
A Recursive Reversible Data Hiding in Encrypted Images Method With a Very High Payload76
Attribute Restoration Framework for Anomaly Detection75
3D Face Reconstruction From A Single Image Assisted by 2D Face Images in the Wild75
Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval74
MFFENet: Multiscale Feature Fusion and Enhancement Network For RGB–Thermal Urban Road Scene Parsing72
Cross-Domain Contrastive Learning for Unsupervised Domain Adaptation72
Fast Intra Mode Decision Algorithm for Versatile Video Coding72
EHPE: Skeleton Cues-Based Gaussian Coordinate Encoding for Efficient Human Pose Estimation72
Deep Fusion Feature Representation Learning With Hard Mining Center-Triplet Loss for Person Re-Identification71
BVI-DVC: A Training Database for Deep Video Compression70
Driver Yawning Detection Based on Subtle Facial Action Recognition70
Uncertainty-Aware Unsupervised Domain Adaptation in Object Detection69
VPFNet: Improving 3D Object Detection With Virtual Point Based LiDAR and Stereo Data Fusion67
Joint Contrast Enhancement and Exposure Fusion for Real-World Image Dehazing66
cmSalGAN: RGB-D Salient Object Detection With Cross-View Generative Adversarial Networks65
A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition64
RelationTrack: Relation-Aware Multiple Object Tracking With Decoupled Representation64
Illumination-Adaptive Person Re-Identification63
RGBT Salient Object Detection: A Large-Scale Dataset and Benchmark62
Cross View Capture for Stereo Image Super-Resolution62
Temporal Cross-Layer Correlation Mining for Action Recognition61
DeepDance: Music-to-Dance Motion Choreography With Adversarial Learning61
Salient Object Detection in Stereoscopic 3D Images Using a Deep Convolutional Residual Autoencoder60
Person Re-Identification in Aerial Imagery60
Robust Coding of Encrypted Images via 2D Compressed Sensing59
COLA-Net: Collaborative Attention Network for Image Restoration59
Temporal Context Mining for Learned Video Compression58
End-to-End Audiovisual Speech Recognition System With Multitask Learning58
Interactive Video Retrieval in the Age of Deep Learning – Detailed Evaluation of VBS 201958
Part-aware Progressive Unsupervised Domain Adaptation for Person Re-Identification58
Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval58
Self-Supervised Graph Convolutional Network for Multi-View Clustering57
Fine-Grained Image Captioning With Global-Local Discriminative Objective56
Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation55
Learning Dual-Level Deep Representation for Thermal Infrared Tracking55
Supervised Pixel-Wise GAN for Face Super-Resolution55
Predictive Adaptive Streaming to Enable Mobile 360-Degree and VR Experiences55
Pose-Guided Tracking-by-Detection: Robust Multi-Person Pose Tracking54
Multimodal Sentiment Analysis With Image-Text Interaction Network54
PhotoHelper: Portrait Photographing Guidance Via Deep Feature Retrieval and Fusion54
Image Compression Based on Compressive Sensing: End-to-End Comparison With JPEG52
Adversarial Network With Multiple Classifiers for Open Set Domain Adaptation52
Edge-Cloud Collaboration Enabled Video Service Enhancement: A Hybrid Human-Artificial Intelligence Scheme52
Focal Inverse Distance Transform Maps for Crowd Localization51
Optimal Volumetric Video Streaming With Hybrid Saliency Based Tiling51
Dual-Awareness Attention for Few-Shot Object Detection50
A Physiology-Based QoE Comparison of Interactive Augmented Reality, Virtual Reality and Tablet-Based Applications50
FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation50
Design and Analysis of MEC- and Proactive Caching-Based $360^{\circ }$ Mobile VR Video Streaming49
A Cuboid CNN Model With an Attention Mechanism for Skeleton-Based Action Recognition49
Multi-Focus Image Fusion Based on Multi-Scale Gradients and Image Matting49
Transformer Encoder With Multi-Modal Multi-Head Attention for Continuous Affect Recognition49
Underwater Image Enhancement With Lightweight Cascaded Network49
Density-Aware Multi-Task Learning for Crowd Counting49
V-Eye: A Vision-Based Navigation System for the Visually Impaired48
Employing Bilinear Fusion and Saliency Prior Information for RGB-D Salient Object Detection48
A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution48
Point Cloud Rendering After Coding: Impacts on Subjective and Objective Quality48
Hybrid Contrastive Learning for Unsupervised Person Re-Identification48
Fast Multi-Type Tree Partitioning for Versatile Video Coding Using a Lightweight Neural Network48
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition47
Self-Adaptive Neural Module Transformer for Visual Question Answering47
High Capacity Reversible Data Hiding in Encrypted Image Based on Intra-Block Lossless Compression47
Semantic-Supervised Infrared and Visible Image Fusion Via a Dual-Discriminator Generative Adversarial Network47
Partition-Aware Adaptive Switching Neural Networks for Post-Processing in HEVC46
Semantic Context Encoding for Accurate 3D Point Cloud Segmentation45
C-GCN: Correlation Based Graph Convolutional Network for Audio-Video Emotion Recognition45
Towards Coding for Human and Machine Vision: Scalable Face Image Coding45
Single Shot Video Object Detector45
Attribute-Aware Pedestrian Detection in a Crowd45
A Multi-Stream Graph Convolutional Networks-Hidden Conditional Random Field Model for Skeleton-Based Action Recognition44
Co-Saliency Detection Guided by Group Weakly Supervised Learning44
Salient Object Detection by Fusing Local and Global Contexts44
DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction44
ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction44
Attribute-Guided Feature Learning for Few-Shot Image Recognition44
C$^{2}$DFNet: Criss-Cross Dynamic Filter Network for RGB-D Salient Object Detection43
USID-Net: Unsupervised Single Image Dehazing Network via Disentangled Representations43
GraphIQA: Learning Distortion Graph Representations for Blind Image Quality Assessment43
Style Normalization and Restitution for Domain Generalization and Adaptation43
Hierarchical User Intent Graph Network for Multimedia Recommendation43
Joint-Bone Fusion Graph Convolutional Network for Semi-Supervised Skeleton Action Recognition43
Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval42
High Capacity Reversible Data Hiding in Encrypted Image Based on Adaptive MSB Prediction42
Learning Compact Multifeature Codes for Palmprint Recognition From a Single Training Image per Palm42
DENet: A Universal Network for Counting Crowd With Varying Densities and Scales42
MaD-DLS: Mean and Deviation of Deep and Local Similarity for Image Quality Assessment41
HAPGN: Hierarchical Attentive Pooling Graph Network for Point Cloud Segmentation41
Multi-Modal Meta Multi-Task Learning for Social Media Rumor Detection41
Deep-PCAC: An End-to-End Deep Lossy Compression Framework for Point Cloud Attributes41
Multimodal Cross-Layer Bilinear Pooling for RGBT Tracking41
Video Frame Interpolation via Generalized Deformable Convolution41
Robust Visual Tracking via Constrained Multi-Kernel Correlation Filters41
Integrating Part of Speech Guidance for Image Captioning41
An Artificial Intelligence-Based System to Assess Nutrient Intake for Hospitalised Patients40
Joint Input and Output Space Learning for Multi-Label Image Classification40
Fine-Grained Attention and Feature-Sharing Generative Adversarial Networks for Single Image Super-Resolution40
CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification40
Model-Based Joint Bit Allocation Between Geometry and Color for Video-Based 3D Point Cloud Compression39
Object-Aware Multimodal Named Entity Recognition in Social Media Posts With Adversarial Learning39
Unsupervised Adversarial Instance-Level Image Retrieval39
SAL:Selection and Attention Losses for Weakly Supervised Semantic Segmentation39
Towards Adaptive Consensus Graph: Multi-View Clustering via Graph Collaboration39
Optimal Wireless Streaming of Multi-Quality 360 VR Video By Exploiting Natural, Relative Smoothness-Enabled, and Transcoding-Enabled Multicast Opportunities39
Beyond Vision: A Multimodal Recurrent Attention Convolutional Neural Network for Unified Image Aesthetic Prediction Tasks39
Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution38
Quality-Aware Part Models for Occluded Person Re-Identification38
Subjective Evaluation of Visual Quality and Simulator Sickness of Short 360$^\circ$ Videos: ITU-T Rec. P.91938
Self-Supervised Learning for Multimedia Recommendation38
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering38
BR$^2$Net: Defocus Blur Detection Via a Bidirectional Channel Attention Residual Refining Network38
Semi-Reference Sonar Image Quality Assessment Based on Task and Visual Perception38
Learning and Fusing Multiple User Interest Representations for Micro-Video and Movie Recommendations37
Understanding More About Human and Machine Attention in Deep Neural Networks37
Interaction Relational Network for Mutual Action Recognition37
Dress With Style: Learning Style From Joint Deep Embedding of Clothing Styles and Body Shapes37
Weakly Supervised Emotion Intensity Prediction for Recognition of Emotions in Images37
Image Compressed Sensing Using Non-Local Neural Network37
An Attention-Based Unsupervised Adversarial Model for Movie Review Spam Detection37
TERA: Screen-to-Camera Image Code With Transparency, Efficiency, Robustness and Adaptability37
Viewport-Dependent Saliency Prediction in 360° Video37
Dual Convolutional LSTM Network for Referring Image Segmentation36
Textual Context-Aware Dense Captioning With Diverse Words36
Efficient Projected Frame Padding for Video-Based Point Cloud Compression36
TransIFC: Invariant Cues-aware Feature Concentration Learning for Efficient Fine-grained Bird Image Classification36
Referring Expression Comprehension: A Survey of Methods and Datasets36
Spectrum Characteristics Preserved Visible and Near-Infrared Image Fusion Algorithm36
Person Retrieval in Surveillance Videos Via Deep Attribute Mining and Reasoning36
Intermittent Contextual Learning for Keyfilter-Aware UAV Object Tracking Using Deep Convolutional Feature36
Region-Based Dehazing via Dual-Supervised Triple-Convolutional Network36
Heterogeneous Hierarchical Feature Aggregation Network for Personalized Micro-Video Recommendation36
Adaptive Partial Multi-View Hashing for Efficient Social Image Retrieval36
Graph Embedding Multi-Kernel Metric Learning for Image Set Classification With Grassmannian Manifold-Valued Features36
AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation36
Recurrent Exposure Generation for Low-Light Face Detection36
Learning Crisp Boundaries Using Deep Refinement Network and Adaptive Weighting Loss35
Spatiotemporal Dilated Convolution With Uncertain Matching for Video-Based Crowd Estimation35
Learning Non-Locally Regularized Compressed Sensing Network With Half-Quadratic Splitting35
Recognition-Oriented Image Compressive Sensing With Deep Learning35
Rate Control Method Based on Deep Reinforcement Learning for Dynamic Video Sequences in HEVC35
Dual Attention on Pyramid Feature Maps for Image Captioning35
Long Dialogue Emotion Detection Based on Commonsense Knowledge Graph Guidance35
Hierarchical Soft Quantization for Skeleton-Based Human Action Recognition34
CaptionNet: A Tailor-made Recurrent Neural Network for Generating Image Descriptions34
Unsupervised Image-to-Image Translation via Pre-Trained StyleGAN2 Network34
Graph-Based Multimodal Sequential Embedding for Sign Language Translation34
Multi-Scale Sparse Graph Convolutional Network For the Assessment of Parkinsonian Gait34
LAG-Net: Multi-Granularity Network for Person Re-Identification via Local Attention System34
Align and Tell: Boosting Text-Video Retrieval With Local Alignment and Fine-Grained Supervision34
Hybrid Refinement-Correction Heatmaps for Human Pose Estimation34
Deep Reinforcement Polishing Network for Video Captioning34
Weakly-Supervised Facial Expression Recognition in the Wild With Noisy Data34
A Unified Transformer Framework for Group-Based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection34
Exploring Global and Local Linguistic Representations for Text-to-Image Synthesis34
Adaptive Deep Metric Learning for Affective Image Retrieval and Classification34
Does Thermal Really Always Matter for RGB-T Salient Object Detection?34
Viewport-Aware Deep Reinforcement Learning Approach for 360$^\circ$ Video Caching33
Entity-Oriented Multi-Modal Alignment and Fusion Network for Fake News Detection33
Dual Transformer for Point Cloud Analysis33
M-GCN: Multi-Branch Graph Convolution Network for 2D Image-based on 3D Model Retrieval33
Deep Multi-Patch Matching Network for Visible Thermal Person Re-Identification33
Semantically Meaningful Class Prototype Learning for One-Shot Image Segmentation33
Hierarchical Consensus Hashing for Cross-Modal Retrieval33
Voxel Structure-Based Mesh Reconstruction From a 3D Point Cloud33
Speech Driven Talking Face Generation From a Single Image and an Emotion Condition33
Frame-Wise Cross-Modal Matching for Video Moment Retrieval33
LD-MAN: Layout-Driven Multimodal Attention Network for Online News Sentiment Recognition33
Spatial-Channel Enhanced Transformer for Visible-Infrared Person Re-Identification33
Discriminative Invariant Alignment for Unsupervised Domain Adaptation33
Hierarchical Context Features Embedding for Object Detection33
Building and Using Personal Knowledge Graph to Improve Suicidal Ideation Detection on Social Media33
R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection33
Arbitrarily-Oriented Text Detection in Low Light Natural Scene Images33
Dynamic Objectives Learning for Facial Expression Recognition32
Graph Signal Processing for Geometric Data and Beyond: Theory and Applications32
Context-Dependent Propagating-Based Video Recommendation in Multimodal Heterogeneous Information Networks32
Multimodal Marketing Intent Analysis for Effective Targeted Advertising32
Multi-Scale Fine-Grained Alignments for Image and Sentence Matching32
A Two-Stage Triplet Network Training Framework for Image Retrieval32
Mask Cross-Modal Hashing Networks32
Alleviating Modality Bias Training for Infrared-Visible Person Re-Identification32
Contrastive Attention for Video Anomaly Detection32
Perceptual Image Hashing With Texture and Invariant Vector Distance for Copy Detection32
DeepFacade: A Deep Learning Approach to Facade Parsing With Symmetric Loss31
Learning Dual-Pooling Graph Neural Networks for Few-Shot Video Classification31
Theme Transformer: Symbolic Music Generation With Theme-Conditioned Transformer31
Robust Visual Object Tracking Via Adaptive Attribute-Aware Discriminative Correlation Filters31
Deep Co-Image-Label Hashing for Multi-Label Image Retrieval31
Staged Sketch-to-Image Synthesis via Semi-supervised Generative Adversarial Networks31
Mobile Streaming of Live 360-Degree Videos31
Global-Local Label Correlation for Partial Multi-Label Learning31
Joint Deep Learning of Facial Expression Synthesis and Recognition31
Objective Quality Assessment of Lenslet Light Field Image Based on Focus Stack31
Intra-Inter View Interaction Network for Light Field Image Super-Resolution31
Crowd Counting Via Perspective-Guided Fractional-Dilation Convolution30
Cross-Modal Dynamic Networks for Video Moment Retrieval With Text Query30
Pixel-Level Non-local Image Smoothing With Objective Evaluation30
SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries30
Anti-UAV: A Large-Scale Benchmark for Vision-Based UAV Tracking30
Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly Supervised Semantic Segmentation30
Adversarial-Metric Learning for Audio-Visual Cross-Modal Matching30
Projective Multiple Kernel Subspace Clustering30
Deep Collaborative Discrete Hashing With Semantic-Invariant Structure Construction30
Low-Light Image Enhancement via Self-Reinforced Retinex Projection Model30
DualGNN: Dual Graph Neural Network for Multimedia Recommendation30
Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching30
Universal Chosen-Ciphertext Attack for a Family of Image Encryption Schemes30
0.066009998321533