IEEE Transactions on Multimedia

Papers
(The median citation count of IEEE Transactions on Multimedia is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-04-01 to 2024-04-01.)
ArticleCitations
A Strong Baseline and Batch Normalization Neck for Deep Person Re-Identification328
Human Memory Update Strategy: A Multi-Layer Template Update Mechanism for Remote Visual Monitoring216
Low-Light Image Enhancement With Semi-Decoupled Decomposition178
Extended Feature Pyramid Network for Small Object Detection152
AttentionFGAN: Infrared and Visible Image Fusion Using Attention-Based Generative Adversarial Networks150
StrongSORT: Make DeepSORT Great Again143
MFDNet: Collaborative Poses Perception and Matrix Fisher Distribution for Head Pose Estimation142
Reversible Data Hiding in Encrypted Images Based on Multi-MSB Prediction and Huffman Coding141
3D Room Layout Estimation From a Single RGB Image135
Coarse-to-Fine CNN for Image Super-Resolution126
DSLR: Deep Stacked Laplacian Restorer for Low-Light Image Enhancement117
Automated Colorization of a Grayscale Image With Seed Points Propagation116
Beyond Triplet Loss: Person Re-Identification With Fine-Grained Difference-Aware Pairwise Loss109
Image-to-Image Translation: Methods and Applications106
Parameter Sharing Exploration and Hetero-Center Triplet Loss for Visible-Thermal Person Re-Identification101
Learning Deep Multi-Level Similarity for Thermal Infrared Object Tracking101
SPA-GAN: Spatial Attention GAN for Image-to-Image Translation101
Spatio-Temporal Attention Networks for Action Recognition and Detection97
A Dilated Inception Network for Visual Saliency Prediction94
Geometric Back-Projection Network for Point Cloud Classification93
Consensus Graph Learning for Multi-View Clustering90
Spatial-Temporal Cascade Autoencoder for Video Anomaly Detection in Crowded Scenes86
Adaptive Graph Completion Based Incomplete Multi-View Clustering86
TBEFN: A Two-Branch Exposure-Fusion Network for Low-Light Image Enhancement84
VehicleNet: Learning Robust Visual Representation for Vehicle Re-Identification81
CCAFNet: Crossflow and Cross-Scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images80
STNReID: Deep Convolutional Networks With Pairwise Spatial Transformer Networks for Partial Person Re-Identification80
Jointly Learning Kernel Representation Tensor and Affinity Matrix for Multi-View Clustering77
Kernelized Multiview Subspace Analysis By Self-Weighted Learning77
Image-Text Multimodal Emotion Classification via Multi-View Attentional Network76
Food Recommendation: Framework, Existing Solutions, and Challenges76
Real-Time and Accurate UAV Pedestrian Detection for Social Distancing Monitoring in COVID-19 Pandemic74
Deep Multi-View Subspace Clustering With Unified and Discriminative Learning74
Stacked U-Shape Network With Channel-Wise Attention for Salient Object Detection73
An Improved Reversible Data Hiding in Encrypted Images Using Parametric Binary Tree Labeling73
ATMFN: Adaptive-Threshold-Based Multi-Model Fusion Network for Compressed Face Hallucination71
Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning71
Low-Rank Pairwise Alignment Bilinear Network For Few-Shot Fine-Grained Image Classification71
SiamCorners: Siamese Corner Networks for Visual Tracking71
EHPE: Skeleton Cues-based Gaussian Coordinate Encoding for Efficient Human Pose Estimation70
Multi-Channel Deep Networks for Block-Based Image Compressive Sensing69
Multi-View Multi-Label Learning With Sparse Feature Selection for Image Annotation69
Predicting the Perceptual Quality of Point Cloud: A 3D-to-2D Projection-Based Exploration66
Deep-IRTarget: An Automatic Target Detector in Infrared Imagery Using Dual-Domain Feature Extraction and Allocation66
A Flexible Deep CNN Framework for Image Restoration66
Deep Fusion Feature Representation Learning With Hard Mining Center-Triplet Loss for Person Re-Identification65
Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation65
PointHop: An Explainable Machine Learning Method for Point Cloud Classification65
Anti-Forensics for Face Swapping Videos via Adversarial Training65
Luminance-Aware Pyramid Network for Low-Light Image Enhancement65
Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval65
PixelRL: Fully Convolutional Network With Reinforcement Learning for Image Processing64
Interact as You Intend: Intention-Driven Human-Object Interaction Detection64
Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval63
A Recursive Reversible Data Hiding in Encrypted Images Method With a Very High Payload63
WSCNet: Weakly Supervised Coupled Networks for Visual Sentiment Classification and Detection62
Illumination-Adaptive Person Re-Identification62
An Automated and Robust Image Watermarking Scheme Based on Deep Neural Networks62
YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer61
Learning Disentangled Representation Implicitly Via Transformer for Occluded Person Re-Identification61
MFFENet: Multiscale Feature Fusion and Enhancement Network For RGB–Thermal Urban Road Scene Parsing60
Hierarchical Attention Network for Visually-Aware Food Recommendation60
3D Face Reconstruction From A Single Image Assisted by 2D Face Images in the Wild60
MRFN: Multi-Receptive-Field Network for Fast and Accurate Single Image Super-Resolution59
Fast Intra Mode Decision Algorithm for Versatile Video Coding59
Attribute Restoration Framework for Anomaly Detection58
2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling57
A Serial Image Copy-Move Forgery Localization Scheme With Source/Target Distinguishment57
cmSalGAN: RGB-D Salient Object Detection With Cross-View Generative Adversarial Networks57
Interactive Video Retrieval in the Age of Deep Learning – Detailed Evaluation of VBS 201956
Uncertainty-Aware Unsupervised Domain Adaptation in Object Detection55
Salient Object Detection in Stereoscopic 3D Images Using a Deep Convolutional Residual Autoencoder55
Driver Yawning Detection Based on Subtle Facial Action Recognition54
CKD: Cross-Task Knowledge Distillation for Text-to-Image Synthesis54
Part-aware Progressive Unsupervised Domain Adaptation for Person Re-Identification54
2-D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs53
The Prediction of Saliency Map for Head and Eye Movements in 360 Degree Images52
Learning Normal Patterns via Adversarial Attention-Based Autoencoder for Abnormal Event Detection in Videos52
Temporal Cross-Layer Correlation Mining for Action Recognition51
Person Re-Identification in Aerial Imagery51
Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval50
End-to-End Audiovisual Speech Recognition System With Multitask Learning50
Weighted and Class-Specific Maximum Mean Discrepancy for Unsupervised Domain Adaptation50
Cross View Capture for Stereo Image Super-Resolution50
Multi-View Saliency Guided Deep Neural Network for 3-D Object Retrieval and Classification50
VPFNet: Improving 3D Object Detection With Virtual Point Based LiDAR and Stereo Data Fusion48
Efficient Supervised Discrete Multi-View Hashing for Large-Scale Multimedia Search48
Fine-Grained Image Captioning With Global-Local Discriminative Objective48
RGBT Salient Object Detection: A Large-Scale Dataset and Benchmark48
Pose-Guided Tracking-by-Detection: Robust Multi-Person Pose Tracking48
Energy Compaction-Based Image Compression Using Convolutional AutoEncoder47
BVI-DVC: A Training Database for Deep Video Compression46
Multimedia Intelligence: When Multimedia Meets Artificial Intelligence46
Content-Based Light Field Image Compression Method With Gaussian Process Regression45
Joint Contrast Enhancement and Exposure Fusion for Real-World Image Dehazing45
Robust Coding of Encrypted Images via 2D Compressed Sensing45
A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition44
Density-Aware Multi-Task Learning for Crowd Counting44
Edge-Cloud Collaboration Enabled Video Service Enhancement: A Hybrid Human-Artificial Intelligence Scheme43
iWave: CNN-Based Wavelet-Like Transform for Image Compression43
Adversarial Network With Multiple Classifiers for Open Set Domain Adaptation43
Image Compression Based on Compressive Sensing: End-to-End Comparison With JPEG43
Sensor-Augmented Neural Adaptive Bitrate Video Streaming on UAVs43
A Physiology-Based QoE Comparison of Interactive Augmented Reality, Virtual Reality and Tablet-Based Applications43
RelationTrack: Relation-Aware Multiple Object Tracking With Decoupled Representation43
Predictive Adaptive Streaming to Enable Mobile 360-Degree and VR Experiences43
A Cuboid CNN Model With an Attention Mechanism for Skeleton-Based Action Recognition43
DENet: A Universal Network for Counting Crowd With Varying Densities and Scales41
Robust Visual Tracking via Constrained Multi-Kernel Correlation Filters41
Relation Attention for Temporal Action Localization41
A Multi-Stream Graph Convolutional Networks-Hidden Conditional Random Field Model for Skeleton-Based Action Recognition41
Design and Analysis of MEC- and Proactive Caching-Based $360^{\circ }$ Mobile VR Video Streaming41
Supervised Pixel-Wise GAN for Face Super-Resolution41
An Artificial Intelligence-Based System to Assess Nutrient Intake for Hospitalised Patients40
DeepDance: Music-to-Dance Motion Choreography With Adversarial Learning40
Referring Image Segmentation by Generative Adversarial Learning40
Cross-Domain Contrastive Learning for Unsupervised Domain Adaptation40
COLA-Net: Collaborative Attention Network for Image Restoration40
Single Shot Video Object Detector40
Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation39
Bidirectional Attention-Recognition Model for Fine-Grained Object Classification39
Fast Multi-Type Tree Partitioning for Versatile Video Coding Using a Lightweight Neural Network39
Saliency Detection via a Multiple Self-Weighted Graph-Based Manifold Ranking39
EAPT: Efficient Attention Pyramid Transformer for Image Processing38
Self-Supervised Graph Convolutional Network for Multi-View Clustering38
Attribute-Guided Feature Learning for Few-Shot Image Recognition38
Hybrid Contrastive Learning for Unsupervised Person Re-Identification38
C-GCN: Correlation Based Graph Convolutional Network for Audio-Video Emotion Recognition38
High Capacity Reversible Data Hiding in Encrypted Image Based on Intra-Block Lossless Compression38
Employing Bilinear Fusion and Saliency Prior Information for RGB-D Salient Object Detection38
Self-Adaptive Neural Module Transformer for Visual Question Answering38
Frame Augmented Alternating Attention Network for Video Question Answering38
Learning Dual-Level Deep Representation for Thermal Infrared Tracking38
Transformer Encoder With Multi-Modal Multi-Head Attention for Continuous Affect Recognition38
Attribute-Aware Pedestrian Detection in a Crowd37
Partition-Aware Adaptive Switching Neural Networks for Post-Processing in HEVC37
Point Cloud Rendering After Coding: Impacts on Subjective and Objective Quality37
Towards Coding for Human and Machine Vision: Scalable Face Image Coding37
Accurate and Robust Video Saliency Detection via Self-Paced Diffusion37
Learning Compact Multifeature Codes for Palmprint Recognition From a Single Training Image per Palm37
MaD-DLS: Mean and Deviation of Deep and Local Similarity for Image Quality Assessment37
Focal Inverse Distance Transform Maps for Crowd Localization37
DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction37
Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval37
Salient Object Detection by Fusing Local and Global Contexts37
CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification36
Co-Saliency Detection Guided by Group Weakly Supervised Learning36
V-Eye: A Vision-Based Navigation System for the Visually Impaired35
BR$^2$Net: Defocus Blur Detection Via a Bidirectional Channel Attention Residual Refining Network35
Multi-Focus Image Fusion Based on Multi-Scale Gradients and Image Matting34
Semantic Context Encoding for Accurate 3D Point Cloud Segmentation34
Video Frame Interpolation via Generalized Deformable Convolution34
Learning the Traditional Art of Chinese Calligraphy via Three-Dimensional Reconstruction and Assessment34
Interaction Relational Network for Mutual Action Recognition34
Spectrum Characteristics Preserved Visible and Near-Infrared Image Fusion Algorithm33
Learning and Fusing Multiple User Interest Representations for Micro-Video and Movie Recommendations33
Show, Tell, and Polish: Ruminant Decoding for Image Captioning33
An Attention-Based Unsupervised Adversarial Model for Movie Review Spam Detection33
GraphIQA: Learning Distortion Graph Representations for Blind Image Quality Assessment33
Multimodal Cross-Layer Bilinear Pooling for RGBT Tracking33
Intermittent Contextual Learning for Keyfilter-Aware UAV Object Tracking Using Deep Convolutional Feature33
Adversarial Attribute-Text Embedding for Person Search With Natural Language Query33
Optimal Wireless Streaming of Multi-Quality 360 VR Video By Exploiting Natural, Relative Smoothness-Enabled, and Transcoding-Enabled Multicast Opportunities33
Dual Convolutional LSTM Network for Referring Image Segmentation32
Reduced Reference Stereoscopic Image Quality Assessment Using Sparse Representation and Natural Scene Statistics32
Fine-Grained Attention and Feature-Sharing Generative Adversarial Networks for Single Image Super-Resolution32
ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction32
Multi-Modal Meta Multi-Task Learning for Social Media Rumor Detection32
Unsupervised Adversarial Instance-Level Image Retrieval32
Deep Reinforcement Learning for Image Hashing32
Referring Expression Comprehension: A Survey of Methods and Datasets32
Semi-Reference Sonar Image Quality Assessment Based on Task and Visual Perception32
Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text Detection31
Semantic-Supervised Infrared and Visible Image Fusion Via a Dual-Discriminator Generative Adversarial Network31
A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution31
Graph Embedding Multi-Kernel Metric Learning for Image Set Classification With Grassmannian Manifold-Valued Features31
Deep-PCAC: An End-to-End Deep Lossy Compression Framework for Point Cloud Attributes31
Subjective Evaluation of Visual Quality and Simulator Sickness of Short 360$^\circ$ Videos: ITU-T Rec. P.91931
Hierarchical Soft Quantization for Skeleton-Based Human Action Recognition31
Joint Input and Output Space Learning for Multi-Label Image Classification31
Unsupervised Video Summarization With Cycle-Consistent Adversarial LSTM Networks30
Adaptive Single Image Dehazing Using Joint Local-Global Illumination Adjustment30
Hierarchical Context Features Embedding for Object Detection30
Integrating Part of Speech Guidance for Image Captioning30
R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection30
Long Dialogue Emotion Detection Based on Commonsense Knowledge Graph Guidance30
Efficient Projected Frame Padding for Video-Based Point Cloud Compression30
Dual-Awareness Attention for Few-Shot Object Detection30
Beyond Vision: A Multimodal Recurrent Attention Convolutional Neural Network for Unified Image Aesthetic Prediction Tasks30
Underwater Image Enhancement With Lightweight Cascaded Network30
Dress With Style: Learning Style From Joint Deep Embedding of Clothing Styles and Body Shapes30
Exploring Global and Local Linguistic Representations for Text-to-Image Synthesis30
Adaptive Deep Metric Learning for Affective Image Retrieval and Classification30
Concentrated Local Part Discovery With Fine-Grained Part Representation for Person Re-Identification30
Cycle-IR: Deep Cyclic Image Retargeting29
HAPGN: Hierarchical Attentive Pooling Graph Network for Point Cloud Segmentation29
Object-Aware Multimodal Named Entity Recognition in Social Media Posts With Adversarial Learning29
A Two-Stage Triplet Network Training Framework for Image Retrieval29
High Capacity Reversible Data Hiding in Encrypted Image Based on Adaptive MSB Prediction29
Learning Non-Locally Regularized Compressed Sensing Network With Half-Quadratic Splitting29
Viewport-Dependent Saliency Prediction in 360° Video29
Multi-Scale Fine-Grained Alignments for Image and Sentence Matching29
Person Retrieval in Surveillance Videos Via Deep Attribute Mining and Reasoning29
Align and Tell: Boosting Text-Video Retrieval With Local Alignment and Fine-Grained Supervision29
Rate Control Method Based on Deep Reinforcement Learning for Dynamic Video Sequences in HEVC29
Deep Reinforcement Polishing Network for Video Captioning29
Leveraging Virtual and Real Person for Unsupervised Person Re-Identification29
Staged Sketch-to-Image Synthesis via Semi-supervised Generative Adversarial Networks28
Region-Based Dehazing via Dual-Supervised Triple-Convolutional Network28
Viewport-Aware Deep Reinforcement Learning Approach for 360$^\circ$ Video Caching28
Deep Collaborative Discrete Hashing With Semantic-Invariant Structure Construction28
Dynamic Objectives Learning for Facial Expression Recognition28
Spatiotemporal Dilated Convolution With Uncertain Matching for Video-Based Crowd Estimation28
M-GCN: Multi-Branch Graph Convolution Network for 2D Image-based on 3D Model Retrieval28
Model-Based Joint Bit Allocation Between Geometry and Color for Video-Based 3D Point Cloud Compression27
Projective Multiple Kernel Subspace Clustering27
Joint Learning in the Spatio-Temporal and Frequency Domains for Skeleton-Based Action Recognition27
Adaptive Partial Multi-View Hashing for Efficient Social Image Retrieval27
DeepFacade: A Deep Learning Approach to Facade Parsing With Symmetric Loss27
Joint Deep Learning of Facial Expression Synthesis and Recognition27
Deep Co-Image-Label Hashing for Multi-Label Image Retrieval27
Hierarchical User Intent Graph Network for Multimedia Recommendation27
Building and Using Personal Knowledge Graph to Improve Suicidal Ideation Detection on Social Media27
Recurrent Generative Adversarial Network for Face Completion27
A Coarse-to-Fine Facial Landmark Detection Method Based on Self-attention Mechanism27
Deep Multi-Patch Matching Network for Visible Thermal Person Re-Identification27
CaptionNet: A Tailor-made Recurrent Neural Network for Generating Image Descriptions27
Recurrent Exposure Generation for Low-Light Face Detection27
Hybrid Refinement-Correction Heatmaps for Human Pose Estimation27
Robust Visual Object Tracking Via Adaptive Attribute-Aware Discriminative Correlation Filters27
Mask Cross-Modal Hashing Networks27
Guide to Match: Multi-Layer Feature Matching With a Hybrid Gaussian Mixture Model27
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering26
Dual Attention on Pyramid Feature Maps for Image Captioning26
TERA: Screen-to-Camera Image Code With Transparency, Efficiency, Robustness and Adaptability26
Semantic Example Guided Image-to-Image Translation26
Multimodal Sentiment Analysis With Image-Text Interaction Network26
Multi-Scale Sparse Graph Convolutional Network For the Assessment of Parkinsonian Gait26
Unsupervised Variational Video Hashing With 1D-CNN-LSTM Networks26
Asymmetric Joint GANs for Normalizing Face Illumination From a Single Image26
Understanding More About Human and Machine Attention in Deep Neural Networks26
Accurate Scene Text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint26
Learning Crisp Boundaries Using Deep Refinement Network and Adaptive Weighting Loss26
LAG-Net: Multi-Granularity Network for Person Re-Identification via Local Attention System26
Weakly Supervised Emotion Intensity Prediction for Recognition of Emotions in Images26
SAL:Selection and Attention Losses for Weakly Supervised Semantic Segmentation26
Context-Dependent Propagating-Based Video Recommendation in Multimodal Heterogeneous Information Networks26
Alleviating Modality Bias Training for Infrared-Visible Person Re-Identification26
Weakly Supervised Temporal Adjacent Network for Language Grounding25
Adversarial Disentanglement Spectrum Variations and Cross-Modality Attention Networks for NIR-VIS Face Recognition25
0.12167596817017