OOIR: Observatory of International Research

Papers

(The median citation count of ACM Transactions on Multimedia Computing Communications and Applicatio is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation	438
Hypercube Pooling for Visual Semantic Embedding	161
Backdoor Two-Stream Video Models on Federated Learning	158
Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis	117
Category-Level Pose Estimation and Iterative Refinement for Monocular RGB-D Image	104
ForgeFinder: Perceptive Multimodal Deepfake Detection via Multi-grained Forgery Localization	85
Fine-Grained Text-to-Video Temporal Grounding from Coarse Boundary	84
Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action Localization	84
Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of Variation	82
Semi-supervised Learning for Mars Imagery Classification and Segmentation	78
AED-PADA: Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation	67
HTTP Adaptive Streaming: A Review on Current Advances and Future Challenges	63
Enhanced Video Super-Resolution Network towards Compressed Data	62
Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos	61
Towards Intelligent Attack Detection Using DNA Computing	59
Establishing Trust and Security in Decentralized Metaverse: A Web 3.0 Approach	57
CVLP-NaVD: Contrastive Visual-language Pre-training Models for Non-annotated Visual Description	55
SEADUNet: A Multilingual Ancient Document Image Binarization using EMCAM Attention Mechanism and SCP	53
QuickCSGModeling: Quick CSG Operations Based on Fusing Signed Distance Fields for VR Modeling	52
Upsampling Algorithm for V-PCC-Coded 3D Point Clouds	52
A Siamese Inverted Residuals Network Image Steganalysis Scheme based on Deep Learning	51
Tensorial Evolutionary Optimization for Natural Image Matting	49
Point Cloud Quality Assessment: Dataset Construction and Learning-based No-reference Metric	48
Exploring Talking Head Models with Adjacent Frame Prior for Speech-Preserving Facial Expression Manipulation	47
Infrared and Visible Image Fusion via Text-Prior Guided Frequency-Domain Decomposition	47

JDAN: Joint Detection and Association Network for Real-Time Online Multi-Object Tracking	46
Quantum Fourier Convolutional Network	46
High Feature Distinguishability for Adaptive Image-text Matching with Dual-stream Transformers	46
Towards Generalizable Deepfake Detection by Primary Region Regularization	45
Image Cropping with Content and Composition Attribute-aware Global Relation Reasoning	45
BiC-Net: Learning Efficient Spatio-temporal Relation for Text-Video Retrieval	44
Reconstruction-Free Image Compression for Machine Vision via Knowledge Transfer	44
Rank-in-Rank Loss for Person Re-identification	43
A Comprehensive Survey on Methods for Image Integrity	42
Joint Mixing Data Augmentation for Skeleton-Based Action Recognition	42
New Metrics and Dataset for Biological Development Video Generation	42
Attentional Composition Networks for Long-Tailed Human Action Recognition	42
SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection	41
Immersive Multimedia Service Caching in Edge Cloud with Renewable Energy	41
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model	38
Visual-linguistic-stylistic Triple Reward for Cross-lingual Image Captioning	38
GMS-3DQA: Projection-Based Grid Mini-patch Sampling for 3D Model Quality Assessment	38
Multi-spectral Class Center Network for Face Manipulation Localization	37
CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding	36
Robust Video Stabilization based on Motion Decomposition	36
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition	36
Expanding-Window Zigzag Decodable Fountain Codes for Scalable Multimedia Transmission	36
VISCOUNTH: A Large-scale Multilingual Visual Question Answering Dataset for Cultural Heritage	35
CLOUD-CODEC : A New Way of Storing Traffic Camera Footage at Scale	35
GANonymization: A GAN-Based Face Anonymization Framework for Preserving Emotional Expressions	34
Domain-Aware Semantic Alignment Hashing for Large-Scale Zero-Shot Image Retrieval	33
A Self-Defense Copyright Protection Scheme for NFT Image Art Based on Information Embedding	32
Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval	32
The Price of Unlearning: Identifying Unlearning Risk in Edge Computing	32
Using Four Hypothesis Probability Estimators for CABAC in Versatile Video Coding	32
A Multi-Task Adversarial Attack against Face Authentication	31
Learned Image Compression with Frequency Feature Interaction and Non-local Cross-similarity Prior	30
Boosting Transferability of Adversarial Examples with Spatio-Temporal Context	30
DALD-PCAC: Density-Adaptive Learning Descriptor for Point Cloud Lossless Attribute Compression	30
Light Field Reconstruction Using Multi-orientation Epipolar Plane Images	29
Image Defogging Based on Regional Gradient Constrained Prior	29
Detection of Moving Object Using Superpixel Fusion Network	29
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval	29
Boundary Attention-Guided Sparse Feature Learning for Underwater Object Tracking in Edge Computing	29
Decoupling Deep Learning for Enhanced Image Recognition Interpretability	29
ViCoFace: Learning Disentangled Latent Motion Representations for Visual-Consistent Face Reenactment	28
Universal Relocalizer for Weakly Supervised Referring Expression Grounding	28
ER-Depth: Enhancing the Robustness of Self-Supervised Monocular Depth Estimation in Challenging Scenes	27
(Compress and Restore) ^N : A Robust Defense Against Adversarial Attacks on Image Classification	26
A Quality of Experience and Visual Attention Evaluation for 360° Videos with Non-spatial and Spatial Audio	25
EiMOL: A Secure Medical Image Encryption Algorithm based on Optimization and the Lorenz System	25
Source Information-Assisted UV-Space Transformation Network for Person Image Generation	25
DTSD: A Dual Teacher–Student-Based Discrimination Model for Anomaly Detection	25
Principal Component Approximation Network for Image Compression	24
One-Bit Supervision for Image Classification: Problem, Solution, and Beyond	24

TEVL: Trilinear Encoder for Video-language Representation Learning	24
Melody Generation from Lyrics with Local Interpretability	23
Enhancing Embedding Diversity and Robustness for Image-Text Retrieval in Remote Sensing	23
An Efficient and Accurate GPU-based Deep Learning Model for Multimedia Recommendation	23
Multi-Grained Point Cloud Geometry Compression via Dual-Model Prediction with Extended Octree	23
Hyperbolic Active Learning for Label-Efficient Action Segmentation	23
LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene Synthesis	23
Temporal Dynamic Concept Modeling Network for Explainable Video Event Recognition	23
QoE Evaluation for VR with Vibrotactile Feedback Based on Inter-user Brain Spatial Information	23
Zero-shot Scene Graph Generation via Triplet Calibration and Reduction	23
Dual Alignment-enhanced Fashion Vision-Language Pre-training	23
Similarity Regulation and Calibration Alignment for Weakly Supervised Text-Based Person Re-Identification	22
ATMNet: Adaptive Texture Migration Network for Guided Depth Super-Resolution	22
Visual Security Index Combining CNN and Filter for Perceptually Encrypted Light Field Images	22
DATRA-MIV: Decoder-Adaptive Tiling and Rate Allocation for MPEG Immersive Video	22
Toward Egocentric Compositional Action Anticipation with Adaptive Semantic Debiasing	22
Adversarial Sample Synthesis for Visual Question Answering	22
Cyclic Self-attention for Point Cloud Recognition	22
Spotting the Fakes: A Deep Dive into GAN-Generated Face Detection	22
Gloss-driven Conditional Diffusion Models for Sign Language Production	21
InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-Modal Sarcasm Detection	21
Human Selective Matting	21
Cross-modal Semantically Augmented Network for Image-text Matching	21
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation	21
Counterfactual Scenario-relevant Knowledge-enriched Multi-modal Emotion Reasoning	21
Boosting Targeted Adversarial Transferability with Feature Contrastive Optimization	21
Gleaning Wisdom from the Past: Towards Label Incremental Learning for Online Hashing with a Plug-and-Play Framework	21
THMM-CLIP: Task-Guided Hierarchical Multi-Modal Alignment for Rehearsal-Free Class Incremental Learning	21
Motion-Aware Self-Supervised RGBT Tracking with Multi-Modality Hierarchical Transformers	21
DISA: Disentangled Dual-Branch Framework for Affordance-Aware Human Insertion	20
MDRA: A Motion-guided Dual-stream Recurrent Attention Framework for Dynamic Hand Gesture Recognition	20
SkiTrack: An Aerial Skiing Benchmark for Human-Centric Object Tracking	20
Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation Adaptation	20
Deep Chroma Compression of Tone-Mapped Images	20
Temporal and Semantic Correlation Network for Weakly-Supervised Temporal Action Localization	20
Reversible Data Hiding in Shared JPEG Images	20
ReFID: Reciprocal Frequency-aware Generalizable Person Re-identification via Decomposition and Filtering	19
Multiply Complementary Priors for Image Compressive Sensing Reconstruction in Impulsive Noise	19
Hyperbolic-Based Cross-Modal Semantic Remodeling Network for Zero-Shot Sketch-Based Image Retrieval	19
PTHUMAN3D: 3D Gaussian Human Avatar Modeling with the Poincaré Ball and the Triplane Representation	19
Diversity-Representativeness Replay and Knowledge Alignment for Lifelong Vehicle Re-identification	19
Maximizing Long-Term Task Completion Ratio of UAV-Enabled Wirelessly Powered MEC Systems	19
Robust RGB-T Tracking via Adaptive Modality Weight Correlation Filters and Cross-modality Learning	19
Deep Modular Co-Attention Shifting Network for Multimodal Sentiment Analysis	18
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach	18
User-Generated Content and Editors in Games: A Comprehensive Survey	18
Multi-Task Driven Adapter-Based Foundation Model for Locomotion Prediction in Virtual Reality	18
Structure-aware Video Style Transfer with Map Art	18
PADVG: A Simple Baseline of Active Protection for Audio-Driven Video Generation	18
Triplet Contrastive Representation Learning for Unsupervised Vehicle Re-Identification	18
Text-Guided Synthesis of Masked Face Images	18
Generative Image Steganography Based on Guidance Feature Distribution	18
Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation	18
CVAF: A CLIP-Based View-Consistent Alignment Framework for Aerial-Ground Person Re-Identification	17
Dynamic Transfer Exemplar based Facial Emotion Recognition Model Toward Online Video	17
Cross-Modality Relation and Uncertainty Exploration for Text-Based Person Search	17
Potential Features Fusion Network for Multimodal Fake News Detection	17
StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition	17
Attack-Defending Contrastive Learning for Volumetric Medical Image Zero-Watermarking	17
GLPose: Global-Local Representation Learning for Human Pose Estimation	17
SeGDP: Source-free Cross-domain Few-shot Learning via Semantic Guided Diversity Prompting	17
CLIP-GS: CLIP-Informed Gaussian Splatting for View-Consistent 3D Indoor Semantic Understanding	17
Shot Boundary Detection Using Color Clustering and Attention Mechanism	16
Learning Nighttime Semantic Segmentation the Hard Way	16
NSDIE: Noise Suppressing Dark Image Enhancement Using Multiscale Retinex and Low-Rank Minimization	16
Skeleton-Aware Graph-Based Adversarial Networks for Human Pose Estimation from Sparse IMUs	16
SSAT: Active Authorization Control and User’s Fingerprint Tracking Framework for DNN IP Protection	16
Multigranularity Feature Aggregation and Cross-level Boundary Modeling for Temporal Action Detection	16
A Comprehensive Study of Deep Learning-based Covert Communication	16
Sentiment-Oriented Transformer-Based Variational Autoencoder Network for Live Video Commenting	16
Dual Dynamic Threshold Adjustment Strategy	16
Semantic Completion and Filtration for Image–Text Retrieval	16
PrivaMod: Uncertainty-Aware Multimedia Fusion with Privacy Guarantees for NFT Visual and Transaction Analysis	16
Mutually-Guided Hierarchical Multi-Modal Feature Learning for Referring Image Segmentation	16
3D Facial Shape Similarity with Deep Perceptual Representations	16
Multi-view Shape Generation for a 3D Human-like Body	16
A Normalized Slicing-assigned Virtualization Method for 6G-based Wireless Communication Systems	16
Quality Enhancement of Compressed 360-Degree Videos Using Viewport-based Deep Neural Networks	16
Domain-invariant and Patch-discriminative Feature Learning for General Deepfake Detection	16
Content-Aware Selective Encryption for H.265/HEVC Using Deep Hashing Network and Steganography	15

3DMambaComplete: Structured State Space Model for High-Efficiency Point Cloud Completion	15
Generation and Editing of Mandrill Faces: Application to Sex Editing and Assessment	15
ProposalVLAD with Proposal-Intra Exploring for Temporal Action Proposal Generation	15
Generating Robust Adversarial Examples against Online Social Networks (OSNs)	15
Cascaded Adaptive Graph Representation Learning for Image Copy-Move Forgery Detection	15
Semantics and Non-fungible Tokens for Copyright Management on the Metaverse and Beyond	15
Offloading-based Power-Efficient Mobile VTuber Live Streaming	15
Multi-Modal Driven Pose-Controllable Talking Head Generation	15
Robust and Secure Hashing Towards Pirated Neural Network Model Detection	15
PingTactics: A Multimodal Dataset for Table Tennis Action Recognition and Tactical Analysis	15
A Collaborative Hierarchical Aggregation Network for Weakly Supervised Temporal Action Localization	14
Arbitrary Virtual Try-on Network: Characteristics Preservation and Tradeoff between Body and Clothing	14
Privacy-preserving Multi-source Cross-domain Recommendation Based on Knowledge Graph	14
Action-aware Linguistic Skeleton Optimization Network for Non-autoregressive Video Captioning	14
Self-supervised Multi-view Learning via Auto-encoding 3D Transformations	14
Trans-Convo-Former Net for Hierarchical Prediction of Household Images	14
VRVul-Discovery: BiLSTM-based Vulnerability Discovery for Virtual Reality Devices in Metaverse	14
Learning the User’s Deeper Preferences for Multi-modal Recommendation Systems	14
Unsupervised Domain Adaptation by Causal Learning for Biometric Signal-based HCI	14
Quality Assessment in the Era of Large Models: A Survey	14
GAN-Assisted Road Segmentation from Satellite Imagery	14
Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection	14
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications	14
Autoregressive GAN for Semantic Unconditional Head Motion Generation	14
Portrait Video Compression with Semantic-guided Animation Model and Background Incremental Coding	14
Progressive Transformer Machine for Natural Character Reenactment	14
Toward High-quality Face-Mask Occluded Restoration	14
Robust Image Hashing via CP Decomposition and DCT for Copy Detection	14
Transformer-Based Visual Grounding with Cross-Modality Interaction	14
Self-supervised Calorie-aware Heterogeneous Graph Networks for Food Recommendation	14
Robust Long-Term Tracking via Localizing Occluders	14
Joint Structure-Texture Scan-Order for Point Cloud Attribute Compression Using Affine Transformation	14
Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible Sensors	13
Low-Latency Multimedia Delivery via Collaborative Cloud–Edge Caching in Edge Computing Networks	13
EIN: Exposure-Induced Network for Single-Image HDR Reconstruction	13
Language-guided Residual Graph Attention Network and Data Augmentation for Visual Grounding	13
Enhancing Pose-Guided Human Image Generation with Comprehensive and Adjustable 3D Control	13
GJFusion: A Channel-Level Correlation Construction Method for Multimodal Physiological Signal Fusion	13
EVASR: Edge-Based Salience-Aware Super-Resolution for Enhanced Video Quality and Power Efficiency	13
Noise-Resistance Learning via Multi-Granularity Consistency for Unsupervised Domain Adaptive Person Re-Identification	13
Dynamic Weighted Gradient Reversal Network for Visible-infrared Person Re-identification	13
Multiscale Feature Importance-Based Bit Allocation for End-to-End Feature Coding for Machines	13
ALOHA: Adapting Local Spatio-Temporal Context to Enhance the Audio-Visual Semantic Segmentation	13
Language-guided Bias Generation Contrastive Strategy for Visual Question Answering	13
Geometry-Insensitive RPN Prototypes for Domain Adaptive 3D Object Detection	13
Balanced and Accurate Pseudo-Labels for Semi-Supervised Image Classification	13
Boosting Few-shot Object Detection with Discriminative Representation and Class Margin	13
FAST: Flexibly Controllable Arbitrary Style Transfer via Latent Diffusion Models	13
A Simple Switchable Framework for Open-Vocabulary Video Instance Segmentation	13
Dual Scene Graph Convolutional Network for Motivation Prediction	13
Language-guided Visual Tracking: Comprehensive and Effective Multimodal Information Fusion	13
Generating and Evaluating Data of Daily Activities with an Autonomous Agent in a Virtual Smart Home	13
Compressed Point Cloud Quality Index by Combining Global Appearance and Local Details	12
Smart City Construction and Management by Digital Twins and BIM Big Data in COVID-19 Scenario	12
CAQoE: A Novel No-Reference Context-aware Speech Quality Prediction Metric	12
T2C: Text-guided 4D Cloth Generation	12
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing	12
How to Understand Named Entities: Using Commonsense for News Captioning	12
Early Traffic Accident Anticipation via Feature Consistency Representation and Soft Label Regression	12
Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge Computing	12
Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective	12
A Real-Time Medical Image Encryption Algorithm Leveraging a Novel Hypersensitive Chaotic Map	12
Efficient Privacy-Preserving Video Analytics via Share Transforming in Distributed Clouds	12
iDAM: Iteratively Trained Deep In-loop Filter with Adaptive Model Selection	12
SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection	12
PMAL: A Proxy Model Active Learning Approach for Vision Based Industrial Applications	12
Meetor: A Human-Centered Automatic Video Editing System for Meeting Recordings	12
InteractNet: Social Interaction Recognition for Semantic-rich Videos	12
A Multimodal Hierarchical Attentional Ordering Network	12
Learning Semantic Representation on Visual Attribute Graph for Person Re-identification and Beyond	12
Hierarchical and Progressive Image Matting	12
Variational Autoencoder with CCA for Audio–Visual Cross-modal Retrieval	12
Deep Differential Lifelong Cross-modal Hashing for Stream Medical Data Retrieval	12
DPDFormer: A Coarse-to-Fine Model for Monocular Depth Estimation	12
Random Dense Knowledge Distillation for Continual Learning	11
A Review of Player Engagement Estimation in Video Games: Challenges and Opportunities	11
Video Streaming Over QUIC: A Comprehensive Study	11
LFIZW-GRHFMR: Robust Zero-Watermarking with GRHFMR for Light Field Image	11
Multimodal Cascaded Framework with Multimodal Latent Loss Functions Robust to Missing Modalities	11
A Hierarchically Discriminative Loss with Group Regularization for Fine-Grained Image Classification	11
Invisible Adversarial Watermarking: A Novel Security Mechanism for Enhancing Copyright Protection	11
R-HMF: A Relation-enhanced Hierarchical Multimodal Framework for Few-shot Knowledge Graph Completion	11
Dual-Modality-Shared Learning and Label Refinement for Unsupervised Visible-Infrared Person ReID	11
Self-supervised Image-based 3D Model Retrieval	11
Optimized Deep-Neural Network for Content-based Medical Image Retrieval in a Brownfield IoMT Network	11
Instance-level Adversarial Source-free Domain Adaptive Person Re-identification	11
Multi-Scale and Multi-Layer Lattice Transformer for Underwater Image Enhancement	11
MLIC ⁺⁺ : Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression	11
Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning	11
Boolean-based Two-in-One Secret Image Sharing by Adaptive Pixel Grouping	11
FishFormer: Annulus Slicing-based Transformer for Fisheye Rectification	11
Context-Based Novel Histogram Bin Stretching Algorithm for Automatic Contrast Enhancement	11
Beyond the Parts: Learning Coarse-to-Fine Adaptive Alignment Representation for Person Search	11
Learning to Discern Fine-Grained Cues across Domains: Generalizing ReID via Multi-Level Feature Propagation	11
Cryptanalysis and Improvement of a Video Cryptosystem via Chaos and S-Box	11