OOIR: Observatory of International Research

Papers

(The median citation count of International Journal of Computer Vision is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective	2646
Guest Editorial: Special Issue on Open-World Visual Recognition	879
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement	424
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration	409
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence	365
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation	348
Learning Discriminative Features for Visual Tracking via Scenario Decoupling	329
MoDA: Modeling Deformable 3D Objects from Casual Videos	225
Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting	205
Robust Averaging using Adaptive Annealing	198
Exocentric-to-Egocentric Adaptation for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs	185
UniAttack: Unified Physical-Digital Face Attack Detection	180
AutoIT: Automated Image Tagging with Random Perturbation	172
Correction: Multi-source-free Domain Adaptive Object Detection	168
Image-based Morphological Characterization of Filamentous Biological Structures with Non-constant Curvature Shape Feature	161
Large-Scale Pre-Trained Models Empowering Phrase Generalization in Temporal Sentence Localization	157
Weakly Supervised Salient Object Detection with Text Supervision	144
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression	143
Learning Extensible Series-Parallel Lookup Tables for Efficient Image Super-Resolution	139
Delving Deeper into Anti-Aliasing in ConvNets	136
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels	133
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements	130
EAN: Event Adaptive Network for Enhanced Action Recognition	128
Image Synthesis Under Limited Data: A Survey and Taxonomy	125
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting	124

Conditional Temporal Variational AutoEncoder for Action Video Prediction	123
Invert Your Prompt: Editing-Aware Diffusion Inversion	123
Learning with Enriched Inductive Biases for Vision-Language Models	114
Are Vision Transformers Robust to Spurious Correlations?	113
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision	113
Learning Text-to-Video Retrieval from Image Captioning	109
Correction: Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization	101
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion	101
A Minimal Solution for Image-Based Sphere Estimation	96
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks	95
Deep Image Deblurring: A Survey	94
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention	92
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition	88
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates	83
Instance-dependent Label Distribution Estimation for Learning with Label Noise	83
Guest Editorial: Special Issue on the British Machine Vision Conference 2022	78
Feature Hallucination for Self-supervised Action Recognition	76
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild	76
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization	75
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data	74
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow	72
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions	68
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation	68
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models	67
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation	67
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation	66
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning	64
VideoQA in the Era of LLMs: An Empirical Study	63
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond	62
Learning Cooperative Neural Modules for Stylized Image Captioning	61
Learning Latent Part-Whole Hierarchies for Point Clouds	59
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer	58
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey	58
Noise-Resistant Multimodal Transformer for Emotion Recognition	56
AI killed the Video Star. Audio-Driven Diffusion Model for Expressive Talking Head Generation	55
Correction: Consistent Prompt Tuning for Generalized Category Discovery	54
Sample-efficient Audio-Visual Learning of Scene Acoustics	54
Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking	53
Bi-calibration Networks for Weakly-Supervised Video Representation Learning	53
Learning Accurate Low-bit Quantization towards Efficient Computational Imaging	52
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization	51
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose	50
Semantic-Based Implicit Feature Transform for Few-Shot Classification	49
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset	48
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking	48
A Realism Metric for Generated LiDAR Point Clouds	47
CAS-AIR-3D: A Large-scale Low-quality Multi-modal Face Database	47
OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation	46
UIL-AQA: Uncertainty-Aware Clip-Level Interpretable Action Quality Assessment	44
Lightweight and Progressively-Scalable Networks for Semantic Segmentation	44

Symmetria: A Synthetic Dataset for Learning in Point Clouds	43
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks	42
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition	42
Dynamic Knowledge Transfer for Mitigating Spurious Correlations in Deep Learning	42
A Motion-Based Compression and Tracking System for Video Camera Trap-Based Insect Behaviour Studies	42
Diagram Perception Networks for Textbook Question Answering via Joint Optimization	42
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow	42
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution	41
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification	40
MedSegFM: A Generative Perspective for Lesion Segmentation via Flow Matching	40
A Nonlinear, Regularized, and Data-independent Modulation for Continuously Interactive Image Processing Network	40
Globally Correlation-Aware Hard Negative Generation	40
Learning to Prompt for Vision-Language Models	40
IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion	39
Understanding Synonymous Referring Expressions via Contrastive Features	39
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces	39
Robust Partial-to-Partial Point Cloud Registration with Overlapping Mask Learning	38
EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining	38
Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method	38
Relaxed Knowledge Distillation	38
Improving Domain Adaptation Through Class Aware Frequency Transformation	38
GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates	37
Modeling Scattering Effect for Under-Display Camera Image Restoration	37
Skeletonizing Caenorhabditis elegans Based on U-Net Architectures Trained with a Multi-worm Low-Resolution Synthetic Dataset	37
A Generalized Contour Vibration Model for Building Extraction	37
Paragraph-to-Image Generation with Information-Enriched Diffusion Model	37
Focal Modulation for Image Restoration	36
Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation	36
Generative Adversarial Network Applications in Industry 4.0: A Review	36
Image Matting and 3D Reconstruction in One Loop	36
Beyond Learned Metadata-Based Raw Image Reconstruction	36
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models	36
Robust Unpaired Image Dehazing via Density and Depth Decomposition	35
Advances in 3D Neural Stylization: A Survey	35
Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking	35
Text2Scenes: Language-Guided Synthesis of Complex Indoor Scenes	34
From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave	34
A CNN Based Approach for the Point-Light Photometric Stereo Problem	34
Feature Matching via Motion-Consistency Driven Probabilistic Graphical Model	34
Correction: BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos	33
I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification	32
Guest Editorial: Special Issue on Computer Vision from 2D to 3D	32
Control Color: Multimodal Diffusion-Based Interactive Image Colorization	32
Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos	32
Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing	32
Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era	32
WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation	32
A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance	32
PartCom: Part Composition Learning for 3D Open-Set Recognition	31
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection	31
Predictive Display for Teleoperation Based on Vector Fields Using Lidar-Camera Fusion	31
Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing	31
Beyond Image Prior: Embedding Noise Prior into Latent Space of Conditional Denoising Transformer	31
Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching	30
Shuffled Linear Regression with Outliers in Both Covariates and Responses	30
Investigating Self-Supervised Methods for Label-Efficient Learning	30
SHARP: Shape-Aware Reconstruction of People in Loose Clothing	30
An Optimal Transport View of Class-Imbalanced Visual Recognition	29
InstaBoost++: Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation	29
High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion	29
Structured Binary Neural Networks for Image Recognition	29
HACG: Leveraging Hierarchical Alignment and Caption Generation for Text-Video Retrieval	29
Blur Invariants for Image Recognition	29
CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition	28
Object-Scene-Camera Decomposition and Recomposition for Data Efficient Monocular 3D Object Detection	28
Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection	28
A Region-Based Randers Geodesic Approach for Image Segmentation	28
Active Perception for Visual-Language Navigation	28
Uncertainty-Aware and Decoupled Distillation for Semantic Segmentation	28
WildIng: A Wildlife Image Invariant Representation Model for Geographical Domain Shift	28
Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge Consolidation	28
Source-Free Domain Adaptation via Target Prediction Distribution Searching	28
Editor’s Note: Special Issue on Computer Vision Approach for Animal Tracking and Modeling	27
TokenPacker: Efficient Visual Projector for Multimodal LLM	27
WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models	27
LEO: Generative Latent Image Animator for Human Video Synthesis	27
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight	27
Point-In-Context: Understanding Point Cloud via In-Context Learning	27
Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring	27
SMPL-IKS: A Mixed Analytical-Neural Inverse Kinematics Solver for 3D Human Mesh Recovery	26

Evidence Conflict Sampling for Open-set Active Learning	26
Hard-Normal Example-Aware Template Mutual Matching for Industrial Anomaly Detection	26
Guest Editorial: Special Issue on Visual Datasets	26
Out-of-Distribution Detection with Virtual Outlier Smoothing	26
Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning	26
Subspace Training Mitigates Gradient Noise Vulnerability	26
Polynomial Implicit Neural Framework for Promoting Shape Awareness in Generative Models	25
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering	25
Uniformity Preserving Transfer for Visual Prompt Tuning under Long-tailed Distribution	25
Editor’s Note: Special Issue on ACCV 2024	25
Robust Image Restoration with an Adaptive Huber Function Based Fidelity	25
An Interactive Conversational 3D Virtual Human	25
Neural Architecture Search for Dense Prediction Tasks in Computer Vision	24
Correction: Continual Face Forgery Detection via Historical Distribution Preserving	24
Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue	24
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer	24
Anti-Bandit for Neural Architecture Search	24
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook	24
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing	24
Relation-Guided Adversarial Learning for Data-Free Knowledge Transfer	24
Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification	23
Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification	23
LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation	23
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing	23
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding	23
Zero-Shot Learning on 3D Point Cloud Objects and Beyond	23
Reconstructing a Sphere and the Camera Focal Length from a Single View by Fitting Planes	23
Knowledge Distillation Meets Open-Set Semi-supervised Learning	23
Supervised Neural Style Transfer as an Augmentation Technique for Facial Landmark Detection	23
A Novel Dataset and Lightweight Distillation Baseline for Highlight Transparent Object Detection	23
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware Contrast	22
Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection	22
A Deeper Analysis of Volumetric Relightable Faces	22
Dynamic MAsk-Pruning Strategy for Source-Free Model Intellectual Property Protection	22
SLNMapping: Super Lightweight Neural Mapping in Large-Scale Scenes	22
Preface to the Special Issue on Pattern Recognition (DAGM GCPR 2021)	22
General Class-Balanced Multicentric Dynamic Prototype Pseudo-Labeling for Source-Free Domain Adaptation	22
FourierMIL: Fourier Filtering-based Multiple Instance Learning for Whole Slide Image Analysis	22
Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos	22
Correction: Variational Rectification Inference for Learning with Noisy Labels	22
DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation	21
Structure-from-motion in micro-image domain for uncalibrated plenoptic 2.0 cameras	21
Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review	21
UMCL: Unimodal-generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection	21
A Comprehensive Study of the Robustness for LiDAR-Based 3D Object Detectors Against Adversarial Attacks	21
Single-View View Synthesis with Self-rectified Pseudo-Stereo	21
Generalized Relative Pose and Scale from Affine Correspondences	21
Unifying Viewgraph Sparsification and Disambiguation of Repeated Structures in Structure-from-Motion	21
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection	20
Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement	20
AnyPattern: Towards In-context Image Copy Detection	20
Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning	20
Learning General and Specific Embedding with Transformer for Few-Shot Object Detection	20
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts	20
CamoVid60K: A Large-Scale Video Dataset for Moving Camouflaged Animals Understanding	20
Segment Anything in 3D with Radiance Fields	20
Defending Against Adversarial Examples Via Modeling Adversarial Noise	20
Adversarial Learning Domain-Invariant Conditional Features for Robust Face Anti-spoofing	19
Relative Norm Alignment for Tackling Domain Shift in Deep Multi-modal Classification	19
Thread Counting in Plain Weave for Old Paintings Using Regression Deep Learning Models	19
Incremental Model Enhancement via Memory-based Contrastive Learning	19
Sentimental Visual Captioning using Multimodal Transformer	19
Correction: Scene Prior Filtering for Depth Super-Resolution	19
Deep Learning-Based Point Cloud Registration: A Comprehensive Survey and Taxonomy	19
GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation	19
Visual Object Tracking in First Person Vision	19
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection	19
IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection	19
Rethinking Open-Set Object Detection: Issues, A New Formulation, and Taxonomy	19
BayesAdapter: Enhanced Uncertainty Estimation in CLIP Few-Shot Adaptation	19
Image-Based Virtual Try-On: A Survey	19
Mitigating Label Noise using Prompt-Based Hyperbolic Meta-Learning in Open-Set Domain Generalization	18
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models	18
A Survey of Multimodal Hallucination Evaluation and Detection	18
CompViT: Real-Time Compressed Video Action Recognition with Asymmetric Transformer Networks	18
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding	18
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark	18
Task Bias in Contrastive Vision-Language Models	18
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention	18
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization	18
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation	18
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective	18
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation	18
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization	18
Towards Scene-Aware Video-to-Spatial Audio Generation	18
Vision-Language Efficient Tuning for Mitigating Catastrophic Forgetting in Multi-Modal Learning	18
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey	17
DCP–NAS: Discrepant Child–Parent Neural Architecture Search for 1-bit CNNs	17
Semantic Contrastive Embedding for Generalized Zero-Shot Learning	17
Guest Editorial: Special Issue on Biometrics Security and Privacy	17
Attribute-Centric Compositional Text-to-Image Generation	17
Universal Prototype Transport for Zero-Shot Action Recognition and Localization	17
Adapting Vision-Language Models from Iconic to Inclusive for Multi-label Recognition Without Labels	17
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation	17
Correction: Open-Vocabulary Text-Driven Human Image Generation	17