OOIR: Observatory of International Research

Papers

(The H4-Index of International Journal of Computer Vision is 58. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective	2646
Guest Editorial: Special Issue on Open-World Visual Recognition	879
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement	424
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration	409
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence	365
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation	348
Learning Discriminative Features for Visual Tracking via Scenario Decoupling	329
MoDA: Modeling Deformable 3D Objects from Casual Videos	225
Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting	205
Robust Averaging using Adaptive Annealing	198
Exocentric-to-Egocentric Adaptation for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs	185
UniAttack: Unified Physical-Digital Face Attack Detection	180
AutoIT: Automated Image Tagging with Random Perturbation	172
Correction: Multi-source-free Domain Adaptive Object Detection	168
Image-based Morphological Characterization of Filamentous Biological Structures with Non-constant Curvature Shape Feature	161
Large-Scale Pre-Trained Models Empowering Phrase Generalization in Temporal Sentence Localization	157
Weakly Supervised Salient Object Detection with Text Supervision	144
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression	143
Learning Extensible Series-Parallel Lookup Tables for Efficient Image Super-Resolution	139
Delving Deeper into Anti-Aliasing in ConvNets	136
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels	133
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements	130
EAN: Event Adaptive Network for Enhanced Action Recognition	128
Image Synthesis Under Limited Data: A Survey and Taxonomy	125
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting	124

Invert Your Prompt: Editing-Aware Diffusion Inversion	123
Conditional Temporal Variational AutoEncoder for Action Video Prediction	123
Learning with Enriched Inductive Biases for Vision-Language Models	114
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision	113
Are Vision Transformers Robust to Spurious Correlations?	113
Learning Text-to-Video Retrieval from Image Captioning	109
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion	101
Correction: Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization	101
A Minimal Solution for Image-Based Sphere Estimation	96
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks	95
Deep Image Deblurring: A Survey	94
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention	92
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition	88
Instance-dependent Label Distribution Estimation for Learning with Label Noise	83
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates	83
Guest Editorial: Special Issue on the British Machine Vision Conference 2022	78
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild	76
Feature Hallucination for Self-supervised Action Recognition	76
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization	75
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data	74
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow	72
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation	68
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions	68
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation	67
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models	67
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation	66
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning	64
VideoQA in the Era of LLMs: An Empirical Study	63
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond	62
Learning Cooperative Neural Modules for Stylized Image Captioning	61
Learning Latent Part-Whole Hierarchies for Point Clouds	59
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey	58
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer	58