International Journal of Computer Vision

Papers
(The H4-Index of International Journal of Computer Vision is 51. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Conditional Temporal Variational AutoEncoder for Action Video Prediction1196
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression1046
A Minimal Solution for Image-Based Sphere Estimation1027
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence375
Are Vision Transformers Robust to Spurious Correlations?333
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision324
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks229
Guest Editorial: Special Issue on Open-World Visual Recognition209
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation207
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement201
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements182
Instance-Aware Scene Layout Forecasting174
Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics148
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting148
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective134
Image Synthesis Under Limited Data: A Survey and Taxonomy124
Learning with Enriched Inductive Biases for Vision-Language Models123
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates122
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels105
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration104
Correction: Multi-source-free Domain Adaptive Object Detection102
Learning Text-to-Video Retrieval from Image Captioning102
MoDA: Modeling Deformable 3D Objects from Casual Videos101
Learning Discriminative Features for Visual Tracking via Scenario Decoupling101
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention100
Instance-dependent Label Distribution Estimation for Learning with Label Noise94
EAN: Event Adaptive Network for Enhanced Action Recognition93
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach91
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition88
Delving Deeper into Anti-Aliasing in ConvNets87
Deep Image Deblurring: A Survey85
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild83
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models81
Shape My Face: Registering 3D Face Scans by Surface-to-Surface Translation81
Semantic-Based Implicit Feature Transform for Few-Shot Classification81
Noise-Resistant Multimodal Transformer for Emotion Recognition79
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization75
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking75
Lightweight and Progressively-Scalable Networks for Semantic Segmentation74
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data67
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization65
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer64
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation63
A Realism Metric for Generated LiDAR Point Clouds62
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks61
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond58
Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking58
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution58
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow57
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation56
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset55
Learning Cooperative Neural Modules for Stylized Image Captioning51
0.073975086212158