International Journal of Computer Vision

Papers
(The median citation count of International Journal of Computer Vision is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-02-01 to 2025-02-01.)
ArticleCitations
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark1603
Spectral Shape Recovery and Analysis Via Data-driven Connections900
Learning with Enriched Inductive Biases for Vision-Language Models891
Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation806
Rethinking Generalizability and Discriminability of Self-Supervised Learning from Evolutionary Game Theory Perspective290
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking247
Contextual Object Detection with Multimodal Large Language Models196
Toward Accurate and Robust Pedestrian Detection via Variational Inference179
Image Synthesis Under Limited Data: A Survey and Taxonomy168
Dual-Space Video Person Re-identification164
A Survey on Adaptive Cameras160
PL$${}_{1}$$P: Point-Line Minimal Problems under Partial Visibility in Three Views146
 WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake Detection142
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration142
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks132
Regional Adversarial Training for Better Robust Generalization127
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective97
Efficient High-Quality Vectorized Modeling of Large-Scale Scenes93
RMS-FlowNet++: Efficient and Robust Multi-scale Scene Flow Estimation for Large-Scale Point Clouds88
An Empirical Study on Multi-domain Robust Semantic Segmentation86
Re-ID-leak: Membership Inference Attacks Against Person Re-identification85
SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers84
DOVE: Learning Deformable 3D Objects by Watching Videos83
Learning Dynamic Prototypes for Visual Pattern Debiasing80
Deep Physics-Guided Unrolling Generalization for Compressed Sensing80
Editor’s Note: Special Issue on BMVC 202177
Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior74
Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation73
EAN: Event Adaptive Network for Enhanced Action Recognition72
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression69
Editor’s Note: Special Issue on Computer Vision and Cultural Heritage Preservation66
Interpreting Face Inference Models Using Hierarchical Network Dissection65
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements61
Renormalization for Initialization of Rolling Shutter Visual-Inertial Odometry59
Hierarchical Curriculum Learning for No-Reference Image Quality Assessment58
Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive Tasks58
Context-Enhanced Representation Learning for Single Image Deraining56
Semi-Supervised and Long-Tailed Object Detection with CascadeMatch55
From Individual to Whole: Reducing Intra-class Variance by Feature Aggregation55
Self-supervised Secondary Landmark Detection via 3D Representation Learning55
RIConv++: Effective Rotation Invariant Convolutions for 3D Point Clouds Deep Learning52
Development and Validation of an Unsupervised Feature Learning System for Leukocyte Characterization and Classification: A Multi-Hospital Study51
Conditional Temporal Variational AutoEncoder for Action Video Prediction50
The Isowarp: The Template-Based Visual Geometry of Isometric Surfaces49
An Exploration of Embodied Visual Exploration48
Breaking the Limits of Reliable Prediction via Generated Data46
Visual Object Tracking in First Person Vision46
Lidar Panoptic Segmentation in an Open World46
Robust Deep Object Tracking against Adversarial Attacks45
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence44
Are Vision Transformers Robust to Spurious Correlations?44
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence43
Task Bias in Contrastive Vision-Language Models42
4D Temporally Coherent Multi-Person Semantic Reconstruction and Segmentation42
An Efficient Model for a Camera Behind a Parallel Refractive Slab40
Automatic Modelling for Interactive Action Assessment40
Underwater Camera: Improving Visual Perception Via Adaptive Dark Pixel Prior and Color Correction38
CMSNet: Deep Color and Monochrome Stereo38
Deep Corner38
Deep Unfolding for Snapshot Compressive Imaging38
Guest Editorial: Special Issue on Performance Evaluation in Computer Vision37
Computer Vision and Pattern Recognition 202036
RELAX: Representation Learning Explainability36
LAMP-HQ: A Large-Scale Multi-pose High-Quality Database and Benchmark for NIR-VIS Face Recognition34
Mitigating Demographic Bias in Facial Datasets with Style-Based Multi-attribute Transfer34
Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation34
Importance First: Generating Scene Graph of Human Interest33
A Minimal Solution for Image-Based Sphere Estimation33
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking33
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates32
Disentangling Geometric Deformation Spaces in Generative Latent Shape Models32
Local Compressed Video Stream Learning for Generic Event Boundary Detection32
Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors32
Efficient Burst Raw Denoising with Variance Stabilization and Multi-frequency Denoising Network31
Adapting Across Domains via Target-Oriented Transferable Semantic Augmentation Under Prototype Constraint31
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition29
Visual Interestingness Prediction: A Benchmark Framework and Literature Review27
Deep CockTail Networks27
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels27
Learning Discriminative Features for Visual Tracking via Scenario Decoupling27
MADAN: Multi-source Adversarial Domain Aggregation Network for Domain Adaptation26
Semantic-Aware Visual Decomposition for Image Coding26
Building 3D Generative Models from Minimal Data26
Combating Label Noise with a General Surrogate Model for Sample Selection26
Inferring Attention Shifts for Salient Instance Ranking26
Intra-Camera Supervised Person Re-Identification25
Dynamical Deep Generative Latent Modeling of 3D Skeletal Motion25
Recurrent Graph Neural Networks for Video Instance Segmentation25
End-to-End Video Text Spotting with Transformer25
Enhanced 3D Human Pose Estimation from Videos by Using Attention-Based Neural Network with Dilated Convolutions24
Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations24
DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval24
Focus for Free in Density-Based Counting23
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models23
Dual-Attention-Guided Network for Ghost-Free High Dynamic Range Imaging23
FD-GAN: Generalizable and Robust Forgery Detection via Generative Adversarial Networks22
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis22
ToTem NRSfM: Object-Wise Non-rigid Structure-from-Motion with a Topological Template22
Learning to Detect Novel Species with SAM in the Wild22
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering22
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition22
On Finite Difference Jacobian Computation in Deformable Image  Registration22
APPTracker+: Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking22
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-Wise Pseudo Labeling21
Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics21
CG-FAS: Cross-label Generative Augmentation for Face Anti-Spoofing21
CSDG-FAS: Closed-Space Domain Generalization for Face Anti-spoofing21
A Survey of Methods for Automated Quality Control Based on Images20
Fast and Accurate 3D Registration from Line Intersection Constraints20
Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search20
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting20
On the Generalization and Causal Explanation in Self-Supervised Learning20
Learning Text-to-Video Retrieval from Image Captioning20
A Cutting-Plane Method for Sublabel-Accurate Relaxation of Problems with Product Label Spaces20
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective20
Eliminating Temporal Illumination Variations in Whisk-broom Hyperspectral Imaging19
End-to-End Alternating Optimization for Real-World Blind Super Resolution19
Delving Deeper into Anti-Aliasing in ConvNets19
Network Adjustment: Channel and Block Search Guided by Resource Utilization Ratio18
Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction18
Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation17
LDTrack: Dynamic People Tracking by Service Robots Using Diffusion Models17
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization17
Deep Image Deblurring: A Survey17
Correction: Multi-source-free Domain Adaptive Object Detection17
Incremental Model Enhancement via Memory-based Contrastive Learning17
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention17
Blind Image Quality Assessment: Exploring Content Fidelity Perceptibility via Quality Adversarial Learning17
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision17
Mimetics: Towards Understanding Human Actions Out of Context17
HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning16
CAE-GReaT: Convolutional-Auxiliary Efficient Graph Reasoning Transformer for Dense Image Predictions16
Invertible Rescaling Network and Its Extensions16
Correction: Instant3D: Instant Text-to-3D Generation16
Cross-Domain Gated Learning for Domain Generalization16
Super Vision Transformer15
UrbanEvolver: Function-Aware Urban Layout Regeneration15
Curriculum Learning: A Survey15
Delving into Inter-Image Invariance for Unsupervised Visual Representations15
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes15
Polysemy Deciphering Network for Robust Human–Object Interaction Detection15
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey15
M-RRFS: A Memory-Based Robust Region Feature Synthesizer for Zero-Shot Object Detection15
Artificial Intelligence for Dunhuang Cultural Heritage Protection: The Project and the Dataset15
Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning15
Pyramid Attention Network for Image Restoration15
Language-Guided Hierarchical Fine-Grained Image Forgery Detection and Localization15
Instance-dependent Label Distribution Estimation for Learning with Label Noise14
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention14
Correction: Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences Using Transformer Networks14
Unsupervised Scale-Consistent Depth Learning from Video14
Guest Editorial: Special Issue on Multimodal Learning14
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding14
Learning to Adapt to Light13
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-modal Manipulation13
Learning Robust Facial Representation From the View of Diversity and Closeness13
Bi-VLGM: Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation13
Correction to: Deep Unpaired Blind Image Super-Resolution Using Self-supervised Learning and Exemplar Distillation13
PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection13
Guest Editorial: Special Issue on Open-World Visual Recognition13
SegMix: Co-occurrence Driven Mixup for Semantic Segmentation and Adversarial Robustness13
Guest Editorial: Special Issue on Deep Learning for Video Analysis and Compression13
Instance-Aware Scene Layout Forecasting13
InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation13
MoDA: Modeling Deformable 3D Objects from Casual Videos13
Universal Prototype Transport for Zero-Shot Action Recognition and Localization13
Editor’s Note: Special Issue on 3D Computer Vision13
Deep Attention Learning for Pre-operative Lymph Node Metastasis Prediction in Pancreatic Cancer via Multi-object Relationship Modeling13
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach13
Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering12
Augmenting the Softmax with Additional Confidence Scores for Improved Selective Classification with Out-of-Distribution Data12
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization12
CRetinex: A Progressive Color-Shift Aware Retinex Model for Low-Light Image Enhancement12
A Realism Metric for Generated LiDAR Point Clouds12
Error-Aware Conversion from ANN to SNN via Post-training Parameter Calibration12
Bi-calibration Networks for Weakly-Supervised Video Representation Learning12
Descriptor Distillation: A Teacher-Student-Regularized Framework for Learning Local Descriptors12
Guest Editorial: Special Issue on Traditional Computer Vision in the Age of Deep Learning11
Open-Set Adversarial Defense with Clean-Adversarial Mutual Learning11
Action2video: Generating Videos of Human 3D Actions11
Depth Descent Synchronization in $${{\,\mathrm{\text {SO}}\,}}(D)$$11
Guest Editorial: Special Issue on the British Machine Vision Conference 202211
Learning to Detect Instance-Level Salient Objects Using Complementary Image Labels11
Bridging Composite and Real: Towards End-to-End Deep Image Matting11
Distribution-Sensitive Information Retention for Accurate Binary Neural Network11
Matching Compound Prototypes for Few-Shot Action Recognition11
L3AM: Linear Adaptive Additive Angular Margin Loss for Video-Based Hand Gesture Authentication11
Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks11
Guest Editorial: Special Issue on Advances in Computer Vision and Applications (ACCV 2020)10
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models10
Symmetry-aware Neural Architecture for Embodied Visual Navigation10
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks10
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation10
Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset10
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose10
Distance Based Image Classification: A solution to generative classification’s conundrum?10
ReliTalk: Relightable Talking Portrait Generation from a Single Video10
Learning Cooperative Neural Modules for Stylized Image Captioning10
Adaptive Multi-Source Predictor for Zero-Shot Video Object Segmentation10
Attribute-Image Person Re-identification via Modal-Consistent Metric Learning10
Toward Practical Weakly Supervised Semantic Segmentation via Point-Level Supervision10
SportsCap: Monocular 3D Human Motion Capture and Fine-Grained Understanding in Challenging Sports Videos10
Exploring Vision-Language Models for Imbalanced Learning9
Correction to: Long-Short Temporal–Spatial Clues Excited Network for Robust Person Re-identification9
Learning Sequence Representations by Non-local Recurrent Neural Memory9
Does Confusion Really Hurt Novel Class Discovery?9
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition9
Self-supervised Scalable Deep Compressed Sensing9
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow9
Overcoming the Domain Gap in Neural Action Representations9
Semantic Contrastive Bootstrapping for Single-Positive Multi-label Recognition9
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation9
Robust Heterogeneous Model Fitting for Multi-source Image Correspondences9
Correction: Open-Vocabulary Text-Driven Human Image Generation9
FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding9
Efficient Person Search: An Anchor-Free Approach9
Snowvision: Segmenting, Identifying, and Discovering Stamped Curve Patterns from Fragments of Pottery9
Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition9
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild9
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation9
AROID: Improving Adversarial Robustness Through Online Instance-Wise Data Augmentation9
DeMoCap: Low-Cost Marker-Based Motion Capture9
DCP–NAS: Discrepant Child–Parent Neural Architecture Search for 1-bit CNNs8
Learning by Asking Questions for Knowledge-Based Novel Object Recognition8
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data8
Bridging the Source-to-Target Gap for Cross-Domain Person Re-identification with Intermediate Domains8
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution8
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation8
R$$^{2}$$S100K: Road-Region Segmentation Dataset for Semi-supervised Autonomous Driving in the Wild8
Learning Enriched Hop-Aware Correlation for Robust 3D Human Pose Estimation8
EfficientPS: Efficient Panoptic Segmentation8
Shape My Face: Registering 3D Face Scans by Surface-to-Surface Translation8
Data Augmentation for Low-Level Vision: CutBlur and Mixture-of-Augmentation8
Compressed Event Sensing (CES) Volumes for Event Cameras8
Position-Guided Point Cloud Panoptic Segmentation Transformer8
AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation8
UPR-Net: A Unified Pyramid Recurrent Network for Video Frame Interpolation8
Infrared Adversarial Patches with Learnable Shapes and Locations in the Physical World8
Deep Unsupervised 3D Human Body Reconstruction from a Sparse set of Landmarks8
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset8
SyDog-Video: A Synthetic Dog Video Dataset for Temporal Pose Estimation8
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation8
Learning to Detect Semantic Boundaries with Image-Level Class Labels8
FastTrack: A Highly Efficient and Generic GPU-Based Multi-object Tracking Method with Parallel Kalman Filter8
Joint Learning of Audio–Visual Saliency Prediction and Sound Source Localization on Multi-face Videos8
What Limits the Performance of Local Self-attention?8
Guided Attention in CNNs for Occluded Pedestrian Detection and Re-identification7
Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection7
3D Shape Analysis Through a Quantum Lens: the Average Mixing Kernel Signature7
Towards Ultra High-Speed Hyperspectral Imaging by Integrating Compressive and Neuromorphic Sampling7
0.02636194229126