International Journal of Computer Vision

Papers
(The TQCC of International Journal of Computer Vision is 13. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)
ArticleCitations
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective2507
Guest Editorial: Special Issue on Open-World Visual Recognition822
Correction: Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization407
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates382
MoDA: Modeling Deformable 3D Objects from Casual Videos357
Correction: Multi-source-free Domain Adaptive Object Detection339
Learning with Enriched Inductive Biases for Vision-Language Models305
Conditional Temporal Variational AutoEncoder for Action Video Prediction217
Instance-dependent Label Distribution Estimation for Learning with Label Noise201
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements193
Invert Your Prompt: Editing-Aware Diffusion Inversion179
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision172
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention162
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting157
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition153
Image Synthesis Under Limited Data: A Survey and Taxonomy153
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation150
Learning Discriminative Features for Visual Tracking via Scenario Decoupling146
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression139
Image-based Morphological Characterization of Filamentous Biological Structures with Non-constant Curvature Shape Feature137
Large-Scale Pre-Trained Models Empowering Phrase Generalization in Temporal Sentence Localization128
Weakly Supervised Salient Object Detection with Text Supervision127
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence126
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration126
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement126
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion123
EAN: Event Adaptive Network for Enhanced Action Recognition118
Robust Averaging using Adaptive Annealing117
Exocentric-to-Egocentric Adaptation for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs110
AutoIT: Automated Image Tagging with Random Perturbation107
Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting107
UniAttack: Unified Physical-Digital Face Attack Detection106
Learning Extensible Series-Parallel Lookup Tables for Efficient Image Super-Resolution100
Are Vision Transformers Robust to Spurious Correlations?100
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks99
A Minimal Solution for Image-Based Sphere Estimation98
Delving Deeper into Anti-Aliasing in ConvNets90
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels87
Learning Text-to-Video Retrieval from Image Captioning84
Deep Image Deblurring: A Survey84
Guest Editorial: Special Issue on the British Machine Vision Conference 202280
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization78
H-SegMed: A Hybrid Method for Prostate Segmentation in TRUS Images via Improved Closed Principal Curve and Improved Enhanced Machine Learning76
Diagram Perception Networks for Textbook Question Answering via Joint Optimization76
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization75
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow74
Noise-Resistant Multimodal Transformer for Emotion Recognition74
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning72
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions68
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose68
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation67
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models66
Learning Latent Part-Whole Hierarchies for Point Clouds65
Learning Cooperative Neural Modules for Stylized Image Captioning64
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild63
Feature Hallucination for Self-supervised Action Recognition62
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking61
Correction: Consistent Prompt Tuning for Generalized Category Discovery58
UIL-AQA: Uncertainty-Aware Clip-Level Interpretable Action Quality Assessment57
OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation56
CAS-AIR-3D: A Large-scale Low-quality Multi-modal Face Database56
Symmetria: A Synthetic Dataset for Learning in Point Clouds55
A Realism Metric for Generated LiDAR Point Clouds55
Lightweight and Progressively-Scalable Networks for Semantic Segmentation55
Dynamic Knowledge Transfer for Mitigating Spurious Correlations in Deep Learning53
Sample-efficient Audio-Visual Learning of Scene Acoustics52
A Motion-Based Compression and Tracking System for Video Camera Trap-Based Insect Behaviour Studies51
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition50
AI killed the Video Star. Audio-Driven Diffusion Model for Expressive Talking Head Generation49
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data49
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow49
Learning Accurate Low-bit Quantization towards Efficient Computational Imaging48
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation48
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution48
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks47
Bi-calibration Networks for Weakly-Supervised Video Representation Learning46
Semantic-Based Implicit Feature Transform for Few-Shot Classification44
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset43
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation42
Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking42
Learning to Prompt for Vision-Language Models41
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer41
A CNN Based Approach for the Point-Light Photometric Stereo Problem40
VideoQA in the Era of LLMs: An Empirical Study40
A Nonlinear, Regularized, and Data-independent Modulation for Continuously Interactive Image Processing Network40
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey40
Globally Correlation-Aware Hard Negative Generation40
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond40
Robust Partial-to-Partial Point Cloud Registration with Overlapping Mask Learning39
Control Color: Multimodal Diffusion-Based Interactive Image Colorization39
Focal Modulation for Image Restoration39
Relaxed Knowledge Distillation39
Beyond Learned Metadata-Based Raw Image Reconstruction39
GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates38
Understanding Synonymous Referring Expressions via Contrastive Features38
Correction to: On the Arbitrary-Oriented Object Detection: Classification Based Approaches Revisited38
A Generalized Contour Vibration Model for Building Extraction38
Modeling Scattering Effect for Under-Display Camera Image Restoration38
Feature Matching via Motion-Consistency Driven Probabilistic Graphical Model38
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces37
Skeletonizing Caenorhabditis elegans Based on U-Net Architectures Trained with a Multi-worm Low-Resolution Synthetic Dataset37
EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining37
Improving Domain Adaptation Through Class Aware Frequency Transformation37
Correction: BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos36
Paragraph-to-Image Generation with Information-Enriched Diffusion Model36
Image Matting and 3D Reconstruction in One Loop35
Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking35
Advances in 3D Neural Stylization: A Survey35
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification35
Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method35
Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era35
From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave35
WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation35
Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation34
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models34
Generative Adversarial Network Applications in Industry 4.0: A Review34
Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos34
I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification33
IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion33
Robust Unpaired Image Dehazing via Density and Depth Decomposition33
Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing32
Guest Editorial: Special Issue on Computer Vision from 2D to 3D32
A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance32
Predictive Display for Teleoperation Based on Vector Fields Using Lidar-Camera Fusion31
PartCom: Part Composition Learning for 3D Open-Set Recognition31
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection31
Investigating Self-Supervised Methods for Label-Efficient Learning31
Beyond Image Prior: Embedding Noise Prior into Latent Space of Conditional Denoising Transformer31
Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing31
Shuffled Linear Regression with Outliers in Both Covariates and Responses30
Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching30
InstaBoost++: Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation30
Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection30
Structured Binary Neural Networks for Image Recognition30
SHARP: Shape-Aware Reconstruction of People in Loose Clothing30
An Optimal Transport View of Class-Imbalanced Visual Recognition29
Uncertainty-Aware and Decoupled Distillation for Semantic Segmentation29
WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models29
Blur Invariants for Image Recognition29
HACG: Leveraging Hierarchical Alignment and Caption Generation for Text-Video Retrieval29
Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring29
High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion29
Point-In-Context: Understanding Point Cloud via In-Context Learning28
TokenPacker: Efficient Visual Projector for Multimodal LLM28
Object-Scene-Camera Decomposition and Recomposition for Data Efficient Monocular 3D Object Detection28
Source-Free Domain Adaptation via Target Prediction Distribution Searching28
WildIng: A Wildlife Image Invariant Representation Model for Geographical Domain Shift28
Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge Consolidation27
Active Perception for Visual-Language Navigation27
A Region-Based Randers Geodesic Approach for Image Segmentation27
CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition27
LEO: Generative Latent Image Animator for Human Video Synthesis27
Countering Malicious DeepFakes: Survey, Battleground, and Horizon27
Editor’s Note: Special Issue on Computer Vision Approach for Animal Tracking and Modeling26
SMPL-IKS: A Mixed Analytical-Neural Inverse Kinematics Solver for 3D Human Mesh Recovery26
Subspace Training Mitigates Gradient Noise Vulnerability26
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight26
Hard-Normal Example-Aware Template Mutual Matching for Industrial Anomaly Detection26
Evidence Conflict Sampling for Open-set Active Learning26
Anti-Bandit for Neural Architecture Search26
Guest Editorial: Special Issue on Visual Datasets26
Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning26
Out-of-Distribution Detection with Virtual Outlier Smoothing25
A Novel Dataset and Lightweight Distillation Baseline for Highlight Transparent Object Detection25
Zero-Shot Learning on 3D Point Cloud Objects and Beyond25
An Interactive Conversational 3D Virtual Human25
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering25
Supervised Neural Style Transfer as an Augmentation Technique for Facial Landmark Detection25
Uniformity Preserving Transfer for Visual Prompt Tuning under Long-tailed Distribution25
Robust Image Restoration with an Adaptive Huber Function Based Fidelity24
Reconstructing a Sphere and the Camera Focal Length from a Single View by Fitting Planes24
Polynomial Implicit Neural Framework for Promoting Shape Awareness in Generative Models24
Editor’s Note: Special Issue on ACCV 202424
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing24
Knowledge Distillation Meets Open-Set Semi-supervised Learning23
Nonblind Image Deconvolution via Leveraging Model Uncertainty in An Untrained Deep Neural Network23
Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification23
Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue23
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer23
Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification23
LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation23
Neural Architecture Search for Dense Prediction Tasks in Computer Vision23
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook23
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding23
Correction: Variational Rectification Inference for Learning with Noisy Labels22
Preface to the Special Issue on Pattern Recognition (DAGM GCPR 2021)22
Relation-Guided Adversarial Learning for Data-Free Knowledge Transfer22
General Class-Balanced Multicentric Dynamic Prototype Pseudo-Labeling for Source-Free Domain Adaptation22
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing22
A Deeper Analysis of Volumetric Relightable Faces22
Correction: Continual Face Forgery Detection via Historical Distribution Preserving22
SLNMapping: Super Lightweight Neural Mapping in Large-Scale Scenes21
Structure-from-motion in micro-image domain for uncalibrated plenoptic 2.0 cameras21
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware Contrast21
UMCL: Unimodal-generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection21
FourierMIL: Fourier Filtering-based Multiple Instance Learning for Whole Slide Image Analysis21
Defending Against Adversarial Examples Via Modeling Adversarial Noise21
IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection21
Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos21
Single-View View Synthesis with Self-rectified Pseudo-Stereo21
AnyPattern: Towards In-context Image Copy Detection21
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection21
Dynamic MAsk-Pruning Strategy for Source-Free Model Intellectual Property Protection21
Leveraging Blur Information for Plenoptic Camera Calibration21
DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation21
Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection21
Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review20
A Comprehensive Study of the Robustness for LiDAR-Based 3D Object Detectors Against Adversarial Attacks20
Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement20
Learning General and Specific Embedding with Transformer for Few-Shot Object Detection20
Generalized Relative Pose and Scale from Affine Correspondences20
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts20
Unifying Viewgraph Sparsification and Disambiguation of Repeated Structures in Structure-from-Motion19
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection19
Sentimental Visual Captioning using Multimodal Transformer19
Thread Counting in Plain Weave for Old Paintings Using Regression Deep Learning Models19
GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation19
BayesAdapter: Enhanced Uncertainty Estimation in CLIP Few-Shot Adaptation19
CamoVid60K: A Large-Scale Video Dataset for Moving Camouflaged Animals Understanding19
Adversarial Learning Domain-Invariant Conditional Features for Robust Face Anti-spoofing19
Segment Anything in 3D with Radiance Fields19
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes19
Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning19
Image-Based Virtual Try-On: A Survey19
Deep Learning-Based Point Cloud Registration: A Comprehensive Survey and Taxonomy19
Relative Norm Alignment for Tackling Domain Shift in Deep Multi-modal Classification19
Rethinking Open-Set Object Detection: Issues, A New Formulation, and Taxonomy19
Evidential Robust Feature Learning for Generalized Few-Shot Segmentation19
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation18
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models18
Incremental Model Enhancement via Memory-based Contrastive Learning18
A Survey of Multimodal Hallucination Evaluation and Detection18
Task Bias in Contrastive Vision-Language Models18
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective18
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports18
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization18
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark18
Mitigating Label Noise using Prompt-Based Hyperbolic Meta-Learning in Open-Set Domain Generalization18
GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting17
CompViT: Real-Time Compressed Video Action Recognition with Asymmetric Transformer Networks17
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding17
Visual Object Tracking in First Person Vision17
A Generative Victim Model for Segmentation17
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization17
Neural Discrimination-Prompted Transformers for Efficient UHD Image Restoration and Enhancement17
Vision-Language Efficient Tuning for Mitigating Catastrophic Forgetting in Multi-Modal Learning17
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention17
Towards Scene-Aware Video-to-Spatial Audio Generation17
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence17
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey17
0.08792781829834