International Journal of Computer Vision

Papers
(The TQCC of International Journal of Computer Vision is 10. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Conditional Temporal Variational AutoEncoder for Action Video Prediction1196
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression1046
A Minimal Solution for Image-Based Sphere Estimation1027
GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence375
Are Vision Transformers Robust to Spurious Correlations?333
BioDrone: A Bionic Drone-Based Single Object Tracking Benchmark for Robust Vision324
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks229
Guest Editorial: Special Issue on Open-World Visual Recognition209
Guest Editorial: Special Issue on Large-Scale Generative Models for Content Creation and Manipulation207
Bootstrapping Vision-Language Models for Frequency-Centric Self-Supervised Remote Physiological Measurement201
View Birdification in the Crowd: Ground-Plane Localization from Perceived Movements182
Instance-Aware Scene Layout Forecasting174
Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics148
From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting148
Exploring the Semi-Supervised Video Object Segmentation Problem from a Cyclic Perspective134
Image Synthesis Under Limited Data: A Survey and Taxonomy124
Learning with Enriched Inductive Biases for Vision-Language Models123
OpenMonkeyChallenge: Dataset and Benchmark Challenges for Pose Estimation of Non-human Primates122
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels105
Common Pole–Polar Properties of Central Catadioptric Sphere and Line Images Used for Camera Calibration104
Learning Text-to-Video Retrieval from Image Captioning102
Correction: Multi-source-free Domain Adaptive Object Detection102
MoDA: Modeling Deformable 3D Objects from Casual Videos101
Learning Discriminative Features for Visual Tracking via Scenario Decoupling101
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention100
Instance-dependent Label Distribution Estimation for Learning with Label Noise94
EAN: Event Adaptive Network for Enhanced Action Recognition93
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach91
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition88
Delving Deeper into Anti-Aliasing in ConvNets87
Deep Image Deblurring: A Survey85
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild83
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models81
Shape My Face: Registering 3D Face Scans by Surface-to-Surface Translation81
Semantic-Based Implicit Feature Transform for Few-Shot Classification81
Noise-Resistant Multimodal Transformer for Emotion Recognition79
Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization75
Correction: SOTVerse: A User-Defined Task Space of Single Object Tracking75
Lightweight and Progressively-Scalable Networks for Semantic Segmentation74
Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data67
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization65
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer64
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation63
A Realism Metric for Generated LiDAR Point Clouds62
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks61
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution58
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond58
Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking58
Relating View Directions of Complementary-View Mobile Cameras via the Human Shadow57
Exploiting Inter-Sample Affinity for Knowability-Aware Universal Domain Adaptation56
Free-view Face Relighting Using a Hybrid Parametric Neural Model on a SMALL-OLAT Dataset55
Learning Cooperative Neural Modules for Stylized Image Captioning51
Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose50
Guest Editorial: Special Issue on the British Machine Vision Conference 202249
UniCanvas: Affordance-Aware Unified Real Image Editing via Customized Text-to-Image Generation49
UMSCS: A Novel Unpaired Multimodal Image Segmentation Method Via Cross-Modality Generative and Semi-supervised Learning49
Bi-calibration Networks for Weakly-Supervised Video Representation Learning48
Learning Accurate Low-bit Quantization towards Efficient Computational Imaging48
Learning to Generalize Heterogeneous Representation for Cross-Modality Image Synthesis via Multiple Domain Interventions47
SeaFormer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition45
H-SegMed: A Hybrid Method for Prostate Segmentation in TRUS Images via Improved Closed Principal Curve and Improved Enhanced Machine Learning45
VideoQA in the Era of LLMs: An Empirical Study45
Sfnet: Faster and Accurate Semantic Segmentation Via Semantic Flow44
Diagram Perception Networks for Textbook Question Answering via Joint Optimization40
WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation40
Learning to Prompt for Vision-Language Models40
Correction to: On the Arbitrary-Oriented Object Detection: Classification Based Approaches Revisited40
Deep Maximum a Posterior Estimator for Video Denoising39
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces38
Globally Correlation-Aware Hard Negative Generation36
IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion36
Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-identification36
Advances in 3D Neural Stylization: A Survey35
Generative Adversarial Network Applications in Industry 4.0: A Review35
Improving Domain Adaptation Through Class Aware Frequency Transformation35
Unsupervised Domain Adaptation with Background Shift Mitigating for Person Re-Identification35
Image Matting and 3D Reconstruction in One Loop35
Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method35
Beyond Learned Metadata-Based Raw Image Reconstruction34
I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification34
Understanding Synonymous Referring Expressions via Contrastive Features34
Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking33
Robust Unpaired Image Dehazing via Density and Depth Decomposition33
A Nonlinear, Regularized, and Data-independent Modulation for Continuously Interactive Image Processing Network33
EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining33
Skeletonizing Caenorhabditis elegans Based on U-Net Architectures Trained with a Multi-worm Low-Resolution Synthetic Dataset33
Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild31
RePCD-Net: Feature-Aware Recurrent Point Cloud Denoising Network31
Feature Matching via Motion-Consistency Driven Probabilistic Graphical Model31
Guest Editorial: Special Issue on Computer Vision from 2D to 3D30
A CNN Based Approach for the Point-Light Photometric Stereo Problem30
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection30
Deep Learning Geometry Compression Artifacts Removal for Video-Based Point Cloud Compression30
Structured Binary Neural Networks for Image Recognition30
Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing29
Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing29
Shuffled Linear Regression with Outliers in Both Covariates and Responses28
Assignment Flow for Order-Constrained OCT Segmentation28
PartCom: Part Composition Learning for 3D Open-Set Recognition28
InstaBoost++: Visual Coherence Principles for Unified 2D/3D Instance Level Data Augmentation27
Blur Invariants for Image Recognition27
Distribution-Aware Margin Calibration for Semantic Segmentation in Images26
A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance26
SHARP: Shape-Aware Reconstruction of People in Loose Clothing26
An Optimal Transport View of Class-Imbalanced Visual Recognition26
Exemplar-Free Lifelong Person Re-identification via Prompt-Guided Adaptive Knowledge Consolidation26
Semantic Edge Detection with Diverse Deep Supervision25
LEO: Generative Latent Image Animator for Human Video Synthesis25
Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection25
Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring25
Source-Free Domain Adaptation via Target Prediction Distribution Searching25
Investigating Self-Supervised Methods for Label-Efficient Learning25
WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models25
CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition25
Active Perception for Visual-Language Navigation25
Countering Malicious DeepFakes: Survey, Battleground, and Horizon24
A Region-Based Randers Geodesic Approach for Image Segmentation24
Neural Architecture Search for Dense Prediction Tasks in Computer Vision24
Singularity Analysis for the Perspective-Four and Five-Line Problems24
Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification24
Few-Shot Referring Video Single- and Multi-Object Segmentation Via Cross-Modal Affinity with Instance Sequence Matching24
Editor’s Note: Special Issue on Computer Vision Approach for Animal Tracking and Modeling24
CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer23
Correction: Continual Face Forgery Detection via Historical Distribution Preserving23
Knowledge Distillation Meets Open-Set Semi-supervised Learning22
Self-Supervised Monocular Depth and Motion Learning in Dynamic Scenes: Semantic Prior to Rescue22
Anti-Bandit for Neural Architecture Search22
Out-of-Distribution Detection with Virtual Outlier Smoothing22
Robust Image Restoration with an Adaptive Huber Function Based Fidelity22
Few-Shot Learning with Complex-Valued Neural Networks and Dependable Learning21
Polynomial Implicit Neural Framework for Promoting Shape Awareness in Generative Models21
Guest Editorial: Special Issue: Computer Vision and Pattern Recognition (DAGM GCPR 2019)21
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook21
Nonblind Image Deconvolution via Leveraging Model Uncertainty in An Untrained Deep Neural Network21
Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations21
Hard-Normal Example-Aware Template Mutual Matching for Industrial Anomaly Detection21
Relation-Guided Adversarial Learning for Data-Free Knowledge Transfer20
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight20
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing20
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering20
A Numerical Framework for Elastic Surface Matching, Comparison, and Interpolation20
Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification19
Zero-Shot Learning on 3D Point Cloud Objects and Beyond19
Correction: Variational Rectification Inference for Learning with Noisy Labels19
LiDAR-guided Geometric Pretraining for Vision-Centric 3D Object Detection19
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding19
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware Contrast19
Correction to: AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach19
LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation19
DustNet++: Deep Learning-Based Visual Regression for Dust Density Estimation19
General Class-Balanced Multicentric Dynamic Prototype Pseudo-Labeling for Source-Free Domain Adaptation19
Preface to the Special Issue on Pattern Recognition (DAGM GCPR 2021)18
Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning18
Learning General and Specific Embedding with Transformer for Few-Shot Object Detection18
A Deeper Analysis of Volumetric Relightable Faces18
Learning 3D Semantic Scene Graphs with Instance Embeddings18
Single-View View Synthesis with Self-rectified Pseudo-Stereo18
IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection17
Adversarial Learning Domain-Invariant Conditional Features for Robust Face Anti-spoofing17
Image-Based Virtual Try-On: A Survey17
Segment Anything in 3D with Radiance Fields17
Sentimental Visual Captioning using Multimodal Transformer17
Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement17
Relative Norm Alignment for Tackling Domain Shift in Deep Multi-modal Classification17
Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos17
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection17
AutoScale: Learning to Scale for Crowd Counting17
Task Bias in Contrastive Vision-Language Models16
A Comprehensive Study of the Robustness for LiDAR-Based 3D Object Detectors Against Adversarial Attacks16
Rethinking Out-of-Distribution Detection From a Human-Centric Perspective16
Not All Pixels are Equal: Learning Pixel Hardness for Semantic Segmentation16
Leveraging Blur Information for Plenoptic Camera Calibration16
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models16
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes16
Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection16
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence16
RepSNet: A Nucleus Instance Segmentation Model Based on Boundary Regression and Structural Re-Parameterization15
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey15
NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention15
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation15
Correction: Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding15
Towards Robust Monocular Depth Estimation: A New Baseline and Benchmark15
Attribute-Centric Compositional Text-to-Image Generation15
Universal Prototype Transport for Zero-Shot Action Recognition and Localization15
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation15
Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors15
Incremental Model Enhancement via Memory-based Contrastive Learning15
DCP–NAS: Discrepant Child–Parent Neural Architecture Search for 1-bit CNNs15
Visual Object Tracking in First Person Vision15
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-modal Manipulation14
Self-supervised Scalable Deep Compressed Sensing14
Correction: Open-Vocabulary Text-Driven Human Image Generation14
Diagnosing Human-Object Interaction Detectors14
Multi-Constraint Transferable Generative Adversarial Networks for Cross-Modal Brain Image Synthesis14
Learning Sequence Representations by Non-local Recurrent Neural Memory14
Audio-Visual Segmentation with Semantics14
CRCNet: Few-Shot Segmentation with Cross-Reference and Region–Global Conditional Networks13
Learning Enriched Hop-Aware Correlation for Robust 3D Human Pose Estimation13
Action2video: Generating Videos of Human 3D Actions13
Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering13
Deep Learning-Based Image and Video Inpainting: A Survey13
Multi-teacher Universal Distillation Based on Information Hiding for Defense Against Facial Manipulation13
Position-Guided Point Cloud Panoptic Segmentation Transformer13
FusionBooster: A Unified Image Fusion Boosting Paradigm13
A General Paradigm with Detail-Preserving Conditional Invertible Network for Image Fusion13
Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization13
Fast Ultra High-Definition Video Deblurring via Multi-scale Separable Network13
Guest Editorial: Special Issue on Biometrics Security and Privacy13
Semantic Contrastive Embedding for Generalized Zero-Shot Learning13
Transformer for Object Re-identification: A Survey13
Deep Memory-Augmented Proximal Unrolling Network for Compressive Sensing13
Multi-Text Guidance Is Important: Multi-Modality Image Fusion via Large Generative Vision-Language Model13
Domain-Agnostic Priors for Semantic Segmentation Under Unsupervised Domain Adaptation and Domain Generalization12
Towards Frame Rate Agnostic Multi-object Tracking12
Just Recognizable Distortion for Machine Vision Oriented Image and Video Coding12
Warping the Residuals for Image Editing with StyleGAN12
A Survey on Long-Tailed Visual Recognition12
DLOW: Domain Flow and Applications12
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models12
Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training12
Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes12
Systematic Evaluation of Uncertainty Calibration in Pretrained Object Detectors12
Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation12
Open-Vocabulary Text-Driven Human Image Generation12
Predicting Visual Political Bias Using Webly Supervised Data and an Auxiliary Task12
Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks12
SoftPool++: An Encoder–Decoder Network for Point Cloud Completion12
Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups Using a Single Model Across Cages12
Compositional Prompting for Anti-Forgetting in Domain Incremental Learning12
PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition12
Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing12
Universal Representations: A Unified Look at Multiple Task and Domain Learning12
Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging12
Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints12
Dual Graph Networks for Pose Estimation in Crowded Scenes11
Learning Regression and Verification Networks for Robust Long-term Tracking11
Learning a Robust Part-Aware Monocular 3D Human Pose Estimator via Neural Architecture Search11
Unknown Support Prototype Set for Open Set Recognition11
Learning Portrait Drawing with Unsupervised Parts11
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement11
Deep Hierarchical Learning for 3D Semantic Segmentation11
Instance-Level Moving Object Segmentation from a Single Image with Events11
How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?10
Semi-Supervised Domain Generalization with Stochastic StyleMatch10
Multi-frame Motion Segmentation by Combining Two-Frame Results10
Deep Unpaired Blind Image Super-Resolution Using Self-supervised Learning and Exemplar Distillation10
Data-Driven Restoration of Digital Archaeological Pottery with Point Cloud Analysis10
Correction: Towards Automated Ethogramming: Cognitively-Inspired Event Segmentation for Streaming Wildlife Video Monitoring10
Instance Segmentation in the Dark10
Generalized Out-of-Distribution Detection: A Survey10
0.10956001281738