Computer Vision and Image Understanding

Papers
(The median citation count of Computer Vision and Image Understanding is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
Deep 3D human pose estimation: A review178
Skeleton-based action recognition via spatial and temporal transformer networks165
Deep learning for deepfakes creation and detection: A survey161
Pros and cons of GAN evaluation measures: New developments150
A review of 3D human pose estimation algorithms for markerless motion capture102
Fake face detection via adaptive manipulation traces extraction network91
A comprehensive review of past and present image inpainting methods78
TCLR: Temporal contrastive learning for video representation74
CUFD: An encoder–decoder network for visible and infrared image fusion based on common and unique feature decomposition66
High-level prior-based loss functions for medical image segmentation: A survey53
Single-image deblurring with neural networks: A comparative survey52
Knowledge distillation for incremental learning in semantic segmentation51
Nighttime image dehazing based on Retinex and dark channel prior using Taylor series expansion43
Visual object tracking: A survey42
Multi-focus image fusion approach based on CNP systems in NSCT domain42
A survey on bias in visual datasets42
Human action recognition in drone videos using a few aerial training examples38
SSMTL++: Revisiting self-supervised multi-task learning for video anomaly detection38
MFMAM: Image inpainting via multi-scale feature module with attention module37
SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection36
Detection of Face Recognition Adversarial Attacks35
Learning deep edge prior for image denoising34
Curriculum self-paced learning for cross-domain object detection33
The synergy of double attention: Combine sentence-level and word-level attention for image captioning30
Uncertainty-aware consistency regularization for cross-domain semantic segmentation28
ICycleGAN: Single image dehazing based on iterative dehazing model and CycleGAN28
Video Deblurring via Spatiotemporal Pyramid Network and Adversarial Gradient Prior27
Predicting the future from first person (egocentric) vision: A survey27
MTRNet++: One-stage mask-based scene text eraser25
Decoupled appearance and motion learning for efficient anomaly detection in surveillance video24
Detail preserving image denoising with patch-based structure similarity via sparse representation and SVD24
Deep structural information fusion for 3D object detection on LiDAR–camera system24
Ghost Removal via Channel Attention in Exposure Fusion23
Enhanced discriminative graph convolutional network with adaptive temporal modelling for skeleton-based action recognition22
Cross-modal distillation for RGB-depth person re-identification21
Multi-scale attention network for image inpainting20
Adaptive CNN filter pruning using global importance metric20
Person re-identification with part prediction alignment20
Real-time and accurate object detection in compressed video by long short-term feature aggregation20
Animal pose estimation: A closer look at the state-of-the-art, existing gaps and opportunities20
Fully convolutional online tracking19
Efficient dual attention SlowFast networks for video action recognition19
Automatic detection and localization of thighbone fractures in X-ray based on improved deep learning method19
Multimodal attention networks for low-level vision-and-language navigation18
SID: Incremental learning for anchor-free object detection via Selective and Inter-related Distillation18
Sejong face database: A multi-modal disguise face database18
A survey on RGB-D datasets18
Attentive deep network for blind motion deblurring on dynamic scenes17
MC-Calib: A generic and robust calibration toolbox for multi-camera systems17
Casting a BAIT for offline and online source-free domain adaptation17
Pruning CNN filters via quantifying the importance of deep visual representations17
Evaluate and improve the quality of neural style transfer17
A data augmentation framework by mining structured features for fake face image detection16
PS-DeVCEM: Pathology-sensitive deep learning model for video capsule endoscopy based on weakly labeled data16
Video action detection by learning graph-based spatio-temporal interactions16
Task dependent deep LDA pruning of neural networks16
Deep learning-based single image face depth data enhancement15
AC-VRNN: Attentive Conditional-VRNN for multi-future trajectory prediction15
Robust real-world point cloud registration by inlier detection15
Encoder and decoder network with ResNet-50 and global average feature pooling for local change detection15
Periocular biometrics and its relevance to partially masked faces: A survey15
Embedding group and obstacle information in LSTM networks for human trajectory prediction in crowded scenes14
Investigating the significance of adversarial attacks and their relation to interpretability for radar-based human activity recognition systems14
Context understanding in computer vision: A survey14
Unifying frame rate and temporal dilations for improved remote pulse detection14
Spatial location constraint prototype loss for open set recognition14
Few-shot action recognition with implicit temporal alignment and pair similarity optimization14
Light-weight shadow detection via GCN-based annotation strategy and knowledge distillation14
Multi-human Fall Detection and Localization in Videos14
BasicTAD: An astounding RGB-Only baseline for temporal action detection14
Snow Mask Guided Adaptive Residual Network for Image Snow Removal14
Comprehensive comparative evaluation of background subtraction algorithms in open sea environments13
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics13
A novel shape matching descriptor for real-time static hand gesture recognition13
Frame-level refinement networks for skeleton-based gait recognition13
Facial landmarks localization using cascaded neural networks12
Multi-modal semantic image segmentation12
Lightweight adaptive weighted network for single image super-resolution12
MTCD: Cataract detection via near infrared eye images11
Physics-based shading reconstruction for intrinsic image decomposition11
A multi-view-CNN framework for deep representation learning in image classification11
Attention-induced semantic and boundary interaction network for camouflaged object detection11
Single image rain removal via multi-module deep grid network11
Multiple instance learning on deep features for weakly supervised object detection with extreme domain shifts10
Image retrieval with mixed initiative and multimodal feedback10
Unsupervised sound localization via iterative contrastive learning10
BacklitNet: A dataset and network for backlit image enhancement10
Facial landmark points detection using knowledge distillation-based neural networks10
Accurate MR image super-resolution via lightweight lateral inhibition network10
Target-aware and spatial-spectral discriminant feature joint correlation filters for hyperspectral video object tracking10
α-EGAN: 10
Detecting abnormality with separated foreground and background: Mutual Generative Adversarial Networks for video abnormal event detection10
Learning to locate for fine-grained image recognition10
LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR10
HSGAN: Reducing mode collapse in GANs by the latent code distance of homogeneous samples10
MAEDAY: MAE for few- and zero-shot AnomalY-Detection9
An efficient framework for few-shot skeleton-based temporal action segmentation9
Rolling-Shutter-stereo-aware motion estimation and image correction9
Monocular 3D multi-person pose estimation via predicting factorized correction factors9
Human skeletons and change detection for efficient violence detection in surveillance videos9
Self-knowledge distillation via dropout9
Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation9
A semantically driven self-supervised algorithm for detecting anomalies in image sets9
Learning representational invariances for data-efficient action recognition8
Zero-shot sketch-based image retrieval with structure-aware asymmetric disentanglement8
Adversarial feature distribution alignment for semi-supervised learning8
Video frame interpolation via down–up scale generative adversarial networks8
Multi-perspective cross-class domain adaptation for open logo detection8
Model-image registration of a building’s facade based on dense semantic segmentation8
Weakly supervised fine-grained image classification via two-level attention activation model8
Learning transformer-based attention region with multiple scales for occluded person re-identification8
When CNNs meet random RNNs: Towards multi-level analysis for RGB-D object and scene recognition8
Robust detection of dehazed images via dual-stream CNNs with adaptive feature fusion8
Anti-jamming heart rate estimation using a spatial–temporal fusion network8
Self-attentive 3D human pose and shape estimation from videos8
MECCANO: A multimodal egocentric dataset for humans behavior understanding in the industrial-like domain8
Diff attention: A novel attention scheme for person re-identification8
Weakly supervised instance segmentation using multi-prior fusion8
Multi-person 3D pose estimation from a single image captured by a fisheye camera8
Low-light image enhancement by deep learning network for improved illumination map8
Video scene parsing: An overview of deep learning methods and datasets8
Balanced softmax cross-entropy for incremental learning with and without memory7
TMF: Temporal Motion and Fusion for action recognition7
Graph Convolutional Networks based on manifold learning for semi-supervised image classification7
Open cross-domain visual search7
Infrared and visible image fusion via mutual information maximization7
Anchor pruning for object detection7
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications7
A comparison of methods for 3D scene shape retrieval7
STURE: Spatial–Temporal Mutual Representation Learning for robust data association in online multi-object tracking7
Infrared and visible image fusion using a guiding network to leverage perceptual similarity7
FIFNET: A convolutional neural network for motion-based multiframe super-resolution using fusion of interpolated frames7
Adaptive Capsule Network7
DSDNet: Toward single image deraining with self-paced curricular dual stimulations7
Deducing health cues from biometric data7
Video captioning: A comparative review of where we are and which could be the route7
Action Capsules: Human skeleton action recognition7
Reliable shot identification for complex event detection via visual-semantic embedding7
E-ProSRNet: An enhanced progressive single image super-resolution approach7
Dissected 3D CNNs: Temporal skip connections for efficient online video processing7
SCA-Net: Spatial and channel attention-based network for 3D point clouds7
Learning to teach and learn for semi-supervised few-shot image classification7
Feature preserving 3D mesh denoising with a Dense Local Graph Neural Network6
Single image super-resolution via hybrid resolution NSST prediction6
MetaVD: A Meta Video Dataset for enhancing human action recognition datasets6
Unsupervised video anomaly detection based on multi-timescale trajectory prediction6
Camouflaged object detection via Neighbor Connection and Hierarchical Information Transfer6
On the exact recovery conditions of 3D human motion from 2D landmark motion with sparse articulated motion6
Are 3D convolutional networks inherently biased towards appearance?6
An asymmetrical-structure auto-encoder for unsupervised representation learning of skeleton sequences6
Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos6
M2FINet: Modality-specific and Modality-shared Features Interaction Network for RGB-IR Person Re-Identification6
Stacked Capsule Graph Autoencoders for geometry-aware 3D head pose estimation6
Semantic segmentation from remote sensor data and the exploitation of latent learning for classification of auxiliary tasks6
DenseNet-CTC: An end-to-end RNN-free architecture for context-free string recognition6
SIFNet: Free-form image inpainting using color split-inpaint-fuse approach6
Unsupervised face frontalization using disentangled representation-learning CycleGAN6
One-class anomaly detection via novelty normalization6
SnapshotNet: Self-supervised feature learning for point cloud data segmentation using minimal labeled data6
AWDMC-Net: Classification of Adversarial Weather Degraded Multiclass scenes using a Convolution Neural Network6
Adaptive feature denoising based deep convolutional network for single image super-resolution6
The MSR-Video to Text dataset with clean annotations6
Prediction and Description of Near-Future Activities in Video6
FRIDA — Generative feature replay for incremental domain adaptation6
Pointly-supervised scene parsing with uncertainty mixture6
Exploiting multimodal synthetic data for egocentric human-object interaction detection in an industrial scenario6
Pick-Object-Attack: Type-specific adversarial attack for object detection6
SAPS: Self-Attentive Pathway Search for weakly-supervised action localization with background-action augmentation6
Plug-and-Play video super-resolution using edge-preserving filtering6
Fine-grained facial landmark detection exploiting intermediate feature representations6
Effective crowd counting using multi-resolution context and image quality assessment-guided training5
Siamese self-supervised learning for fine-grained visual classification5
Spectrum-irrelevant fine-grained representation for visible–infrared person re-identification5
Feature reconstruction and metric based network for few-shot object detection5
An anchor-free object detector based on soften optimized bi-directional FPN5
Learning the Compositional Domains for Generalized Zero-shot Learning5
Unsupervised real image super-resolution via knowledge distillation network5
TransRPN: Towards the Transferable Adversarial Perturbations using Region Proposal Networks and Beyond5
Co-segmentation inspired attention module for video-based computer vision tasks5
Image amodal completion: A survey5
Interactive image segmentation based on the appearance model and orientation energy5
RFCNet: Enhancing urban segmentation using regularization, fusion, and completion5
Learning to combine the modalities of language and video for temporal moment localization5
Progressive multi-scale fusion network for RGB-D salient object detection5
Diversified text-to-image generation via deep mutual information estimation5
NCMS: Towards accurate anchor free object detection through 5
Deep-STaR: Classification of image time series based on spatio-temporal representations5
GSNNet: Group semantic-guided neighbor interaction network for co-salient object detection5
Semantically accurate super-resolution Generative Adversarial Networks5
WMCP-EM: An integrated dehazing framework for visibility restoration in single image5
Multi-granularity Pseudo-label Collaboration for unsupervised person re-identification5
Teacher or supervisor? Effective online knowledge distillation via guided collaborative learning5
Class knowledge overlay to visual feature learning for zero-shot image classification5
Weakly supervised action segmentation with effective use of attention and self-attention4
A comprehensive survey of procedural video datasets4
Meta conditional variational auto-encoder for domain generalization4
Semi-supervised Cycle-GAN for face photo-sketch translation in the wild4
Image editing with varying intensities of processing4
Unpaired sonar image denoising with simultaneous contrastive learning4
Tensor based completion meets adversarial learning: A win–win solution for change detection on unseen videos4
Subspace reconstruction based correlation filter for object tracking4
Indoor Synthetic Data Generation: A Systematic Review4
SimpleCut: A simple and strong 2D model for multi-person pose estimation4
Superclass-aware network for few-shot learning4
Pose invariant age estimation of face images in the wild4
GradPaint: Gradient-guided inpainting with diffusion models4
Glitch in the matrix: A large scale benchmark for content driven audio–visual forgery detection and localization4
Cross-domain few-shot action recognition with unlabeled videos4
Long term spatio-temporal modeling for action detection4
Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques4
Simultaneous multi-person tracking and activity recognition based on cohesive cluster search4
Model-based inexact graph matching on top of DNNs for semantic scene understanding4
AFA-Net: Adaptive Feature Attention Network in image deblurring and super-resolution for improving license plate recognition4
Instance-level salient object segmentation4
Handling new target classes in semantic segmentation with domain adaptation4
A novel fast combine-and-conquer object detector based on only one-level feature map4
Siamese Graph Attention Networks for robust visual object tracking4
Local to non-local: Multi-scale progressive attention network for image restoration4
Robust kernel-based feature representation for 3D point cloud analysis via circular convolutional network4
Digital image defogging using joint Retinex theory and independent component analysis4
Transformer-based image generation from scene graphs4
Full-parameter adaptive fuzzy clustering for noise image segmentation based on non-local and local spatial information4
Learning geodesic-aware local features from RGB-D images4
Grow-push-prune: Aligning deep discriminants for effective structural network compression4
SdcNet for object recognition3
Fréchet AutoEncoder Distance: A new approach for evaluation of Generative Adversarial Networks3
Joint coupled dictionaries-based visible-infrared image fusion method via texture preservation structure in sparse domain3
Human-Scene Network: A novel baseline with self-rectifying loss for weakly supervised video anomaly detection3
MeT: A graph transformer for semantic segmentation of 3D meshes3
PGF-BIQA: Blind image quality assessment via probability multi-grained cascade forest3
GAFL: Global adaptive filtering layer for computer vision3
Controlling biases and diversity in diverse image-to-image translation3
Incorporating structural prior for depth regularization in shape from focus3
Trimap-guided feature mining and fusion network for natural image matting3
Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection3
Multi-layered self-attention mechanism for weakly supervised semantic segmentation3
Video captioning using Semantically Contextual Generative Adversarial Network3
Hallucinating uncertain motion and future for static image action recognition3
Memory-efficient multi-scale residual dense network for single image rain removal3
A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification3
Cross-domain fashion cloth retrieval via novel attention-guided cascade neural network and clothing parsing3
Efficient cross-information fusion decoder for semantic segmentation3
Dehazing cost volume for deep multi-view stereo in scattering media with airlight and scattering coefficient estimation3
LocoGAN — Locally convolutional GAN3
Transformer with large convolution kernel decoder network for salient object detection in optical remote sensing images3
A global generalized maximum coverage-based solution to the non-model-based view planning problem for object reconstruction3
Adaptive semantic transfer network for unsupervised 2D image-based 3D model retrieval3
Recurrent context-aware multi-stage network for single image deraining3
Cutout with patch-loss augmentation for improving generative adversarial networks against instability3
End-to-end weakly-supervised single-stage multiple 3D hand mesh reconstruction from a single RGB image3
0.1067430973053