Computer Vision and Image Understanding

Papers
(The TQCC of Computer Vision and Image Understanding is 6. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
Deep 3D human pose estimation: A review178
Skeleton-based action recognition via spatial and temporal transformer networks165
Deep learning for deepfakes creation and detection: A survey161
Pros and cons of GAN evaluation measures: New developments150
A review of 3D human pose estimation algorithms for markerless motion capture102
Fake face detection via adaptive manipulation traces extraction network91
A comprehensive review of past and present image inpainting methods78
TCLR: Temporal contrastive learning for video representation74
CUFD: An encoder–decoder network for visible and infrared image fusion based on common and unique feature decomposition66
High-level prior-based loss functions for medical image segmentation: A survey53
Single-image deblurring with neural networks: A comparative survey52
Knowledge distillation for incremental learning in semantic segmentation51
Nighttime image dehazing based on Retinex and dark channel prior using Taylor series expansion43
Multi-focus image fusion approach based on CNP systems in NSCT domain42
A survey on bias in visual datasets42
Visual object tracking: A survey42
SSMTL++: Revisiting self-supervised multi-task learning for video anomaly detection38
Human action recognition in drone videos using a few aerial training examples38
MFMAM: Image inpainting via multi-scale feature module with attention module37
SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection36
Detection of Face Recognition Adversarial Attacks35
Learning deep edge prior for image denoising34
Curriculum self-paced learning for cross-domain object detection33
The synergy of double attention: Combine sentence-level and word-level attention for image captioning30
ICycleGAN: Single image dehazing based on iterative dehazing model and CycleGAN28
Uncertainty-aware consistency regularization for cross-domain semantic segmentation28
Predicting the future from first person (egocentric) vision: A survey27
Video Deblurring via Spatiotemporal Pyramid Network and Adversarial Gradient Prior27
MTRNet++: One-stage mask-based scene text eraser25
Deep structural information fusion for 3D object detection on LiDAR–camera system24
Decoupled appearance and motion learning for efficient anomaly detection in surveillance video24
Detail preserving image denoising with patch-based structure similarity via sparse representation and SVD24
Ghost Removal via Channel Attention in Exposure Fusion23
Enhanced discriminative graph convolutional network with adaptive temporal modelling for skeleton-based action recognition22
Cross-modal distillation for RGB-depth person re-identification21
Real-time and accurate object detection in compressed video by long short-term feature aggregation20
Animal pose estimation: A closer look at the state-of-the-art, existing gaps and opportunities20
Multi-scale attention network for image inpainting20
Adaptive CNN filter pruning using global importance metric20
Person re-identification with part prediction alignment20
Automatic detection and localization of thighbone fractures in X-ray based on improved deep learning method19
Fully convolutional online tracking19
Efficient dual attention SlowFast networks for video action recognition19
A survey on RGB-D datasets18
Multimodal attention networks for low-level vision-and-language navigation18
SID: Incremental learning for anchor-free object detection via Selective and Inter-related Distillation18
Sejong face database: A multi-modal disguise face database18
Casting a BAIT for offline and online source-free domain adaptation17
Pruning CNN filters via quantifying the importance of deep visual representations17
Evaluate and improve the quality of neural style transfer17
Attentive deep network for blind motion deblurring on dynamic scenes17
MC-Calib: A generic and robust calibration toolbox for multi-camera systems17
Task dependent deep LDA pruning of neural networks16
A data augmentation framework by mining structured features for fake face image detection16
PS-DeVCEM: Pathology-sensitive deep learning model for video capsule endoscopy based on weakly labeled data16
Video action detection by learning graph-based spatio-temporal interactions16
Encoder and decoder network with ResNet-50 and global average feature pooling for local change detection15
Periocular biometrics and its relevance to partially masked faces: A survey15
Deep learning-based single image face depth data enhancement15
AC-VRNN: Attentive Conditional-VRNN for multi-future trajectory prediction15
Robust real-world point cloud registration by inlier detection15
Multi-human Fall Detection and Localization in Videos14
BasicTAD: An astounding RGB-Only baseline for temporal action detection14
Snow Mask Guided Adaptive Residual Network for Image Snow Removal14
Embedding group and obstacle information in LSTM networks for human trajectory prediction in crowded scenes14
Investigating the significance of adversarial attacks and their relation to interpretability for radar-based human activity recognition systems14
Context understanding in computer vision: A survey14
Unifying frame rate and temporal dilations for improved remote pulse detection14
Spatial location constraint prototype loss for open set recognition14
Few-shot action recognition with implicit temporal alignment and pair similarity optimization14
Light-weight shadow detection via GCN-based annotation strategy and knowledge distillation14
A novel shape matching descriptor for real-time static hand gesture recognition13
Frame-level refinement networks for skeleton-based gait recognition13
Comprehensive comparative evaluation of background subtraction algorithms in open sea environments13
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics13
Lightweight adaptive weighted network for single image super-resolution12
Facial landmarks localization using cascaded neural networks12
Multi-modal semantic image segmentation12
A multi-view-CNN framework for deep representation learning in image classification11
Attention-induced semantic and boundary interaction network for camouflaged object detection11
Single image rain removal via multi-module deep grid network11
MTCD: Cataract detection via near infrared eye images11
Physics-based shading reconstruction for intrinsic image decomposition11
LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR10
HSGAN: Reducing mode collapse in GANs by the latent code distance of homogeneous samples10
Learning to locate for fine-grained image recognition10
Unsupervised sound localization via iterative contrastive learning10
BacklitNet: A dataset and network for backlit image enhancement10
Multiple instance learning on deep features for weakly supervised object detection with extreme domain shifts10
Image retrieval with mixed initiative and multimodal feedback10
Facial landmark points detection using knowledge distillation-based neural networks10
α-EGAN: 10
Detecting abnormality with separated foreground and background: Mutual Generative Adversarial Networks for video abnormal event detection10
Accurate MR image super-resolution via lightweight lateral inhibition network10
Target-aware and spatial-spectral discriminant feature joint correlation filters for hyperspectral video object tracking10
Self-knowledge distillation via dropout9
Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation9
A semantically driven self-supervised algorithm for detecting anomalies in image sets9
MAEDAY: MAE for few- and zero-shot AnomalY-Detection9
An efficient framework for few-shot skeleton-based temporal action segmentation9
Rolling-Shutter-stereo-aware motion estimation and image correction9
Monocular 3D multi-person pose estimation via predicting factorized correction factors9
Human skeletons and change detection for efficient violence detection in surveillance videos9
Multi-person 3D pose estimation from a single image captured by a fisheye camera8
Low-light image enhancement by deep learning network for improved illumination map8
Video scene parsing: An overview of deep learning methods and datasets8
Learning representational invariances for data-efficient action recognition8
Zero-shot sketch-based image retrieval with structure-aware asymmetric disentanglement8
Adversarial feature distribution alignment for semi-supervised learning8
Video frame interpolation via down–up scale generative adversarial networks8
Multi-perspective cross-class domain adaptation for open logo detection8
Model-image registration of a building’s facade based on dense semantic segmentation8
Weakly supervised fine-grained image classification via two-level attention activation model8
Learning transformer-based attention region with multiple scales for occluded person re-identification8
When CNNs meet random RNNs: Towards multi-level analysis for RGB-D object and scene recognition8
Robust detection of dehazed images via dual-stream CNNs with adaptive feature fusion8
Anti-jamming heart rate estimation using a spatial–temporal fusion network8
Self-attentive 3D human pose and shape estimation from videos8
MECCANO: A multimodal egocentric dataset for humans behavior understanding in the industrial-like domain8
Diff attention: A novel attention scheme for person re-identification8
Weakly supervised instance segmentation using multi-prior fusion8
Video captioning: A comparative review of where we are and which could be the route7
Action Capsules: Human skeleton action recognition7
Reliable shot identification for complex event detection via visual-semantic embedding7
E-ProSRNet: An enhanced progressive single image super-resolution approach7
Dissected 3D CNNs: Temporal skip connections for efficient online video processing7
SCA-Net: Spatial and channel attention-based network for 3D point clouds7
Learning to teach and learn for semi-supervised few-shot image classification7
Balanced softmax cross-entropy for incremental learning with and without memory7
TMF: Temporal Motion and Fusion for action recognition7
Graph Convolutional Networks based on manifold learning for semi-supervised image classification7
Open cross-domain visual search7
Infrared and visible image fusion via mutual information maximization7
Anchor pruning for object detection7
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications7
A comparison of methods for 3D scene shape retrieval7
STURE: Spatial–Temporal Mutual Representation Learning for robust data association in online multi-object tracking7
Infrared and visible image fusion using a guiding network to leverage perceptual similarity7
FIFNET: A convolutional neural network for motion-based multiframe super-resolution using fusion of interpolated frames7
Adaptive Capsule Network7
DSDNet: Toward single image deraining with self-paced curricular dual stimulations7
Deducing health cues from biometric data7
FRIDA — Generative feature replay for incremental domain adaptation6
Pointly-supervised scene parsing with uncertainty mixture6
Exploiting multimodal synthetic data for egocentric human-object interaction detection in an industrial scenario6
Pick-Object-Attack: Type-specific adversarial attack for object detection6
SAPS: Self-Attentive Pathway Search for weakly-supervised action localization with background-action augmentation6
Plug-and-Play video super-resolution using edge-preserving filtering6
Fine-grained facial landmark detection exploiting intermediate feature representations6
Feature preserving 3D mesh denoising with a Dense Local Graph Neural Network6
Single image super-resolution via hybrid resolution NSST prediction6
MetaVD: A Meta Video Dataset for enhancing human action recognition datasets6
Unsupervised video anomaly detection based on multi-timescale trajectory prediction6
Camouflaged object detection via Neighbor Connection and Hierarchical Information Transfer6
On the exact recovery conditions of 3D human motion from 2D landmark motion with sparse articulated motion6
Are 3D convolutional networks inherently biased towards appearance?6
An asymmetrical-structure auto-encoder for unsupervised representation learning of skeleton sequences6
Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos6
M2FINet: Modality-specific and Modality-shared Features Interaction Network for RGB-IR Person Re-Identification6
Stacked Capsule Graph Autoencoders for geometry-aware 3D head pose estimation6
Semantic segmentation from remote sensor data and the exploitation of latent learning for classification of auxiliary tasks6
DenseNet-CTC: An end-to-end RNN-free architecture for context-free string recognition6
SIFNet: Free-form image inpainting using color split-inpaint-fuse approach6
Unsupervised face frontalization using disentangled representation-learning CycleGAN6
One-class anomaly detection via novelty normalization6
SnapshotNet: Self-supervised feature learning for point cloud data segmentation using minimal labeled data6
AWDMC-Net: Classification of Adversarial Weather Degraded Multiclass scenes using a Convolution Neural Network6
Adaptive feature denoising based deep convolutional network for single image super-resolution6
The MSR-Video to Text dataset with clean annotations6
Prediction and Description of Near-Future Activities in Video6
0.054257869720459