Computer Vision and Image Understanding

Papers
(The TQCC of Computer Vision and Image Understanding is 5. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Editorial Board253
Efficient cross-information fusion decoder for semantic segmentation227
Editorial Board210
Editorial Board200
Editorial Board131
MATTE: Multi-task multi-scale attention96
Modality adaptation via feature difference learning for depth human parsing88
CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation74
Editorial Board70
Emerging image generation with flexible control of perceived difficulty69
Improving the planarity and sharpness of monocularly estimated depth images using the Phong reflection model60
Twin-SegNet: Dynamically coupled complementary segmentation networks for generalized medical image segmentation59
Exploring using jigsaw puzzles for out-of-distribution detection57
Feature reconstruction and metric based network for few-shot object detection47
RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving44
Siamese self-supervised learning for fine-grained visual classification41
Luminance prior guided Low-Light 4C catenary image enhancement34
Extending function mixture network for improved spectral super-resolution34
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications31
Decoupled appearance and motion learning for efficient anomaly detection in surveillance video30
Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection29
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics29
Lightweight feature point detection network with channel enhancement28
Deducing health cues from biometric data28
Editorial Board26
Improved Short-term Dense Bottleneck network for efficient scene analysis26
Editorial Board26
Sejong face database: A multi-modal disguise face database25
Feature preserving 3D mesh denoising with a Dense Local Graph Neural Network25
View-aligned pixel-level feature aggregation for 3D shape classification24
Deep-STaR: Classification of image time series based on spatio-temporal representations24
Adaptive CNN filter pruning using global importance metric23
Embedding AI ethics into the design and use of computer vision technology for consumer’s behaviour understanding22
Implicit and explicit commonsense for multi-sentence video captioning22
Robust detection of dehazed images via dual-stream CNNs with adaptive feature fusion22
Syntactically and semantically enhanced captioning network via hybrid attention and POS tagging prompt22
SIERRA: A robust bilateral feature upsampler for dense prediction22
RelFormer: Advancing contextual relations for transformer-based dense captioning22
Hi-ROS: Open-source multi-camera sensor fusion for real-time people tracking22
Reverse Stable Diffusion: What prompt was used to generate this image?21
3D object feature extraction and classification using 3D MF-DFA21
Editorial Board21
Towards efficient image and video style transfer via distillation and learnable feature transformation21
Enhanced dual contrast representation learning with cell separation and merging for breast cancer diagnosis20
Editorial Board20
Online real-time pedestrian tracking from medium altitude aerial footage with camera motion cancellation20
Hallucinating uncertain motion and future for static image action recognition20
Other tokens matter: Exploring global and local features of Vision Transformers for Object Re-Identification19
Enhanced discriminative graph convolutional network with adaptive temporal modelling for skeleton-based action recognition19
Self-supervised network for low-light traffic image enhancement based on deep noise and artifacts removal18
M318
TFUT: Task fusion upward transformer model for multi-task learning on dense prediction18
Continuous fake media detection: Adapting deepfake detectors to new generative techniques18
Lv-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors17
SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection17
Pseudo initialization based Few-Shot Class Incremental Learning17
Unsupervised real image super-resolution via knowledge distillation network17
Uncertainty estimation using boundary prediction for medical image super-resolution17
Lightning fast video anomaly detection via multi-scale adversarial distillation16
When super-resolution meets camouflaged object detection: A comparison study16
Subspace reconstruction based correlation filter for object tracking15
Dissected 3D CNNs: Temporal skip connections for efficient online video processing15
Learning spectral transform for 3D human motion prediction15
A multi-view-CNN framework for deep representation learning in image classification15
CTM: Cross-time temporal module for fine-grained action recognition15
Multi-view cognition with path search for one-shot part labeling14
Deep parametric Retinex decomposition model for low-light image enhancement14
Scribble-based complementary graph reasoning network for weakly supervised salient object detection14
Global key knowledge distillation framework14
BasicTAD: An astounding RGB-Only baseline for temporal action detection14
Transformed ROIs for capturing visual transformations in videos13
Sketch-based 3D shape retrieval via teacher–student learning13
Semantic-driven diffusion for sign language production with gloss-pose latent spaces alignment13
SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video13
Casting a BAIT for offline and online source-free domain adaptation13
Real-time distributed video analytics for privacy-aware person search13
Ensemble learning-based method for maritime background subtraction in open sea environments13
Multi-dimensional attention-aided transposed ConvBiLSTM network for hyperspectral image super-resolution13
TCLR: Temporal contrastive learning for video representation12
A robust kinship verification scheme using face age transformation12
Attention-induced semantic and boundary interaction network for camouflaged object detection12
Quantifying model uncertainty for semantic segmentation of Fluorine-19 MRI using stochastic gradient MCMC12
Hexagonal mesh-based neural rendering for real-time rendering and fast reconstruction12
Editorial Board12
Extending class activation mapping using Gaussian receptive field11
Image style disentangling for instance-level facial attribute transfer11
3D Pose Nowcasting: Forecast the future to improve the present11
MASK_LOSS guided non-end-to-end image denoising network based on multi-attention module with bias rectified linear unit and absolute pooling unit11
MLGPnet: Multi-granularity neural network for 3D shape recognition using pyramid data11
The shading isophotes: Model and methods for Lambertian planes and a point light11
α-EGAN: 11
Combinational sign language recognition11
Tensor robust PCA with nonconvex and nonlocal regularization10
For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives10
DM-Align: Leveraging the power of natural language instructions to make changes to images10
Learning representational invariances for data-efficient action recognition10
Rethink arbitrary style transfer with transformer and contrastive learning10
Monocular 3D multi-person pose estimation via predicting factorized correction factors10
Deep learning-based estimation of whole-body kinematics from multi-view images10
Space–time recurrent memory network10
Semi-supervised Cycle-GAN for face photo-sketch translation in the wild10
LightSOD: Towards lightweight and efficient network for salient object detection10
Survey on fast dense video segmentation techniques10
GradPaint: Gradient-guided inpainting with diffusion models9
Semantic manipulation through the lens of Geometric Algebra9
Robust attention ranking architecture with frequency-domain transform to defend against adversarial samples9
VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection9
Accurate depth image generation via overfit training of point cloud registration using local frame sets9
Distributed multi-target tracking and active perception with mobile camera networks9
FAM: Improving columnar vision transformer with feature attention mechanism9
A multi camera unsupervised domain adaptation pipeline for object detection in cultural sites through adversarial learning and self-training9
To make yourself invisible with Adversarial Semantic Contours9
4DHumanOutfit: A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements9
Dual cross-enhancement network for highly accurate dichotomous image segmentation9
Human action recognition in drone videos using a few aerial training examples8
EFSCNN: Encoded Feature Sphere Convolution Neural Network for fast non-rigid 3D models classification and retrieval8
Certifiable algorithms for the two-view planar triangulation problem8
Bidirectional brain image translation using transfer learning from generic pre-trained models8
Dehazing cost volume for deep multi-view stereo in scattering media with airlight and scattering coefficient estimation8
GSNNet: Group semantic-guided neighbor interaction network for co-salient object detection8
STURE: Spatial–Temporal Mutual Representation Learning for robust data association in online multi-object tracking8
Editorial Board8
AWADA: Foreground-focused adversarial learning for cross-domain object detection8
Adaptive semantic guidance network for video captioning8
Editorial Board8
Facial landmark points detection using knowledge distillation-based neural networks8
Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network8
Editorial Board8
MFCT: Multi-Frequency Cascade Transformers for no-reference SR-IQA8
Learning rotation equivalent scene representation from instance-level semantics: A novel top-down perspective8
UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation7
Joint coupled dictionaries-based visible-infrared image fusion method via texture preservation structure in sparse domain7
Editorial Board7
A closer look at branch classifiers of multi-exit architectures7
A distribution independence based method for 3D face shape decomposition7
View consistency aware holistic triangulation for 3D human pose estimation7
Adaptive gradients and weight projection based on quantized neural networks for efficient image classification7
Real-world efficient fall detection: Balancing performance and complexity with FDGA workflow7
Semantically accurate super-resolution Generative Adversarial Networks7
LocoGAN — Locally convolutional GAN7
Adaptive feature denoising based deep convolutional network for single image super-resolution7
Light-weight shadow detection via GCN-based annotation strategy and knowledge distillation7
MDC-Net: Multi-domain constrained kernel estimation network for blind image super resolution7
MetaVD: A Meta Video Dataset for enhancing human action recognition datasets7
Discriminative object tracking by domain contrast7
Adversarial Style Mixup and Improved Temporal Alignment for Cross-Domain Few-Shot Action Recognition7
Editorial Board7
Editorial Board7
Periocular biometrics and its relevance to partially masked faces: A survey7
Self-supervision & meta-learning for one-shot unsupervised cross-domain detection7
On the coherency of quantitative evaluation of visual explanations7
Minimum error adaptive RGB calibration in a context of colorimetric uncertainty for cultural heritage preservation7
Disentangled generation network for enlarged license plate recognition and a unified dataset6
Incorporating degradation estimation in light field spatial super-resolution6
CMGNet: Collaborative multi-modal graph network for video captioning6
Leaf cultivar identification via prototype-enhanced learning6
Conditioning diffusion models via attributes and semantic masks for face generation6
Sparse graph matching network for temporal language localization in videos6
An image denoising method based on the nonlinear Schrödinger equation and spectral subband decomposition6
Evaluate and improve the quality of neural style transfer6
High frame rate optical flow estimation from event sensors via intensity estimation6
Self-supervised vision transformers for semantic segmentation6
Lightweight cross-modal transformer for RGB-D salient object detection6
Constituent Attention for Vision Transformers6
Human skeletons and change detection for efficient violence detection in surveillance videos6
Plug-and-Play video super-resolution using edge-preserving filtering6
2.5D visual relationship detection6
Opti-CAM: Optimizing saliency maps for interpretability6
Dual adversarial model: Exploring low-dimensional space features for point clouds generating and completing6
Certifiable planar relative pose estimation with gravity prior6
Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition6
MAL-Net: Multiscale Attention Link Network for accurate eye center detection6
MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming6
Multi-person 3D pose estimation from a single image captured by a fisheye camera6
NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters6
Progressive multi-scale fusion network for RGB-D salient object detection6
Editorial Board6
Stacked Capsule Graph Autoencoders for geometry-aware 3D head pose estimation6
MAIN: Multi-Attention Instance Network for video segmentation6
Semantic segmentation from remote sensor data and the exploitation of latent learning for classification of auxiliary tasks6
Diversified text-to-image generation via deep mutual information estimation5
Improving rare relation inferring for scene graph generation using bipartite graph network5
DFNet-Trans: An end-to-end multibranching network for depth estimation for transparent objects5
ParticleAugment: Sampling-based data augmentation5
Editorial for CVIU_DL for image restoration5
CUFD: An encoder–decoder network for visible and infrared image fusion based on common and unique feature decomposition5
Discriminative semantic transitive consistency for cross-modal learning5
Local to global purification strategy to realize collaborative camouflaged object detection5
Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation5
AC-VRNN: Attentive Conditional-VRNN for multi-future trajectory prediction5
Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos5
Modality mixer exploiting complementary information for multi-modal action recognition5
Bypass network for semantics driven image paragraph captioning5
Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes5
A linear method for camera pair self-calibration5
Addressing multiple salient object detection via dual-space long-range dependencies5
Editorial Board5
Scene-cGAN: A GAN for underwater restoration and scene depth estimation5
Spatial constraint for efficient semi-supervised video object segmentation5
Editorial Board5
Re-scoring using image-language similarity for few-shot object detection5
Brain tumor image segmentation based on shuffle transformer-dynamic convolution and inception dilated convolution5
Visual object tracking: A survey5
Learning key lines for multi-object tracking5
Continual learning on 3D point clouds with random compressed rehearsal5
Invisible backdoor attack with attention and steganography5
SAPS: Self-Attentive Pathway Search for weakly-supervised action localization with background-action augmentation5
BacklitNet: A dataset and network for backlit image enhancement5
A lightweight convolutional neural network-based feature extractor for visible images5
RocNet: Recursive octree network for efficient 3D processing5
Weakly supervised fine-grained image classification via two-level attention activation model5
RSTC: Residual Swin Transformer Cascade to approximate Taylor expansion for image denoising5
Editorial Board5
Detecting abnormality with separated foreground and background: Mutual Generative Adversarial Networks for video abnormal event detection5
Font transformer for few-shot font generation5
Few-shot action recognition with implicit temporal alignment and pair similarity optimization5
Blur aware metric depth estimation with multi-focus plenoptic cameras5
Simultaneous image denoising and completion through convolutional sparse representation and nonlocal self-similarity5
0.10793805122375