OOIR: Observatory of International Research

Papers

(The TQCC of Computer Vision and Image Understanding is 7. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)

Article	Citations
Luminance prior guided Low-Light 4C catenary image enhancement	381
Editorial Board	126
Efficient cross-information fusion decoder for semantic segmentation	115
CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation	114
Deducing health cues from biometric data	111
Editorial Board	94
Improving the planarity and sharpness of monocularly estimated depth images using the Phong reflection model	88
Editorial Board	62
Exploring using jigsaw puzzles for out-of-distribution detection	54
Extending function mixture network for improved spectral super-resolution	52
Editorial Board	50
Editorial Board	50
MATTE: Multi-task multi-scale attention	50
Feature reconstruction and metric based network for few-shot object detection	48
Convolutional neural network framework for deepfake detection: A diffusion-based approach	46
Twin-SegNet: Dynamically coupled complementary segmentation networks for generalized medical image segmentation	44
Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics	42
RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving	41
SNRD-Net: SNR-aware dual enhancement network for low-light images	40
Spatial Sensitive Grad-CAM++: Towards High-Quality Visual Explanations for Object Detectors via Weighted Combination of Gradient Maps	39
Lightweight feature point detection network with channel enhancement	38
Emerging image generation with flexible control of perceived difficulty	38
3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications	37
Modality adaptation via feature difference learning for depth human parsing	36
QB-MOTR: A simple query bootstrapping end-to-end multi-object tracking method with transformer	36

Siamese self-supervised learning for fine-grained visual classification	35
Robust Teacher: Self-correcting pseudo-label-guided semi-supervised learning for object detection	35
REST: A resolution preserving network for photorealistic style transfer via semantic distillation	35
Adaptive CNN filter pruning using global importance metric	34
RelFormer: Advancing contextual relations for transformer-based dense captioning	34
PConvSRGAN: Real-world super-resolution reconstruction with pure convolutional networks	33
Embedding AI ethics into the design and use of computer vision technology for consumer’s behaviour understanding	32
3D object feature extraction and classification using 3D MF-DFA	30
Editorial Board	29
Editorial Board	28
SIERRA: A robust bilateral feature upsampler for dense prediction	27
CCNeXt: An effective self-supervised stereo depth estimation approach	26
View-aligned pixel-level feature aggregation for 3D shape classification	26
Syntactically and semantically enhanced captioning network via hybrid attention and POS tagging prompt	26
A lightweight and robust framework for small object detection in UAV imagery	25
Implicit and explicit commonsense for multi-sentence video captioning	25
Hierarchical contrastive distillation: Bridging multi-level semantics for enhanced knowledge transfer	25
Hi-ROS: Open-source multi-camera sensor fusion for real-time people tracking	25
Feature preserving 3D mesh denoising with a Dense Local Graph Neural Network	24
SDC-Net: A novel selective dilated convolution network for medical images segmentation	23
Towards efficient image and video style transfer via distillation and learnable feature transformation	23
Reverse Stable Diffusion: What prompt was used to generate this image?	23
Attribute-guided Relevance Propagation for interpreting image classifier based on Deep Neural Networks	22
Improved Short-term Dense Bottleneck network for efficient scene analysis	22
GaitBranch: A multi-branch refinement model combined with frame-channel attention mechanism for gait recognition	22
Iterative Caption Generation with Heuristic Guidance for enhancing knowledge-based visual question answering	22
Pseudo initialization based Few-Shot Class Incremental Learning	21
UniMultNet: Action recognition method based on multi-scale feature fusion and video-text constraint guidance	21
Other tokens matter: Exploring global and local features of Vision Transformers for Object Re-Identification	21
When super-resolution meets camouflaged object detection: A comparison study	21
An efficient direct solution of the perspective-three-point problem	20
Editorial Board	20
Lv-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors	20
Editorial Board	20
Unsupervised real image super-resolution via knowledge distillation network	20
Learning spectral transform for 3D human motion prediction	20
TFUT: Task fusion upward transformer model for multi-task learning on dense prediction	19
Enhanced dual contrast representation learning with cell separation and merging for breast cancer diagnosis	19
Dynamic deep multi-label image data augmentation based on self-paced learning	19
Uncertainty estimation using boundary prediction for medical image super-resolution	19
Self-supervised network for low-light traffic image enhancement based on deep noise and artifacts removal	19
Lightning fast video anomaly detection via multi-scale adversarial distillation	18
BARD: A Basketball Action Recognition Dataset for multi-label classification	18
LARKED:A lightweight and reliable keypoint detection method for feature matching	18
Enhancing feature representation in siamese networks for object tracking with ranking-based loss	18
Extensions in channel and class dimensions for attention-based knowledge distillation	18
Real-time distributed video analytics for privacy-aware person search	17
M3	17
Multi-dimensional attention-aided transposed ConvBiLSTM network for hyperspectral image super-resolution	17
A multi-view-CNN framework for deep representation learning in image classification	17

Ensemble learning-based method for maritime background subtraction in open sea environments	17
SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection	17
Continuous fake media detection: Adapting deepfake detectors to new generative techniques	17
Few-shot Medical Image Segmentation via Boundary-extended Prototypes and Momentum Inference	16
Global key knowledge distillation framework	16
Scribble-based complementary graph reasoning network for weakly supervised salient object detection	15
CTM: Cross-time temporal module for fine-grained action recognition	15
Sketch-based 3D shape retrieval via teacher–student learning	15
Casting a BAIT for offline and online source-free domain adaptation	15
MOSAIC: A multi-view 2.5D organ slice selector with cross-attentional reasoning for anatomically-aware CT localization in medical organ segmentation	15
A dynamic hybrid network with attention and mamba for image captioning	15
Hexagonal mesh-based neural rendering for real-time rendering and fast reconstruction	15
SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video	15
Multi-view cognition with path search for one-shot part labeling	15
A robust kinship verification scheme using face age transformation	15
Editorial Board	14
Indoor UAV navigation using event cameras and intermediate frame reconstruction	14
Semantic-driven diffusion for sign language production with gloss-pose latent spaces alignment	14
Deep parametric Retinex decomposition model for low-light image enhancement	14
Transformed ROIs for capturing visual transformations in videos	14
Statistical-driven adaptive data augmentation for single-domain generalized object detection	14
SPSC-Net: Shared parallel space-channel attention mechanism transformer network for cell sequence image segmentation	14
TCLR: Temporal contrastive learning for video representation	14
3D Pose Nowcasting: Forecast the future to improve the present	13
Editorial Board	13
MASK_LOSS guided non-end-to-end image denoising network based on multi-attention module with bias rectified linear unit and absolute pooling unit	13
Extending class activation mapping using Gaussian receptive field	13
Semi-supervised Cycle-GAN for face photo-sketch translation in the wild	13
BasicTAD: An astounding RGB-Only baseline for temporal action detection	13
Attention-induced semantic and boundary interaction network for camouflaged object detection	13
MLGPnet: Multi-granularity neural network for 3D shape recognition using pyramid data	13
α-EGAN:	13
Quantifying model uncertainty for semantic segmentation of Fluorine-19 MRI using stochastic gradient MCMC	13
The shading isophotes: Model and methods for Lambertian planes and a point light	13
Real-time fusion of stereo vision and hyperspectral imaging for objective decision support during surgery	13
Combinational sign language recognition	13
XLITE-Unet: Extremely Light and Efficient Deep learning architecture with selective atrous and axial depthwise convolution for image segmentation	13
Semantic manipulation through the lens of Geometric Algebra	12
Space–time recurrent memory network	12
For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives	12
High-speed autonomous flight and obstacle avoidance for quadrotors in unknown dynamic environments based on imitation learning	12
Accurate depth image generation via overfit training of point cloud registration using local frame sets	12
DM-Align: Leveraging the power of natural language instructions to make changes to images	12
A LLM-guided hybrid Mamba-Transformer architecture for part-to-whole motion synthesis	12
Tensor robust PCA with nonconvex and nonlocal regularization	12
Biometric technology roadmapping for personalized augmentative and alternative communication	12
To make yourself invisible with Adversarial Semantic Contours	12
Multiscale Spatio-Temporal Fusion Network for video dehazing	12
Rethink arbitrary style transfer with transformer and contrastive learning	12
EADA: Efficient adaptive data augmentation	12
Learning representational invariances for data-efficient action recognition	12
Feature-aligned distillation for dense object detection via refined semantic guidance and distribution consistency	11
A multi camera unsupervised domain adaptation pipeline for object detection in cultural sites through adversarial learning and self-training	11
Dual cross-enhancement network for highly accurate dichotomous image segmentation	11
FAR-AMTN: Attention Multi-Task Network for Face Attribute Recognition	11
GAN inversion via cross-domain feature fusion and invertibility decomposition	11
Distributed multi-target tracking and active perception with mobile camera networks	11
4DHumanOutfit: A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements	11
Edge-aware graph reasoning network for image manipulation localization	11
STURE: Spatial–Temporal Mutual Representation Learning for robust data association in online multi-object tracking	11
Survey on fast dense video segmentation techniques	11
An effective CNN and Transformer fusion network for camouflaged object detection	11
HFINet: Hybrid Feature Integration for enhancing collaborative camouflaged object detection	11
Robust attention ranking architecture with frequency-domain transform to defend against adversarial samples	11
Deep learning-based estimation of whole-body kinematics from multi-view images	11
Comprehensive regional guidance for attention map semantics in text-to-image diffusion models	11
Local Consistency Guidance: Personalized Stylization Method of Face Video	11
GSNNet: Group semantic-guided neighbor interaction network for co-salient object detection	11
Generalized prompt-driven zero-shot domain adaptive segmentation with feature rectification and semantic modulation	11
LightSOD: Towards lightweight and efficient network for salient object detection	11
FAM: Improving columnar vision transformer with feature attention mechanism	11
Discriminative object tracking by domain contrast	10
EPDiff: Enhancing Prior-guided Diffusion model for Real-world Image Super-Resolution	10
UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation	10
Real-world efficient fall detection: Balancing performance and complexity with FDGA workflow	10
Self-supervision & meta-learning for one-shot unsupervised cross-domain detection	10
LocoGAN — Locally convolutional GAN	10
MFCT: Multi-Frequency Cascade Transformers for no-reference SR-IQA	10
VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection	10
On the coherency of quantitative evaluation of visual explanations	10

Minimum error adaptive RGB calibration in a context of colorimetric uncertainty for cultural heritage preservation	10
Semantically accurate super-resolution Generative Adversarial Networks	10
Editorial Board	10
EFSCNN: Encoded Feature Sphere Convolution Neural Network for fast non-rigid 3D models classification and retrieval	10
GradPaint: Gradient-guided inpainting with diffusion models	10
Learning rotation equivalent scene representation from instance-level semantics: A novel top-down perspective	10
Bi-granularity balance learning for long-tailed image classification	10
Certifiable algorithms for the two-view planar triangulation problem	10
Editorial Board	10
Certifiable planar relative pose estimation with gravity prior	9
MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming	9
Constituent Attention for Vision Transformers	9
AWADA: Foreground-focused adversarial learning for cross-domain object detection	9
Bidirectional brain image translation using transfer learning from generic pre-trained models	9
Constructing adaptive spatial-frequency interactive network with bi-directional adapter for generalizable face forgery detection	9
Evaluating the effect of image quantity on Gaussian Splatting: A statistical perspective	9
Object re-identification via spatial–temporal fusion networks and causal identity matching	9
Exploring joint embedding predictive architectures for pretraining convolutional neural networks	9
Joint coupled dictionaries-based visible-infrared image fusion method via texture preservation structure in sparse domain	9
Adaptive gradients and weight projection based on quantized neural networks for efficient image classification	9
An image denoising method based on the nonlinear Schrödinger equation and spectral subband decomposition	9
S2DNet: A self-supervised deraining network using monocular videos	9
Exploring black-box adversarial attacks on Interpretable Deep Learning Systems	9
Periocular biometrics and its relevance to partially masked faces: A survey	9
Lifelong visible–infrared person re-identification via replay samples domain-modality-mix reconstruction and cross-domain cognitive network	9
Adversarial Style Mixup and Improved Temporal Alignment for Cross-Domain Few-Shot Action Recognition	9
Adaptive semantic guidance network for video captioning	9
An efficient three-stage network via Multi-Scale Orthogonal Complementary Transformer for low-light image enhancement	9
Incorporating degradation estimation in light field spatial super-resolution	9
Editorial Board	9
Editorial Board	9
Editorial Board	9
Context perturbation: A Consistent alignment approach for Domain Adaptive Semantic Segmentation	9
MDC-Net: Multi-domain constrained kernel estimation network for blind image super resolution	9
Underwater image quality evaluation via deep meta-learning: Dataset and objective method	9
View consistency aware holistic triangulation for 3D human pose estimation	9
BiPG-FER: Bi-intelligence probabilistic graph for facial expression inference drived by action units	9
Once Upon a Goal: Towards orientation-based shot metrics in football	8
Channel-aware feature mining network for Visible–Infrared Person Re-identification	8
Sparse graph matching network for temporal language localization in videos	8
Simultaneous image denoising and completion through convolutional sparse representation and nonlocal self-similarity	8
Bypass network for semantics driven image paragraph captioning	8
Lightweight cross-modal transformer for RGB-D salient object detection	8
A closer look at branch classifiers of multi-exit architectures	8
CMGNet: Collaborative multi-modal graph network for video captioning	8
Adaptive feature denoising based deep convolutional network for single image super-resolution	8
Multi-person 3D pose estimation from a single image captured by a fisheye camera	8
Human skeletons and change detection for efficient violence detection in surveillance videos	8
Progressive multi-scale fusion network for RGB-D salient object detection	8
Leaf cultivar identification via prototype-enhanced learning	8
Learning key lines for multi-object tracking	8
Distribution-aware contrastive learning for domain adaptation in 3D LiDAR segmentation	8
Editorial Board	8
Made-In: An immersive human-in-the-loop analytics platform for enhancing creative processes in fashion	8
TEMSA:Text enhanced modal representation learning for multimodal sentiment analysis	8
Self-supervised vision transformers for semantic segmentation	8
SASFNet: Soft-edge awareness and spatial-attention feedback deep network for blind image deblurring	8
Disentangled generation network for enlarged license plate recognition and a unified dataset	8
Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition	8
: Localized text prompt refinement for zero-shot referring image segmentation	8
Blur aware metric depth estimation with multi-focus plenoptic cameras	8
Cascading attention enhancement network for RGB-D indoor scene segmentation	8
Dual adversarial model: Exploring low-dimensional space features for point clouds generating and completing	8
MAL-Net: Multiscale Attention Link Network for accurate eye center detection	8
Modality mixer exploiting complementary information for multi-modal action recognition	8
Opti-CAM: Optimizing saliency maps for interpretability	8
NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters	8
Multimodal transformer–diffusion framework for large-scale reconstruction of soccer tracking data	8
AnomalySD: One-for-all few-shot anomaly detection via pre-trained diffusion models	8
OVGrasp: Open-Vocabulary Intent Detection for Grasping Assistance using ExoGlove	8
STARS: Semantics-Aware Text-guided Aerial Image Refinement and Synthesis	7
AuxFlow: Anchor-grounded homography estimation through flow-guided auxiliary points for Soccer field registration and player localization	7
MuRE: Multi-Relationship Encoder for 3D human pose estimation	7
Slope-Track: Multiple Object Tracking on Ski Slopes	7
Editorial Board	7
FDPAdapter : Adapting segment anything in challenging vision tasks via frequency-domain priors	7
Time-archival camera virtualization for sports and visual performances	7
Continual learning on 3D point clouds with random compressed rehearsal	7
Style transfer with diffusion models for synthetic-to-real domain adaptation	7
Discriminative semantic transitive consistency for cross-modal learning	7
Local to global purification strategy to realize collaborative camouflaged object detection	7
Improving rare relation inferring for scene graph generation using bipartite graph network	7
RSTC: Residual Swin Transformer Cascade to approximate Taylor expansion for image denoising	7
CLIP-driven fine-grained mining for text-based person search	7
Text-Aided Domain Adaptation for CLIP-like models and application to challenging domain shifts	7
A survey on class-agnostic counting: Advancements from reference-based to open-world text-guided approaches	7
Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes	7
Editorial Board	7
Brain tumor image segmentation based on shuffle transformer-dynamic convolution and inception dilated convolution	7
A vector quantized masked autoencoder for audiovisual speech emotion recognition	7
Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation	7
Spatial constraint for efficient semi-supervised video object segmentation	7
A real-time image super-resolution model based on U-shaped deep feature extraction module	7
Multimodal vs. unimodal approaches to uncertainty in 3D image segmentation under distribution shifts	7
Invisible backdoor attack with attention and steganography	7