OOIR: Observatory of International Research

Papers

(The TQCC of Multimedia Systems is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-12-01 to 2025-12-01.)

Article	Citations
A visual question answering model based on image captioning	127
Unsupervised deep metric learning algorithm for crop disease images based on knowledge distillation networks	92
Pseudo-global strategy-based visual comfort assessment considering attention mechanism	88
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy	83
A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network	76
Real emotion seeker: recalibrating annotation for facial expression recognition	71
A comparative study of color quantization methods using various image quality assessment indices	70
BENet: bi-directional enhanced network for image captioning	67
Correction: STASiamRPN: visual tracking based on spatiotemporal and attention	58
Towards domain adaptation underwater image enhancement and restoration	49
Dual-branch spectral–spatial feature extraction network for multispectral image compression	47
Face and voice cross-modal association with learning convex feature embedding	43
ConASD: Contrastive Few Shot Learning for Detecting Autism Spectrum Disorder via Eye Tracking Scanpath	42
LMFE-RDD: a road damage detector with a lightweight multi-feature extraction network	41
Feature fusion and optimization integrated refined deep residual network for diabetic retinopathy severity classification using fundus image	41
360° video quality assessment based on saliency-guided viewport extraction	40
SFRA: spatial fusion regression augmentation network for facial landmark detection	39
SEMNet: a simple and efficient MLP-based network for 3D Face point clouds landmarks localization	38
Model-based portrait video compression with spatial constraint and adaptive pose processing	35
Improving text-image cross-modal retrieval with contrastive loss	34
Segmentation-aware image super-resolution with generative adversarial networks	30
CAPNet: tomato leaf disease detection network based on adaptive feature fusion and convolutional enhancement	30
SS-YOLOv8: small-size object detection algorithm based on improved YOLOv8 for UAV imagery	29
Dual convolutional neural network with attention for image blind denoising	29
SFFN-YOLO for small object detection in aerial images	28

CHCoT-MSLU: a coupled hierarchical chain-of-thought prompt learning model for multi-intent spoken language understanding	28
DiffRA: universal restorative adversarial attack based on diffusion model	27
The segmented UEC Food-100 dataset with benchmark experiment on food detection	26
GVA: guided visual attention approach for automatic image caption generation	26
On-line monitoring of structural performance of scraper conveyor driven by digital twin	26
Design and realization of pulse-controlled multi-memristor Hopfield neural networks and their applications in information encryption	24
Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach	23
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model	23
Automatic lymph node segmentation using deep parallel squeeze & excitation and attention Unet	23
User authentication method based on keystroke dynamics and mouse dynamics using HDA	22
Multi-view Isolated sign language recognition based on cross-view and multi-level transformer	22
Multi-level sentiment-aware clustering for denoising in multimodal sentiment analysis with ASR errors	21
EDB-Diff: a EdgeDevice based diffusion network for brain tumor image segmentation	21
Optimizing codebook training through control chart analysis	20
RGB-Net: transformer-based lightweight low-light image enhancement network via RGB channel separation	20
Spatial interpolation of head-related transfer functions using a physics-informed autoencoder	20
Weakly supervised anomaly detection with multi-level contextual modeling	18
DMFTNet: dense multimodal fusion transfer network for free-space detection	18
Fast bilateral filter with spatial subsampling	18
SoftBinReduce: data reduction for color quantization through soft binning	18
Overcoming the practical restrictions in H.266/VVC-based video communication systems by a PI bit rate controller	18
TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction	18
Design and evaluation of a serious game in virtual reality to increase empathy towards students with phonological dyslexia	17
CGMAformer: CNN and gated multi axial-sparse transformer feature fusion network for image deraining	17
Enhanced target recognition and localization using binocular vision and infrared thermal imaging	17
Multi-view region proposal network predictive learning for tracking	17
Big-LITTLE-Net: a dual-branch network for small UAV detection	17
An automatic music generation method based on RSCLN_Transformer network	17
A verifiable variable threshold visual image secret sharing scheme	16
A multi-label classification method combined with texture enhancement for deepfake face detection	16
Similarity-guided contrastive learning for deep multi-view clustering	16
Inter-class distance enhanced prototypical network for few-shot text classification	16
Exploiting local detail in single image super-resolution via hypergraph convolution	16
Efficient and self-adaptive rationale knowledge base for visual commonsense reasoning	16
A survey of multimodal federated learning: background, applications, and perspectives	16
RefinerHash: a new hashing-based re-ranking technique for image retrieval	15
GL-MambaNet: Mamba-based global and local feature fusion for image dehazing	15
CR-DM: A novel craniofacial reconstruction framework based on diffusion model	15
DAFMixerSR: a lightweight fusion-enhanced adaptive perception network for image super-resolution	15
Workpiece tracking based on improved SiamFC++ and virtual dataset	15
Speech-driven talking face video generation	15
Game and reference: efficient policy making for epidemic prevention and control	15
CCM-Net: image splicing localization network based on context-aware and cross-domain multi-scale fusion	15
Enhanced 3D reconstruction with all-neighbor-first philosophy and Ricci flow-based mesh smoothing approach	15
Bcgn: BLIP-based cross-modal grasping network for language-conditioned robotic grasping	15
A deep learning-based framework for detecting COVID-19 patients using chest X-rays	15
Pull and concentrate: improving unsupervised semantic segmentation adaptation with cross- and intra-domain consistencies	15
Double-scale similarity with rich features for cross-modal retrieval	14
Cross-modality geometry-guided historical momentum learning for coupled noisy visible-infrared re-identification	14
EDCM-EA: event prediction based on event development context mining considering event arguments	14

Graph contrastive learning for recommendation with generative data augmentation	14
A plug-and-play image enhancement model for end-to-end object detection in low-light condition	14
Unsupervised cross-database micro-expression recognition based on distribution adaptation	13
A CNN-transformer hybrid network with selective fusion and dual attention for image super-resolution	13
NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism	13
Wireless multipath video transmission: when IoT video applications meet networking—a survey	13
PDSRN: a progressive distillation network for generalizable single image super-resolution	13
3D human pose estimation method based on multi-constrained dilated convolutions	13
Skeleton-based human activity recognition with wifi CSI using a hybrid approach combining convolutional neural network and long short term memory	13
LAM-YOLOv11 for UAV transmission line inspection: overcoming environmental challenges with enhanced detection efficiency	12
RMVAE: one-class classification via divergence regularization and maximization mutual information	12
Occluded scene text detection via context-awareness from sketch-level image representations	12
Automated brain tumor malignancy detection via 3D MRI using adaptive-3-D U-Net and heuristic-based deep neural network	12
Enhancing long-tailed classification via multi-strategy weighted experts with hybrid distillation	12
UAPT: an underwater acoustic target recognition method based on pre-trained Transformer	12
LPR: learning point-level temporal action localization through re-training	12
MCLSC-Fusion: a multi-scale cross-modality long-short connection fusion network for infrared and visible images	12
Depth alignment interaction network for camouflaged object detection	12
Multimodal-enhanced hierarchical attention network for video captioning	12
Computer-aided diagnosis for early detection and staging of human pancreatic tumors using an optimized 3D CNN on computed tomography	12
PCAF: UAV scenarios detector via pyramid converge-and-assign fusion network	12
Object detection of mural images based on improved YOLOv8	12
Gicnet: global information capture network for visual place recognition	11
Developing novel video coding model using modified dual-tree wavelet-based multi-resolution technique	11
Style matching CAPTCHA: match neural transferred styles to thwart intelligent attacks	11
A comprehensive survey on human pose estimation approaches	11
HandO: a hybrid 3D hand–object reconstruction model for unknown objects	11
Tb-mmrd: transformer-based multi-modal election rumor detection with agreement-aware gating and semantic fusion	11
Scd-yolo: a novel object detection method for efficient road crack detection	11
Unsupervised knowledge representation of panoramic dental X-ray images using SVG image-and-object clustering	11
Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning	11
Synthetic shadows: the interplay of forensic detection and anti-forensic techniques in GAN-generated images	11
CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrence	11
Learning shared features from specific and ambiguous descriptions for text-based person search	11
Smartphone-based gait recognition using convolutional neural networks and dual-tree complex wavelet transform	11
3D model watermarking using surface integrals of generated random vector fields	10
Facial action unit detection with emotion consistency: a cross-modal learning approach	10
Polarity-aware attention network for image sentiment analysis	10
COVID-SegNet: encoder–decoder-based architecture for COVID-19 lesion segmentation in chest X-ray	10
Text-centered cross-sample fusion network for multimodal sentiment analysis	10
Deepfake detection of occluded images using a patch-based approach	10
HSGNet: hierarchically stacked graph network with attention mechanism for 3D human pose estimation	10
Multi-level fine-grained center calibration network for unsupervised person re-identification	10
Non-convex fractional-order TV model for image inpainting	10
Remote sensing image cloud removal based on multi-scale spatial information perception	10
You watch once more: a more effective CNN architecture for video spatio-temporal action localization	10
Detecting offensive language on instagram with a combined approach of the Gray Wolf algorithm and deep learning networks	10
Asymmetric exponential loss function for crack segmentation	10
Overcomplete-to-sparse representation learning for few-shot class-incremental learning	10
Bag of states: a non-sequential approach to video-based engagement measurement	10
SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering	10
Unsupervised single-image dehazing via self-guided inverse-retinex GAN	9
Dual-stream progressive neural network based on cross fusion in image manipulation localization	9
User quality of experience estimation using social network analysis	9
VCounselor: a psychological intervention chat agent based on a knowledge-enhanced large language model	9
ST-GRU: spatiotemporal gated recurrent unit for video prediction	9
Student engagement detection in online environment using computer vision and multi-dimensional feature fusion	9
Tex-Net: texture-based parallel branch cross-attention generalized robust Deepfake detector	9
3D human pose estimation with multi-hypotheses gated transformer	9
Compact twice fusion network for edge detection	9
Dual-guided multi-modal bias removal strategy for temporal sentence grounding in video	9
CBLC-SOOD: contrastive background and label correction for semi-supervised oriented object detection	9
Gender estimation based on deep learned and handcrafted features in an uncontrolled environment	9
Same-clothes person re-identification with dual-stream network	9
STSD: spatial–temporal semantic decomposition transformer for skeleton-based action recognition	9
Reducing blind spots in esophagogastroduodenoscopy examinations using a novel deep learning model	9
GloFP-MSF: monocular scene flow estimation with global feature perception	9
Hfffap-net: unsupervised fundus image enhancement with high-frequency feature fusion and artifact processing	9
A Three-stage multimodal emotion recognition network based on text low-rank fusion	9
Face attribute recognition via end-to-end weakly supervised regional location	9
Lightweight super-resolution via multi-group window self-attention and residual blueprint separable convolution	9
ReDiT: re-evaluating large visual question answering model confidence by defining input scenario difficulty and applying temperature mapping	9
$$\hbox {DA}^2$$Net: a dual attention-aware network for robust crowd counting	9
Dual-visual collaborative enhanced transformer for image captioning	9
Learning unified anchor graph based on affinity relationships with strong consensus for multi-view spectral clustering	9
Local discriminative graph convolutional networks for text classification	9
Adversarial training in logit space against tiny perturbations	8
Indirect visual–semantic alignment for generalized zero-shot recognition	8
Multimodal large language model enhancement network for multimodal sentiment analysis	8
Dual attention transformer with adaptive frequency enhancement for real-world Chinese–English scene text image super-resolution	8

EA-EDNet: encapsulated attention encoder-decoder network for 3D reconstruction in low-light-level environment	8
GCMR-Net: A Global Context-Enhanced Multi-scale Residual Network for medical image segmentation	8
DS-Diff: a dual-stage network with degradation-aware and semantic-aware for adverse weather removal based on diffusion models	8
Generating generalized zero-shot learning based on dual-path feature enhancement	8
Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction	8
Teaching authentic sign language through multiple representation learning	8
Fine-tuning CLIP for difference-guided composed image retrieval	8
Fast-colorfool: faster and more transferable semantic adversarial attack with complementary colors and cumulative perturbation	8
EfficientFace: an efficient deep network with feature enhancement for accurate face detection	8
Estimating visibility via differential regression network	8
Gmd: Gaussian mixture descriptor for pair matching of 3D fragments	8
Hierarchical segmentation for traditional cultural pattern based on iterative compression and clustering	8
Selecting generated synthetic features using clustering algorithm for generalized zero-shot learning	8
Hybrid embedding for multimodal few-frame action recognition	8
MGSAN: multimodal graph self-attention network for skeleton-based action recognition	8
Special issue on data-driven personalisation of television content	8
Music genre classification based on auditory image, spectral and acoustic features	7
Link prediction in social networks using hyper-motif representation on hypergraph	7
Role of deep learning models and analytics in industrial multimedia environment	7
CAFIN: cross-attention based face image repair network	7
WFIL-NET: image inpainting based on wavelet downsampling and frequency integrated learning module	7
Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS	7
Lightweight dual-path octave generative adversarial networks for few-shot image generation	7
Multi-granular dynamic interaction network for multimodal sarcasm detection	7
Recognition of miner action and violation behavior based on the ANODE-GCN model	7
Propagating prior information with transformer for robust visual object tracking	7
Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes	7
Map modeling for full body gesture using flex sensor and machine learning algorithms	7
Exploring multi-dimensional interests for session-based recommendation	7
DRL-based transmission control for QoE guaranteed transmission efficiency optimization in tile-based panoramic video streaming	7
Channel modulus normalization for CNN image classification	7
YOLO-ERF: lightweight object detector for UAV aerial images	7
Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN	7
Prior-based bi-encoder transformer for underwater image enhancement	7
Dual-stream network with cross-layer attention and similarity constraint for micro-expression recognition	7
Kronecker-factored Approximate Curvature with adaptive learning rate for optimizing model-agnostic meta-learning	7
Adaptafood: an intelligent system to adapt recipes to specialised diets and healthy lifestyles	7
CAPTCHA farm detection and user authentication via mouse-trajectory similarity measurement	7
ASFESRN: bridging the gap in real-time corn leaf disease detection with image super-resolution	7
KECAN: knowledge-enhanced cross-modal alignment network for ophthalmic report generation	7
PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention	7
HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection	7
LCFormer: linear complexity transformer for efficient image super-resolution	6
A multi-level feature weight fusion model for salient object detection	6
Locally controllable network based on visual–linguistic relation alignment for text-to-image generation	6
Mmy-net: a multimodal network exploiting image and patient metadata for simultaneous segmentation and diagnosis	6
Collaborative point cloud geometry compression for both human vision and machine vision	6
Personalized time-sync comment generation based on a multimodal transformer	6
Editorial note for few-shot learning for intelligent multimedia systems	6
LET-Net: locally enhanced transformer network for medical image segmentation	6
Adp-clf: adaptive dual-perception contrastive learning for gastrointestinal endoscopic image classification	6
PS-YOLO: a small object detector based on efficient convolution and multi-scale feature fusion	6
SR-DAYOLOv8: cross-domain adaptive object detection based on super-resolution domain classifier	6
Mpv-pcqa: multimodal no-reference point cloud quality assessment via point cloud and captured dynamic video	6
IOPCNet: inner and outer point classification based low overlap rate local-to-global point cloud registration	6
Full reference image quality assessment based on dual-space multi-feature fusion	6
An efficient federated learning method based on enhanced classification-GAN for medical image classification	6
A MADDPG-based multi-agent antagonistic algorithm for sea battlefield confrontation	6
Learning effective embedding for automated COVID-19 prediction from chest X-ray images	6
GCIF: graph based cross-modal information fusion for conversational emotion recognition	6
Prior tissue knowledge-driven contrastive learning for brain CT report generation	6
A multi-scale channel attention network with federated learning for magnetic resonance image super-resolution	6
Rescue decision via Earthquake Disaster Knowledge Graph reasoning	6
Spatial attention-guided deformable fusion network for salient object detection	6
A weakly supervised pavement crack segmentation based on adversarial learning and transformers	6
Image and audio caps: automated captioning of background sounds and images using deep learning	6
TrafficTrack: rethinking the motion and appearance cue for multi-vehicle tracking in traffic monitoring	6
Msfusenet: a multi-stage information fusion network for multi-modal skin lesion diagnosis	6
Irregular feature enhancer for low-dose CT denoising	6
An adaptive Bagging algorithm based on lightweight transformer for multi-class imbalance recognition	6
From coarse to fine: a two-stage common semantic space construction for unpaired cross modal retrieval	6
A multi-scale feature fusion spatial–channel attention model for background subtraction	6
A two-stage forgery detection and localization framework based on feature classification and similarity metric	6
DATaR: Depth Augmented Target Redetection using Kernelized Correlation Filter	6
A cross-view geo-localization method guided by relation-aware global attention	6
A multi-scale no-reference video quality assessment method based on transformer	6
Multiscale geometric window transformer for orthodontic teeth point cloud registration	6
ITrans: generative image inpainting with transformers	6
Layer-wise enhanced transformer with multi-modal fusion for image caption	6
TSGFormer: temporal-aware network and spatial encoding GCN for three-dimensional human pose estimation	6
Wavelet guided real time detection transformer with sparse attention	6
Composite makeup transfer model based on generative adversarial networks	6
A prompt-based dual-layer cross-modal distillation learning method for aspect-based sentiment analysis	6
Collaborative multi-knowledge distillation under the influence of softmax regression representation	5
PillarVTP: vehicle trajectory prediction method based on local point cloud aggregation and receptive field expansion	5
Breast density measurement methods on mammograms: a review	5
Weighted sparse gradient reconstruction model with a robust fidelity for edge-aware image smoothing	5
Identification of haploid and diploid maize seeds using hybrid transformer model	5
DHRA-UNet: a lightweight SLM powder-spreading defect image segmentation algorithm	5
Edge-preserving image denoising using noise-enhanced patch-based non-local means	5
Adaptive region assisted GAN for image steganography	5
Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method	5
AFEV-INet: adaptive feature extraction variational interactive network for remote sensing image denoising	5
View adjustment: helping users improve photographic composition	5
IS-DGM: an improved steganography method based on a deep generative model and hyper logistic map encryption via social media networks	5