OOIR: Observatory of International Research

Papers

(The median citation count of International Journal of Multimedia Information Retrieval is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-07-01 to 2026-07-01.)

Article	Citations
Video anomaly detection with memory-guided multilevel embedding	101
Multiple object tracking under occlusions based on the stage-wise association strategy with weak cues	82
Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey	69
Recent trends in recommender systems: a survey	69
VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias	61
Strengthening attention: knowledge distillation via cross-layer feature fusion for image classification	40
Optimized data-cube search for enhanced video summarization via shot boundary detection	34
VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds	32
Enhancing Facial Beauty Prediction via a Dual-Pathway Hybrid Architecture Integrating Vmamba and ViT	32
DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images	31
CSAM: Capsule spatial attention mask network for visual question answering	30
Enhanced YOLOv10 for small object detection with context-aware and adaptive modules	29
Multi-objective reinforcement learning for recommender systems: a comprehensive survey of methods, challenges, and future directions	24
Hierarchical multi-modal fusion with vision transformers for robust action recognition in infrared-visible videos	21
Feature-NeuS: Neural Implicit Surface Reconstruction Using Feature Multi-View Consistency Constraint	21
Prototype local–global alignment network for image–text retrieval	21
MMDL: a multi-modal deep learning for video highlight detection in sports	20
Human behavior recognition based on DualBiNet model	19
Similarity-based face image retrieval using sparsely embedded deep features and binary code learning	19
FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis	19
CAMIR: fine-tuning CLIP and multi-head cross-attention mechanism for multimodal image retrieval with sketch and text features	18
Visual and semantic ensemble for scene text recognition with gated dual mutual attention	15
Multimodal music datasets? Challenges and future goals in music processing	15
An emotion-driven, transformer-based network for multimodal fake news detection	14
A Comprehensive Review of Multimodal Visual Representation Learning: Tracing the Evolution from CNNs to Transformers and Beyond	14

State of art and emerging trends on group recommender system: a comprehensive review	14
Generative adversarial networks for 2D-based CNN pose-invariant face recognition	14
DAF-Net: dense attention feature pyramid network for multiscale object detection	14
Cross-domain image retrieval: methods and applications	13
Human action recognition using an optical flow-gated recurrent neural network	13
Multi-scale object detection with feature enhancement for traffic scenes	13
MFAFD: a few-shot learning method for cascading models with parameter free attention and finite discrete space	13
Ultra fast-inference depth completion with linear attention-based cascaded hourglass network	12
Multi-view learning for camouflaged object detection with PVTv2	12
Organ segmentation from computed tomography images using the 3D convolutional neural network: a systematic review	11
Image enhancement with bi-directional normalization and color attention-guided generative adversarial networks	11
Weighted semantic feature based self-supervised deep cross-modal hashing	11
Concept-based and embedding-based models in lifelog retrieval: an empirical comparison of performance	11
Optical music recognition for homophonic scores with neural networks and synthetic music generation	10
FOF: a fine-grained object detection and feature extraction end-to-end network	10
Study of Alzheimer’s disease brain impairment and methods for its early diagnosis: a comprehensive survey	10
A Reproducibility Study of Multimodal Embeddings for Recommender Systems	10
Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval	9
Zero-shot quantization for object detection via scene-aware synthesis and instance-guided alignment	9
A voting-based novel spatio-temporal fusion framework for video saliency using transfer learning mechanism	9
Style-aware adversarial pairwise ranking for image recommendation systems	9
Enhancing multimodal recommendation via contrastive self-supervised modality-preserving learning	8
Stratified Graph Indexing for efficient search in deep descriptor databases	8
An interactive attribute-preserving fashion recommendation with 3D image-based virtual try-on	8
MCDINO: Self-supervised learning of masks based on combination of multi-path channel attention and local feature weighting	8
Improving skeleton-based action recognition with interactive object information	8
FDAM: full-dimension attention module for deep convolutional neural networks	7
ETG: the graph convolutional network was enhanced with an EA-transformer for aspect sentiment triplet extraction	7
TCKGE: Transformers with contrastive learning for knowledge graph embedding	7
A literature review and perspectives in deepfakes: generation, detection, and applications	7
Few-shot and meta-learning methods for image understanding: a survey	7
A unified approach of detecting misleading images via tracing its instances on web and analyzing its past context for the verification of multimedia content	7
Optimising few-shot class-incremental learning for fine-grained visual recognition	7
Joint multi-scale information and long-range dependence for video captioning	6
Dual-feature collaborative relation-attention networks for visual question answering	6
Who is gambling? Finding cryptocurrency gamblers using multi-modal retrieval methods	6
A hierarchical multi-modal injection architecture for synergistic music understanding and generation	5
DMFNet: geometric multi-scale pixel-level contrastive learning for video salient object detection	5
Deep multimodal learning for time series analysis in social computing: a survey	5
CoCoOpter: Pre-train, prompt, and fine-tune the vision-language model for few-shot image classification	5
Partial multimodal hashing with multi-level semantics and adversarial learning	4
Special Issue on Open-Domain Image Retrieval in the Wild	4
LG-MLFormer: local and global MLP for image captioning	4
ANROT-HELANet: adverserially and naturally robust attention-based aggregation network via the hellinger distance for few-shot classification	4
$$HF^{2}\text {-}Net$$: hybrid fine-tuning heterogeneous fusion network for visible-infrared person Re-identification	4
Image forgery classification and localization through vision transformers	4
Ornament image retrieval using few-shot learning	4
Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking	4
Enhancing action recognition via dynamic cross-frame differential modeling	4
Emotion-aware music tower blocks (EmoMTB ): an intelligent audiovisual interface for music discovery and recommendation	4

Sentiment analysis using deep learning techniques: a comprehensive review	4
Similar interior coordination image retrieval with multi-view features	4
Gender classification from face images using central difference convolutional networks	4
CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval	3
Global and local label-constrained alignment for image-text matching	3
Multi-aware coreference relation network for visual dialog	3
A survey of multimodal recommender systems: methods, challenges, and future directions	3
Dual-matrix guided reconstruction hashing for unsupervised cross-modal retrieval	3
Enhancing deep learning image classification using data augmentation and genetic algorithm-based optimization	3
A new CNN-based semantic object segmentation for autonomous vehicles in urban traffic scenes	3
Parameter-efficient tuning of cross-modal retrieval for a specific database via trainable textual and visual prompts	3
Cross-modal alignment with synthetic caption for text-based person search	3
A novel method for video shot boundary detection using CNN-LSTM approach	3
Deep multiple aggregation networks for action recognition	3
H-ARN: A holo-attentive relational network for holistic facial beauty prediction via distribution learning	3
Remote Sensing Image Change Captioning: A Comprehensive Review	3
3D skeleton-based human motion prediction using spatial–temporal graph convolutional network	3
Special issue on cross-modal retrieval and analysis	3