International Journal of Multimedia Information Retrieval

Papers
(The median citation count of International Journal of Multimedia Information Retrieval is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
Editorial: web of science and scopus impact in IJMIR408
Towards a high robust neural network via feature matching112
Video anomaly detection with memory-guided multilevel embedding56
VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias51
Recent trends in recommender systems: a survey46
Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey38
Strengthening attention: knowledge distillation via cross-layer feature fusion for image classification37
DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images26
VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds25
Enhanced YOLOv10 for small object detection with context-aware and adaptive modules24
A local representation-enhanced recurrent convolutional network for image captioning22
Prototype local–global alignment network for image–text retrieval21
Feature-NeuS: Neural Implicit Surface Reconstruction Using Feature Multi-View Consistency Constraint18
MMDL: a multi-modal deep learning for video highlight detection in sports17
Similarity-based face image retrieval using sparsely embedded deep features and binary code learning16
Visual and semantic ensemble for scene text recognition with gated dual mutual attention15
CAMIR: fine-tuning CLIP and multi-head cross-attention mechanism for multimodal image retrieval with sketch and text features15
How can users’ comments posted on social media videos be a source of effective tags?15
DC-GNN: drop channel graph neural network for object classification and part segmentation in the point cloud15
Human behavior recognition based on DualBiNet model14
FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis13
Few2Decide: towards a robust model via using few neuron connections to decide12
Semantic-enhanced discriminative embedding learning for cross-modal retrieval12
Multimodal music datasets? Challenges and future goals in music processing12
MFAFD: a few-shot learning method for cascading models with parameter free attention and finite discrete space11
State of art and emerging trends on group recommender system: a comprehensive review11
An emotion-driven, transformer-based network for multimodal fake news detection11
Generative adversarial networks for 2D-based CNN pose-invariant face recognition11
Human action recognition using an optical flow-gated recurrent neural network11
Cross-domain image retrieval: methods and applications11
DAF-Net: dense attention feature pyramid network for multiscale object detection11
Multi-view learning for camouflaged object detection with PVTv210
Weighted semantic feature based self-supervised deep cross-modal hashing10
InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection9
Image enhancement with bi-directional normalization and color attention-guided generative adversarial networks9
Concept-based and embedding-based models in lifelog retrieval: an empirical comparison of performance9
Organ segmentation from computed tomography images using the 3D convolutional neural network: a systematic review9
FOF: a fine-grained object detection and feature extraction end-to-end network8
Optical music recognition for homophonic scores with neural networks and synthetic music generation8
Multi-sensor human activity recognition using CNN and GRU8
Study of Alzheimer’s disease brain impairment and methods for its early diagnosis: a comprehensive survey8
A voting-based novel spatio-temporal fusion framework for video saliency using transfer learning mechanism7
RGBD deep multi-scale network for background subtraction7
Few-shot and meta-learning methods for image understanding: a survey6
An interactive attribute-preserving fashion recommendation with 3D image-based virtual try-on6
Stratified Graph Indexing for efficient search in deep descriptor databases6
ETG: the graph convolutional network was enhanced with an EA-transformer for aspect sentiment triplet extraction6
TCKGE: Transformers with contrastive learning for knowledge graph embedding6
MCDINO: Self-supervised learning of masks based on combination of multi-path channel attention and local feature weighting6
Style-aware adversarial pairwise ranking for image recommendation systems6
Correction to: Different techniques for Alzheimer’s disease classification using brain images: a study6
Multi-class imbalanced image classification using conditioned GANs6
Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval6
Improving skeleton-based action recognition with interactive object information6
A literature review and perspectives in deepfakes: generation, detection, and applications5
Deep multimodal learning for time series analysis in social computing: a survey5
Dual-feature collaborative relation-attention networks for visual question answering5
A unified approach of detecting misleading images via tracing its instances on web and analyzing its past context for the verification of multimedia content5
FDAM: full-dimension attention module for deep convolutional neural networks5
DMFNet: geometric multi-scale pixel-level contrastive learning for video salient object detection4
Who is gambling? Finding cryptocurrency gamblers using multi-modal retrieval methods4
Joint multi-scale information and long-range dependence for video captioning4
CoCoOpter: Pre-train, prompt, and fine-tune the vision-language model for few-shot image classification4
Similar interior coordination image retrieval with multi-view features3
Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking3
3D skeleton-based human motion prediction using spatial–temporal graph convolutional network3
Image forgery classification and localization through vision transformers3
Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown3
Emotion-aware music tower blocks (EmoMTB ): an intelligent audiovisual interface for music discovery and recommendation3
Gender classification from face images using central difference convolutional networks3
LG-MLFormer: local and global MLP for image captioning3
Special Issue on Open-Domain Image Retrieval in the Wild3
Enhancing the performance of 3D auto-correlation gradient features in depth action classification3
Ornament image retrieval using few-shot learning3
Dual-matrix guided reconstruction hashing for unsupervised cross-modal retrieval3
Anomaly detection using edge computing in video surveillance system: review3
Sentiment analysis using deep learning techniques: a comprehensive review3
Deep multiple aggregation networks for action recognition2
Cross-modal alignment with synthetic caption for text-based person search2
Deep adversarial multi-label cross-modal hashing algorithm2
Parameter-efficient tuning of cross-modal retrieval for a specific database via trainable textual and visual prompts2
Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges2
Enhancing deep learning image classification using data augmentation and genetic algorithm-based optimization2
Multi-aware coreference relation network for visual dialog2
MHA-WoML: Multi-head attention and Wasserstein-OT for few-shot learning2
Opinion convergence-based sentiment prediction of image advertisement2
A novel method for video shot boundary detection using CNN-LSTM approach2
Text detection, recognition, and script identification in natural scene images: a Review2
A fast and robust affine-invariant method for shape registration under partial occlusion2
A new CNN-based semantic object segmentation for autonomous vehicles in urban traffic scenes2
FCT: fusing CNN and transformer for scene classification2
CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval2
Music emotion recognition based on segment-level two-stage learning2
Special issue on cross-modal retrieval and analysis2
Remote Sensing Image Change Captioning: A Comprehensive Review2
0.03802490234375