International Journal of Multimedia Information Retrieval

Papers
(The median citation count of International Journal of Multimedia Information Retrieval is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
A voting-based novel spatio-temporal fusion framework for video saliency using transfer learning mechanism329
DAABNet: depth-wise asymmetric attention bottleneck for real-time semantic segmentation87
How can users’ comments posted on social media videos be a source of effective tags?44
Editorial: web of science and scopus impact in IJMIR44
Detecting abnormal behavior in megastore for crime prevention using a deep neural architecture37
Multimodal music datasets? Challenges and future goals in music processing25
VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias24
Enhancing the performance of 3D auto-correlation gradient features in depth action classification21
Style-aware adversarial pairwise ranking for image recommendation systems21
Stratified Graph Indexing for efficient search in deep descriptor databases17
Mual: enhancing multimodal sentiment analysis with cross-modal attention and difference loss15
End-to-end residual learning-based deep neural network model deployment for human activity recognition14
Visual and semantic ensemble for scene text recognition with gated dual mutual attention13
Similar interior coordination image retrieval with multi-view features12
Towards a high robust neural network via feature matching11
Reinforcement learning applied to machine vision: state of the art11
Your heart rate betrays you: multimodal learning with spatio-temporal fusion networks for micro-expression recognition10
Correction to: Different techniques for Alzheimer’s disease classification using brain images: a study10
How does a kernel based on gradients of infinite-width neural networks come to be widely used: a review of the neural tangent kernel9
Gender classification from face images using central difference convolutional networks9
An interactive attribute-preserving fashion recommendation with 3D image-based virtual try-on9
LG-MLFormer: local and global MLP for image captioning9
Improving skeleton-based action recognition with interactive object information8
Recent trends in recommender systems: a survey8
Advancements in machine learning techniques for threat item detection in X-ray images: a comprehensive survey8
CAMIR: fine-tuning CLIP and multi-head cross-attention mechanism for multimodal image retrieval with sketch and text features8
RGBD deep multi-scale network for background subtraction8
Optimized RT-DETR for accurate and efficient video object detection via decoupled feature aggregation8
Neural style transfer generative adversarial network (NST-GAN) for facial expression recognition8
Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey8
Video anomaly detection with memory-guided multilevel embedding7
A review on deep learning in medical image analysis7
Ornament image retrieval using few-shot learning7
Caption TLSTMs: combining transformer with LSTMs for image captioning7
Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval7
Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking7
3D skeleton-based human motion prediction using spatial–temporal graph convolutional network6
State of art and emerging trends on group recommender system: a comprehensive review6
Dual-matrix guided reconstruction hashing for unsupervised cross-modal retrieval6
Multiple feedback based adversarial collaborative filtering with aesthetics6
Counterfactual attribute-based visual explanations for classification5
A novel method for video shot boundary detection using CNN-LSTM approach5
DAF-Net: dense attention feature pyramid network for multiscale object detection5
Incremental image retrieval method based on feature perception and deep hashing5
Multi-class imbalanced image classification using conditioned GANs5
LSECA: local semantic enhancement and cross aggregation for video-text retrieval5
Generative adversarial networks for 2D-based CNN pose-invariant face recognition5
TCKGE: Transformers with contrastive learning for knowledge graph embedding5
Cross-domain image retrieval: methods and applications5
Few2Decide: towards a robust model via using few neuron connections to decide4
Tri-RAT: optimizing the attention scores for image captioning4
Multimodal news analytics using measures of cross-modal entity and context consistency4
Strengthening attention: knowledge distillation via cross-layer feature fusion for image classification4
A literature review and perspectives in deepfakes: generation, detection, and applications4
Semantic-enhanced discriminative embedding learning for cross-modal retrieval3
A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios3
Bridging language to visuals: towards natural language query-to-chart image retrieval3
Special issue on cross-modal retrieval and analysis3
Few-shot and meta-learning methods for image understanding: a survey3
A fast and robust affine-invariant method for shape registration under partial occlusion3
A order-based content-based information retrieval system proposal applied in 3D meshes3
Parameter-efficient tuning of cross-modal retrieval for a specific database via trainable textual and visual prompts3
An emotion-driven, transformer-based network for multimodal fake news detection3
Enhancing deep learning image classification using data augmentation and genetic algorithm-based optimization3
Who is gambling? Finding cryptocurrency gamblers using multi-modal retrieval methods2
Joint multi-scale information and long-range dependence for video captioning2
Augmented inputs for surveillance re-identification2
DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images2
Unsupervised graph reasoning distillation hashing for multimodal hamming space search with vision-language model2
Cross-modal retrieval based on shared proxies2
Visual feature segmentation with reinforcement learning for continuous sign language recognition2
A spatiotemporal bidirectional network for video salient object detection using multiscale transfer learning2
Video deblurring and flow-guided feature aggregation for obstacle detection in agricultural videos2
An improved customized CNN model for adaptive recognition of cerebral palsy people’s handwritten digits in assessment2
Dual-feature collaborative relation-attention networks for visual question answering2
FDAM: full-dimension attention module for deep convolutional neural networks2
Medical image watermarking: a survey on applications, approach and performance requirement compliance1
VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds1
DBTSF-VSOD: a decision-based two-stage framework for video salient object detection1
SPSD: Similarity-preserving self-distillation for video–text retrieval1
Multimodal image and audio music transcription1
Human action recognition using an optical flow-gated recurrent neural network1
Domain-specific image captioning: a comprehensive review1
Optimized MobileNet + SSD: a real-time pedestrian detection on a low-end edge device1
Organ segmentation from computed tomography images using the 3D convolutional neural network: a systematic review1
Prototype local–global alignment network for image–text retrieval1
A comprehensive survey of multimodal fake news detection techniques: advances, challenges, and opportunities1
Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges1
Modal interaction-enhanced prompt learning by transformer decoder for vision-language models1
ConvST-LSTM-Net: convolutional spatiotemporal LSTM networks for skeleton-based human action recognition1
CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval1
MFAFD: a few-shot learning method for cascading models with parameter free attention and finite discrete space1
Different techniques for Alzheimer’s disease classification using brain images: a study1
A new CNN-based semantic object segmentation for autonomous vehicles in urban traffic scenes1
PDS-Net: A novel point and depth-wise separable convolution for real-time object detection1
Deep multiple aggregation networks for action recognition1
Editorial for the ICMR 2020 special issue1
MemeTector: enforcing deep focus for meme detection1
Recognition of student engagement in classroom from affective states1
STCA: an action recognition network with spatio-temporal convolution and attention1
Cluster-guided temporal modeling for action recognition1
A unified approach of detecting misleading images via tracing its instances on web and analyzing its past context for the verification of multimedia content1
0.031155824661255