IEEE Transactions on Parallel and Distributed Systems

Papers
(The H4-Index of IEEE Transactions on Parallel and Distributed Systems is 51. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)
ArticleCitations
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Tsinghua University206
Enabling Large Scale Simulations for Particle Accelerators196
Design and Implementation of 2D Convolution on x86/x64 Processors146
Online Container Caching for IoT Data Processing in Serverless Edge Computing142
Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach136
EdgeTB: A Hybrid Testbed for Distributed Machine Learning at the Edge With High Fidelity130
H5Intent: Autotuning HDF5 With User Intent125
AWB+-Tree: A Novel Width-Based Index Structure Supporting Hybrid Matching for Large-Scale Content-Based Pub/Sub Systems115
fPIM: A Holistic Design to Optimize PIM Data Flow for High Execution Efficiency115
Fully Decentralized Data Distribution for Large-Scale HPC Systems113
Mapping Large-Scale Spiking Neural Network on Arbitrary Meshed Neuromorphic Hardware111
QoS-Aware Scheduling of Remote Rendering for Interactive Multimedia Applications in Edge Computing107
STR: Hybrid Tensor Re-Generation to Break Memory Wall for DNN Training106
Replicated Versioned Data Structures for Wide-Area Distributed Systems102
mtGEMM: An Efficient GEMM Library for Modern Multi-Core DSPs95
Jdebug: A Fast, Non-Intrusive and Scalable Fault Locating Tool for Ten-Million-Scale Parallel Applications94
Large-Scale Neural Network Quantum States Calculation for Quantum Chemistry on a New Sunway Supercomputer91
An Efficient Bottleneck Planes Exclusion Method for Reconfiguring 3D VLSI Arrays88
A Point Cloud Video Recognition Acceleration Framework Based on Tempo-Spatial Information88
IRHunter: Universal Detection of Instruction Reordering Vulnerabilities for Enhanced Concurrency in Distributed and Parallel Systems87
HRCM: A Hierarchical Regularizing Mechanism for Sparse and Imbalanced Communication in Whole Human Brain Simulations85
GeoScale: Microservice Autoscaling With Cost Budget in Geo-Distributed Edge Clouds84
UniOrch: A Unified Mixed Framework for High-Efficiency LLM Training on Heterogeneous AI Chips82
Bal-DGCN: A Hardware Acceleration Framework for Balanced Computational Efficiency in DGCNs80
Optimizing Data Locality by Integrating Intermediate Data Partitioning and Reduce Task Scheduling in Spark Framework77
Federated Learning With Nesterov Accelerated Gradient75
A Memory-Constraint-Aware List Scheduling Algorithm for Memory-Constraint Heterogeneous Muti-Processor System71
On the Message Complexity of Fault-Tolerant Computation: Leader Election and Agreement71
ComStar: Compression-Aware Stream Query for Heterogeneous Hybrid Architecture69
RHINO: An Efficient Serverless Container System for Small-Scale HPC Applications69
DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV67
HarmonyCache: Scalable In-Network Cache With Read-Write Separation67
On the Performance of SMASH: A Non-Preemptive Window-Based Scheduler for Multiserver Jobs66
PHIDE: A Parallel Hybrid Direct–Iterative Eigensolver for Hermitian Eigenvalue Problems65
Simple, Fast and Widely Applicable Concurrent Memory Reclamation via Neutralization64
Accelerating Data Delivery of Latency-Sensitive Applications in Container Overlay Network64
Securing Fine-Grained Data Sharing and Erasure in Outsourced Storage Systems63
Asynchronous Algorithms for Decentralized Resource Allocation Over Directed Networks62
Graph-Centric Performance Analysis for Large-Scale Parallel Applications62
BARM: A Batch-Aware Resource Manager for Boosting Multiple Neural Networks Inference on GPUs With Memory Oversubscription61
Joint Model Pruning and Topology Construction for Accelerating Decentralized Machine Learning60
A Novel Parallel Algorithm for Sparse Tensor Matrix Chain Multiplication via TCU-Acceleration60
Tag-Sharer-Fusion Directory: A Scalable Coherence Directory With Flexible Entry Formats60
Agile Cache Replacement in Edge Computing via Offline-Online Deep Reinforcement Learning58
CiMBA: Accelerating Genome Sequencing Through On-Device Basecalling via Compute-in-Memory58
Efficient and Automated Deployment Architecture for OpenStack in TianHe SuperComputing Environment57
Scalable Hybrid Learning Techniques for Scientific Data Compression57
Building Accurate and Interpretable Online Classifiers on Edge Devices55
Cannikin: No Lagger of SLO in Concurrent Multiple LoRA LLM Serving55
Coordinating Fast Concurrency Adapting With Autoscaling for SLO-Oriented Web Applications54
GreenFlow: A Carbon-Efficient Scheduler for Deep Learning Workloads53
Improving the Scalability of GPU Synchronization Primitives51
High-Level Data Abstraction and Elastic Data Caching for Data-Intensive AI Applications on Cloud-Native Platforms51
0.10537195205688