IEEE Transactions on Parallel and Distributed Systems

Papers
(The median citation count of IEEE Transactions on Parallel and Distributed Systems is 5. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
2020 Reviewers List300
Enabling Large Scale Simulations for Particle Accelerators264
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Tsinghua University219
Building High-throughput Neural Architecture Search Workflows via a Decoupled Fitness Prediction Engine203
Jdebug: A Fast, Non-Intrusive and Scalable Fault Locating Tool for Ten-Million-Scale Parallel Applications196
EdgeTB: A Hybrid Testbed for Distributed Machine Learning at the Edge With High Fidelity159
Design and Implementation of 2D Convolution on x86/x64 Processors154
Replicated Versioned Data Structures for Wide-Area Distributed Systems149
STR: Hybrid Tensor Re-Generation to Break Memory Wall for DNN Training128
A Point Cloud Video Recognition Acceleration Framework Based on Tempo-Spatial Information128
An Efficient Bottleneck Planes Exclusion Method for Reconfiguring 3D VLSI Arrays124
HRCM: A Hierarchical Regularizing Mechanism for Sparse and Imbalanced Communication in Whole Human Brain Simulations113
H5Intent: Autotuning HDF5 With User Intent105
Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach102
A Memory-Constraint-Aware List Scheduling Algorithm for Memory-Constraint Heterogeneous Muti-Processor System97
On the Message Complexity of Fault-Tolerant Computation: Leader Election and Agreement93
QoS-Aware Scheduling of Remote Rendering for Interactive Multimedia Applications in Edge Computing91
GeoScale: Microservice Autoscaling With Cost Budget in Geo-Distributed Edge Clouds87
IPPTS: An Efficient Algorithm for Scientific Workflow Scheduling in Heterogeneous Computing Systems85
AW B +-Tree: a Novel Width-based Index Structure Supporting Hybrid Matching for Large-scale Content-based Pub/Sub Systems81
Joint Task Scheduling and Containerizing for Efficient Edge Computing80
Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization80
Coflow Scheduling in Data Centers: Routing and Bandwidth Allocation77
Federated Learning With Nesterov Accelerated Gradient77
A Case for Pricing Bandwidth: Sharing Datacenter Networks With Cost Dominant Fairness76
Multi-Swarm Co-Evolution Based Hybrid Intelligent Optimization for Bi-Objective Multi-Workflow Scheduling in the Cloud76
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From University of Washington74
Accelerating Data Delivery of Latency-Sensitive Applications in Container Overlay Network74
Graph-Centric Performance Analysis for Large-Scale Parallel Applications72
Securing Fine-Grained Data Sharing and Erasure in Outsourced Storage Systems71
Agile Cache Replacement in Edge Computing via Offline-Online Deep Reinforcement Learning69
Simple, Fast and Widely Applicable Concurrent Memory Reclamation via Neutralization66
LB-Chain: Load-Balanced and Low-Latency Blockchain Sharding via Account Migration64
A Pessimistic Fault Diagnosability of Large-Scale Connected Networks via Extra Connectivity64
GreenFlow: A Carbon-Efficient Scheduler for Deep Learning Workloads62
Joint Model Pruning and Topology Construction for Accelerating Decentralized Machine Learning62
Efficient and Automated Deployment Architecture for OpenStack in TianHe SuperComputing Environment62
DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV62
High-Level Data Abstraction and Elastic Data Caching for Data-Intensive AI Applications on Cloud-Native Platforms61
A Novel Parallel Algorithm for Sparse Tensor Matrix Chain Multiplication via TCU-Acceleration60
Asynchronous Algorithms for Decentralized Resource Allocation Over Directed Networks59
Improving the Scalability of GPU Synchronization Primitives59
Burst Load Evacuation Based on Dispatching and Scheduling In Distributed Edge Networks59
BARM: A Batch-Aware Resource Manager for Boosting Multiple Neural Networks Inference on GPUs With Memory Oversubscription58
Tag-Sharer-Fusion Directory: A Scalable Coherence Directory With Flexible Entry Formats57
Improved MPC Algorithms for Edit Distance and Ulam Distance57
Error-Compensated Sparsification for Communication-Efficient Decentralized Training in Edge Environment56
Coordinating Fast Concurrency Adapting With Autoscaling for SLO-Oriented Web Applications55
AESM2 Attribute-Based Encrypted Search for Multi-Owner and Multi-User Distributed Systems54
Efficient Distributed Approaches to Core Maintenance on Large Dynamic Graphs53
Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator Compilation53
Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation52
vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training52
Rings for Privacy: An Architecture for Large Scale Privacy-Preserving Data Mining48
Hierarchical Federated Learning With Momentum Acceleration in Multi-Tier Networks48
Trusted Model Aggregation With Zero-Knowledge Proofs in Federated Learning48
Fine-Grained Performance and Cost Modeling and Optimization for FaaS Applications48
Libfork: Portable Continuation-Stealing With Stackless Coroutines48
Distributed and Dynamic Service Placement in Pervasive Edge Computing Networks47
Analysis of Global and Local Synchronization in Parallel Computing46
Two-Timescale Joint Optimization of Task Scheduling and Resource Scaling in Multi-Data Center System Based on Multi-Agent Deep Reinforcement Learning46
Silhouette: Efficient Cloud Configuration Exploration for Large-Scale Analytics46
Identifying Degree and Sources of Non-Determinism in MPI Applications Via Graph Kernels46
AIDTN: Towards a Real-Time AI Optimized DTN System With NVMeoF45
Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS Targets45
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication45
HashCache: Accelerating Serverless Computing by Skipping Duplicated Function Execution45
Scalable, Confidential and Survivable Software Updates44
Guest Editorial44
SSRAID: A Stripe-Queued and Stripe-Threaded Merging I/O Strategy to Improve Write Performance of Serial Interface SSD RAID44
A Runtime and Non-Intrusive Approach to Optimize EDP by Tuning Threads and CPU Frequency for OpenMP Applications42
iBalancer: Load-Aware in-Server Flow Scheduling for Sub-Millisecond Tail Latency41
Decentralised Data Quality Control in Ground Truth Production for Autonomic Decisions41
Congestion Control for Datacenter Networks: A Control-Theoretic Approach41
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From the University of Texas at Austin41
A Survey of Storage Systems in the RDMA Era40
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra40
2024 Reviewers List*40
Efficient Methods for Mapping Neural Machine Translator on FPGAs38
HybRAID: A High-Performance Hybrid RAID Storage Architecture for Write-Intensive Applications in All-Flash Storage Systems37
Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap37
From Deterioration to Acceleration: A Calibration Approach to Rehabilitating Step Asynchronism in Federated Optimization37
FedVeca: Federated Vectorized Averaging on Non-IID Data With Adaptive Bi-Directional Global Objective36
HSA-Net: Hidden-State-Aware Networks for High-Precision QoS Prediction35
TensorOpt: Exploring the Tradeoffs in Distributed DNN Training With Auto-Parallelism34
DePo: Dynamically Offload Expensive Event Processing to the Edge of Cyber-Physical Systems34
Joint Coverage-Reliability for Budgeted Edge Application Deployment in Mobile Edge Computing Environment33
Optimal Convex Hull Formation on a Grid by Asynchronous Robots With Lights33
SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join Processor32
Liberator: A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs32
Bandwidth-Aware Scheduling Repair Techniques in Erasure-Coded Clusters: Design and Analysis32
GML: Efficiently Auto-Tuning Flink's Configurations Via Guided Machine Learning32
Critique of “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From Tsinghua University32
Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output Activation32
MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging31
HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge31
Cost-Efficient Server Configuration and Placement for Mobile Edge Computing31
EESaver: Saving Energy Dynamically for Green Multi-Access Edge Computing31
An Efficient Algorithm for Hamiltonian Path Embedding of $k$-Ary $n$-Cubes under the Partitioned Edge Fault Model30
Blockchain Assisted Decentralized Federated Learning (BLADE-FL): Performance Analysis and Resource Allocation30
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems30
Deep Reinforcement Learning for Load-Balancing Aware Network Control in IoT Edge Systems29
A Framework for Mapping DRL Algorithms With Prioritized Replay Buffer Onto Heterogeneous Platforms29
CIA: A Collaborative Integrity Auditing Scheme for Cloud Data With Multi-Replica on Multi-Cloud Storage Providers29
Energy-Aware Non-Preemptive Task Scheduling With Deadline Constraint in DVFS-Enabled Heterogeneous Clusters29
VCSR: An Efficient GPU Memory-Aware Sparse Format29
Static Algorithm Allocation with Duplication in Robotic Network Cloud Systems29
Accelerated Information Dissemination for Replica Selection in Distributed Key-Value Store Systems28
Understanding the Impact of Data Staging for Coupled Scientific Workflows28
Design and Implementation of a Criticality- and Heterogeneity-Aware Runtime System for Task-Parallel Applications28
Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems28
Optimizing Error-Bounded Lossy Compression for Scientific Data With Diverse Constraints28
Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions27
Cost-Efficient Workflow Scheduling Algorithm for Applications With Deadline Constraint on Heterogeneous Clouds27
Timed Loops for Distributed Storage in Wireless Networks27
gIM: GPU Accelerated RIS-Based Influence Maximization Algorithm27
Predicting Throughput of Distributed Stochastic Gradient Descent26
Revisiting PM-Based B-Tree With Persistent CPU Cache26
A Practical Framework for Secure Document Retrieval in Encrypted Cloud File Systems26
NetSHa: In-Network Acceleration of LSH-Based Distributed Search26
Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs26
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Clemson University26
A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training26
Parallel and Distributed Bayesian Network Structure Learning25
Distributed Evolution Strategies With Multi-Level Learning for Large-Scale Black-Box Optimization25
Efficient Function Queryable and Privacy Preserving Data Aggregation Scheme in Smart Grid25
Improving Fairness for SSD Devices through DRAM Over-Provisioning Cache Management25
Necessary Feasibility Analysis for Mixed-Criticality Real-Time Embedded Systems25
Dynamic GPU Energy Optimization for Machine Learning Training Workloads24
A High-Throughput FPGA Accelerator for Short-Read Mapping of the Whole Human Genome24
HI-Kyber: A Novel High-Performance Implementation Scheme of Kyber Based on GPU24
A GPU Acceleration Framework for Motif and Discord Based Pattern Mining24
FedICT: Federated Multi-Task Distillation for Multi-Access Edge Computing24
Distributed Adaptive Consensus Tracking Control for Multi-Agent System With Communication Constraints24
LOCUS: User-Perceived Delay-Aware Service Placement and User Allocation in MEC Environment23
Microservice Deployment in Edge Computing Based on Deep Q Learning23
Graphite: Hardware-Aware GNN Reshaping for Acceleration With GPU Tensor Cores23
P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs23
Deadline and Reliability Aware Multiserver Configuration Optimization for Maximizing Profit23
Toward Load-Balanced Redundancy Transitioning for Erasure-Coded Storage23
MRCN: Throughput-Oriented Multicast Routing for Customized Network-on-Chips22
CERT-DF: A Computing-Efficient and Robust Distributed Deep Forest Framework With Low Communication Overhead22
Learning to Schedule Multi-Server Jobs With Fluctuated Processing Speeds22
Content Collaborative Caching Strategy in the Edge Maintenance of Communication Network: A Joint Download Delay and Energy Consumption Method22
Online Pricing and Trading of Private Data in Correlated Queries22
COFFEE: Cross-Layer Optimization for Fast and Efficient Executions of Sinkhorn-Knopp Algorithm on HPC Systems22
Subutai: Speeding Up Legacy Parallel Applications Through Data Synchronization22
Taking Advantage of the Mistakes: Rethinking Clustered Federated Learning for IoT Anomaly Detection22
Monte: SFCs Migration Scheme in the Distributed Programmable Data Plane22
UFC2: User-Friendly Collaborative Cloud22
A Machine-Learning-Based Framework for Productive Locality Exploitation22
ETICA: Efficient Two-Level I/O Caching Architecture for Virtualized Platforms21
Increasing the Efficiency of Massively Parallel Sparse Matrix-Matrix Multiplication in First-Principles Calculation on the New-Generation Sunway Supercomputer21
Propagation Pattern for Moment Representation of the Lattice Boltzmann Method21
On the Analysis of Cache Invalidation With LRU Replacement21
RADAR: A Skew-Resistant and Hotness-Aware Ordered Index Design for Processing-in-Memory Systems21
Parallel Multi Objective Shortest Path Update Algorithm in Large Dynamic Networks21
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning21
Estuary: A Low Cross-Shard Blockchain Sharding Protocol Based on State Splitting21
Accelerating Bayesian Neural Networks via Algorithmic and Hardware Optimizations21
Distributed Approaches to Butterfly Analysis on Large Dynamic Bipartite Graphs20
Dap-FL: Federated Learning Flourishes by Adaptive Tuning and Secure Aggregation20
Distributed Task Migration Optimization in MEC by Extending Multi-Agent Deep Reinforcement Learning Approach20
On Mixing Eventual and Strong Consistency: Acute Cloud Types20
Cost-Effective Server Deployment for Multi-Access Edge Networks: A Cooperative Scheme20
The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and Availability20
YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve20
Optimizing DNN Compilation for Distributed Training With Joint OP and Tensor Fusion20
VQL: Efficient and Verifiable Cloud Query Services for Blockchain Systems20
On Model Transmission Strategies in Federated Learning With Lossy Communications20
SmartTuning: Selecting Hyper-Parameters of a ConvNet System for Fast Training and Small Working Memory19
Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution19
Collaboration in Federated Learning With Differential Privacy: A Stackelberg Game Analysis19
APQ: Automated DNN Pruning and Quantization for ReRAM-Based Accelerators19
Scaling Poisson Solvers on Many Cores via MMEwald19
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From Peking University19
Critique of “A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery” by SCC Team From ShanghaiTech University19
LOFS: A Lightweight Online File Storage Strategy for Effective Data Deduplication at Network Edge19
IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous Computing19
RLPTO: A Reinforcement Learning-Based Performance-Time Optimized Task and Resource Scheduling Mechanism for Distributed Machine Learning18
Gamora: Learning-Based Buffer-Aware Preloading for Adaptive Short Video Streaming18
Floating Point Calculation of the Cube Function on FPGAs18
Guest Editorial:Special Section on SC22 Student Cluster Competition18
PaVM: A Parallel Virtual Machine for Smart Contract Execution and Validation18
Accelerating Content-Defined Chunking for Data Deduplication Based on Speculative Jump18
FedTune-SGM: A Stackelberg-Driven Personalized Federated Learning Strategy for Edge Networks18
GPABE: GPU-Based Parallelization Framework for Attribute-Based Encryption Schemes18
CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAs18
Optimizing Network Transfers for Data Analytic Jobs Across Geo-Distributed Datacenters18
Accurate Differentially Private Deep Learning on the Edge18
Co-Concurrency Mechanism for Multi-GPUs in Distributed Heterogeneous Environments17
Efficient Virtual Network Embedding of Cloud-Based Data Center Networks into Optical Networks17
Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading17
SelectiveEC: Towards Balanced Recovery Load on Erasure-Coded Storage Systems17
VeriML: Enabling Integrity Assurances and Fair Payments for Machine Learning as a Service17
DELICIOUS: Deadline-Aware Approximate Computing in Cache-Conscious Multicore17
Adaptive Vertical Federated Learning on Unbalanced Features17
CNNPC: End-Edge-Cloud Collaborative CNN Inference With Joint Model Partition and Compression17
Auto-GNAS: A Parallel Graph Neural Architecture Search Framework17
AdaptChain: Adaptive Scaling Blockchain With Transaction Deduplication17
An Unequal Caching Strategy for Shared-Memory Graph Analytics16
Redundancy-Free and Load-Balanced TGNN Training With Hierarchical Pipeline Parallelism16
Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From ETH Zürich16
Shuffle Differential Private Data Aggregation for Random Population16
PhaST: Hierarchical Concurrent Log-Free Skip List for Persistent Memory16
Reliability-Aware Multi-Objective Memetic Algorithm for Workflow Scheduling Problem in Multi-Cloud System16
Cooperative Scheduling Schemes for Explainable DNN Acceleration in Satellite Image Analysis and Retraining16
Near-Lossless MPI Tracing and Proxy Application Autogeneration16
FLUPS - A Flexible and Performant Massively Parallel Fourier Transform Library16
Privacy Preserving Task Push in Spatial Crowdsourcing With Unknown Popularity16
Mobility-Aware Offloading and Resource Allocation for Distributed Services Collaboration16
Online Elastic Resource Provisioning With QoS Guarantee in Container-Based Cloud Computing16
Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility16
The Doctrine of MEAN: Realizing Deduplication Storage at Unreliable Edge15
Synergistically Rebalancing the EDP of Container-Based Parallel Applications15
Multi-Tier GPU Virtualization for Deep Learning in Cloud-Edge Systems15
Beyond Belady to Attain a Seemingly Unattainable Byte Miss Ratio for Content Delivery Networks15
Faster-BNI: Fast Parallel Exact Inference on Bayesian Networks15
MoltDB: Accelerating Blockchain via Ancient State Segregation15
Frequency-Domain Inference Acceleration for Convolutional Neural Networks Using ReRAMs15
Harnessing the Potential of Function-Reuse in Multimedia Cloud Systems15
CPLNS: Cooperative Parallel Large Neighborhood Search for Large-Scale Multi-Agent Path Finding15
Loci: Federated Continual Learning of Heterogeneous Tasks at Edge15
Task Placement and Resource Allocation for Edge Machine Learning: A GNN-Based Multi-Agent Reinforcement Learning Paradigm15
Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data Centers15
TODG: Distributed Task Offloading With Delay Guarantees for Edge Computing15
Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value15
FedMDS: An Efficient Model Discrepancy-Aware Semi-Asynchronous Clustered Federated Learning Framework15
Accelerating Communication-Efficient Federated Multi-Task Learning With Personalization and Fairness15
Toward Materials Genome Big-Data: A Blockchain-Based Secure Storage and Efficient Retrieval Method15
SLO-Aware Function Placement for Serverless Workflows With Layer-Wise Memory Sharing15
Accelerating Restarted GMRES With Mixed Precision Arithmetic14
Guest Editorial14
Coordinated Batching and DVFS for DNN Inference on GPU Accelerators14
FEUAGame: Fairness-Aware Edge User Allocation for App Vendors14
A Distributed Network-Based Runtime Verification of Full Regular Temporal Properties14
Detailed Modeling of Heterogeneous and Contention-Constrained Point-to-Point MPI Communication14
μBench: An Open-Source Factory of Benchmark Microservice Applications14
Transformations of High-Level Synthesis Codes for High-Performance Computing14
Practical Cloud-Edge Scheduling for Large-Scale Crowdsourced Live Streaming14
Evaluating Data Redistribution in PaRSEC14
A Practical and Efficient Bidirectional Access Control Scheme for Cloud-Edge Data Sharing14
Real-Time Scheduling of Parallel Task Graphs With Critical Sections Across Different Vertices13
Synchronize Only the Immature Parameters: Communication-Efficient Federated Learning By Freezing Parameters Adaptively13
e-PoS: Making Proof-of-Stake Decentralized and Fair13
Revenue Maximizing Online Service Function Chain Deployment in Multi-Tier Computing Network13
Reversible CSP Computations13
Highly Accurate Clock Synchronization With Drift Correction for the Controller Area Network13
Landlord: Coordinating Dynamic Software Environments to Reduce Container Sprawl13
High-Throughput GPU Implementation of Dilithium Post-Quantum Digital Signature13
0.1616370677948