Parallel Computing

Papers
(The median citation count of Parallel Computing is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-02-01 to 2025-02-01.)
ArticleCitations
Immortal rays: Rethinking random ray neutron transport on GPU architectures49
Heterogeneous sparse matrix–vector multiplication via compressed sparse row format30
ESA: An efficient sequence alignment algorithm for biological database search on Sunway TaihuLight24
Porting hypre to heterogeneous computer architectures: Strategies and experiences21
Optimal task scheduling for partially heterogeneous systems21
Parallel multi-view HEVC for heterogeneously embedded cluster system19
Optimizing convolutional neural networks on multi-core vector accelerator19
Editorial Board18
Compiler-assisted, adaptive runtime system for the support of OpenMP in embedded multicores18
Special issue of Selected Papers from EuroMPI/USA 202018
Task-parallel tiled direct solver for dense symmetric indefinite systems17
Spatial-aware data partition for distributed memory parallelization of ANN search in multimedia retrieval16
Efficient parallel reduction of bandwidth for symmetric matrices16
GPU accelerated parallel reliability-guided digital volume correlation with automatic seed selection based on 3D SIFT12
Editorial Board12
Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems12
Editorial Board12
Parallel and scalable Dunn Index for the validation of big data clusters11
Porting WarpX to GPU-accelerated platforms10
Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance10
Program partitioning and deadlock analysis for MPI based on logical clocks9
MPI collective communication through a single set of interfaces: A case for orthogonality9
Parallel Pattern Compiler for Automatic Global Optimizations9
Editorial Board9
Multiscale modeling and cinematic visualization of photosynthetic energy conversion processes from electronic to cell scales8
Optimizing small channel 3D convolution on GPU with tensor core7
Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS7
Block red–black MILU(0) preconditioner with relaxation on GPU7
Using long vector extensions for MPI reductions6
Improving the I/O of large geophysical models using PnetCDF and BeeGFS6
Editorial Board6
Big data BPMN workflow resource optimization in the cloud6
Editorial Board6
Integrating FPGA-based hardware acceleration with relational databases6
Enabling GPU accelerated computing in the SUNDIALS time integration library6
Visualizing the world’s largest turbulence simulation6
Editorial Board5
Fast calculation of isostatic compensation correction using the GPU-parallel prism method5
Federated learning based modulation classification for multipath channels5
PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters5
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU5
A parallel non-convex approximation framework for risk parity portfolio design5
Editorial Board4
Editorial Board4
NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers4
Evaluating adaptive and predictive power management strategies for optimizing visualization performance on supercomputers4
Parallel FFT algorithms for high-order approximations on three-dimensional compact stencils4
The BondMachine, a moldable computer architecture4
A scalable algorithm for the optimization of neural network architectures4
A case study on parallel HDF5 dataset concatenation for high energy physics data analysis4
Editorial Board4
Measurement and analysis of GPU-accelerated applications with HPCToolkit4
Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures4
Parallel Fast Multipole Method accelerated FFT on HPC clusters4
A coarse-grained multicomputer parallel algorithm for the sequential substring constrained longest common subsequence problem4
Computational records with aging hardware: Controlling half the output of SHA-2564
On revisiting energy and performance in microservices applications: A cloud elasticity-driven approach3
An automated OpenMP mutation testing framework for performance optimization3
NVIDIA IndeX accelerated computing for visualizing Cholla's galactic winds3
Toward performance-portable PETSc for GPU-based exascale systems3
NekRS, a GPU-accelerated spectral element Navier–Stokes solver3
A new scalable distributed k-means algorithm based on Cloud micro-services for High-performance computing3
Improved probabilistic I/O scheduling for limited-size Burst-Buffers deployed HPC3
Multi-GPU 3D k-nearest neighbors computation with application to ICP, point cloud smoothing and normals computation3
GPU acceleration of Levenshtein distance computation between long strings3
A novel hybrid heuristic-based list scheduling algorithm in heterogeneous cloud computing environment for makespan optimization3
PEAB: A pool-based distributed evolutionary algorithm model with buffer2
Editorial for parallel computing2
Linear solvers for power grid optimization problems: A review of GPU-accelerated linear solvers2
WBSP: Addressing stragglers in distributed machine learning with worker-busy synchronous parallel2
Editorial Board2
Editorial Board2
Energy-efficient scheduling algorithms based on task clustering in heterogeneous spark clusters2
Synthesis and feedback on the distribution and parallelization of FMI-CS-based co-simulations with the DACCOSIM platform2
A survey of software techniques to emulate heterogeneous memory systems in high-performance computing2
SVM-SMO-SGD: A hybrid-parallel support vector machine algorithm using sequential minimal optimization with stochastic gradient descent2
Implementation and evaluation of MPI 4.0 partitioned communication libraries2
Mobilizing underutilized storage nodes via job path: A job-aware file striping approach2
Uphill resampling for particle filter and its implementation on graphics processing unit2
Octopus-DF: Unified DataFrame-based cross-platform data analytic system2
Optimization with the OpenACC-to-FPGA framework on the Arria 10 and Stratix 10 FPGAs2
Finding inputs that trigger floating-point exceptions in heterogeneous computing via Bayesian optimization2
Using heterogeneous GPU nodes with a Cabana-based implementation of MPCD2
Spatial- and time- division multiplexing in CNN accelerator2
Tree cutting approach for domain partitioning on forest-of-octrees-based block-structured static adaptive mesh refinement with lattice Boltzmann method2
A heterogeneous processing-in-memory approach to accelerate quantum chemistry simulation2
Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA2
Multi-level parallel multi-layer block reproducible summation algorithm2
Minimizing development costs for efficient many-core visualization using MCD32
FastPTM: Fast weights loading of pre-trained models for parallel inference service provisioning1
Extending the limit of LR-TDDFT on two different approaches: Numerical algorithms and new Sunway heterogeneous supercomputer1
Benchmarking the performance of irregular computations in AutoDock-GPU molecular docking1
OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning1
A method for efficient radio astronomical data gridding on multi-core vector processor1
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms1
Optimal ATAPE task scheduling on reconfigurable and partitionable hierarchical hypercube networks1
Low consumption automatic discovery protocol for DDS-based large-scale distributed parallel computing1
Achieving performance portability in Gaussian basis set density functional theory on accelerator based architectures in NWChemEx1
OpenMP application experiences: Porting to accelerated nodes1
Editorial Board1
C-Lop: Accurate contention-based modeling of MPI concurrent communication1
Task graph-based performance analysis of parallel-in-time methods1
Improving cryptanalytic applications with stochastic runtimes on GPUs and multicores1
Towards leveraging collective performance with the support of MPI 4.0 features in MPC1
Adaptively parallel runtime verification based on distributed network for temporal properties1
Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms1
Scalable communication for high-order stencil computations using CUDA-aware MPI1
Reconfiguration algorithms for synchronous communication on switch based degradable arrays1
Parallelization of network motif discovery using star contraction1
Abstractions for C++ code optimizations in parallel high-performance applications1
An approach for low-power heterogeneous parallel implementation of ALC-PSO algorithm using OmpSs and CUDA1
Editorial on Advances in High Performance Programming1
Analyzing the impact of CUDA versions on GPU applications1
Accelerating the scheduling of the network resources of the next-generation optical data centers1
Distributed software defined network-based fog to fog collaboration scheme1
Parallel graph coloring algorithms for distributed GPU environments1
0.043668031692505