Parallel Computing

Papers
(The TQCC of Parallel Computing is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-02-01 to 2025-02-01.)
ArticleCitations
Immortal rays: Rethinking random ray neutron transport on GPU architectures49
Heterogeneous sparse matrix–vector multiplication via compressed sparse row format30
ESA: An efficient sequence alignment algorithm for biological database search on Sunway TaihuLight24
Porting hypre to heterogeneous computer architectures: Strategies and experiences21
Optimal task scheduling for partially heterogeneous systems21
Parallel multi-view HEVC for heterogeneously embedded cluster system19
Optimizing convolutional neural networks on multi-core vector accelerator19
Compiler-assisted, adaptive runtime system for the support of OpenMP in embedded multicores18
Special issue of Selected Papers from EuroMPI/USA 202018
Editorial Board18
Task-parallel tiled direct solver for dense symmetric indefinite systems17
Spatial-aware data partition for distributed memory parallelization of ANN search in multimedia retrieval16
Efficient parallel reduction of bandwidth for symmetric matrices16
GPU accelerated parallel reliability-guided digital volume correlation with automatic seed selection based on 3D SIFT12
Editorial Board12
Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems12
Editorial Board12
Parallel and scalable Dunn Index for the validation of big data clusters11
Porting WarpX to GPU-accelerated platforms10
Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance10
Program partitioning and deadlock analysis for MPI based on logical clocks9
MPI collective communication through a single set of interfaces: A case for orthogonality9
Parallel Pattern Compiler for Automatic Global Optimizations9
Editorial Board9
Multiscale modeling and cinematic visualization of photosynthetic energy conversion processes from electronic to cell scales8
Optimizing small channel 3D convolution on GPU with tensor core7
Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS7
Block red–black MILU(0) preconditioner with relaxation on GPU7
Using long vector extensions for MPI reductions6
Improving the I/O of large geophysical models using PnetCDF and BeeGFS6
Editorial Board6
Big data BPMN workflow resource optimization in the cloud6
Editorial Board6
Integrating FPGA-based hardware acceleration with relational databases6
Enabling GPU accelerated computing in the SUNDIALS time integration library6
Visualizing the world’s largest turbulence simulation6
Fast calculation of isostatic compensation correction using the GPU-parallel prism method5
Federated learning based modulation classification for multipath channels5
PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters5
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU5
A parallel non-convex approximation framework for risk parity portfolio design5
Editorial Board5
Editorial Board4
NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers4
Evaluating adaptive and predictive power management strategies for optimizing visualization performance on supercomputers4
Parallel FFT algorithms for high-order approximations on three-dimensional compact stencils4
The BondMachine, a moldable computer architecture4
A scalable algorithm for the optimization of neural network architectures4
A case study on parallel HDF5 dataset concatenation for high energy physics data analysis4
Editorial Board4
Measurement and analysis of GPU-accelerated applications with HPCToolkit4
Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures4
Parallel Fast Multipole Method accelerated FFT on HPC clusters4
A coarse-grained multicomputer parallel algorithm for the sequential substring constrained longest common subsequence problem4
Computational records with aging hardware: Controlling half the output of SHA-2564
Editorial Board4
0.82075214385986