Parallel Computing

Papers
(The TQCC of Parallel Computing is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
NekRS, a GPU-accelerated spectral element Navier–Stokes solver49
Porting WarpX to GPU-accelerated platforms30
Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms24
OpenMP application experiences: Porting to accelerated nodes21
Toward performance-portable PETSc for GPU-based exascale systems21
Parallel and scalable Dunn Index for the validation of big data clusters19
SVM-SMO-SGD: A hybrid-parallel support vector machine algorithm using sequential minimal optimization with stochastic gradient descent18
A novel hybrid heuristic-based list scheduling algorithm in heterogeneous cloud computing environment for makespan optimization18
GPU algorithms for Efficient Exascale Discretizations17
Linear solvers for power grid optimization problems: A review of GPU-accelerated linear solvers17
Benchmarking the performance of irregular computations in AutoDock-GPU molecular docking16
A thread-adaptive sparse approximate inverse preconditioning algorithm on multi-GPUs16
High performance sparse multifrontal solvers on modern GPUs12
Multiscale modeling and cinematic visualization of photosynthetic energy conversion processes from electronic to cell scales12
Implementation and evaluation of MPI 4.0 partitioned communication libraries12
Porting hypre to heterogeneous computer architectures: Strategies and experiences12
Enabling GPU accelerated computing in the SUNDIALS time integration library12
Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance11
Dynamic power management for value-oriented schedulers in power-constrained HPC system10
On revisiting energy and performance in microservices applications: A cloud elasticity-driven approach10
HBPFP-DC: A parallel frequent itemset mining using Spark10
A new scalable distributed k-means algorithm based on Cloud micro-services for High-performance computing9
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers9
Measurement and analysis of GPU-accelerated applications with HPCToolkit9
Achieving performance portability in Gaussian basis set density functional theory on accelerator based architectures in NWChemEx9
Ginkgo—A math library designed for platform portability8
Towards performance portability in the Spark astrophysical magnetohydrodynamics solver in the Flash-X simulation framework7
Exploring GPU acceleration of Deep Neural Networks using Block Circulant Matrices7
Callback-based completion notification using MPI Continuations7
Scalable communication for high-order stencil computations using CUDA-aware MPI7
Low-synch Gram–Schmidt with delayed reorthogonalization for Krylov solvers6
Using long vector extensions for MPI reductions6
CCF: An efficient SpMV storage format for AVX512 platforms6
A computational-graph partitioning method for training memory-constrained DNNs6
ImRP: A Predictive Partition Method for Data Skew Alleviation in Spark Streaming Environment6
An international survey on MPI users6
Optimizing small channel 3D convolution on GPU with tensor core6
Graph optimization algorithm for low-latency interconnection networks6
Asynchronous parallel stochastic Quasi-Newton methods5
NVIDIA IndeX accelerated computing for visualizing Cholla's galactic winds5
Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems5
Collectives in hybrid MPI+MPI code: Design, practice and performance5
A case study on parallel HDF5 dataset concatenation for high energy physics data analysis5
Optimal task scheduling for partially heterogeneous systems5
Towards scaling community detection on distributed-memory heterogeneous systems5
Improving the I/O of large geophysical models using PnetCDF and BeeGFS5
GPU acceleration of Levenshtein distance computation between long strings5
Operational Data Analytics in practice: Experiences from design to deployment in production HPC environments4
OpenCL-like offloading with metaprogramming for SX-Aurora TSUBASA4
MPI detach — Towards automatic asynchronous local completion4
Parallelization of network motif discovery using star contraction4
GPU accelerated parallel reliability-guided digital volume correlation with automatic seed selection based on 3D SIFT4
Accelerated molecular dynamics simulation of Silicon Crystals on TaihuLight using OpenACC4
Parallel branch and bound algorithm for solving integer linear programming models derived from behavioral synthesis4
An optimisation of allreduce communication in message-passing systems4
Speedup vs. quality: Asynchronous and cluster-based distributed adaptive genetic algorithms for ordered problems4
Minimizing development costs for efficient many-core visualization using MCD34
Immortal rays: Rethinking random ray neutron transport on GPU architectures4
A parallel strategy for density functional theory computations on accelerated nodes4
HySet: A hybrid framework for exact set similarity join using a GPU4
Performance portability through machine learning guided kernel selection in SYCL libraries4
Parallel graph coloring algorithms for distributed GPU environments4
Improved probabilistic I/O scheduling for limited-size Burst-Buffers deployed HPC4
0.062371969223022