Parallel Computing

Papers
(The TQCC of Parallel Computing is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-03-01 to 2024-03-01.)
ArticleCitations
NekRS, a GPU-accelerated spectral element Navier–Stokes solver38
Porting WarpX to GPU-accelerated platforms25
Programming languages for data-Intensive HPC applications: A systematic mapping study18
Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms17
Parallel and scalable Dunn Index for the validation of big data clusters16
A thread-adaptive sparse approximate inverse preconditioning algorithm on multi-GPUs16
SVM-SMO-SGD: A hybrid-parallel support vector machine algorithm using sequential minimal optimization with stochastic gradient descent15
OpenMP application experiences: Porting to accelerated nodes15
Benchmarking the performance of irregular computations in AutoDock-GPU molecular docking14
Toward performance-portable PETSc for GPU-based exascale systems14
A novel hybrid heuristic-based list scheduling algorithm in heterogeneous cloud computing environment for makespan optimization12
GPU-based parallel multi-objective particle swarm optimization for large swarms and high dimensional problems11
LU-Cholesky QR algorithms for thin QR decomposition11
GPU algorithms for Efficient Exascale Discretizations10
Multiscale modeling and cinematic visualization of photosynthetic energy conversion processes from electronic to cell scales10
Implementation and evaluation of MPI 4.0 partitioned communication libraries10
Enabling GPU accelerated computing in the SUNDIALS time integration library10
Dynamic power management for value-oriented schedulers in power-constrained HPC system8
On revisiting energy and performance in microservices applications: A cloud elasticity-driven approach8
AIR: Iterative refinement acceleration using arbitrary dynamic precision8
Porting hypre to heterogeneous computer architectures: Strategies and experiences8
HBPFP-DC: A parallel frequent itemset mining using Spark8
Measurement and analysis of GPU-accelerated applications with HPCToolkit8
Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance8
High performance sparse multifrontal solvers on modern GPUs7
Linear solvers for power grid optimization problems: A review of GPU-accelerated linear solvers7
Graph optimization algorithm for low-latency interconnection networks7
A new scalable distributed k-means algorithm based on Cloud micro-services for High-performance computing7
AMG based on compatible weighted matching for GPUs7
A domain partitioning method using a multi-phase-field model for block-based AMR applications6
Towards performance portability in the Spark astrophysical magnetohydrodynamics solver in the Flash-X simulation framework6
Achieving performance portability in Gaussian basis set density functional theory on accelerator based architectures in NWChemEx6
Callback-based completion notification using MPI Continuations6
Scalable communication for high-order stencil computations using CUDA-aware MPI6
ImRP: A Predictive Partition Method for Data Skew Alleviation in Spark Streaming Environment6
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers6
Exploring GPU acceleration of Deep Neural Networks using Block Circulant Matrices6
A novel method of grouping target paths for parallel programs5
Optimizing small channel 3D convolution on GPU with tensor core5
Asynchronous parallel stochastic Quasi-Newton methods5
GPU acceleration of Levenshtein distance computation between long strings5
Ginkgo—A math library designed for platform portability5
Collectives in hybrid MPI+MPI code: Design, practice and performance5
Optimal task scheduling for partially heterogeneous systems5
An international survey on MPI users4
MPI detach — Towards automatic asynchronous local completion4
Efficient CGM-based parallel algorithms for the longest common subsequence problem with multiple substring-exclusion constraints4
Using long vector extensions for MPI reductions4
GPU accelerated parallel reliability-guided digital volume correlation with automatic seed selection based on 3D SIFT4
Speedup vs. quality: Asynchronous and cluster-based distributed adaptive genetic algorithms for ordered problems4
Low-synch Gram–Schmidt with delayed reorthogonalization for Krylov solvers4
An optimisation of allreduce communication in message-passing systems4
High performance solution of skew-symmetric eigenvalue problems with applications in solving the Bethe-Salpeter eigenvalue problem4
Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems4
Accelerated molecular dynamics simulation of Silicon Crystals on TaihuLight using OpenACC4
A computational-graph partitioning method for training memory-constrained DNNs4
Performance portability through machine learning guided kernel selection in SYCL libraries4
Minimizing development costs for efficient many-core visualization using MCD34
WITHDRAWN: Energy-Efficient Routing Technique for Wireless Sensor Networks Using Multiple Mobile Sink Nodes4
0.034092903137207