OOIR: Observatory of International Research

Papers

(The median citation count of Parallel Computing is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
Editorial Board	151
Parallel multi-view HEVC for heterogeneously embedded cluster system	45
Heterogeneous sparse matrix–vector multiplication via compressed sparse row format	36
Integrating FPGA-based hardware acceleration with relational databases	29
NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers	24
A parallel non-convex approximation framework for risk parity portfolio design	18
Editorial Board	16
Mobilizing underutilized storage nodes via job path: A job-aware file striping approach	14
LSHDP: Locally sharded heterogeneous data parallel for distributed deep learning	14
C-Lop: Accurate contention-based modeling of MPI concurrent communication	11
EESF: Energy-efficient scheduling framework for deadline-constrained workflows with computation speed estimation method in cloud	11
Evaluating SYCL as a unified programming model for heterogeneous systems	11
Distributed consensus-based estimation of the leading eigenvalue of a non-negative irreducible matrix	11
Editorial on Advances in High Performance Programming	11
Task graph-based performance analysis of parallel-in-time methods	10
Adaptively parallel runtime verification based on distributed network for temporal properties	9
OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning	7
Parallel optimization and application of unstructured sparse triangular solver on new generation of Sunway architecture	7
New YARN sharing GPU based on graphics memory granularity scheduling	7
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers	7
ShyLU-node: On-node scalable solvers and preconditioners: Recent progress and current performance	7
Editorial Board	7
Efficient parallel reduction of bandwidth for symmetric matrices	6
Special issue of Selected Papers from EuroMPI/USA 2020	6
Optimizing convolutional neural networks on multi-core vector accelerator	6

ParVoro++: A scalable parallel algorithm for constructing 3D Voronoi tessellations based on kd-tree decomposition	6
Editorial Board	5
Multi-level parallelism optimization for two-dimensional convolution vectorization method on multi-core vector accelerator	5
PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters	5
Using Java to create and analyze models of parallel computing systems	5
GPU acceleration of Levenshtein distance computation between long strings	5
Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS	5
Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA	5
A survey of software techniques to emulate heterogeneous memory systems in high-performance computing	4
Byzantine-tolerant detection of causality: There is no holy grail	4
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms	4
Spatial- and time- division multiplexing in CNN accelerator	4
Editorial Board	4
Distributed software defined network-based fog to fog collaboration scheme	4
A sleek lock-free hash map in an ERA of safe memory reclamation methods	4
Lifeline-based load balancing schemes for Asynchronous Many-Task runtimes in clusters	4
Accelerating the scheduling of the network resources of the next-generation optical data centers	4
Analyzing the impact of CUDA versions on GPU applications	3
Editorial Board	3
Parallel Pattern Compiler for Automatic Global Optimizations	3
Editorial Board	3
Exploring metrics for analyzing dynamic behavior in MPI programs via a coupled-oscillator model	3
Random sketching to enhance the numerical stability of block orthogonalization algorithms for s-step GMRES	3
Editorial Board	3
Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance	3
Editorial Board	3
Optimal ATAPE task scheduling on reconfigurable and partitionable hierarchical hypercube networks	3
A flexible sparse matrix data format and parallel algorithms for the assembly of finite element matrices on shared memory systems	3
Enable cross-iteration parallelism for PIM-based graph processing with vertex-level synchronization	3
Butterfly factorization for vision transformers on multi-IPU systems	3
Spatial-aware data partition for distributed memory parallelization of ANN search in multimedia retrieval	3
Cache partitioning for sparse matrix–vector multiplication on the A64FX	2
FastPTM: Fast weights loading of pre-trained models for parallel inference service provisioning	2
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU	2
HRPF: A parallel programming framework for recursive algorithms on heterogeneous CPU–GPU systems	2
LSAF: A load-balancing SpGEMM acceleration framework with dynamic package and static partition for multi-core systolic arrays	2
Reconfiguration algorithms for synchronous communication on switch based degradable arrays	2
A heterogeneous processing-in-memory approach to accelerate quantum chemistry simulation	2
A survey of parallel computing frameworks and optimizations for AI and deep learning	2
NekRS, a GPU-accelerated spectral element Navier–Stokes solver	2
Editorial for parallel computing	2
Analysis of the impact of NUMA node configuration on the performance of offloading computations to GPUs	2
SGPM: A coroutine framework for transaction processing	1
Operational Data Analytics in practice: Experiences from design to deployment in production HPC environments	1
Metall: A persistent memory allocator for data-centric analytics	1
Fast calculation of isostatic compensation correction using the GPU-parallel prism method	1
ALBBA: An efficient ALgebraic Bypass BFS Algorithm on long vector architectures	1
An approach for low-power heterogeneous parallel implementation of ALC-PSO algorithm using OmpSs and CUDA	1
Low-synch Gram–Schmidt with delayed reorthogonalization for Krylov solvers	1
OpenACC + Athread collaborative optimization of Silicon-Crystal application on Sunway TaihuLight	1

Accelerating communication for parallel programming models on GPU systems	1
Benchmark of classical disk array and software-defined storage on near-identical hardware	1
Editorial Board	1
Optimizing massively parallel sparse matrix computing on ARM many-core processor	1
WBSP: Addressing stragglers in distributed machine learning with worker-busy synchronous parallel	1
A coarse-grained multicomputer parallel algorithm for the sequential substring constrained longest common subsequence problem	1
Extending the limit of LR-TDDFT on two different approaches: Numerical algorithms and new Sunway heterogeneous supercomputer	1
Lowering entry barriers to developing custom simulators of distributed applications and platforms with SimGrid	1
An optimal scheduling algorithm considering the transactions worst-case delay for multi-channel hyperledger fabric network	1
Towards scaling community detection on distributed-memory heterogeneous systems	1
Task-parallel tiled direct solver for dense symmetric indefinite systems	1
Big data BPMN workflow resource optimization in the cloud	1
Multi-level parallel multi-layer block reproducible summation algorithm	1
GPU/CUDA-Accelerated gradient growth optimizer for efficient complex numerical global optimization	1
Performance and accuracy predictions of approximation methods for shortest-path algorithms on GPUs	1