ACM Transactions on Architecture and Code Optimization

Papers
(The TQCC of ACM Transactions on Architecture and Code Optimization is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-03-01 to 2024-03-01.)
ArticleCitations
SMAUG31
IR2V EC28
Domain-Specific Multi-Level IR Rewriting for GPU22
A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures21
Grus18
A Black-box Monitoring Approach to Measure Microservices Runtime Performance18
Compiler Support for Sparse Tensor Computations in MLIR17
PERI17
LLOV16
ArmorAll16
Dynamic Precision Autotuning with TAFFO16
PAVER15
SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms12
A Case For Intra-rack Resource Disaggregation in HPC12
Optimizing the SSD Burst Buffer by Traffic Detection11
Inter-kernel Reuse-aware Thread Block Scheduling11
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications11
Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks11
Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)10
Exploiting Parallelism Opportunities with Deep Learning Frameworks10
A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels10
Dynamic Colocation Policies with Reinforcement Learning10
OD-SGD10
A Novel, Highly Integrated Simulator for Parallel and Distributed Systems9
AsynGraph9
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators9
An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication9
EchoBay9
KernelFaRer9
Gem5-X9
Securing Branch Predictors with Two-Level Encryption8
CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems8
Schedule Synthesis for Halide Pipelines on GPUs8
Enabling Highly Efficient Batched Matrix Multiplications on SW26010 Many-core Processor8
GEVO8
On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond8
Performance Evaluation of Intel Optane Memory for Managed Workloads8
Bayesian Optimization for Efficient Accelerator Synthesis7
ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes7
Low I/O Intensity-aware Partial GC Scheduling to Reduce Long-tail Latency in SSDs7
Architecting Optically Controlled Phase Change Memory7
GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory7
PolyDL7
GRAM7
Informed Prefetching for Indirect Memory Accesses6
A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAs6
Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization6
Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints6
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM6
ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer6
Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration5
Scale-out Systolic Arrays5
SIMT-X5
E-BATCH: Energy-Efficient and High-Throughput RNN Batching5
Autotuning Convolutions Is Easier Than You Think5
MC-DeF5
Energy-efficient In-Memory Address Calculation5
A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC20065
Optimizing Small-Sample Disk Fault Detection Based on LSTM-GAN Model5
Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs5
Cooperative Software-hardware Acceleration of K-means on a Tightly Coupled CPU-FPGA System5
HeapCheck: Low-cost Hardware Support for Memory Safety5
Low-precision Logarithmic Number Systems5
GraphPEG4
Zeroploit4
GraphAttack4
On Architectural Support for Instruction Set Randomization4
Practical Software-Based Shadow Stacks on x86-644
Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory4
MemSZ4
MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation4
Understanding Cache Compression4
An FPGA-based Approach to Evaluate Thermal and Resource Management Strategies of Many-core Processors4
FastPath_MP4
Application-Specific Arithmetic in High-Level Synthesis Tools4
0.027294874191284