ACM Transactions on Architecture and Code Optimization

Papers
(The TQCC of ACM Transactions on Architecture and Code Optimization is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-07-01 to 2024-07-01.)
ArticleCitations
SMAUG36
IR2V EC35
Domain-Specific Multi-Level IR Rewriting for GPU25
A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures23
Grus21
PERI19
A Black-box Monitoring Approach to Measure Microservices Runtime Performance19
Compiler Support for Sparse Tensor Computations in MLIR19
LLOV18
PAVER17
SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms17
A Case For Intra-rack Resource Disaggregation in HPC16
Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks14
Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)13
Inter-kernel Reuse-aware Thread Block Scheduling13
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications12
KernelFaRer12
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators12
Securing Branch Predictors with Two-Level Encryption11
A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels11
OD-SGD10
AsynGraph10
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM10
Exploiting Parallelism Opportunities with Deep Learning Frameworks10
Low I/O Intensity-aware Partial GC Scheduling to Reduce Long-tail Latency in SSDs10
EchoBay10
On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond10
An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication10
PolyDL10
Gem5-X10
Low-precision Logarithmic Number Systems9
GRAM9
GEVO9
CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems9
Bayesian Optimization for Efficient Accelerator Synthesis8
GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory8
Schedule Synthesis for Halide Pipelines on GPUs8
ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes8
Architecting Optically Controlled Phase Change Memory8
Performance Evaluation of Intel Optane Memory for Managed Workloads8
Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization7
Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration7
Autotuning Convolutions Is Easier Than You Think7
A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC20066
Scale-out Systolic Arrays6
Understanding Cache Compression6
Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints6
Optimizing Small-Sample Disk Fault Detection Based on LSTM-GAN Model6
ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer6
Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs5
Energy-efficient In-Memory Address Calculation5
Practical Software-Based Shadow Stacks on x86-645
Gretch5
Refresh Triggered Computation5
MemSZ5
E-BATCH: Energy-Efficient and High-Throughput RNN Batching5
GraphPEG5
FastPath_MP5
LargeGraph5
Cooperative Software-hardware Acceleration of K-means on a Tightly Coupled CPU-FPGA System5
HeapCheck: Low-cost Hardware Support for Memory Safety5
MC-DeF5
High-performance Deterministic Concurrency Using Lingua Franca4
An FPGA-based Approach to Evaluate Thermal and Resource Management Strategies of Many-core Processors4
Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache4
GraphAttack4
On Architectural Support for Instruction Set Randomization4
A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks4
Zeroploit4
Performance and Power Prediction for Concurrent Execution on GPUs4
SPX644
Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators4
Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory4
GPU Domain Specialization via Composable On-Package Architecture4
MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation4
1.3564579486847