ACM Transactions on Architecture and Code Optimization

Papers
(The TQCC of ACM Transactions on Architecture and Code Optimization is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-04-01 to 2024-04-01.)
ArticleCitations
SMAUG32
IR2V EC30
Domain-Specific Multi-Level IR Rewriting for GPU24
A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures21
Grus19
A Black-box Monitoring Approach to Measure Microservices Runtime Performance18
ArmorAll17
PERI17
LLOV17
Compiler Support for Sparse Tensor Computations in MLIR17
PAVER16
Dynamic Precision Autotuning with TAFFO16
Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks12
A Case For Intra-rack Resource Disaggregation in HPC12
SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms12
Inter-kernel Reuse-aware Thread Block Scheduling12
Securing Branch Predictors with Two-Level Encryption11
A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels11
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators11
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications11
OD-SGD10
KernelFaRer10
Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)10
Exploiting Parallelism Opportunities with Deep Learning Frameworks10
GEVO9
Gem5-X9
EchoBay9
An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication9
AsynGraph9
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM9
PolyDL8
On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond8
GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory8
Low I/O Intensity-aware Partial GC Scheduling to Reduce Long-tail Latency in SSDs8
Schedule Synthesis for Halide Pipelines on GPUs8
CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems8
Low-precision Logarithmic Number Systems8
Architecting Optically Controlled Phase Change Memory8
Performance Evaluation of Intel Optane Memory for Managed Workloads8
Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization7
ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes7
Bayesian Optimization for Efficient Accelerator Synthesis7
GRAM7
Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints6
ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer6
Scale-out Systolic Arrays6
A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAs6
HeapCheck: Low-cost Hardware Support for Memory Safety5
GraphPEG5
MC-DeF5
Refresh Triggered Computation5
Cooperative Software-hardware Acceleration of K-means on a Tightly Coupled CPU-FPGA System5
A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC20065
Understanding Cache Compression5
Optimizing Small-Sample Disk Fault Detection Based on LSTM-GAN Model5
FastPath_MP5
Gretch5
SIMT-X5
E-BATCH: Energy-Efficient and High-Throughput RNN Batching5
Autotuning Convolutions Is Easier Than You Think5
Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration5
Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs5
Energy-efficient In-Memory Address Calculation5
Zeroploit4
LargeGraph4
MemSZ4
An FPGA-based Approach to Evaluate Thermal and Resource Management Strategies of Many-core Processors4
Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory4
GPU Domain Specialization via Composable On-Package Architecture4
MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation4
Practical Software-Based Shadow Stacks on x86-644
Performance and Power Prediction for Concurrent Execution on GPUs4
GraphAttack4
On Architectural Support for Instruction Set Randomization4
0.19154000282288