ACM Transactions on Architecture and Code Optimization

Papers
(The TQCC of ACM Transactions on Architecture and Code Optimization is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
SMAUG39
IR2V EC39
Domain-Specific Multi-Level IR Rewriting for GPU25
A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures25
Grus23
PERI21
SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms21
A Black-box Monitoring Approach to Measure Microservices Runtime Performance20
Compiler Support for Sparse Tensor Computations in MLIR20
LLOV19
A Case For Intra-rack Resource Disaggregation in HPC18
Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks18
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications18
PAVER17
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators15
Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)15
PolyDL13
Low I/O Intensity-aware Partial GC Scheduling to Reduce Long-tail Latency in SSDs13
Gem5-X13
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM12
KernelFaRer12
A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels12
An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication12
CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems12
Architecting Optically Controlled Phase Change Memory11
Exploiting Parallelism Opportunities with Deep Learning Frameworks11
On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond10
Scale-out Systolic Arrays10
Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration9
Performance Evaluation of Intel Optane Memory for Managed Workloads9
Low-precision Logarithmic Number Systems9
GEVO9
ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes9
Autotuning Convolutions Is Easier Than You Think9
GRAM9
Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization8
ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer8
Understanding Cache Compression8
Bayesian Optimization for Efficient Accelerator Synthesis8
Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators7
Optimizing Small-Sample Disk Fault Detection Based on LSTM-GAN Model7
GraphPEG7
A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC20066
FastPath_MP6
High-performance Deterministic Concurrency Using Lingua Franca6
Performance and Power Prediction for Concurrent Execution on GPUs6
HeapCheck: Low-cost Hardware Support for Memory Safety6
LargeGraph6
Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints6
Spiking Neural Networks in Spintronic Computational RAM5
E-BATCH: Energy-Efficient and High-Throughput RNN Batching5
A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks5
SplitZNS: Towards an Efficient LSM-Tree on Zoned Namespace SSDs5
Refresh Triggered Computation5
MemSZ5
MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation5
Multi-objective Hardware-aware Neural Architecture Search with Pareto Rank-preserving Surrogate Models5
WaFFLe5
Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache5
Gretch5
Energy-efficient In-Memory Address Calculation5
Practical Software-Based Shadow Stacks on x86-645
MC-DeF5
GraphAttack5
GPU Domain Specialization via Composable On-Package Architecture5
GiantVM: A Novel Distributed Hypervisor for Resource Aggregation with DSM-aware Optimizations4
An FPGA Overlay for CNN Inference with Fine-grained Flexible Parallelism4
ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks4
Design and Evaluation of an Ultra Low-power Human-quality Speech Recognition System4
YaConv: Convolution with Low Cache Footprint4
SPX644
An FPGA-based Approach to Evaluate Thermal and Resource Management Strategies of Many-core Processors4
Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory4
On Architectural Support for Instruction Set Randomization4
0.046548128128052