IEEE Computer Architecture Letters

(The median citation count of IEEE Computer Architecture Letters is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-04-01 to 2024-04-01.)
DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator95
SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD40
RAMBO: Resource Allocation for Microservices Using Bayesian Optimization29
GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers27
pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning23
The Entangling Instruction Prefetcher17
Lightweight Hardware Implementation of Binary Ring-LWE PQC Accelerator16
MultiPIM: A Detailed and Configurable Multi-Stack Processing-In-Memory Simulator14
A Cross-Stack Approach Towards Defending Against Cryptojacking13
Flexion: A Quantitative Metric for Flexibility in DNN Accelerators12
Rebasing Instruction Prefetching: An Industry Perspective11
HBM3 RAS: Enhancing Resilience at Scale10
Cryogenic PIM: Challenges & Opportunities9
Reorder Buffer Contention: A Forward Speculative Interference Attack for Speculation Invariant Instructions8
TRiM: Tensor Reduction in Memory8
STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators8
Heterogeneity-Aware Scheduling on SoCs for Autonomous Vehicles8
Characterizing and Understanding End-to-End Multi-Modal Neural Networks on GPUs7
A Day In the Life of a Quantum Error7
Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training6
Accelerating Concurrent Priority Scheduling Using Adaptive in-Hardware Task Distribution in Multicores6
MCsim: An Extensible DRAM Memory Controller Simulator6
Harnessing Pairwise-Correlating Data Prefetching With Runahead Metadata6
BTB-X: A Storage-Effective BTB Organization6
DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching5
A Lightweight Memory Access Pattern Obfuscation Framework for NVM5
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications5
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems5
LT-PIM: An LUT-Based Processing-in-DRAM Architecture With RowHammer Self-Tracking5
Instruction Criticality Based Energy-Efficient Hardware Data Prefetching5
Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs5
Zero-Copying I/O Stack for Low-Latency SSDs4
Hardware Acceleration for GCNs via Bidirectional Fusion4
A First-Order Model to Assess Computer Architecture Sustainability4
Adaptive Web Browsing on Mobile Heterogeneous Multi-cores4
Deep Partitioned Training From Near-Storage Computing to DNN Accelerators4
Decoupled SSD: Reducing Data Movement on NAND-Based Flash SSD4
DAM: Deadblock Aware Migration Techniques for STT-RAM-Based Hybrid Caches4
Dynamic Optimization of On-Chip Memories for HLS Targeting Many-Accelerator Platforms4
Characterizing and Understanding HGNNs on GPUs4
GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks4
Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure4
Making a Better Use of Caches for GCN Accelerators with Feature Slicing and Automatic Tile Morphing3
Managing Prefetchers With Deep Reinforcement Learning3
Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture3
OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs3
Infinity Stream: Enabling Transparent and Automated In-Memory Computing3
Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive Prefetching3
Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures3
WPC: Whole-Picture Workload Characterization Across Intermediate Representation, ISA, and Microarchitecture3
Characterizing and Understanding Distributed GNN Training on GPUs3
Data-Aware Compression of Neural Networks3
Near-Data Processing in Memory Expander for DNN Acceleration on GPUs3
PIM-GraphSCC: PIM-Based Graph Processing Using Graph’s Community Structures2
The Case for Domain-Specialized Branch Predictors for Graph-Processing2
DAMARU: A Denial-of-Service Attack on Randomized Last-Level Caches2
A Case for Speculative Strength Reduction2
Fine-Grained Scheduling in Heterogeneous-ISA Architectures2
MQSim-E: An Enterprise SSD Simulator2
Scale-Model Simulation2
Enabling In-SRAM Pattern Processing With Low-Overhead Reporting Architecture2
MPU-Sim: A Simulator for In-DRAM Near-Bank Processing Architectures2
Accelerating Graph Processing With Lightweight Learning-Based Data Reordering2
The Case for Dynamic Bias in Global Adaptive Routing2
Exploring PIM Architecture for High-Performance Graph Pattern Mining2
BayesTuner: Leveraging Bayesian Optimization For DNN Inference Configuration Selection1
Runtime Support for Accelerating CNN Models on Digital DRAM Processing-in-Memory Hardware1
Canal: A Flexible Interconnect Generator for Coarse-Grained Reconfigurable Arrays1
Characterization and Analysis of Deep Learning for 3D Point Cloud Analytics1
Multi-Prediction Compression: An Efficient and Scalable Memory Compression Framework for GP-GPU1
The Case for Replication-Aware Memory-Error Protection in Disaggregated Memory1
LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks1
X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands1
Aging-Aware Context Switching in Multicore Processors Based on Workload Classification1
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models1
A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key-Value Stores1
Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs1
Stride Equality Prediction for Value Speculation1
FlexScore: Quantifying Flexibility1
Energy-Efficient Bayesian Inference Using Bitstream Computing1
Modeling DRAM Timing in Parallel Simulators With Immediate-Response Memory Model1
Modeling Periodic Energy-Harvesting Computing Systems1
SmaQ: Smart Quantization for DNN Training by Exploiting Value Clustering1
FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack1
Advancing Compilation of DNNs for FPGAs Using Operation Set Architectures1
A Model for Scalable and Balanced Accelerators for Graph Processing1
Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations1
XLA-NDP: Efficient Scheduling and Code Generation for Deep Learning Model Training on Near-Data Processing Memory1
gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation1
Accelerators & Security: The Socket Approach1
Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator1
By-Software Branch Prediction in Loops1
A Pre-Silicon Approach to Discovering Microarchitectural Vulnerabilities in Security Critical Applications1
Towards Improved Power Management in Cloud GPUs1