IEEE Computer Architecture Letters

Papers
(The TQCC of IEEE Computer Architecture Letters is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-03-01 to 2024-03-01.)
ArticleCitations
DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator86
SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD39
RAMBO: Resource Allocation for Microservices Using Bayesian Optimization29
GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers27
pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning18
The Entangling Instruction Prefetcher16
Lightweight Hardware Implementation of Binary Ring-LWE PQC Accelerator14
MultiPIM: A Detailed and Configurable Multi-Stack Processing-In-Memory Simulator13
A Cross-Stack Approach Towards Defending Against Cryptojacking12
Flexion: A Quantitative Metric for Flexibility in DNN Accelerators11
Rebasing Instruction Prefetching: An Industry Perspective10
Cryogenic PIM: Challenges & Opportunities9
HBM3 RAS: Enhancing Resilience at Scale9
STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators8
Reorder Buffer Contention: A Forward Speculative Interference Attack for Speculation Invariant Instructions8
A Day In the Life of a Quantum Error7
Heterogeneity-Aware Scheduling on SoCs for Autonomous Vehicles7
TRiM: Tensor Reduction in Memory7
MCsim: An Extensible DRAM Memory Controller Simulator6
Characterizing and Understanding End-to-End Multi-Modal Neural Networks on GPUs6
Harnessing Pairwise-Correlating Data Prefetching With Runahead Metadata6
Accelerating Concurrent Priority Scheduling Using Adaptive in-Hardware Task Distribution in Multicores6
Instruction Criticality Based Energy-Efficient Hardware Data Prefetching5
BTB-X: A Storage-Effective BTB Organization5
Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training5
Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs5
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems5
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications5
Adaptive Web Browsing on Mobile Heterogeneous Multi-cores4
Characterizing and Understanding HGNNs on GPUs4
Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure4
Dynamic Optimization of On-Chip Memories for HLS Targeting Many-Accelerator Platforms4
GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks4
DAM: Deadblock Aware Migration Techniques for STT-RAM-Based Hybrid Caches4
A Lightweight Memory Access Pattern Obfuscation Framework for NVM4
DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching3
Making a Better Use of Caches for GCN Accelerators with Feature Slicing and Automatic Tile Morphing3
Characterizing and Understanding Distributed GNN Training on GPUs3
LT-PIM: An LUT-Based Processing-in-DRAM Architecture With RowHammer Self-Tracking3
Near-Data Processing in Memory Expander for DNN Acceleration on GPUs3
A First-Order Model to Assess Computer Architecture Sustainability3
Deep Partitioned Training From Near-Storage Computing to DNN Accelerators3
Zero-Copying I/O Stack for Low-Latency SSDs3
Hardware Acceleration for GCNs via Bidirectional Fusion3
OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs3
WPC: Whole-Picture Workload Characterization Across Intermediate Representation, ISA, and Microarchitecture3
Infinity Stream: Enabling Transparent and Automated In-Memory Computing3
Managing Prefetchers With Deep Reinforcement Learning3
Decoupled SSD: Reducing Data Movement on NAND-Based Flash SSD3
0.055972814559937