IEEE Computer Architecture Letters

Papers
(The median citation count of IEEE Computer Architecture Letters is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-03-01 to 2024-03-01.)
ArticleCitations
DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator86
SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD39
RAMBO: Resource Allocation for Microservices Using Bayesian Optimization29
GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers27
pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning18
The Entangling Instruction Prefetcher16
Lightweight Hardware Implementation of Binary Ring-LWE PQC Accelerator14
MultiPIM: A Detailed and Configurable Multi-Stack Processing-In-Memory Simulator13
A Cross-Stack Approach Towards Defending Against Cryptojacking12
Flexion: A Quantitative Metric for Flexibility in DNN Accelerators11
Rebasing Instruction Prefetching: An Industry Perspective10
HBM3 RAS: Enhancing Resilience at Scale9
Cryogenic PIM: Challenges & Opportunities9
STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators8
Reorder Buffer Contention: A Forward Speculative Interference Attack for Speculation Invariant Instructions8
Heterogeneity-Aware Scheduling on SoCs for Autonomous Vehicles7
TRiM: Tensor Reduction in Memory7
A Day In the Life of a Quantum Error7
Characterizing and Understanding End-to-End Multi-Modal Neural Networks on GPUs6
Harnessing Pairwise-Correlating Data Prefetching With Runahead Metadata6
Accelerating Concurrent Priority Scheduling Using Adaptive in-Hardware Task Distribution in Multicores6
MCsim: An Extensible DRAM Memory Controller Simulator6
Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training5
Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs5
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems5
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications5
Instruction Criticality Based Energy-Efficient Hardware Data Prefetching5
BTB-X: A Storage-Effective BTB Organization5
Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure4
Dynamic Optimization of On-Chip Memories for HLS Targeting Many-Accelerator Platforms4
GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks4
DAM: Deadblock Aware Migration Techniques for STT-RAM-Based Hybrid Caches4
A Lightweight Memory Access Pattern Obfuscation Framework for NVM4
Adaptive Web Browsing on Mobile Heterogeneous Multi-cores4
Characterizing and Understanding HGNNs on GPUs4
A First-Order Model to Assess Computer Architecture Sustainability3
Deep Partitioned Training From Near-Storage Computing to DNN Accelerators3
Zero-Copying I/O Stack for Low-Latency SSDs3
Hardware Acceleration for GCNs via Bidirectional Fusion3
OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs3
WPC: Whole-Picture Workload Characterization Across Intermediate Representation, ISA, and Microarchitecture3
Infinity Stream: Enabling Transparent and Automated In-Memory Computing3
Managing Prefetchers With Deep Reinforcement Learning3
Decoupled SSD: Reducing Data Movement on NAND-Based Flash SSD3
DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching3
Making a Better Use of Caches for GCN Accelerators with Feature Slicing and Automatic Tile Morphing3
Characterizing and Understanding Distributed GNN Training on GPUs3
LT-PIM: An LUT-Based Processing-in-DRAM Architecture With RowHammer Self-Tracking3
Near-Data Processing in Memory Expander for DNN Acceleration on GPUs3
Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive Prefetching2
Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture2
The Case for Domain-Specialized Branch Predictors for Graph-Processing2
DAMARU: A Denial-of-Service Attack on Randomized Last-Level Caches2
A Case for Speculative Strength Reduction2
PIM-GraphSCC: PIM-Based Graph Processing Using Graph’s Community Structures2
Enabling In-SRAM Pattern Processing With Low-Overhead Reporting Architecture2
Exploring PIM Architecture for High-Performance Graph Pattern Mining2
Data-Aware Compression of Neural Networks2
Accelerating Graph Processing With Lightweight Learning-Based Data Reordering2
The Case for Dynamic Bias in Global Adaptive Routing2
Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs1
Fine-Grained Scheduling in Heterogeneous-ISA Architectures1
Characterization and Analysis of Deep Learning for 3D Point Cloud Analytics1
Modeling DRAM Timing in Parallel Simulators With Immediate-Response Memory Model1
By-Software Branch Prediction in Loops1
X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands1
Aging-Aware Context Switching in Multicore Processors Based on Workload Classification1
Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures1
A Model for Scalable and Balanced Accelerators for Graph Processing1
MPU-Sim: A Simulator for In-DRAM Near-Bank Processing Architectures1
Stride Equality Prediction for Value Speculation1
Accelerators & Security: The Socket Approach1
FlexScore: Quantifying Flexibility1
LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks1
SmaQ: Smart Quantization for DNN Training by Exploiting Value Clustering1
A Pre-Silicon Approach to Discovering Microarchitectural Vulnerabilities in Security Critical Applications1
Energy-Efficient Bayesian Inference Using Bitstream Computing1
Advancing Compilation of DNNs for FPGAs Using Operation Set Architectures1
Runtime Support for Accelerating CNN Models on Digital DRAM Processing-in-Memory Hardware1
Multi-Prediction Compression: An Efficient and Scalable Memory Compression Framework for GP-GPU1
gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation1
XLA-NDP: Efficient Scheduling and Code Generation for Deep Learning Model Training on Near-Data Processing Memory1
The Case for Replication-Aware Memory-Error Protection in Disaggregated Memory1
Modeling Periodic Energy-Harvesting Computing Systems1
FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack1
Scale-Model Simulation1
BayesTuner: Leveraging Bayesian Optimization For DNN Inference Configuration Selection1
A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key-Value Stores1
Canal: A Flexible Interconnect Generator for Coarse-Grained Reconfigurable Arrays1
On Variable Strength Quantum ECC0
Value Locality Based Approximation With ODIN0
PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks0
Intelligent SSD Firmware for Zero-Overhead Journaling0
Exploiting Direct Memory Operands in GPU Instructions0
LSim: Fine-Grained Simulation Framework for Large-Scale Performance Evaluation0
Characterizing and Understanding Defense Methods for GNNs on GPUs0
DRAMA: Commodity DRAM based Content Addressable Memory0
A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models0
Structured Combinators for Efficient Graph Reduction0
TURBULENCE: Complexity-effective Out-of-order Execution on GPU with Distance-based ISA0
Efficient Memory Layout for Pre-Alignment Filtering of Long DNA Reads Using Racetrack Memory0
Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations0
Revisiting Browser Performance Benchmarking From an Architectural Perspective0
Fast Performance Prediction for Efficient Distributed DNN Training0
Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators0
Speculative Multi-Level Access in LSM Tree-Based KV Store0
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems0
2021 Index IEEE Computer Architecture Letters Vol. 200
DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity0
MQSim-E: An Enterprise SSD Simulator0
HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization0
Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator0
LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads0
Voltage Noise Mitigation With Barrier Approximation0
Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains0
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models0
DNA Pre-Alignment Filter Using Processing Near Racetrack Memory0
LMT: Accurate and Resource-Scalable Slowdown Prediction0
Enhancing DNN Training Efficiency Via Dynamic Asymmetric Architecture0
Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator0
Adapting In Situ Accelerators for Sparsity with Granular Matrix Reordering0
FullPack: Full Vector Utilization for Sub-Byte Quantized Vector-Matrix Multiplication on General Purpose CPUs0
A Study of Memory Placement on Hardware-Assisted Tiered Memory Systems0
CoreNap: Energy Efficient Core Allocation for Latency-Critical Workloads0
Redundant Array of Independent Memory Devices0
Ensuring Data Confidentiality in eADR-Based NVM Systems0
Design of a High-Performance, High-Endurance Key-Value SSD for Large-Key Workloads0
Accelerating Deep Reinforcement Learning via Phase-Level Parallelism for Robotics Applications0
Guard Cache: Creating Noisy Side-Channels0
The Jaseci Programming Paradigm and Runtime Stack: Building Scale-Out Production Applications Easy and Fast0
Architectural Security Regulation0
Guessing Outputs of Dynamically Pruned CNNs Using Memory Access Patterns0
DVFaaS: Leveraging DVFS for FaaS Workflows0
Improving Energy-efficiency of Capsule Networks on Modern GPUs0
T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving0
Toward Practical 128-Bit General Purpose Microarchitectures0
SSE: Security Service Engines to Accelerate Enclave Performance in Secure Multicore Processors0
Probability-Based Address Translationfor Flash SSDs0
SmartIndex: Learning to Index Caches to Improve Performance0
Achieving Forward Progress Guarantee in Small Hardware Transactions0
eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models0
2020 Index IEEE Computer Architecture Letters Vol. 190
Exploiting Intrinsic Redundancies in Dynamic Graph Neural Networks for Processing Efficiency0
Primate: A Framework to Automatically Generate Soft Processors for Network Applications0
Hardware Accelerated Reusable Merkle Tree Generation for Bitcoin Blockchain Headers0
Simulating Our Way to Safer Software: A Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling0
Architectural Implications of GNN Aggregation Programming Abstractions0
Hy-Sched: A Simple Hyperthreading-Aware Thread to Core Allocation Strategy0
Hardware-Implemented Lightweight Accelerator for Large Integer Polynomial Multiplication0
Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping0
In-Memory Versioning (IMV)0
Open-Source Hardware Memory Protection Engine Integrated With NVMM Simulator0
Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack0
Containerized In-Storage Processing Model and Hardware Acceleration for Fully-Flexible Computational SSDs0
NoHammer: Preventing Row Hammer With Last-Level Cache Management0
Exploring the Latency Sensitivity of Cache Replacement Policies0
Pulley: An Algorithm/Hardware Co-Optimization for In-Memory Sorting0
Tulip: Turn-Free Low-Power Network-on-Chip0
An Intermediate Language for General Sparse Format Customization0
SEMS: Scalable Embedding Memory System for Accelerating Embedding-Based DNNs0
Smart Memory: Deep Learning Acceleration In 3D-Stacked Memories0
ADT: Aggressive Demotion and Promotion for Tiered Memory0
SoCurity: A Design Approach for Enhancing SoC Security0
Inter-Temperature Bandwidth Reduction in Cryogenic QAOA Machines0
Learned Performance Model for SSD0
A Quantum Computer Trusted Execution Environment0
Hardware Trojan Threats to Cache Coherence in Modern 2.5D Chiplet Systems0
R.i.p. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead0
The Mirage of Breaking MIRAGE: Analyzing the Modeling Pitfalls in Emerging “Attacks” on MIRAGE0
RIO: ROB-Centric In-Order Modeling of Out-of-Order Processors0
Baobab Merkle Tree for Efficient Secure Memory0
Kobold: Simplified Cache Coherence for Cache-Attached Accelerators0
Towards Improved Power Management in Cloud GPUs0
JANM-IK: Jacobian Argumented Nelder-Mead Algorithm for Inverse Kinematics and Its Hardware Acceleration0
UDIR: Towards a Unified Compiler Framework for Reconfigurable Dataflow Architectures0
A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System0
Towards an Accelerator for Differential and Algebraic Equations Useful to Scientists0
TokenSmart: Distributed, Scalable Power Management in the Many-Core Era0
LINAC: A Spatially Linear Accelerator for Convolutional Neural Networks0
Hardware-Assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity0
Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata Management0
Direct-Coding DNA With Multilevel Parallelism0
RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs0
0.022964954376221