IEEE Computer Architecture Letters

Papers
(The median citation count of IEEE Computer Architecture Letters is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)
ArticleCitations
Old is Gold: Optimizing Single-Threaded Applications With ExGen-Malloc60
The Architectural Sustainability Indicator33
Toward Practical 128-Bit General Purpose Microarchitectures19
Speculative Multi-Level Access in LSM Tree-Based KV Store19
A Characterization of Generative Recommendation Models: Study of Hierarchical Sequential Transduction Unit17
Characterization and Analysis of Text-to-Image Diffusion Models17
Accelerating Programmable Bootstrapping Targeting Contemporary GPU Microarchitecture16
SCALES: SCALable and Area-Efficient Systolic Accelerator for Ternary Polynomial Multiplication16
Time Series Machine Learning Models for Precise SSD Access Latency Prediction16
Context-Aware Set Dueling for Dynamic Policy Arbitration15
MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference13
A Quantitative Analysis of Mamba-2-Based Large Language Model: Study of State Space Duality13
In-Depth Characterization of Machine Learning on an Optimized Multi-Party Computing Library13
Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure13
SoCurity: A Design Approach for Enhancing SoC Security10
OASIS: Outlier-Aware KV Cache Clustering for Scaling LLM Inference in CXL Memory Systems10
Improving Energy-Efficiency of Capsule Networks on Modern GPUs10
Straw: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs10
2021 Index IEEE Computer Architecture Letters Vol. 2010
AiDE: Attention-FFN Disaggregated Execution for Cost-Effective LLM Decoding on CXL-PNM10
Exploring KV Cache Quantization in Multimodal Large Language Model Inference9
A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System9
Exploring the DIMM PIM Architecture for Accelerating Time Series Analysis9
RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs8
A Case for In-Memory Random Scatter-Gather for Fast Graph Processing7
REDIT: Redirection-Enabled Memory-Side Directory Architecture for CXL Memory Fabric7
StreamDQ: HBM-Integrated On-the-Fly DeQuantization via Memory Load for Large Language Models7
In-Memory Versioning (IMV)7
Disaggregated Speculative Decoding for Carbon-Efficient LLM Serving6
Improving Performance on Tiered Memory With Semantic Data Placement6
Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping6
Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference6
Security Helper Chiplets: A New Paradigm for Secure Hardware Monitoring6
Thread-Adaptive: High-Throughput Parallel Architectures of SLH-DSA on GPUs6
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture6
PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM6
NoHammer: Preventing Row Hammer With Last-Level Cache Management5
Efficient Deadlock Avoidance by Considering Stalling, Message Dependencies, and Topology5
Managing Prefetchers With Deep Reinforcement Learning5
pNet-gem5: Full-System Simulation With High-Performance Networking Enabled by Parallel Network Packet Processing5
DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity5
LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads5
Accelerating Deep Reinforcement Learning via Phase-Level Parallelism for Robotics Applications5
High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network5
Memory-Centric MCM-GPU Architecture5
RAESC: A Reconfigurable AES Countermeasure Architecture for RISC-V With Enhanced Power Side-Channel Resilience5
SparseLeakyNets: Classification Prediction Attack Over Sparsity-Aware Embedded Neural Networks Using Timing Side-Channel Information4
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems4
SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency4
Primate: A Framework to Automatically Generate Soft Processors for Network Applications4
Adaptive Web Browsing on Mobile Heterogeneous Multi-cores4
ZoneBuffer: An Efficient Buffer Management Scheme for ZNS SSDs4
PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks4
A Flexible Hybrid Interconnection Design for High-Performance and Energy-Efficient Chiplet-Based Systems4
Fast Performance Prediction for Efficient Distributed DNN Training4
Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains4
SSE: Security Service Engines to Accelerate Enclave Performance in Secure Multicore Processors3
Exploring Volatile FPGAs Potential for Accelerating Energy-Harvesting IoT Applications3
LeakDiT: Diffusion Transformers for Trace-Augmented Side-Channel Analysis3
SEMS: Scalable Embedding Memory System for Accelerating Embedding-Based DNNs3
Camulator: A Lightweight and Extensible Trace-Driven Cache Simulator for Embedded Multicore SoCs3
Understanding the Performance Behaviors of End-to-End Protein Design Pipelines on GPUs3
T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving3
CABANA : Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS With Intel AMX3
Enabling Cost-Efficient LLM Inference on Mid-Tier GPUs With NMP DIMMs3
Guard Cache: Creating Noisy Side-Channels3
Accelerators & Security: The Socket Approach3
Direct-Coding DNA With Multilevel Parallelism3
A Quantum Computer Trusted Execution Environment3
Minimal Counters, Maximum Insight: Simplifying System Performance With HPC Clusters for Optimized Monitoring2
A First-Order Model to Assess Computer Architecture Sustainability2
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models2
On Internally Tagged Instruction Set Architectures2
FullPack: Full Vector Utilization for Sub-Byte Quantized Matrix-Vector Multiplication on General Purpose CPUs2
A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key-Value Stores2
Accelerating Page Migrations in Operating Systems With Intel DSA2
Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations2
IntervalSim++: Enhanced Interval Simulation for Unbalanced Processor Designs2
PINSim: A Processing In- and Near-Sensor Simulator to Model Intelligent Vision Sensors2
EgDiff: An Enhanced Global Load Value Predictor2
R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead2
Cost-Effective Extension of DRAM-PIM for Group-Wise LLM Quantization2
Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures2
Redundant Array of Independent Memory Devices2
Approximate Multiplier Design With LFSR-Based Stochastic Sequence Generators for Edge AI2
Architectural Implications of GNN Aggregation Programming Abstractions2
Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack2
DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching2
gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation2
Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads2
Analyzing and Exploiting Memory Hierarchy Parallelism With MLP Stacks2
Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture2
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications2
Enhancing DNN Training Efficiency Via Dynamic Asymmetric Architecture2
Energy-Efficient Bayesian Inference Using Bitstream Computing2
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems2
Characterization and Analysis of the 3D Gaussian Splatting Rendering Pipeline1
Halis: A Hardware-Software Co-Designed Near-Cache Accelerator for Graph Pattern Mining1
An Intermediate Language for General Sparse Format Customization1
Exploiting Intel AMX Power Gating1
Exploiting Direct Memory Operands in GPU Instructions1
Low-Latency PIM Accelerator for Edge LLM Inference1
Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata Management1
Architectural Security Regulation1
Characterizing and Understanding End-to-End Multi-Modal Neural Networks on GPUs1
MQSim-E: An Enterprise SSD Simulator1
Characterizing and Understanding HGNNs on GPUs1
A Partial Tag–Data Decoupled Architecture for Last-Level Cache Optimization1
eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models1
MOST: Memory Oversubscription-Aware Scheduling for Tensor Migration on GPU Unified Storage1
Enhancing DCIM Efficiency with Multi-Storage-Row Architecture for Edge AI Workloads1
HINT: A Hardware Platform for Intra-Host NIC Traffic and SmartNIC Emulation1
On Variable Strength Quantum ECC1
Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator1
TeleVM: A Lightweight Virtual Machine for RISC-V Architecture1
Approximate SFQ-Based Computing Architecture Modeling With Device-Level Guidelines1
Canal: A Flexible Interconnect Generator for Coarse-Grained Reconfigurable Arrays1
A Multiple-Aspect Optimal CNN Accelerator in Top1 Accuracy, Performance, and Power Efficiency1
Pyramid: Accelerating LLM Inference With Cross-Level Processing-in-Memory1
MajorK: Majority Based kmer Matching in Commodity DRAM1
A Pre-Silicon Approach to Discovering Microarchitectural Vulnerabilities in Security Critical Applications1
Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System1
CGR-NPU: A Hybrid CGRA and NPU Architecture for Adaptive Neural Computing Workloads1
A Data Prefetcher-Based 1000-Core RISC-V Processor for Efficient Processing of Graph Neural Networks1
Cache and Near-Data Co-Design for Chiplets1
A Case for Hardware Memoization in Server CPUs1
Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs1
MixDiT: Accelerating Image Diffusion Transformer Inference With Mixed-Precision MX Quantization1
A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models1
LSim: Fine-Grained Simulation Framework for Large-Scale Performance Evaluation1
Tulip: Turn-Free Low-Power Network-on-Chip1
Electra: Eliminating the Ineffectual Computations on Bitmap Compressed Matrices1
X-PPR: Post Package Repair for CXL Memory1
Characterizing and Understanding Distributed GNN Training on GPUs1
Amethyst: Reducing Data Center Emissions With Dynamic Autotuning and VM Management1
0.11020112037659