IEEE Computer Architecture Letters

Papers
(The TQCC of IEEE Computer Architecture Letters is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)
ArticleCitations
Old is Gold: Optimizing Single-Threaded Applications With ExGen-Malloc90
The Architectural Sustainability Indicator24
Speculative Multi-Level Access in LSM Tree-Based KV Store23
Toward Practical 128-Bit General Purpose Microarchitectures21
A Characterization of Generative Recommendation Models: Study of Hierarchical Sequential Transduction Unit21
Exploration of Algorithm-Hardware Co-Design for Floating-Point Digital Compute-in-Memory20
Characterization and Analysis of Text-to-Image Diffusion Models20
Accelerating Programmable Bootstrapping Targeting Contemporary GPU Microarchitecture19
Time Series Machine Learning Models for Precise SSD Access Latency Prediction17
De-Quantization Penalties for Interactive LLM Inference on Prosumer GPUs17
SCALES: SCALable and Area-Efficient Systolic Accelerator for Ternary Polynomial Multiplication16
Context-Aware Set Dueling for Dynamic Policy Arbitration13
Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure13
In-Depth Characterization of Machine Learning on an Optimized Multi-Party Computing Library12
A Quantitative Analysis of Mamba-2-Based Large Language Model: Study of State Space Duality12
MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference12
AiDE: Attention-FFN Disaggregated Execution for Cost-Effective LLM Decoding on CXL-PNM11
OASIS: Outlier-Aware KV Cache Clustering for Scaling LLM Inference in CXL Memory Systems10
Exploring KV Cache Quantization in Multimodal Large Language Model Inference10
Improving Energy-Efficiency of Capsule Networks on Modern GPUs10
SoCurity: A Design Approach for Enhancing SoC Security10
Straw: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs10
A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System10
RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs9
In-Memory Versioning (IMV)9
A Case for In-Memory Random Scatter-Gather for Fast Graph Processing8
REDIT: Redirection-Enabled Memory-Side Directory Architecture for CXL Memory Fabric8
StreamDQ: HBM-Integrated On-the-Fly DeQuantization via Memory Load for Large Language Models8
Exploring the DIMM PIM Architecture for Accelerating Time Series Analysis7
Disaggregated Speculative Decoding for Carbon-Efficient LLM Serving7
Enabling Computation and Communication Overlap in PIMs for On-Device LLM Inference7
Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference6
Improving Performance on Tiered Memory With Semantic Data Placement6
Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping6
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture6
Security Helper Chiplets: A New Paradigm for Secure Hardware Monitoring6
PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM6
Thread-Adaptive: High-Throughput Parallel Architectures of SLH-DSA on GPUs6
Accelerating Deep Reinforcement Learning via Phase-Level Parallelism for Robotics Applications6
DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity5
SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency5
Managing Prefetchers With Deep Reinforcement Learning5
NoHammer: Preventing Row Hammer With Last-Level Cache Management5
Efficient Deadlock Avoidance by Considering Stalling, Message Dependencies, and Topology5
SparseLeakyNets: Classification Prediction Attack Over Sparsity-Aware Embedded Neural Networks Using Timing Side-Channel Information5
LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads5
pNet-gem5: Full-System Simulation With High-Performance Networking Enabled by Parallel Network Packet Processing5
High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network5
Memory-Centric MCM-GPU Architecture5
Hisui: Unlocking Tiered Memory Efficiency for FaaS Workloads4
PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks4
ReplayOpt: Optimizer-State Replay to Resolve Critical-Path Bottlenecks in Offloaded Training4
A Flexible Hybrid Interconnection Design for High-Performance and Energy-Efficient Chiplet-Based Systems4
Primate: A Framework to Automatically Generate Soft Processors for Network Applications4
RAESC: A Reconfigurable AES Countermeasure Architecture for RISC-V With Enhanced Power Side-Channel Resilience4
H 3 : H ybrid Architecture Using H igh Bandwidth Memory4
ZoneBuffer: An Efficient Buffer Management Scheme for ZNS SSDs4
Adaptive Web Browsing on Mobile Heterogeneous Multi-cores4
Nighthawk: Zero-Copy Cache Quarantine for Invisible Speculation4
Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains4
Xami : E x pert-Aware A daptive Compression for Mi 4
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems4
Enabling Cost-Efficient LLM Inference on Mid-Tier GPUs With NMP DIMMs3
Driving the Core Frontend With LiteBTB3
T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving3
Fast Performance Prediction for Efficient Distributed DNN Training3
Exploring Volatile FPGAs Potential for Accelerating Energy-Harvesting IoT Applications3
Fast Inter-Enclave Communication Encryption3
Guard Cache: Creating Noisy Side-Channels3
Direct-Coding DNA With Multilevel Parallelism3
Accelerators & Security: The Socket Approach3
SSE: Security Service Engines to Accelerate Enclave Performance in Secure Multicore Processors3
Understanding the Performance Behaviors of End-to-End Protein Design Pipelines on GPUs3
A Quantum Computer Trusted Execution Environment3
LeakDiT: Diffusion Transformers for Trace-Augmented Side-Channel Analysis3
Camulator: A Lightweight and Extensible Trace-Driven Cache Simulator for Embedded Multicore SoCs3
0.40133094787598