OOIR: Observatory of International Research

Papers

(The median citation count of IEEE Computer Architecture Letters is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-08-01 to 2025-08-01.)

Article	Citations
Speculative Multi-Level Access in LSM Tree-Based KV Store	37
Accelerating Programmable Bootstrapping Targeting Contemporary GPU Microarchitecture	32
The Architectural Sustainability Indicator	15
A Characterization of Generative Recommendation Models: Study of Hierarchical Sequential Transduction Unit	15
Characterization and Analysis of Text-to-Image Diffusion Models	15
Toward Practical 128-Bit General Purpose Microarchitectures	13
Old is Gold: Optimizing Single-Threaded Applications With ExGen-Malloc	13
SCALES: SCALable and Area-Efficient Systolic Accelerator for Ternary Polynomial Multiplication	12
Time Series Machine Learning Models for Precise SSD Access Latency Prediction	11
2021 Index IEEE Computer Architecture Letters Vol. 20	10
Straw: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs	10
SoCurity: A Design Approach for Enhancing SoC Security	10
Improving Energy-Efficiency of Capsule Networks on Modern GPUs	9
RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs	8
A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System	8
OASIS: Outlier-Aware KV Cache Clustering for Scaling LLM Inference in CXL Memory Systems	8
In-Memory Versioning (IMV)	8
Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference	7
Exploring the DIMM PIM Architecture for Accelerating Time Series Analysis	7
Security Helper Chiplets: A New Paradigm for Secure Hardware Monitoring	7
A Case for In-Memory Random Scatter-Gather for Fast Graph Processing	7
Accelerating Deep Reinforcement Learning via Phase-Level Parallelism for Robotics Applications	6
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture	6
Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping	6
PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM	6

LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads	5
DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity	5
NoHammer: Preventing Row Hammer With Last-Level Cache Management	5
Managing Prefetchers With Deep Reinforcement Learning	5
pNet-gem5: Full-System Simulation With High-Performance Networking Enabled by Parallel Network Packet Processing	5
SparseLeakyNets: Classification Prediction Attack Over Sparsity-Aware Embedded Neural Networks Using Timing Side-Channel Information	5
High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network	5
Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains	4
A Flexible Hybrid Interconnection Design for High-Performance and Energy-Efficient Chiplet-Based Systems	4
Memory-Centric MCM-GPU Architecture	4
ZoneBuffer: An Efficient Buffer Management Scheme for ZNS SSDs	4
Adaptive Web Browsing on Mobile Heterogeneous Multi-cores	4
SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency	4
PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks	4
Primate: A Framework to Automatically Generate Soft Processors for Network Applications	4
Exploring Volatile FPGAs Potential for Accelerating Energy-Harvesting IoT Applications	3
T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving	3
A Quantum Computer Trusted Execution Environment	3
Guard Cache: Creating Noisy Side-Channels	3
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems	3
Architectural Implications of GNN Aggregation Programming Abstractions	3
Accelerators & Security: The Socket Approach	3
SSE: Security Service Engines to Accelerate Enclave Performance in Secure Multicore Processors	3
Camulator: a Lightweight and Extensible Trace-Driven Cache Simulator for Embedded Multicore SoCs	3
Direct-Coding DNA With Multilevel Parallelism	3
Fast Performance Prediction for Efficient Distributed DNN Training	3
DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching	2
Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack	2
Energy-Efficient Bayesian Inference Using Bitstream Computing	2
Minimal Counters, Maximum Insight: Simplifying System Performance With HPC Clusters for Optimized Monitoring	2
A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key-Value Stores	2
EgDiff: An Enhanced Global Load Value Predictor	2
PINSim: A Processing In- and Near-Sensor Simulator to Model Intelligent Vision Sensors	2
A First-Order Model to Assess Computer Architecture Sustainability	2
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems	2
FullPack: Full Vector Utilization for Sub-Byte Quantized Matrix-Vector Multiplication on General Purpose CPUs	2
Accelerating Page Migrations in Operating Systems With Intel DSA	2
Analyzing and Exploiting Memory Hierarchy Parallelism With MLP Stacks	2
SEMS: Scalable Embedding Memory System for Accelerating Embedding-Based DNNs	2
Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations	2
gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation	2
Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads	2
IntervalSim++: Enhanced Interval Simulation for Unbalanced Processor Designs	2
R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead	2
HINT: A Hardware Platform for Intra-host NIC Traffic and SmartNIC Emulation	1
Redundant Array of Independent Memory Devices	1
Exploiting Direct Memory Operands in GPU Instructions	1
A Data Prefetcher-Based 1000-Core RISC-V Processor for Efficient Processing of Graph Neural Networks	1
Tulip: Turn-Free Low-Power Network-on-Chip	1
LSim: Fine-Grained Simulation Framework for Large-Scale Performance Evaluation	1

Characterizing and Understanding Distributed GNN Training on GPUs	1
Halis: A Hardware-Software Co-designed Near-Cache Accelerator for Graph Pattern Mining	1
MajorK: Majority Based kmer Matching in Commodity DRAM	1
On Internally-Tagged Instruction Set Architectures	1
Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs	1
Approximate Multiplier Design With LFSR-Based Stochastic Sequence Generators for Edge AI	1
Pyramid: Accelerating LLM Inference With Cross-Level Processing-in-Memory	1
An Intermediate Language for General Sparse Format Customization	1
Exploiting Intel AMX Power Gating	1
Electra: Eliminating the Ineffectual Computations on Bitmap Compressed Matrices	1
eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models	1
MOST: Memory Oversubscription-Aware Scheduling for Tensor Migration on GPU Unified Storage	1
A Pre-Silicon Approach to Discovering Microarchitectural Vulnerabilities in Security Critical Applications	1
A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models	1
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications	1
Architectural Security Regulation	1
Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture	1
TeleVM: A Lightweight Virtual Machine for RISC-V Architecture	1
Enhancing DNN Training Efficiency Via Dynamic Asymmetric Architecture	1
MQSim-E: An Enterprise SSD Simulator	1
Characterizing and Understanding End-to-End Multi-Modal Neural Networks on GPUs	1
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models	1
Cost-Effective Extension of DRAM-PIM for Group-Wise LLM Quantization	1
Amethyst: Reducing Data Center Emissions With Dynamic Autotuning and VM Management	1
Characterizing and Understanding HGNNs on GPUs	1
Characterization and Analysis of the 3D Gaussian Splatting Rendering Pipeline	1
X-PPR: Post Package Repair for CXL Memory	1
Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures	1
Hardware Trojan Threats to Cache Coherence in Modern 2.5D Chiplet Systems	0
Cooperative Memory Deduplication With Intel Data Streaming Accelerator	0
SmartQuant: CXL-Based AI Model Store in Support of Runtime Configurable Weight Quantization	0
Approximate SFQ-based Computing Architecture Modeling with Device-level Guidelines	0
Octopus: A Cycle-Accurate Cache System Simulator	0
LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks	0
Lightweight Hardware Implementation of Binary Ring-LWE PQC Accelerator	0
A Case for Hardware Memoization in Server CPUs	0
Ensuring Data Confidentiality in eADR-Based NVM Systems	0
JANM-IK: Jacobian Argumented Nelder-Mead Algorithm for Inverse Kinematics and its Hardware Acceleration	0
Multi-Prediction Compression: An Efficient and Scalable Memory Compression Framework for GP-GPU	0
On Variable Strength Quantum ECC	0
Cycle-Oriented Dynamic Approximation: Architectural Framework to Meet Performance Requirements	0
LT-PIM: An LUT-Based Processing-in-DRAM Architecture With RowHammer Self-Tracking	0
GPU-Centric Memory Tiering for LLM Serving With NVIDIA Grace Hopper Superchip	0
XLA-NDP: Efficient Scheduling and Code Generation for Deep Learning Model Training on Near-Data Processing Memory	0
DVFaaS: Leveraging DVFS for FaaS Workflows	0
Towards Improved Power Management in Cloud GPUs	0
Cache and Near-Data Co-Design for Chiplets	0
UDIR: Towards a Unified Compiler Framework for Reconfigurable Dataflow Architectures	0
Stride Equality Prediction for Value Speculation	0
SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads	0
DPWatch: A Framework for Hardware-Based Differential Privacy Guarantees	0
SPGPU: Spatially Programmed GPU	0
LINAC: A Spatially Linear Accelerator for Convolutional Neural Networks	0
A Model for Scalable and Balanced Accelerators for Graph Processing	0
Characterizing Machine Learning-Based Runtime Prefetcher Selection	0
Exploring the Latency Sensitivity of Cache Replacement Policies	0
Analysis of Data Transfer Bottlenecks in Commercial PIM Systems: A Study With UPMEM-PIM	0
By-Software Branch Prediction in Loops	0
Achieving Forward Progress Guarantee in Small Hardware Transactions	0
Efficient Implementation of Knuth Yao Sampler on Reconfigurable Hardware	0
HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization	0
Design of a High-Performance, High-Endurance Key-Value SSD for Large-Key Workloads	0
Baobab Merkle Tree for Efficient Secure Memory	0
X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands	0
Containerized In-Storage Processing Model and Hardware Acceleration for Fully-Flexible Computational SSDs	0
Toward Scalable RDMA Through Resource Prefetching	0
Estimating CPI Stacks From Multiplexed Performance Counter Data Using Machine Learning	0
A DSP-Based Precision-Scalable MAC With Hybrid Dataflow for Arbitrary-Basis-Quantization CNN Accelerator	0
ONNXim: A Fast, Cycle-Level Multi-Core NPU Simulator	0
LMT: Accurate and Resource-Scalable Slowdown Prediction	0
Thor: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX	0
RoPIM: A Processing-in-Memory Architecture for Accelerating Rotary Positional Embedding in Transformer Models	0
Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator	0
Kobold: Simplified Cache Coherence for Cache-Attached Accelerators	0
Data-Pattern-Driven LUT for Efficient In-Cache Computing in CNNs Acceleration	0
TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA	0
Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata Management	0
Infinity Stream: Enabling Transparent and Automated In-Memory Computing	0
ADT: Aggressive Demotion and Promotion for Tiered Memory	0
In-Memory Computing Accelerator for Iterative Linear Algebra Solvers	0

Intelligent SSD Firmware for Zero-Overhead Journaling	0
Characterizing and Understanding Defense Methods for GNNs on GPUs	0
An Area Efficient Architecture of a Novel Chaotic System for High Randomness Security in e-Health	0
SPAM: Streamlined Prefetcher-Aware Multi-Threaded Cache Covert-Channel Attack	0
LTE: Lightweight and Time-Efficient Hardware Encoder for Post-Quantum Scheme HQC	0
DRAMA: Commodity DRAM Based Content Addressable Memory	0
Correct Wrong Path	0
Towards an Accelerator for Differential and Algebraic Equations Useful to Scientists	0
DNA Pre-Alignment Filter Using Processing Near Racetrack Memory	0
MPU-Sim: A Simulator for In-DRAM Near-Bank Processing Architectures	0
Advancing Compilation of DNNs for FPGAs Using Operation Set Architectures	0
HPN-SpGEMM: Hybrid PIM-NMP for SpGEMM	0
Fold-PIM: A Cost-Efficient LPDDR5-Based PIM for On-Device SLMs	0
Open-Source Hardware Memory Protection Engine Integrated With NVMM Simulator	0
GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance	0
srNAND: A Novel NAND Flash Organization for Enhanced Small Read Throughput in SSDs	0
GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing	0
SAFE: Sharing-Aware Prefetching for Efficient GPU Memory Management With Unified Virtual Memory	0
Efficient Memory Layout for Pre-Alignment Filtering of Long DNA Reads Using Racetrack Memory	0
SmartIndex: Learning to Index Caches to Improve Performance	0
GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks	0
Hardware Accelerated Reusable Merkle Tree Generation for Bitcoin Blockchain Headers	0
DynaFlow: An ML Framework for Dynamic Dataflow Selection in SpGEMM Accelerators	0
Cosmos: A CXL-Based Full In-Memory System for Approximate Nearest Neighbor Search	0
Architecting Compatible PIM Protocol for CPU-PIM Collaboration	0
The Importance of Generalizability in Machine Learning for Systems	0
SLO-Aware GPU DVFS for Energy-Efficient LLM Inference Serving	0
The Jaseci Programming Paradigm and Runtime Stack: Building Scale-Out Production Applications Easy and Fast	0
MixDiT: Accelerating Image Diffusion Transformer Inference With Mixed-Precision MX Quantization	0
Hardware-Accelerated Kernel-Space Memory Compression Using Intel QAT	0
Hardware-Assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity	0
Accelerating Vector Permutation Instruction Execution via Controllable Bitonic Network	0
Canal: A Flexible Interconnect Generator for Coarse-Grained Reconfigurable Arrays	0
Accelerating Graph Processing With Lightweight Learning-Based Data Reordering	0
Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator	0
Optically Connected Multi-Stack HBM Modules for Large Language Model Training and Inference	0
Structured Combinators for Efficient Graph Reduction	0
The Mirage of Breaking MIRAGE: Analyzing the Modeling Pitfalls in Emerging “Attacks” on MIRAGE	0
Runtime Support for Accelerating CNN Models on Digital DRAM Processing-in-Memory Hardware	0
Hashing ATD Tags for Low-Overhead Safe Contention Monitoring	0
Accelerating Control Flow on CGRAs via Speculative Iteration Execution	0
Dramaton: A Near-DRAM Accelerator for Large Number Theoretic Transforms	0
Hardware-Implemented Lightweight Accelerator for Large Integer Polynomial Multiplication	0
Pulley: An Algorithm/Hardware Co-Optimization for In-Memory Sorting	0
Stardust: Scalable and Transferable Workload Mapping for Large AI on Multi-Chiplet Systems	0
Comprehensive Design Space Exploration for Graph Neural Network Aggregation on GPUs	0
L-DTC: Load-based Dynamic Throughput Control for Guaranteed I/O Performance in Virtualized Environments	0
A Quantitative Analysis of State Space Model-Based Large Language Model: Study of Hungry Hungry Hippos	0
Empirical Architectural Analysis on Performance Scalability of Petascale All-Flash Storage Systems	0
OpenMDS: An Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs	0
WoperTM: Got Nacks? Use Them!	0
Editorial: A Letter From the Editor-in-Chief of IEEE Computer Architecture Letters	0
Exploiting Intrinsic Redundancies in Dynamic Graph Neural Networks for Processing Efficiency	0
Inter-Temperature Bandwidth Reduction in Cryogenic QAOA Machines	0
RoSR: A Novel Selective Retransmission FPGA Architecture for RDMA NICs	0
Simulating Our Way to Safer Software: A Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling	0
Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training	0
Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD	0
Dynamic Optimization of On-Chip Memories for HLS Targeting Many-Accelerator Platforms	0
Revisiting Browser Performance Benchmarking From an Architectural Perspective	0
Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive Prefetching	0
CoreNap: Energy Efficient Core Allocation for Latency-Critical Workloads	0
Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories	0
2024 Reviewers List	0
Quantum Assertion Scheme for Assuring Qudit Robustness	0
Segin: Synergistically Enabling Fine-Grained Multi-Tenant and Resource Optimized SpMV	0