OOIR: Observatory of International Research

Papers

(The TQCC of IEEE Computer Architecture Letters is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-11-01 to 2025-11-01.)

Article	Citations
Old is Gold: Optimizing Single-Threaded Applications With ExGen-Malloc	53
The Architectural Sustainability Indicator	33
Speculative Multi-Level Access in LSM Tree-Based KV Store	18
Accelerating Programmable Bootstrapping Targeting Contemporary GPU Microarchitecture	17
Toward Practical 128-Bit General Purpose Microarchitectures	16
Characterization and Analysis of Text-to-Image Diffusion Models	15
In-depth Characterization of Machine Learning on an Optimized Multi-party Computing Library	13
A Characterization of Generative Recommendation Models: Study of Hierarchical Sequential Transduction Unit	13
SCALES: SCALable and Area-Efficient Systolic Accelerator for Ternary Polynomial Multiplication	13
Time Series Machine Learning Models for Precise SSD Access Latency Prediction	13
A Quantitative Analysis of Mamba-2-Based Large Language Model: Study of State Space Duality	12
Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure	12
Context-Aware Set Dueling for Dynamic Policy Arbitration	12
2021 Index IEEE Computer Architecture Letters Vol. 20	10
MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference	10
AiDE: Attention-FFN Disaggregated Execution for Cost-Effective LLM Decoding on CXL-PNM	10
Straw: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs	9
Improving Energy-Efficiency of Capsule Networks on Modern GPUs	9
OASIS: Outlier-Aware KV Cache Clustering for Scaling LLM Inference in CXL Memory Systems	9
SoCurity: A Design Approach for Enhancing SoC Security	9
Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference	8
Exploring the DIMM PIM Architecture for Accelerating Time Series Analysis	8
In-Memory Versioning (IMV)	7
RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs	7
A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System	7

StreamDQ: HBM-integrated On-the-fly DeQuantization via Memory Load for Large Language Models	6
A Case for In-Memory Random Scatter-Gather for Fast Graph Processing	6
Security Helper Chiplets: A New Paradigm for Secure Hardware Monitoring	6
REDIT: Redirection-Enabled Memory-Side Directory Architecture for CXL Memory Fabric	6
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture	6
pNet-gem5: Full-System Simulation With High-Performance Networking Enabled by Parallel Network Packet Processing	5
PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM	5
Improving Performance on Tiered Memory With Semantic Data Placement	5
NoHammer: Preventing Row Hammer With Last-Level Cache Management	5
Accelerating Deep Reinforcement Learning via Phase-Level Parallelism for Robotics Applications	5
Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping	5
Thread-Adaptive: High-Throughput Parallel Architectures of SLH-DSA on GPUs	5
DeMM: A Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity	4
SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency	4
Managing Prefetchers With Deep Reinforcement Learning	4
Primate: A Framework to Automatically Generate Soft Processors for Network Applications	4
High-Performance Winograd Based Accelerator Architecture for Convolutional Neural Network	4
Memory-Centric MCM-GPU Architecture	4
PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks	4
LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads	4
Efficient Deadlock Avoidance by Considering Stalling, Message Dependencies, and Topology	4
RAESC: A Reconfigurable AES Countermeasure Architecture for RISC-V With Enhanced Power Side-Channel Resilience	4
SparseLeakyNets: Classification Prediction Attack Over Sparsity-Aware Embedded Neural Networks Using Timing Side-Channel Information	4
Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains	4
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems	3
Fast Performance Prediction for Efficient Distributed DNN Training	3
Camulator: A Lightweight and Extensible Trace-Driven Cache Simulator for Embedded Multicore SoCs	3
Accelerators & Security: The Socket Approach	3
A Flexible Hybrid Interconnection Design for High-Performance and Energy-Efficient Chiplet-Based Systems	3
SSE: Security Service Engines to Accelerate Enclave Performance in Secure Multicore Processors	3
Exploring Volatile FPGAs Potential for Accelerating Energy-Harvesting IoT Applications	3
Guard Cache: Creating Noisy Side-Channels	3
ZoneBuffer: An Efficient Buffer Management Scheme for ZNS SSDs	3
Adaptive Web Browsing on Mobile Heterogeneous Multi-cores	3
A Quantum Computer Trusted Execution Environment	3
Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures	2
A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key-Value Stores	2
FullPack: Full Vector Utilization for Sub-Byte Quantized Matrix-Vector Multiplication on General Purpose CPUs	2
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems	2
EgDiff: An Enhanced Global Load Value Predictor	2
PINSim: A Processing In- and Near-Sensor Simulator to Model Intelligent Vision Sensors	2
T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving	2
A First-Order Model to Assess Computer Architecture Sustainability	2
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications	2
Architectural Implications of GNN Aggregation Programming Abstractions	2
Redundant Array of Independent Memory Devices	2
IntervalSim++: Enhanced Interval Simulation for Unbalanced Processor Designs	2
Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads	2
Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack	2
SEMS: Scalable Embedding Memory System for Accelerating Embedding-Based DNNs	2

gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation	2
On Internally Tagged Instruction Set Architectures	2
Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture	2
Enhancing DNN Training Efficiency Via Dynamic Asymmetric Architecture	2
Minimal Counters, Maximum Insight: Simplifying System Performance With HPC Clusters for Optimized Monitoring	2
Analyzing and Exploiting Memory Hierarchy Parallelism With MLP Stacks	2
DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching	2
Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations	2
CABANA : Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS With Intel AMX	2
Direct-Coding DNA With Multilevel Parallelism	2
Energy-Efficient Bayesian Inference Using Bitstream Computing	2
Accelerating Page Migrations in Operating Systems With Intel DSA	2