(The TQCC of IEEE Micro is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-04-01 to 2024-04-01.)
NVIDIA A100 Tensor Core GPU: Performance and Innovation121
Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs118
MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings76
FerroElectronics for Edge Intelligence46
PEFL: Deep Privacy-Encoding-Based Federated Learning Framework for Smart Agriculture45
The Design Process for Google's Training Chips: TPUv2 and TPUv345
Accelerating Genome Analysis: A Primer on an Ongoing Journey42
BlackParrot: An Agile Open-Source RISC-V Multicore for Accelerator SoCs37
FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications36
Accelerating Chip Design With Machine Learning31
PyMTL3: A Python Framework for Open-Source Hardware Modeling, Generation, Simulation, and Verification30
MHADBOR: AI-Enabled Administrative-Distance-Based Opportunistic Load Balancing Scheme for an Agriculture Internet of Things Network29
Chasing Carbon: The Elusive Environmental Footprint of Computing29
A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC28
Kunpeng 920: The First 7-nm Chiplet-Based 64-Core ARM SoC for Cloud Services28
Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM25
Manticore: A 4096-Core RISC-V Chiplet Architecture for Ultraefficient Floating-Point Computing25
OpenFPGA: An Open-Source Framework for Agile Prototyping Customizable FPGAs24
SymbiFlow and VPR: An Open-Source Design Flow for Commercial and Novel FPGAs23
Evolution of the Graphics Processing Unit (GPU)21
ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks21
The Path to Successful Wafer-Scale Integration: The Cerebras Story19
Klessydra-T: Designing Vector Coprocessors for Multithreaded Edge-Computing Cores19
Circuits and Architectures for In-Memory Computing-Based Machine Learning Accelerators18
Quantum Computers for High-Performance Computing18
Extending the Frontier of Quantum Computers With Qutrits17
Superconductor Computing for Neural Networks17
TSA-NoC: Learning-Based Threat Detection and Mitigation for Secure Network-on-Chip Architecture17
Intel Alder Lake CPU Architectures16
ML-HW Co-Design of Noise-Robust TinyML Models and Always-On Analog Compute-in-Memory Edge Accelerator15
PCI Express 6.0 Specification: A Low-Latency, High-Bandwidth, High-Reliability, and Cost-Effective Interconnect With 64.0 GT/s PAM-4 Signaling14
Quantum Computing—From NISQ to PISQ14
IBM's POWER10 Processor14
Artificial Intelligence Best Practices in Smart Agriculture14
FPGA-Accelerated Quantum Computing Emulation and Quantum Key Distillation13
Agile Hardware Development and Instrumentation With PyRTL13
Generating Systolic Array Accelerators With Reusable Blocks13
CHIPKIT: An Agile, Reusable Open-Source Framework for Rapid Test Chip Development13
Data Centers on Wheels: Emissions From Computing Onboard Autonomous Vehicles13
Challenges and Opportunities for Autonomous Micro-UAVs in Precision Agriculture13
On-Demand Mobile CPU Cooling With Thin-Film Thermoelectric Array12
Accelerator Integration for Open-Source SoC Design11
An Open Inter-Chiplet Communication Link: Bunch of Wires (BoW)11
Interconnects for DNA, Quantum, In-Memory, and Optical Computing: Insights From a Panel Discussion11
Temporal Computing With Superconductors11
Co-Design and System for the Supercomputer “Fugaku”10
AIDA: Associative In-Memory Deep Learning Accelerator10
NVIDIA Hopper H100 GPU: Scaling Performance9
Architecting Noisy Intermediate-Scale Quantum Computers: A Real-System Study9
Evaluating Sensor Data Quality in Internet of Things Smart Agriculture Applications9
OpenPiton at 5: A Nexus for Open and Agile Hardware Design9
Compute Substrate for Software 2.09
Quantum Codesign9
Cost-Effective and Flexible Asynchronous Interconnect Technology for GALS Systems9
Aquabolt-XL HBM2-PIM, LPDDR5-PIM With In-Memory Processing, and AXDIMM With Acceleration Buffer8
ECIM: Exponent Computing in Memory for an Energy-Efficient Heterogeneous Floating-Point DNN Training Processor8
A Taxonomy of ML for Systems Problems8
Configurable Network Protocol Accelerator (COPA)8
A Programmable Approach to Neural Network Compression8
The AMD Next-Generation “Zen 3” Core8
Accelerating Deep Learning Using Interconnect-Aware UCX Communication for MPI Collectives7
Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms7
On Double Full-Stack Communication-Enabled Architectures for Multicore Quantum Computers7
Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data7
A Next-Generation Cryogenic Processor Architecture7
Unveiling the Hardware and Software Implications of Microservices in Cloud and Edge Systems7
UAV–Assisted Joint Wireless Power Transfer and Data Collection Mechanism for Sustainable Precision Agriculture in 5G7
A Case for Accelerating Software RTL Simulation7
Rome to Milan, AMD Continues Its Tour of Italy6
Accelerating ML Recommendation With Over 1,000 RISC-V/Tensor Processors on Esperanto's ET-SoC-1 Chip6
System on a Package Innovations With Universal Chiplet Interconnect Express (UCIe) Interconnect6
LiveHD: A Productive Live Hardware Development Flow6
Power Side-Channel Attacks in Negative Capacitance Transistor6
MicroScope: Enabling Microarchitectural Replay Attacks6
A Single-Shot Generalized Device Placement for Large Dataflow Graphs6
Neuromorphic Near-Sensor Computing: From Event-Based Sensing to Edge Learning6
Bridging Python to Silicon: The SODA Toolchain6
High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA6
Temperature-Resilient RRAM-Based In-Memory Computing for DNN Inference6
Performance Left on the Table: An Evaluation of Compiler Autovectorization for RISC-V6
History of IBM Z Mainframe Processors5
A Low-Latency and Low-Power Approach for Coherency and Memory Protocols on PCI Express 6.0 PHY at 64.0 GT/s With PAM-4 Signaling5
Balancing Specialized Versus Flexible Computation in Brain–Computer Interfaces5
AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers5
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale5
The Vision Behind MLPerf: Understanding AI Inference Performance5
Memory Pooling With CXL5
Meet the FaM1ly5
Hertzbleed: Turning Power Side-Channel Attacks Into Remote Timing Attacks on x865
Hidden Potential Within Video Game Consoles5
ILLIXR: An Open Testbed to Enable Extended Reality Systems Research5
Democratizing Data-Driven Agriculture Using Affordable Hardware5
The Open Domain-Specific Architecture5
uGEMM: Unary Computing for GEMM Applications5
Advances in Microprocessor Cache Architectures Over the Last 25 Years5
PurpleDrop: A Digital Microfluidics-Based Platform for Hybrid Molecular-Electronics Applications5
Cerebras Architecture Deep Dive: First Look Inside the Hardware/Software Co-Design for Deep Learning5
Agile and Open-Source Hardware4
Compiling for the IBM Matrix Engine for Enterprise Workloads4
Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud4
Soil Fertility Monitoring With Internet of Underground Things: A Survey4
RadioML Meets FINN: Enabling Future RF Applications With FPGA Streaming Architectures4
History of Microcontrollers: First 50 Years4
The Arm Morello Evaluation Platform—Validating CHERI-Based Security in a High-Performance System4
Marvell ThunderX3: Next-Generation Arm-Based Server Processor4
Energy-Efficient Video Processing for Virtual Reality4
Emerging Technologies for Quantum Computing4
ExHero: Execution History-Aware Error-Rate Estimation in Pipelined Designs4
Accessible, FPGA Resource-Optimized Simulation of Multiclock Systems in FireSim4
Three-Dimensional Stacked Neural Network Accelerator Architectures for AR/VR Applications4
Accelerating Allreduce With In-Network Reduction on Intel PIUMA4
Countering Load-to-Use Stalls in the NVIDIA Turing GPU3
Enhancing Model Parallelism in Neural Architecture Search for Multidevice System3
Failure Tolerant Training With Persistent Memory Disaggregation Over CXL3
Compute Express Link (CXL): Enabling Heterogeneous Data-Centric Computing With Heterogeneous Memory Hierarchy3
Architectural CO2 Footprint Tool: Designing Sustainable Computer Systems With an Architectural Carbon Modeling Tool3
Artificial-Intelligence-Enhanced Ultrasound Flow Imaging at the Edge3
Accelerating Genomic Data Analytics With Composable Hardware Acceleration Framework3
CXL-Enabled Enhanced Memory Functions3
TinyIREE: An ML Execution Environment for Embedded Systems From Compilation to Deployment3
LSFQ: A Low-Bit Full Integer Quantization for High-Performance FPGA-Based CNN Acceleration3
Efficient Language-Guided Reinforcement Learning for Resource-Constrained Autonomous Systems3
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs3
Unifying Spatial Accelerator Compilation With Idiomatic and Modular Transformations3
FPGA Computing3
Tydi: An Open Specification for Complex Data Structures Over Hardware Streams3
Shortages of Integrated Circuits3
The Apollo Guidance Computer3
Universal Graph-Based Scheduling for Quantum Systems3
The AMD 400-G Adaptive SmartNIC System on Chip: A Technology Preview2
Microprocessor Advances and the Mainframe Legacy2
Hardware Specialization: From Cell to Heterogeneous Microprocessors Everywhere2
LastLayer: Toward Hardware and Software Continuous Integration2
Photonic Network-on-Wafer for Multichiplet GPUs2
Machine Learning for Systems2
Distributed Deep Learning With GPU-FPGA Heterogeneous Computing2
A Mobile DNN Training Processor With Automatic Bit Precision Search and Fine-Grained Sparsity Exploitation2
Virtual Logical Qubits: A Compact Architecture for Fault-Tolerant Quantum Computing2
Quantum Computing and the Design of the Ultimate Accelerator2
Towards General-Purpose Acceleration: Finding Structure in Irregularity2
Creating Foundations for Secure Microarchitectures With Data-Oblivious ISA Extensions2
HPVM: Hardware-Agnostic Programming for Heterogeneous Parallel Systems2
Overclocking in Immersion-Cooled Datacenters2
A Parallel and Updatable Architecture for FPGA-Based Packet Classification With Large-Scale Rule Sets2
Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part V: References2
Systematically Understanding Graph Accelerator Dimensions and the Value of Hardware Flexibility2
SpecHLS: Speculative Accelerator Design Using High-Level Synthesis2
The Origin of Intel's Micro-Ops2
Kaya for Computer Architects: Toward Sustainable Computer Systems2
Accelerating Phylogenetics Using FPGAs in the Cloud2
Understanding Acceleration Opportunities at Hyperscale2
HALO: A Hardware–Software Co-Designed Processor for Brain–Computer Interfaces2
PCs Take a Page From Xbox With Pluton2
Pensando Distributed Services Architecture2
ACCL: Architecting Highly Scalable Distributed Training Systems With Highly Efficient Collective Communication Library2
Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs2
A Binary Translation Framework for Automated Hardware Generation2
Compiling for Vector Extensions With Stream-Based Specialization2
Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite2
Data Movement Accelerator Engines on a Prototype Power10 Processor1
BabelFish: Fusing Address Translations for Containers1
Increasing Throughput of In-Memory DNN Accelerators by Flexible Layerwise DNN Approximation1
Exploring Memory-Oriented Design Optimization of Edge AI Hardware for Extended Reality Applications1
A Compressed Spiking Neural Network Onto a Memcapacitive In-Memory Computing Array1
Biology and Systems Interactions1
A Hardware/Software Co-Design Vision for Deep Learning at the Edge1
The 50 Year History of the Microprocessor as Five Technology Eras1
IEEE Computer Society: Volunteer Service Awards1
Toward Developing High-Performance RISC-V Processors Using Agile Methodology1
Characterizing and Mitigating Soft Errors in GPU DRAM1
Leaking Secrets Through Compressed Caches1
Warehouse-Scale Video Acceleration1
Interactions, Impacts, and Coincidences of the First Golden Age of Computer Architecture1
The Microarchitecture of DOJO, Tesla’s Exa-Scale Computer1
On-Device Tiny Machine Learning for Anomaly Detection Based on the Extreme Values Theory1
Navigating the Seismic Shift of Post-Moore Computer Systems Design1
DVL-Lossy: Isolating Congesting Flows to Optimize Packet Dropping in Lossy Data-Center Networks1
Dynamic Capacity Service for Improving CXL Pooled Memory Efficiency1
Reliable and Time-Efficient Virtualized Function Placement1
A Brief History of Warehouse-Scale Computing1
Special Issue on In-Memory Computing1
Retargetable Optimizing Compilers for Quantum Accelerators via a Multilevel Intermediate Representation1
Characterizing and Modeling Nonvolatile Memory Systems1
Combining Multiple tinyML Models for Multimodal Context-Aware Stress Recognition on Constrained Microcontrollers1
Vector Runahead for Indirect Memory Accesses1
Early History of Texas Instrument's Digital Signal Processor1
SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander1
The Economics of Confrontational Conversation1
A 10.7-µJ/Frame 88% Accuracy CIFAR-10 Single-Chip Neuromorphic Field-Programmable Gate Array Processor Featuring Various Nonlinear Functions of Dendrites in the Human Cerebrum1
Pod-racing: bulk-bitwise to floating-point compute in racetrack memory for machine learning at the edge1
The Fox and Shepherd Problem1
Online Code Layout Optimizations via OCOLOS1
TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology1
Special Issue on Artificial Intelligence at the Edge1
Varifocal Storage: Dynamic Multiresolution Data Storage1
Z80—The 1970s Microprocessor Still Alive1
Special Issue on Hot Interconnects1
Economic Dependencies in Integrated Circuits1
Sustainable AI Processing at the Edge1
speedAI240: A 2-Petaflop, 30-Teraflops/W At-Memory Inference Acceleration Device With 1456 RISC-V Cores1
I-DVFS: Instantaneous Frequency Switch During Dynamic Voltage and Frequency Scaling1
XCRYPT: Accelerating Lattice-Based Cryptography With Memristor Crossbar Arrays1
The Xbox Series X System Architecture1
Special Issue on Artificial Intelligence, Edge, and Internet of Things for Smart Agriculture1