IEEE Micro

Papers
(The TQCC of IEEE Micro is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
NVIDIA A100 Tensor Core GPU: Performance and Innovation160
FerroElectronics for Edge Intelligence60
The Design Process for Google's Training Chips: TPUv2 and TPUv357
PEFL: Deep Privacy-Encoding-Based Federated Learning Framework for Smart Agriculture52
Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM48
FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications40
Kunpeng 920: The First 7-nm Chiplet-Based 64-Core ARM SoC for Cloud Services38
Chasing Carbon: The Elusive Environmental Footprint of Computing36
Evolution of the Graphics Processing Unit (GPU)35
Accelerating Chip Design With Machine Learning35
A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC33
Manticore: A 4096-Core RISC-V Chiplet Architecture for Ultraefficient Floating-Point Computing31
Quantum Computers for High-Performance Computing29
MHADBOR: AI-Enabled Administrative-Distance-Based Opportunistic Load Balancing Scheme for an Agriculture Internet of Things Network29
Circuits and Architectures for In-Memory Computing-Based Machine Learning Accelerators26
Intel Alder Lake CPU Architectures25
The Path to Successful Wafer-Scale Integration: The Cerebras Story23
Superconductor Computing for Neural Networks23
Klessydra-T: Designing Vector Coprocessors for Multithreaded Edge-Computing Cores22
Data Centers on Wheels: Emissions From Computing Onboard Autonomous Vehicles22
NVIDIA Hopper H100 GPU: Scaling Performance21
Quantum Computing—From NISQ to PISQ21
Artificial Intelligence Best Practices in Smart Agriculture20
PCI Express 6.0 Specification: A Low-Latency, High-Bandwidth, High-Reliability, and Cost-Effective Interconnect With 64.0 GT/s PAM-4 Signaling18
Challenges and Opportunities for Autonomous Micro-UAVs in Precision Agriculture18
ML-HW Co-Design of Noise-Robust TinyML Models and Always-On Analog Compute-in-Memory Edge Accelerator17
On-Demand Mobile CPU Cooling With Thin-Film Thermoelectric Array16
IBM's POWER10 Processor16
Interconnects for DNA, Quantum, In-Memory, and Optical Computing: Insights From a Panel Discussion15
Memory Pooling With CXL15
Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms14
The AMD Next-Generation “Zen 3” Core14
Aquabolt-XL HBM2-PIM, LPDDR5-PIM With In-Memory Processing, and AXDIMM With Acceleration Buffer14
FPGA-Accelerated Quantum Computing Emulation and Quantum Key Distillation14
Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud14
Accelerator Integration for Open-Source SoC Design13
Co-Design and System for the Supercomputer “Fugaku”13
Quantum Codesign12
An Open Inter-Chiplet Communication Link: Bunch of Wires (BoW)12
Compute Substrate for Software 2.012
Bridging Python to Silicon: The SODA Toolchain11
Evaluating Sensor Data Quality in Internet of Things Smart Agriculture Applications11
A Next-Generation Cryogenic Processor Architecture11
UAV–Assisted Joint Wireless Power Transfer and Data Collection Mechanism for Sustainable Precision Agriculture in 5G11
AIDA: Associative In-Memory Deep Learning Accelerator11
Temporal Computing With Superconductors11
Hertzbleed: Turning Power Side-Channel Attacks Into Remote Timing Attacks on x8610
On Double Full-Stack Communication-Enabled Architectures for Multicore Quantum Computers10
Cost-Effective and Flexible Asynchronous Interconnect Technology for GALS Systems10
Temperature-Resilient RRAM-Based In-Memory Computing for DNN Inference10
Compute Express Link (CXL): Enabling Heterogeneous Data-Centric Computing With Heterogeneous Memory Hierarchy10
Three-Dimensional Stacked Neural Network Accelerator Architectures for AR/VR Applications10
Configurable Network Protocol Accelerator (COPA)10
Democratizing Data-Driven Agriculture Using Affordable Hardware10
Neuromorphic Near-Sensor Computing: From Event-Based Sensing to Edge Learning9
A Low-Latency and Low-Power Approach for Coherency and Memory Protocols on PCI Express 6.0 PHY at 64.0 GT/s With PAM-4 Signaling9
Rome to Milan, AMD Continues Its Tour of Italy9
System on a Package Innovations With Universal Chiplet Interconnect Express (UCIe) Interconnect9
ECIM: Exponent Computing in Memory for an Energy-Efficient Heterogeneous Floating-Point DNN Training Processor9
Accelerating ML Recommendation With Over 1,000 RISC-V/Tensor Processors on Esperanto's ET-SoC-1 Chip9
ILLIXR: An Open Testbed to Enable Extended Reality Systems Research9
Accelerating Deep Learning Using Interconnect-Aware UCX Communication for MPI Collectives9
Cerebras Architecture Deep Dive: First Look Inside the Hardware/Software Co-Design for Deep Learning8
Performance Left on the Table: An Evaluation of Compiler Autovectorization for RISC-V8
Marvell ThunderX3: Next-Generation Arm-Based Server Processor8
ACCL: Architecting Highly Scalable Distributed Training Systems With Highly Efficient Collective Communication Library7
TinyIREE: An ML Execution Environment for Embedded Systems From Compilation to Deployment7
High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA6
The AMD 400-G Adaptive SmartNIC System on Chip: A Technology Preview6
History of IBM Z Mainframe Processors6
Advances in Microprocessor Cache Architectures Over the Last 25 Years6
Kaya for Computer Architects: Toward Sustainable Computer Systems6
The Arm Morello Evaluation Platform—Validating CHERI-Based Security in a High-Performance System6
Meet the FaM1ly6
Power Side-Channel Attacks in Negative Capacitance Transistor6
AI and Memory Wall6
History of Microcontrollers: First 50 Years6
Overclocking in Immersion-Cooled Datacenters6
CXL-Enabled Enhanced Memory Functions6
Hidden Potential Within Video Game Consoles5
The Vision Behind MLPerf: Understanding AI Inference Performance5
Photonic Network-on-Wafer for Multichiplet GPUs5
RadioML Meets FINN: Enabling Future RF Applications With FPGA Streaming Architectures5
POD-RACING: Bulk-Bitwise to Floating-Point Compute in Racetrack Memory for Machine Learning at the Edge5
Emerging Technologies for Quantum Computing5
The Open Domain-Specific Architecture5
uGEMM: Unary Computing for GEMM Applications5
The Microarchitecture of DOJO, Tesla’s Exa-Scale Computer5
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale5
Accelerating Allreduce With In-Network Reduction on Intel PIUMA5
Balancing Specialized Versus Flexible Computation in Brain–Computer Interfaces5
SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander4
ExHero: Execution History-Aware Error-Rate Estimation in Pipelined Designs4
Universal Graph-Based Scheduling for Quantum Systems4
Accelerating Genomic Data Analytics With Composable Hardware Acceleration Framework4
SpecHLS: Speculative Accelerator Design Using High-Level Synthesis4
Accessible, FPGA Resource-Optimized Simulation of Multiclock Systems in FireSim4
Pensando Distributed Services Architecture4
Soil Fertility Monitoring With Internet of Underground Things: A Survey4
LSFQ: A Low-Bit Full Integer Quantization for High-Performance FPGA-Based CNN Acceleration4
Compiling for the IBM Matrix Engine for Enterprise Workloads4
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs4
Unifying Spatial Accelerator Compilation With Idiomatic and Modular Transformations4
FPGA Computing3
Accelerating Phylogenetics Using FPGAs in the Cloud3
Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite3
HALO: A Hardware–Software Co-Designed Processor for Brain–Computer Interfaces3
ISOBlue Avena: A Framework for Agricultural Edge Computing and Data Sovereignty3
The Apollo Guidance Computer3
Monitoring InfiniBand Networks to React Efficiently to Congestion3
Novel Composable and Scaleout Architectures Using Compute Express Link3
Practical and Scalable ML-Driven Cloud Performance Debugging With Sage3
The Intel Programmable and Integrated Unified Memory Architecture Graph Analytics Processor3
Efficient Language-Guided Reinforcement Learning for Resource-Constrained Autonomous Systems3
Architectural CO2 Footprint Tool: Designing Sustainable Computer Systems With an Architectural Carbon Modeling Tool3
Artificial-Intelligence-Enhanced Ultrasound Flow Imaging at the Edge3
On-Device Customization of Tiny Deep Learning Models for Keyword Spotting With Few Examples3
Failure Tolerant Training With Persistent Memory Disaggregation Over CXL3
Characterizing and Modeling Nonvolatile Memory Systems3
Sustainable AI Processing at the Edge3
Dynamic Capacity Service for Improving CXL Pooled Memory Efficiency3
A Parallel and Updatable Architecture for FPGA-Based Packet Classification With Large-Scale Rule Sets3
Countering Load-to-Use Stalls in the NVIDIA Turing GPU3
Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs3
Distributed Deep Learning With GPU-FPGA Heterogeneous Computing2
Compiling for Vector Extensions With Stream-Based Specialization2
Enterprise-Class Multilevel Cache Design: Low Latency, Huge Capacity, and High Reliability2
A Compressed Spiking Neural Network Onto a Memcapacitive In-Memory Computing Array2
HPVM: Hardware-Agnostic Programming for Heterogeneous Parallel Systems2
Microprocessor Advances and the Mainframe Legacy2
Datacenter-Scale Analysis and Optimization of GPU Machine Learning Workloads2
PCs Take a Page From Xbox With Pluton2
Retargetable Optimizing Compilers for Quantum Accelerators via a Multilevel Intermediate Representation2
A Mobile DNN Training Processor With Automatic Bit Precision Search and Fine-Grained Sparsity Exploitation2
A Binary Translation Framework for Automated Hardware Generation2
The Origin of Intel's Micro-Ops2
TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology2
Characterizing and Mitigating Soft Errors in GPU DRAM2
I-DVFS: Instantaneous Frequency Switch During Dynamic Voltage and Frequency Scaling2
Hardware Specialization: From Cell to Heterogeneous Microprocessors Everywhere2
Remote Work2
Combining Multiple Tiny Machine Learning Models for Multimodal Context-Aware Stress Recognition on Constrained Microcontrollers2
XCRYPT: Accelerating Lattice-Based Cryptography With Memristor Crossbar Arrays2
Systematically Understanding Graph Accelerator Dimensions and the Value of Hardware Flexibility2
Virtual Logical Qubits: A Compact Architecture for Fault-Tolerant Quantum Computing2
Quantum Computing and the Design of the Ultimate Accelerator2
Shortages of Integrated Circuits2
Understanding Acceleration Opportunities at Hyperscale2
PACMAN: Attacking ARM Pointer Authentication With Speculative Execution2
Exploring Memory-Oriented Design Optimization of Edge AI Hardware for Extended Reality Applications2
Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part V: References2
The 50 Year History of the Microprocessor as Five Technology Eras1
On-Device Tiny Machine Learning for Anomaly Detection Based on the Extreme Values Theory1
DVL-Lossy: Isolating Congesting Flows to Optimize Packet Dropping in Lossy Data-Center Networks1
IEEE Computer Society1
Special Issue on Hot Interconnects1
BabelFish: Fusing Address Translations for Containers1
Understanding and Characterizing Side Channels Exploiting Phase-Change Memories1
The Fox and Shepherd Problem1
Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part III: Claims1
Early History of Texas Instrument's Digital Signal Processor1
Z80—The 1970s Microprocessor Still Alive1
The Xbox Series X System Architecture1
Navigating the Seismic Shift of Post-Moore Computer Systems Design1
The Breakthrough Memory Solutions for Improved Performance on LLM Inference1
Special Issue on Artificial Intelligence, Edge, and Internet of Things for Smart Agriculture1
EyeCoD: Eye Tracking System Acceleration via FlatCam-Based Algorithm and Hardware Co-Design1
Improving key-value cache performance with heterogeneous memory tiering: A case study of CXL-based memory expansion1
Economic Dependencies in Integrated Circuits1
Advancing TinyMLOps: Robust Model Updates in the Internet of Intelligent Vehicles1
A Golden-Free Approach to Detect Trojans in COTS Multi-PCB Systems1
Special Issue on In-Memory Computing1
IEEE Computer Society: Volunteer Service Awards1
Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part IX: Patent Families1
Leaking Secrets Through Compressed Caches1
Vector Runahead for Indirect Memory Accesses1
Hardware–Software Co-Design for Real-Time Latency–Accuracy Navigation in Tiny Machine Learning Applications1
Warehouse-Scale Video Acceleration1
Online Code Layout Optimizations via OCOLOS1
Interactions, Impacts, and Coincidences of the First Golden Age of Computer Architecture1
Special Issue on Artificial Intelligence at the Edge1
Addressing the Gap Between Training Data and Deployed Environment by On-Device Learning1
A 10.7-µJ/Frame 88% Accuracy CIFAR-10 Single-Chip Neuromorphic Field-Programmable Gate Array Processor Featuring Various Nonlinear Functions of Dendrites in the Human Cerebrum1
speedAI240: A 2-Petaflop, 30-Teraflops/W At-Memory Inference Acceleration Device With 1456 RISC-V Cores1
Data Movement Accelerator Engines on a Prototype Power10 Processor1
Increasing Throughput of In-Memory DNN Accelerators by Flexible Layerwise DNN Approximation1
A Hardware/Software Co-Design Vision for Deep Learning at the Edge1
Reliable and Time-Efficient Virtualized Function Placement1
Special Issue on Environmentally Sustainable Computing1
Masthead1
Fifty Years of the International Symposium on Computer Architecture: A Data-Driven Retrospective1
Yin-Yang: Programming Abstractions for Cross-Domain Multi-Acceleration1
Making Machine Learning More Energy Efficient by Bringing It Closer to the Sensor1
Adversarial Attacks Against Machine Learning-Based Resource Provisioning Systems1
Enabling Artificial Intelligence Supercomputers With Domain-Specific Networks1
The Economics of Confrontational Conversation1
A Brief History of Warehouse-Scale Computing1
Toward Developing High-Performance RISC-V Processors Using Agile Methodology1
Acceleration of a Classic McEliece Postquantum Cryptosystem With Cache Processing1
0.052338123321533