IEEE Micro

Papers
(The TQCC of IEEE Micro is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-07-01 to 2024-07-01.)
ArticleCitations
NVIDIA A100 Tensor Core GPU: Performance and Innovation142
Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs134
FerroElectronics for Edge Intelligence54
The Design Process for Google's Training Chips: TPUv2 and TPUv352
PEFL: Deep Privacy-Encoding-Based Federated Learning Framework for Smart Agriculture49
Accelerating Genome Analysis: A Primer on an Ongoing Journey45
BlackParrot: An Agile Open-Source RISC-V Multicore for Accelerator SoCs38
FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications38
Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM37
Chasing Carbon: The Elusive Environmental Footprint of Computing33
PyMTL3: A Python Framework for Open-Source Hardware Modeling, Generation, Simulation, and Verification32
Kunpeng 920: The First 7-nm Chiplet-Based 64-Core ARM SoC for Cloud Services32
Accelerating Chip Design With Machine Learning31
OpenFPGA: An Open-Source Framework for Agile Prototyping Customizable FPGAs30
A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC30
MHADBOR: AI-Enabled Administrative-Distance-Based Opportunistic Load Balancing Scheme for an Agriculture Internet of Things Network29
Evolution of the Graphics Processing Unit (GPU)28
Manticore: A 4096-Core RISC-V Chiplet Architecture for Ultraefficient Floating-Point Computing27
SymbiFlow and VPR: An Open-Source Design Flow for Commercial and Novel FPGAs25
Quantum Computers for High-Performance Computing24
ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks24
Intel Alder Lake CPU Architectures21
Superconductor Computing for Neural Networks21
The Path to Successful Wafer-Scale Integration: The Cerebras Story20
Klessydra-T: Designing Vector Coprocessors for Multithreaded Edge-Computing Cores20
Circuits and Architectures for In-Memory Computing-Based Machine Learning Accelerators20
Quantum Computing—From NISQ to PISQ18
TSA-NoC: Learning-Based Threat Detection and Mitigation for Secure Network-on-Chip Architecture17
PCI Express 6.0 Specification: A Low-Latency, High-Bandwidth, High-Reliability, and Cost-Effective Interconnect With 64.0 GT/s PAM-4 Signaling17
Artificial Intelligence Best Practices in Smart Agriculture17
ML-HW Co-Design of Noise-Robust TinyML Models and Always-On Analog Compute-in-Memory Edge Accelerator16
IBM's POWER10 Processor16
NVIDIA Hopper H100 GPU: Scaling Performance16
Data Centers on Wheels: Emissions From Computing Onboard Autonomous Vehicles15
Generating Systolic Array Accelerators With Reusable Blocks15
CHIPKIT: An Agile, Reusable Open-Source Framework for Rapid Test Chip Development14
Challenges and Opportunities for Autonomous Micro-UAVs in Precision Agriculture14
Agile Hardware Development and Instrumentation With PyRTL14
On-Demand Mobile CPU Cooling With Thin-Film Thermoelectric Array13
Co-Design and System for the Supercomputer “Fugaku”13
FPGA-Accelerated Quantum Computing Emulation and Quantum Key Distillation13
Aquabolt-XL HBM2-PIM, LPDDR5-PIM With In-Memory Processing, and AXDIMM With Acceleration Buffer12
An Open Inter-Chiplet Communication Link: Bunch of Wires (BoW)12
Interconnects for DNA, Quantum, In-Memory, and Optical Computing: Insights From a Panel Discussion12
Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms12
Temporal Computing With Superconductors11
The AMD Next-Generation “Zen 3” Core11
Evaluating Sensor Data Quality in Internet of Things Smart Agriculture Applications11
Accelerator Integration for Open-Source SoC Design11
UAV–Assisted Joint Wireless Power Transfer and Data Collection Mechanism for Sustainable Precision Agriculture in 5G10
Cost-Effective and Flexible Asynchronous Interconnect Technology for GALS Systems10
Quantum Codesign10
OpenPiton at 5: A Nexus for Open and Agile Hardware Design10
AIDA: Associative In-Memory Deep Learning Accelerator10
Bridging Python to Silicon: The SODA Toolchain10
Memory Pooling With CXL10
Compute Substrate for Software 2.010
Neuromorphic Near-Sensor Computing: From Event-Based Sensing to Edge Learning9
Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud9
Configurable Network Protocol Accelerator (COPA)9
Accelerating Deep Learning Using Interconnect-Aware UCX Communication for MPI Collectives9
A Next-Generation Cryogenic Processor Architecture9
ECIM: Exponent Computing in Memory for an Energy-Efficient Heterogeneous Floating-Point DNN Training Processor9
Hertzbleed: Turning Power Side-Channel Attacks Into Remote Timing Attacks on x868
A Case for Accelerating Software RTL Simulation8
On Double Full-Stack Communication-Enabled Architectures for Multicore Quantum Computers8
A Taxonomy of ML for Systems Problems8
Temperature-Resilient RRAM-Based In-Memory Computing for DNN Inference8
Democratizing Data-Driven Agriculture Using Affordable Hardware8
A Programmable Approach to Neural Network Compression8
Performance Left on the Table: An Evaluation of Compiler Autovectorization for RISC-V8
A Low-Latency and Low-Power Approach for Coherency and Memory Protocols on PCI Express 6.0 PHY at 64.0 GT/s With PAM-4 Signaling8
ILLIXR: An Open Testbed to Enable Extended Reality Systems Research7
Rome to Milan, AMD Continues Its Tour of Italy7
Three-Dimensional Stacked Neural Network Accelerator Architectures for AR/VR Applications7
System on a Package Innovations With Universal Chiplet Interconnect Express (UCIe) Interconnect7
LiveHD: A Productive Live Hardware Development Flow7
PurpleDrop: A Digital Microfluidics-Based Platform for Hybrid Molecular-Electronics Applications6
TinyIREE: An ML Execution Environment for Embedded Systems From Compilation to Deployment6
Compute Express Link (CXL): Enabling Heterogeneous Data-Centric Computing With Heterogeneous Memory Hierarchy6
History of IBM Z Mainframe Processors6
High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA6
Advances in Microprocessor Cache Architectures Over the Last 25 Years6
Accelerating ML Recommendation With Over 1,000 RISC-V/Tensor Processors on Esperanto's ET-SoC-1 Chip6
Marvell ThunderX3: Next-Generation Arm-Based Server Processor6
Power Side-Channel Attacks in Negative Capacitance Transistor6
Kaya for Computer Architects: Toward Sustainable Computer Systems6
A Single-Shot Generalized Device Placement for Large Dataflow Graphs6
Cerebras Architecture Deep Dive: First Look Inside the Hardware/Software Co-Design for Deep Learning6
uGEMM: Unary Computing for GEMM Applications5
RadioML Meets FINN: Enabling Future RF Applications With FPGA Streaming Architectures5
Agile and Open-Source Hardware5
History of Microcontrollers: First 50 Years5
The Open Domain-Specific Architecture5
Meet the FaM1ly5
Hidden Potential Within Video Game Consoles5
The AMD 400-G Adaptive SmartNIC System on Chip: A Technology Preview5
Overclocking in Immersion-Cooled Datacenters5
Accelerating Allreduce With In-Network Reduction on Intel PIUMA5
Balancing Specialized Versus Flexible Computation in Brain–Computer Interfaces5
Emerging Technologies for Quantum Computing5
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale5
The Vision Behind MLPerf: Understanding AI Inference Performance5
CXL-Enabled Enhanced Memory Functions4
The Arm Morello Evaluation Platform—Validating CHERI-Based Security in a High-Performance System4
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs4
Unifying Spatial Accelerator Compilation With Idiomatic and Modular Transformations4
SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander4
The Microarchitecture of DOJO, Tesla’s Exa-Scale Computer4
Compiling for the IBM Matrix Engine for Enterprise Workloads4
Photonic Network-on-Wafer for Multichiplet GPUs4
SpecHLS: Speculative Accelerator Design Using High-Level Synthesis4
ExHero: Execution History-Aware Error-Rate Estimation in Pipelined Designs4
Accessible, FPGA Resource-Optimized Simulation of Multiclock Systems in FireSim4
Soil Fertility Monitoring With Internet of Underground Things: A Survey4
The Intel Programmable and Integrated Unified Memory Architecture Graph Analytics Processor3
Architectural CO2 Footprint Tool: Designing Sustainable Computer Systems With an Architectural Carbon Modeling Tool3
ACCL: Architecting Highly Scalable Distributed Training Systems With Highly Efficient Collective Communication Library3
Countering Load-to-Use Stalls in the NVIDIA Turing GPU3
Novel Composable and Scaleout Architectures Using Compute Express Link3
Accelerating Phylogenetics Using FPGAs in the Cloud3
LSFQ: A Low-Bit Full Integer Quantization for High-Performance FPGA-Based CNN Acceleration3
HALO: A Hardware–Software Co-Designed Processor for Brain–Computer Interfaces3
A Parallel and Updatable Architecture for FPGA-Based Packet Classification With Large-Scale Rule Sets3
Universal Graph-Based Scheduling for Quantum Systems3
Artificial-Intelligence-Enhanced Ultrasound Flow Imaging at the Edge3
Enhancing Model Parallelism in Neural Architecture Search for Multidevice System3
Tydi: An Open Specification for Complex Data Structures Over Hardware Streams3
Characterizing and Modeling Nonvolatile Memory Systems3
Shortages of Integrated Circuits3
Efficient Language-Guided Reinforcement Learning for Resource-Constrained Autonomous Systems3
The Apollo Guidance Computer3
Pensando Distributed Services Architecture3
Accelerating Genomic Data Analytics With Composable Hardware Acceleration Framework3
FPGA Computing3
Failure Tolerant Training With Persistent Memory Disaggregation Over CXL3
Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite3
Sustainable AI Processing at the Edge3
Compiling for Vector Extensions With Stream-Based Specialization2
The Origin of Intel's Micro-Ops2
Characterizing and Mitigating Soft Errors in GPU DRAM2
Understanding Acceleration Opportunities at Hyperscale2
Hardware Specialization: From Cell to Heterogeneous Microprocessors Everywhere2
Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part V: References2
Combining Multiple Tiny Machine Learning Models for Multimodal Context-Aware Stress Recognition on Constrained Microcontrollers2
Distributed Deep Learning With GPU-FPGA Heterogeneous Computing2
Systematically Understanding Graph Accelerator Dimensions and the Value of Hardware Flexibility2
A Binary Translation Framework for Automated Hardware Generation2
Quantum Computing and the Design of the Ultimate Accelerator2
HPVM: Hardware-Agnostic Programming for Heterogeneous Parallel Systems2
Microprocessor Advances and the Mainframe Legacy2
PCs Take a Page From Xbox With Pluton2
Retargetable Optimizing Compilers for Quantum Accelerators via a Multilevel Intermediate Representation2
AI and Memory Wall2
Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs2
POD-RACING: Bulk-Bitwise to Floating-Point Compute in Racetrack Memory for Machine Learning at the Edge2
Virtual Logical Qubits: A Compact Architecture for Fault-Tolerant Quantum Computing2
TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology2
I-DVFS: Instantaneous Frequency Switch During Dynamic Voltage and Frequency Scaling2
Exploring Memory-Oriented Design Optimization of Edge AI Hardware for Extended Reality Applications2
LastLayer: Toward Hardware and Software Continuous Integration2
Machine Learning for Systems2
Monitoring InfiniBand Networks to React Efficiently to Congestion2
A Mobile DNN Training Processor With Automatic Bit Precision Search and Fine-Grained Sparsity Exploitation2
Yin-Yang: Programming Abstractions for Cross-Domain Multi-Acceleration1
BabelFish: Fusing Address Translations for Containers1
Warehouse-Scale Video Acceleration1
Navigating the Seismic Shift of Post-Moore Computer Systems Design1
Interactions, Impacts, and Coincidences of the First Golden Age of Computer Architecture1
Advancing TinyMLOps: Robust Model Updates in the Internet of Intelligent Vehicles1
Acceleration of a Classic McEliece Postquantum Cryptosystem With Cache Processing1
Increasing Throughput of In-Memory DNN Accelerators by Flexible Layerwise DNN Approximation1
Biology and Systems Interactions1
Practical and Scalable ML-Driven Cloud Performance Debugging With Sage1
Addressing the Gap Between Training Data and Deployed Environment by On-Device Learning1
A Hardware/Software Co-Design Vision for Deep Learning at the Edge1
speedAI240: A 2-Petaflop, 30-Teraflops/W At-Memory Inference Acceleration Device With 1456 RISC-V Cores1
IEEE Computer Society: Volunteer Service Awards1
Economic Dependencies in Integrated Circuits1
Special Issue on Environmentally Sustainable Computing1
Masthead1
Early History of Texas Instrument's Digital Signal Processor1
Online Code Layout Optimizations via OCOLOS1
Adversarial Attacks Against Machine Learning-Based Resource Provisioning Systems1
Enabling Artificial Intelligence Supercomputers With Domain-Specific Networks1
The Economics of Confrontational Conversation1
A 10.7-µJ/Frame 88% Accuracy CIFAR-10 Single-Chip Neuromorphic Field-Programmable Gate Array Processor Featuring Various Nonlinear Functions of Dendrites in the Human Cerebrum1
Data Movement Accelerator Engines on a Prototype Power10 Processor1
Special Issue on Artificial Intelligence at the Edge1
DVL-Lossy: Isolating Congesting Flows to Optimize Packet Dropping in Lossy Data-Center Networks1
A Brief History of Warehouse-Scale Computing1
A Compressed Spiking Neural Network Onto a Memcapacitive In-Memory Computing Array1
Toward Developing High-Performance RISC-V Processors Using Agile Methodology1
Special Issue on In-Memory Computing1
The 50 Year History of the Microprocessor as Five Technology Eras1
Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part IX: Patent Families1
Vector Runahead for Indirect Memory Accesses1
On-Device Customization of Tiny Deep Learning Models for Keyword Spotting With Few Examples1
The Fox and Shepherd Problem1
XCRYPT: Accelerating Lattice-Based Cryptography With Memristor Crossbar Arrays1
Dynamic Capacity Service for Improving CXL Pooled Memory Efficiency1
The Xbox Series X System Architecture1
Special Issue on Artificial Intelligence, Edge, and Internet of Things for Smart Agriculture1
Enterprise-Class Multilevel Cache Design: Low Latency, Huge Capacity, and High Reliability1
On-Device Tiny Machine Learning for Anomaly Detection Based on the Extreme Values Theory1
Datacenter-Scale Analysis and Optimization of GPU Machine Learning Workloads1
IEEE Computer Society1
Z80—The 1970s Microprocessor Still Alive1
Reliable and Time-Efficient Virtualized Function Placement1
EyeCoD: Eye Tracking System Acceleration via FlatCam-Based Algorithm and Hardware Co-Design1
Special Issue on Hot Interconnects1
Remote Work1
Fifty Years of the International Symposium on Computer Architecture: A Data-Driven Retrospective1
Leaking Secrets Through Compressed Caches1
0.08818793296814