IEEE Micro

Papers
(The median citation count of IEEE Micro is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-03-01 to 2024-03-01.)
ArticleCitations
NVIDIA A100 Tensor Core GPU: Performance and Innovation117
Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs109
Compute Solution for Tesla's Full Self-Driving Computer78
MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance73
MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings73
TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Package Optical I/O63
RTX on—The NVIDIA Turing GPU56
FerroElectronics for Edge Intelligence46
The Design Process for Google's Training Chips: TPUv2 and TPUv343
PEFL: Deep Privacy-Encoding-Based Federated Learning Framework for Smart Agriculture43
The AMD “Zen 2” Processor43
Accelerating Genome Analysis: A Primer on an Ongoing Journey42
The Arm Neoverse N1 Platform: Building Blocks for the Next-Gen Cloud-to-Edge Infrastructure SoC37
FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications36
BlackParrot: An Agile Open-Source RISC-V Multicore for Accelerator SoCs35
PyMTL3: A Python Framework for Open-Source Hardware Modeling, Generation, Simulation, and Verification30
Accelerating Chip Design With Machine Learning30
MHADBOR: AI-Enabled Administrative-Distance-Based Opportunistic Load Balancing Scheme for an Agriculture Internet of Things Network29
Chasing Carbon: The Elusive Environmental Footprint of Computing28
A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC28
Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM25
Habana Labs Purpose-Built AI Inference and Training Processor Architectures: Scaling AI Training Systems Using Standard Ethernet With Gaudi Processor25
Manticore: A 4096-Core RISC-V Chiplet Architecture for Ultraefficient Floating-Point Computing25
Kunpeng 920: The First 7-nm Chiplet-Based 64-Core ARM SoC for Cloud Services24
OpenFPGA: An Open-Source Framework for Agile Prototyping Customizable FPGAs24
SymbiFlow and VPR: An Open-Source Design Flow for Commercial and Novel FPGAs23
ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks21
Evolution of the Graphics Processing Unit (GPU)19
Klessydra-T: Designing Vector Coprocessors for Multithreaded Edge-Computing Cores19
Circuits and Architectures for In-Memory Computing-Based Machine Learning Accelerators18
Quantum Computers for High-Performance Computing18
Extending the Frontier of Quantum Computers With Qutrits17
TSA-NoC: Learning-Based Threat Detection and Mitigation for Secure Network-on-Chip Architecture17
The Path to Successful Wafer-Scale Integration: The Cerebras Story16
Intel Alder Lake CPU Architectures16
Artificial Intelligence Best Practices in Smart Agriculture14
Quantum Computing—From NISQ to PISQ14
Superconductor Computing for Neural Networks14
Data Centers on Wheels: Emissions From Computing Onboard Autonomous Vehicles13
Challenges and Opportunities for Autonomous Micro-UAVs in Precision Agriculture13
IBM's POWER10 Processor13
ML-HW Co-Design of Noise-Robust TinyML Models and Always-On Analog Compute-in-Memory Edge Accelerator13
Generating Systolic Array Accelerators With Reusable Blocks13
CHIPKIT: An Agile, Reusable Open-Source Framework for Rapid Test Chip Development13
On-Demand Mobile CPU Cooling With Thin-Film Thermoelectric Array12
FPGA-Accelerated Quantum Computing Emulation and Quantum Key Distillation12
Agile Hardware Development and Instrumentation With PyRTL12
PCI Express 6.0 Specification: A Low-Latency, High-Bandwidth, High-Reliability, and Cost-Effective Interconnect With 64.0 GT/s PAM-4 Signaling12
Interconnects for DNA, Quantum, In-Memory, and Optical Computing: Insights From a Panel Discussion11
Temporal Computing With Superconductors11
Accelerator Integration for Open-Source SoC Design11
Co-Design and System for the Supercomputer “Fugaku”10
An Open Inter-Chiplet Communication Link: Bunch of Wires (BoW)10
AIDA: Associative In-Memory Deep Learning Accelerator10
NVIDIA Hopper H100 GPU: Scaling Performance9
Architecting Noisy Intermediate-Scale Quantum Computers: A Real-System Study9
Evaluating Sensor Data Quality in Internet of Things Smart Agriculture Applications9
OpenPiton at 5: A Nexus for Open and Agile Hardware Design9
Quantum Codesign9
Compute Substrate for Software 2.09
ECIM: Exponent Computing in Memory for an Energy-Efficient Heterogeneous Floating-Point DNN Training Processor8
The AMD Next-Generation “Zen 3” Core8
Cost-Effective and Flexible Asynchronous Interconnect Technology for GALS Systems8
Configurable Network Protocol Accelerator (COPA)8
A Programmable Approach to Neural Network Compression7
A Next-Generation Cryogenic Processor Architecture7
A Taxonomy of ML for Systems Problems7
A Case for Accelerating Software RTL Simulation7
Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data7
Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms7
Rome to Milan, AMD Continues Its Tour of Italy6
High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA6
Aquabolt-XL HBM2-PIM, LPDDR5-PIM With In-Memory Processing, and AXDIMM With Acceleration Buffer6
On Double Full-Stack Communication-Enabled Architectures for Multicore Quantum Computers6
Hertzbleed: Turning Power Side-Channel Attacks Into Remote Timing Attacks on x866
UAV–Assisted Joint Wireless Power Transfer and Data Collection Mechanism for Sustainable Precision Agriculture in 5G6
Accelerating ML Recommendation With Over 1,000 RISC-V/Tensor Processors on Esperanto's ET-SoC-1 Chip6
Neuromorphic Near-Sensor Computing: From Event-Based Sensing to Edge Learning6
Unveiling the Hardware and Software Implications of Microservices in Cloud and Edge Systems6
Power Side-Channel Attacks in Negative Capacitance Transistor6
MicroScope: Enabling Microarchitectural Replay Attacks6
Temperature-Resilient RRAM-Based In-Memory Computing for DNN Inference6
Performance Left on the Table: An Evaluation of Compiler Autovectorization for RISC-V6
uGEMM: Unary Computing for GEMM Applications5
Accelerating Deep Learning Using Interconnect-Aware UCX Communication for MPI Collectives5
AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers5
System on a Package Innovations With Universal Chiplet Interconnect Express (UCIe) Interconnect5
ILLIXR: An Open Testbed to Enable Extended Reality Systems Research5
Memory Pooling With CXL5
LiveHD: A Productive Live Hardware Development Flow5
Hidden Potential Within Video Game Consoles5
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale5
The Vision Behind MLPerf: Understanding AI Inference Performance5
History of IBM Z Mainframe Processors5
Cerebras Architecture Deep Dive: First Look Inside the Hardware/Software Co-Design for Deep Learning5
Balancing Specialized Versus Flexible Computation in Brain–Computer Interfaces5
Advances in Microprocessor Cache Architectures Over the Last 25 Years5
PurpleDrop: A Digital Microfluidics-Based Platform for Hybrid Molecular-Electronics Applications5
A Single-Shot Generalized Device Placement for Large Dataflow Graphs5
Democratizing Data-Driven Agriculture Using Affordable Hardware5
A Low-Latency and Low-Power Approach for Coherency and Memory Protocols on PCI Express 6.0 PHY at 64.0 GT/s With PAM-4 Signaling5
Emerging Technologies for Quantum Computing4
The Arm Morello Evaluation Platform—Validating CHERI-Based Security in a High-Performance System4
Marvell ThunderX3: Next-Generation Arm-Based Server Processor4
Soil Fertility Monitoring With Internet of Underground Things: A Survey4
ExHero: Execution History-Aware Error-Rate Estimation in Pipelined Designs4
The Open Domain-Specific Architecture4
Energy-Efficient Video Processing for Virtual Reality4
RadioML Meets FINN: Enabling Future RF Applications With FPGA Streaming Architectures4
Agile and Open-Source Hardware4
Accessible, FPGA Resource-Optimized Simulation of Multiclock Systems in FireSim4
Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud4
Accelerating Allreduce With In-Network Reduction on Intel PIUMA4
Accelerating Genomic Data Analytics With Composable Hardware Acceleration Framework3
Enhancing Model Parallelism in Neural Architecture Search for Multidevice System3
Failure Tolerant Training With Persistent Memory Disaggregation Over CXL3
Efficient Language-Guided Reinforcement Learning for Resource-Constrained Autonomous Systems3
Universal Graph-Based Scheduling for Quantum Systems3
Three-Dimensional Stacked Neural Network Accelerator Architectures for AR/VR Applications3
Countering Load-to-Use Stalls in the NVIDIA Turing GPU3
TinyIREE: An ML Execution Environment for Embedded Systems From Compilation to Deployment3
History of Microcontrollers: First 50 Years3
Compiling for the IBM Matrix Engine for Enterprise Workloads3
Unifying Spatial Accelerator Compilation With Idiomatic and Modular Transformations3
Meet the FaM1ly3
FPGA Computing3
Tydi: An Open Specification for Complex Data Structures Over Hardware Streams3
Shortages of Integrated Circuits3
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs3
Artificial-Intelligence-Enhanced Ultrasound Flow Imaging at the Edge3
Creating Foundations for Secure Microarchitectures With Data-Oblivious ISA Extensions2
Understanding Acceleration Opportunities at Hyperscale2
Overclocking in Immersion-Cooled Datacenters2
PCs Take a Page From Xbox With Pluton2
The Apollo Guidance Computer2
Machine Learning for Systems2
Systematically Understanding Graph Accelerator Dimensions and the Value of Hardware Flexibility2
Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs2
Virtual Logical Qubits: A Compact Architecture for Fault-Tolerant Quantum Computing2
Kaya for Computer Architects: Toward Sustainable Computer Systems2
Towards General-Purpose Acceleration: Finding Structure in Irregularity2
HPVM: Hardware-Agnostic Programming for Heterogeneous Parallel Systems2
Compute Express Link (CXL): Enabling Heterogeneous Data-Centric Computing With Heterogeneous Memory Hierarchy2
LastLayer: Toward Hardware and Software Continuous Integration2
Hardware Specialization: From Cell to Heterogeneous Microprocessors Everywhere2
Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part V: References2
ACCL: Architecting Highly Scalable Distributed Training Systems With Highly Efficient Collective Communication Library2
A Mobile DNN Training Processor With Automatic Bit Precision Search and Fine-Grained Sparsity Exploitation2
Distributed Deep Learning With GPU-FPGA Heterogeneous Computing2
The Origin of Intel's Micro-Ops2
The AMD 400-G Adaptive SmartNIC System on Chip: A Technology Preview2
Accelerating Phylogenetics Using FPGAs in the Cloud2
HALO: A Hardware–Software Co-Designed Processor for Brain–Computer Interfaces2
Microprocessor Advances and the Mainframe Legacy2
A Parallel and Updatable Architecture for FPGA-Based Packet Classification With Large-Scale Rule Sets2
Pensando Distributed Services Architecture2
Photonic Network-on-Wafer for Multichiplet GPUs2
CXL-Enabled Enhanced Memory Functions2
SpecHLS: Speculative Accelerator Design Using High-Level Synthesis2
Bridging Python to Silicon: The SODA Toolchain2
Quantum Computing and the Design of the Ultimate Accelerator2
LSFQ: A Low-Bit Full Integer Quantization for High-Performance FPGA-Based CNN Acceleration2
Interactions, Impacts, and Coincidences of the First Golden Age of Computer Architecture1
Dynamic Capacity Service for Improving CXL Pooled Memory Efficiency1
A 10.7-µJ/Frame 88% Accuracy CIFAR-10 Single-Chip Neuromorphic Field-Programmable Gate Array Processor Featuring Various Nonlinear Functions of Dendrites in the Human Cerebrum1
Exploring Memory-Oriented Design Optimization of Edge AI Hardware for Extended Reality Applications1
Pod-racing: bulk-bitwise to floating-point compute in racetrack memory for machine learning at the edge1
A Brief History of Warehouse-Scale Computing1
DVL-Lossy: Isolating Congesting Flows to Optimize Packet Dropping in Lossy Data-Center Networks1
Combining Multiple tinyML Models for Multimodal Context-Aware Stress Recognition on Constrained Microcontrollers1
Reliable and Time-Efficient Virtualized Function Placement1
The 50 Year History of the Microprocessor as Five Technology Eras1
Special Issue on In-Memory Computing1
IEEE Computer Society: Volunteer Service Awards1
Characterizing and Mitigating Soft Errors in GPU DRAM1
Leaking Secrets Through Compressed Caches1
BabelFish: Fusing Address Translations for Containers1
XCRYPT: Accelerating Lattice-Based Cryptography With Memristor Crossbar Arrays1
The Economics of Confrontational Conversation1
Navigating the Seismic Shift of Post-Moore Computer Systems Design1
Data Movement Accelerator Engines on a Prototype Power10 Processor1
Special Issue on Artificial Intelligence at the Edge1
A Binary Translation Framework for Automated Hardware Generation1
Z80—The 1970s Microprocessor Still Alive1
TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology1
Toward Developing High-Performance RISC-V Processors Using Agile Methodology1
Varifocal Storage: Dynamic Multiresolution Data Storage1
Retargetable Optimizing Compilers for Quantum Accelerators via a Multilevel Intermediate Representation1
Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite1
Characterizing and Modeling Nonvolatile Memory Systems1
Vector Runahead for Indirect Memory Accesses1
Masthead1
SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander1
Online Code Layout Optimizations via OCOLOS1
The Xbox Series X System Architecture1
Special Issue on Artificial Intelligence, Edge, and Internet of Things for Smart Agriculture1
Architectural CO2 Footprint Tool: Designing Sustainable Computer Systems With an Architectural Carbon Modeling Tool1
On-Device Tiny Machine Learning for Anomaly Detection Based on the Extreme Values Theory1
Biology and Systems Interactions1
Increasing Throughput of In-Memory DNN Accelerators by Flexible Layerwise DNN Approximation1
A Compressed Spiking Neural Network Onto a Memcapacitive In-Memory Computing Array1
Economic Dependencies in Integrated Circuits1
speedAI240: A 2-Petaflop, 30-Teraflops/W At-Memory Inference Acceleration Device With 1456 RISC-V Cores1
A Hardware/Software Co-Design Vision for Deep Learning at the Edge1
Special Issue on Hot Interconnects1
Sustainable AI Processing at the Edge1
I-DVFS: Instantaneous Frequency Switch During Dynamic Voltage and Frequency Scaling1
The Fox and Shepherd Problem1
Warehouse-Scale Video Acceleration1
Watts S. Humphrey Software Quality Award0
HeteroGen: Automatic Synthesis of Heterogeneous Cache Coherence Protocols0
Environmentally Sustainable Computing0
Call for Papers: IEEE Transactions on Computers0
The Humphrey Award Nominations0
IEEE Computer Society Has You Covered!0
IEEE COMPUTER SOCIETY: Call for Papers0
From Mainframes to Microprocessors0
Call for Papers: IEEE Transactions on Computers0
The Next Security Frontier: Taking the Mystery Out of the Supply Chain0
Review of Patents Issued to Computer Architecture Companies in 2021—Part II0
Pandemics and the Dismal Technology Economy0
Addressing the Gap Between Training Data and Deployed Environment by On-Device Learning0
IEEE Computer Society Jobs Board0
Front Cover0
EyeCoD: Eye Tracking System Acceleration via FlatCam-Based Algorithm and Hardware Co-Design0
IEEE Computer Society0
Over the Rainbow: 21st Century Security & Privacy Podcast0
Remote Work0
IEEE Computer Society Jobs Board0
IEEE TRANSACTIONS ON BIG DATA0
Get Published in the New IEEE Open Journal of the Computer Society0
IEEE Computer Society Has You Covered!0
IEEE COMPUTER SOCIETY JOBS BOARD0
IEEE Computer Society Has You Covered!0
COMPUTER: CALL FOR SPECIAL ISSUE PROPOSALS0
Masthead0
Special Issue on Security and Privacy-Preserving Execution Environments0
Table of Contents0
Hot Chips 34 and More!0
IEEE Computer Society Information0
An Architecture to Accelerate Computation on Encrypted Data0
Subscribe to CiSE Today!0
[Front cover]0
IEEE Computer Graphics and Applications0
Get Published in the New IEEE Open Journal of the Computer Society0
Get Published in the New IEEE Open Journal of the Computer Society0
Front Cover0
IEEE Computer Society Jobs Board0
Reg-TuneV2: A Hardware-Aware and Multiobjective Regression-Based Fine-Tuning Approach for Deep Neural Networks on Embedded Platforms0
Masthead0
0.12897396087646