In-Depth System Architecture and Hardware Design
Tech Column is a technical writing project focused on system architecture, hardware design, and performance optimization. The goal is to explain complex technical concepts clearly using vivid analogies and real-world cases, helping readers understand not just "what" but "why."
About the Cases: All case scenarios in this column are mock scenarios, written based on industry best practices with all sensitive information removed. All content complies with professional ethics and NDA requirements.
- Vivid Analogies: Understand Cache through libraries, Associativity through parking lots, NoC through city traffic
- Real-World Cases: Practical problems and solutions from 20+ years of industry experience
- Progressive Learning: From beginner to advanced, systematically building knowledge
- Practice-Oriented: Not just theory, but actionable optimization advice and design principles
| Series | Articles | Word Count |
|---|---|---|
| Computer Architecture | 4 | ~39,700 |
| Cache Architecture | 6 | ~20,800 |
| Network-on-Chip | 6 | ~14,100 |
| Storage Architecture | 12 | ~52,000 |
| Embedded RTOS | 8 | ~24,000 |
| Bluetooth & IoT | 21 | ~70,000 |
| Building danieRTOS | 40 | ~170,000 |
| Tech Events | 2 | ~45,000 |
| Tech Reads | 1 | ~21,000 |
Total: 100 articles, ~457,600 words
Understanding CPU performance design and heterogeneous computing from the architect's perspective.
Article 01 - All Roads Lead to IPC: IPC (Instructions Per Cycle), Latency vs Occupation, Superscalar, Out-of-Order execution, Branch prediction, Cache effects, ROB sizing
Article 02 - Heterogeneous System Architecture: Six performance laws (Amdahl, Gustafson, USL, Roofline, Little's Law, Queuing Theory), Four processor types (CPU/GPU/NPU/DPU), Memory architectures (UMA/CXL/NVLink), Coherence protocols, MLIR, Data-oriented design
Article 03 - Workload-Driven CPU Selection: TMAM (Top-Down Microarchitecture Analysis), CPU taxonomy (ARM Cortex-M/R/A/Neoverse vs RISC-V SiFive/XiangShan/Ventana), Five design scenarios (Ultra-Low Power, Real-Time Embedded, Rich Embedded, Mobile Computing, Cloud & AI Infrastructure), Performance laws application (Little's Law, Roofline Model, ILP/MLP analysis), PPA trade-offs
Article 04 - LLM-Driven RISC-V Vector Code Generation and Verification Methodology: IntrinTrans framework, Multi-Agent FSM (Translator/Compilation/Test/Optimizer), VLA (Vector Length Agnosticism), Strip-mining, LMUL register pressure, Liveness Analysis, Architecture-Aware guardrails, Post-silicon verification (Trace Encoder/Funnel), Cache-aware optimization limitations
Note: This series is available in both Traditional Chinese and English (independently written, not translated).
Deep dive into CPU Cache design and optimization, from basics to practice.
Topics: Cache basics, Associativity, Modern cache design (L1-L3), MESI protocol, Performance optimization, False sharing
Exploring on-chip communication architecture, from Bus to Network evolution.
Topics: NoC introduction, Topology with graph theory, Routing and deadlock, Router microarchitecture, Cache coherency integration, Advanced packaging
Complete perspective from hardware to software on modern storage systems.
Topics: HDD to SSD evolution, SATA/AHCI, PCIe architecture, NVMe protocol, CXL technology, FTL, GC and wear leveling, Error correction, ZNS, Database optimization, AI/ML workloads, Cloud storage
Practice-oriented embedded RTOS development with FreeRTOS + RISC-V.
Topics: RTOS introduction, Scheduler deep dive, Interrupt handling, Memory management, GDB+QEMU debugging, SMP challenges, Context switch assembly, RISC-V privilege modes
BLE protocol stack, wireless communication, IoT system integration.
Topics: BLE protocol stack (HCI, L2CAP, ATT/GATT, SMP), PHY/RF, WiFi/BT coexistence, Hardware interfaces (SPI, MIPI, I2C/UART/GPIO), Power optimization, Debugging, Certification, Zigbee comparison, Thread/Matter, AIoT, Security
Building a RISC-V RTOS from scratch, narrative-style writing, 40 complete tutorials.
danieRTOS is an educational minimal RTOS running on RISC-V architecture.
| Version | Alias | Chapters | Core Features |
|---|---|---|---|
| v0.x | Nano | 01-12 | Basic RTOS: Task, Scheduler, Semaphore, Mutex, Queue |
| v1.x | Secure | 13-19 | User Mode: PMP, Syscall, Fault Handling |
| v2.x | MSMP | 20-30 | SMP: Spinlock, IPI, Multi-core Scheduler |
| v3.x | SMP | 31-40 | Integration: SMP + User Mode + Fault Isolation |
In-depth reviews of foundational textbooks and research papers, bridging theory with system design practice.
Article 01 - A First Course in Information Theory: Bridging Shannon and System Architecture: Connect information theory fundamentals (entropy, mutual information, channel capacity) with real-world system design. Topics include: Roofline Model as entropy bounds, Fano's Inequality in branch prediction, typicality in benchmarking methodology, rate-distortion theory in quantization, and information diagrams for understanding memory consistency models.
Note: This series is available in both Traditional Chinese and English (independently written, not translated).
Architecture-aware deep dives on major industry events and product launches, focusing on how system architecture, hardware, and infrastructure evolve.
Article 01 - GTC 2026 Technical Review: How AI Factories Are Reshaping System Architecture: From NVIDIA Vera CPU and NVFP4 numerical formats to NVLink/NVL72 clusters and AI Factory infrastructure, this series looks at GTC 2026 through the lens of performance laws, disaggregated inference, and large-scale system design.
Article 02 - Breaking Compute Anxiety: How Arm AGI CPU Reshapes Agentic AI Infrastructure: From the DGX Spark paradox to Meta's heterogeneous clusters, OpenAI's MCTS reasoning trees, and Cloudflare's edge defense—explore how Arm's 136-core AGI CPU tackles Agentic AI's three core challenges: memory bandwidth walls, deterministic latency, and rack-scale economics. Topics: Information entropy in workload characterization, CXL 3.0 zero-copy orchestration, SMT abandonment rationale, P-E-C Triangle optimization, and why ASIC cannot replace CPU in high-entropy scenarios.
Note: This series is available in both Traditional Chinese and English (independently written, not translated).
This column is suitable for:
- System Software Engineers: Understanding how hardware affects software performance
- Embedded Engineers: RTOS, drivers, firmware development
- Hardware Engineers: CPU, SoC design and verification
- IoT Developers: Bluetooth, wireless communication, IoT development
- Computer Architecture Students: Learning system architecture in real-world contexts
Prerequisites:
- Basic computer organization concepts
- Understanding of CPU, memory, bus components
- C programming experience (required for some series)
Copyright © 2025 Danny Jiang
All articles are licensed under Creative Commons Attribution 4.0 International License (CC BY 4.0).
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, including commercial
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made
License: https://creativecommons.org/licenses/by/4.0/
Browse Markdown files directly on GitHub, starting from the first article of each series.
Clone this repository:
git clone https://github.com/djiangtw/tech-column-public.git
cd tech-column-publicHardware Architecture Beginners: Cache Architecture → Network-on-Chip → Storage Architecture
Embedded Systems: Embedded RTOS → Building danieRTOS
Wireless Communication: Bluetooth & IoT Series
This is a read-only public repository. The column is developed in a private repository.
Feedback Welcome:
- Open issues for typos, errors, or suggestions
- Discussion and questions are encouraged
Note: Pull requests cannot be accepted as this is synced one-way from the private development repository.
Danny Jiang
System software engineer focused on RISC-V architecture, embedded systems, and performance optimization. 20+ years of industry experience, passionate about explaining complex technical concepts through vivid analogies.
Other Works:
- See RISC-V Run: Fundamentals - Complete RISC-V Architecture Guide
- Data Structures in Practice - Hardware-Oriented Data Structures
- GitHub: https://github.com/djiangtw/tech-column-public
- Email: djiang.tw@gmail.com
- LinkedIn: linkedin.com/in/danny-jiang-26359644
If you cite this column in research, teaching, or articles:
Danny Jiang. (2025). Tech Column: In-Depth System Architecture and Hardware Design.
Licensed under CC BY 4.0. https://github.com/djiangtw/tech-column-public
Happy Reading! 📖
For any questions or suggestions, feel free to contact me through GitHub Issues.