Releases: djiangtw/tech-column-public
Tech Event 02 - Breaking Compute Anxiety with Arm AGI CPU (Bilingual & Multimedia)
🚀 Release Overview (版本總覽)
This release officially publishes the highly anticipated Tech Event 02: Arm AGI CPU Deep Dive. It includes a comprehensive 15,000-word architecture whitepaper (available in both English and Traditional Chinese) that explores the paradigm shift from Generative AI to Agentic AI, and how the new Arm AGI CPU reshapes the AI infrastructure landscape.
Additionally, this release includes exclusive multimedia resources (Audio Podcasts, Video Overviews, and a Presentation Slide Deck) generated specifically for this article, making the complex theoretical and business insights accessible in multiple formats.
✨ Key Highlights (核心亮點)
- The Agentic Paradigm Shift: Deep analysis of why "brute-force compute" (GPUs) fails in Agentic AI tasks, utilizing the Roofline Model and Information Entropy (Fano's Inequality) to prove why memory bandwidth and deterministic latency are the new king.
- Microarchitecture & Rack Economics: Why Arm abandoned SMT for 136 physical cores, and the strategic brilliance behind the 300W TDP limit for both air-cooling and liquid-cooling massive deployments.
- Three Real-World Scenarios:
- Meta: Using CPU as a Super Orchestrator and CXL 3.0 for zero-copy memory sharing to bypass Amdahl's Law.
- OpenAI: Solving the 200TB MCTS memory explosion and Pointer Chasing bottleneck for System 2 Continuous Reasoning.
- Cloudflare: Overcoming edge computing constraints using the P-E-C Triangle and Kernel-Bypass (DPDK).
- Market Reorganization: Strategic breakdown of Arm's blitzkrieg, NVIDIA's ecosystem moat, and AMD's long-tail strategy.
📦 Assets & Downloads (下載資源)
📄 Documents (文章原始碼)
02-arm-agi.md(Traditional Chinese Version)02-arm-agi.en.md(English Version)
🎧/🎥 Multimedia Overviews (多媒體導覽)
- 🎙️ [ZHTW] Audio Overview (Podcast):
tcte-02-zhtw-Arm_AGI.m4a- A deep-dive conversation exploring the bandwidth wall and Agentic AI challenges.
- 🎬 [ZHTW] Video Overview:
tcte-02-zhtw-Arm_AGI.mp4- Visual explainer of the heterogeneous computing challenges and CXL 3.0.
- 🎬 [EN] Video Overview:
tcte-02-en-Arm_s_AGI_CPU__Reshaping_AI.mp4- A concise video breakdown of the Agentic AI infrastructure shift and TCO advantages.
- 📊 [EN] Slide Deck:
tcte-02-en-Architecting_Agentic_AI.pdf- *A 15-20 page professional presentation (light theme) summarizing the entire wh
📜 License (授權聲明)
All documents and attached multimedia files are licensed under CC BY 4.0.
Author: Danny Jiang.
You are free to share and adapt this work, provided you give appropriate credit to the original author and source.
Tech Read 01: A First Course in Information Theory - Bridging Shannon and System Architecture
📖 Overview
This release features a deep-dive technical reading that bridges the abstract mathematical limits of Claude Shannon's Information Theory (guided by Prof. Raymond W. Yeung's classic textbook) with modern system architecture and performance engineering.
Rather than treating information theory purely as a mathematical discipline, this article explores it from a System Designer's perspective. It demonstrates that the mathematical framework built over 70 years ago provides the exact "limit thinking" required to solve today's most complex hardware bottlenecks—from multi-core scalability to AI model quantization.
🔗 Read the Full Articles
- English Version: A First Course in Information Theory: Bridging Shannon and System Architecture
- Traditional Chinese (繁體中文): 從 Shannon 到系統設計:資訊理論的工程師視角
✨ Key Highlights
- Entropy & the Roofline Model: How data compression physically reshapes the Memory Wall, moving systems from bandwidth-bound to compute-bound.
- Fano's Inequality: Understanding the absolute physical ceiling of modern Branch Predictors (e.g., TAGE) and pipeline flushes.
- Typicality & Benchmarking: The engineering meaning of the Law of Large Numbers (AEP) in designing representative benchmarks (e.g., SPEC CPU) and predicting P99 heavy-tailed latency.
- Information Diagrams & the 7-Domain Framework: Visualizing system modularity breakdowns and cross-domain anti-synergies (Negative Mutual Information).
- Information Inequalities: Why multi-core scalability degrades non-linearly after 4 cores, and how ITIP machine proofs align with LLM-based code verification (IntrinTrans).
- Rate-Distortion & Network Coding: The mathematical necessity behind LLM INT8/NF4 quantization, and how In-Network Computing (e.g., NVIDIA SHARP) embodies the butterfly network's XOR logic.
- Group Theory: The algebraic foundation for formally verifying Cache Coherence (MESI) protocols via symmetry reduction.
📦 Attached Media Assets
To make these dense theoretical concepts more accessible, several AI-generated multimedia overviews are included in this release.
English Assets:
- 🎥 Video Overview:
tctr-01-en-The_Language_of_Limits.mp4 - 📊 Slide Deck (PDF):
tctr-01-en-Shannon_Meets_Silicon.pdf
Traditional Chinese (繁體中文) Assets:
- 🎥 Video Overview:
tctr-01-zhtw-A-First-Course-in-Information-Theory.mp4 - 🎧 Audio Overview (Podcast):
tctr-01-zhtw-A-First-Course-in-Information-Theory.m4a
(Note: All attached media files include short information about the author and licensing)
📄 License & Attribution
- Author: Danny Jiang
- License: CC BY 4.0
GTC 2026 Deep Dive: How AI Factories Are Reshaping System Architecture & Media Assets
📝 Release Description
This release contains the finalized bilingual (English and Traditional Chinese) column articles for "01-gtc-2026", along with generated multimedia assets designed to help readers better digest the architectural insights.
Rather than just acting as a "hardware spec translator," this article explores how the AI industry is moving from "lab experiments" to true "AI Factories" through the lens of system-level thinking. It outlines three key shifts (from training to inference, from thinking to acting, from digital to physical) and uses four classic performance models (Roofline Model, Little's Law, Universal Scalability Law, and the P-E-C Triangle) to deduce the underlying logic of hardware evolution.
📖 Read the Articles / 閱讀專欄文章
- English Version: 01-gtc-2026.en.md
- 繁體中文版: 01-gtc-2026.md
📦 Attached Media Assets
This release includes the following generated multimedia files. Feel free to download them or use them alongside the article:
- 🎥 ZHTW Whiteboard Video Overview:
tcte-01-zhtw-GTC-2026.mp4 - 🎧 ZHTW Deep Dive Audio Overview (Podcast):
tcte-01-zhtw-GTC-2026.m4a - 🎥 EN Whiteboard Video Overview:
tcte-01-en-GTC_2026__The_AI_Factory_Era.mp4 - 📊 EN Slide Deck (Summary):
tcte-01-en-The_AI_Factory_Blueprint.pdf(Light-colored background layout, summarizing the core essence)
📖 Key Topics Explored
- Vera Rubin Platform: Why Agentic AI needs the deterministic execution and bounded latency of the Vera CPU, and how the Rubin GPU hacks the memory wall using HBM4 bandwidth and low-precision numerical formats (NVFP4, 1.58-bit).
- Disaggregated Inference: Splitting Prefill and Decode bottlenecks using Arithmetic Intensity and the Roofline model, and analyzing NVL576 scaling limits and BlueField DPU network offloading via the Universal Scalability Law (USL).
- AI Factory and P-E-C Economics: Starting from a Fermi estimate of a 20MW AI site, we explore the commercial trade-offs between Performance, Energy, and Cost (P-E-C), encompassing power delivery, liquid cooling, and temperature-sensitive CPO networking.
- Physical AI: How the underlying architecture aligns with strict safety-critical (functional safety) requirements when AI moves from the digital cloud into robots and real-world physical control loops.
📄 License
This article and all attached assets are authored by Danny Jiang and released under the CC BY 4.0 (Creative Commons Attribution 4.0 International) license. You are free to share and adapt this work, as long as you give appropriate credit.
Article 04: LLM-Driven RISC-V Vector Code Generation
📚 Computer Architecture Series - Article 04
LLM-Driven RISC-V Vector Code Generation and Verification Methodology
Breaking the architecture barrier: How to leverage LLM with proper guardrails to migrate legacy SIMD code to RISC-V Vector extensions.
📄 Article Links
- Traditional Chinese (ZH-TW): 04-llm-rvv-methodology.md
- English: 04-llm-rvv-methodology.en.md
🎯 Key Topics
- IntrinTrans Framework: Multi-Agent FSM (Translator → Compilation → Test → Optimizer)
- VLA (Vector Length Agnosticism): Write-once, run-anywhere vector code
- Strip-mining: Dynamic loop adjustment via
vsetvl - LMUL Register Pressure: Avoiding the register spilling trap
- Liveness Analysis: Mathematical guardrails for register optimization
- Architecture-Aware Guardrails: Embedding microarchitectural knowledge into LLM prompts
- Post-silicon Verification: Trace Encoder/Funnel for hardware-level validation
- Cache-aware Optimization Limitations: Understanding the memory hierarchy blind spots
📊 Statistics
- Word Count (ZH-TW): ~4,100 words
- Word Count (EN): ~4,100 words
- Series Total: 97 articles, ~391,600 words
🔑 Key Insights
- Legacy Code Tax: RISC-V's biggest barrier is decades of x86/ARM SIMD code
- Single-shot LLM Prompting Fails: Rule-based tools achieve only 38.2% compilation success; single-shot LLMs only 52.9%
- CI/CD Feedback Loop is Critical: Through iterative compilation and testing feedback, LLMs achieve 100% correctness
- Register Spilling is a Performance Killer: LMUL=8 compresses 32 registers into 4 groups, causing 10-20× latency penalty
- Liveness Analysis Provides Guardrails: Mathematical formulas precisely calculate register pressure, avoiding spilling black holes
- AI Can Beat Human Experts: In the
h2v1_upsamplecase, AI achieved 5.93× speedup through exhaustive optimization space exploration
📎 Attachments
- Slide Deck
- English slides: "tcca-04-en-The Vector Blueprint"
- Traditional Chinese slides: "tcca-04-zhtw-AI Guardrails for RISC-V Vector Migration"
📖 Series Articles
- Article 01 - All Roads Lead to IPC
- Article 02 - Heterogeneous System Architecture
- Article 03 - Workload-Driven CPU Selection
- Article 04 - LLM-Driven RISC-V Vector Code Generation (This Article)
🎓 Learning Path
This article builds upon concepts from previous articles:
- Article 01: Understanding IPC and microarchitectural bottlenecks
- Article 02: Heterogeneous computing and performance laws
- Article 03: Workload-driven architecture selection
- Article 04: AI-driven code migration methodology
📜 License
This work is licensed under CC BY 4.0 International.
Author: Danny Jiang
Source: https://github.com/djiangtw/tech-column-public
preview v0p1
The preview files for articles in this repo.