Releases · djiangtw/tech-column-public

28 Mar 15:51

djiangtw

tcte-v0.2.0

72cc45b

Tech Event 02 - Breaking Compute Anxiety with Arm AGI CPU (Bilingual & Multimedia) Latest

Latest

🚀 Release Overview (版本總覽)

This release officially publishes the highly anticipated Tech Event 02: Arm AGI CPU Deep Dive. It includes a comprehensive 15,000-word architecture whitepaper (available in both English and Traditional Chinese) that explores the paradigm shift from Generative AI to Agentic AI, and how the new Arm AGI CPU reshapes the AI infrastructure landscape.

Additionally, this release includes exclusive multimedia resources (Audio Podcasts, Video Overviews, and a Presentation Slide Deck) generated specifically for this article, making the complex theoretical and business insights accessible in multiple formats.

✨ Key Highlights (核心亮點)

The Agentic Paradigm Shift: Deep analysis of why "brute-force compute" (GPUs) fails in Agentic AI tasks, utilizing the Roofline Model and Information Entropy (Fano's Inequality) to prove why memory bandwidth and deterministic latency are the new king.
Microarchitecture & Rack Economics: Why Arm abandoned SMT for 136 physical cores, and the strategic brilliance behind the 300W TDP limit for both air-cooling and liquid-cooling massive deployments.
Three Real-World Scenarios:
- Meta: Using CPU as a Super Orchestrator and CXL 3.0 for zero-copy memory sharing to bypass Amdahl's Law.
- OpenAI: Solving the 200TB MCTS memory explosion and Pointer Chasing bottleneck for System 2 Continuous Reasoning.
- Cloudflare: Overcoming edge computing constraints using the P-E-C Triangle and Kernel-Bypass (DPDK).
Market Reorganization: Strategic breakdown of Arm's blitzkrieg, NVIDIA's ecosystem moat, and AMD's long-tail strategy.

📦 Assets & Downloads (下載資源)

📄 Documents (文章原始碼)

02-arm-agi.md (Traditional Chinese Version)
02-arm-agi.en.md (English Version)

🎧/🎥 Multimedia Overviews (多媒體導覽)

🎙️ [ZHTW] Audio Overview (Podcast): tcte-02-zhtw-Arm_AGI.m4a
- A deep-dive conversation exploring the bandwidth wall and Agentic AI challenges.
🎬 [ZHTW] Video Overview: tcte-02-zhtw-Arm_AGI.mp4
- Visual explainer of the heterogeneous computing challenges and CXL 3.0.
🎬 [EN] Video Overview: tcte-02-en-Arm_s_AGI_CPU__Reshaping_AI.mp4
- A concise video breakdown of the Agentic AI infrastructure shift and TCO advantages.
📊 [EN] Slide Deck: tcte-02-en-Architecting_Agentic_AI.pdf
- *A 15-20 page professional presentation (light theme) summarizing the entire wh

📜 License (授權聲明)
All documents and attached multimedia files are licensed under CC BY 4.0.
Author: Danny Jiang.
You are free to share and adapt this work, provided you give appropriate credit to the original author and source.

Assets 6

27 Mar 05:30

djiangtw

tctr-v0.1.0

10ad870

Tech Read 01: A First Course in Information Theory - Bridging Shannon and System Architecture

📖 Overview

This release features a deep-dive technical reading that bridges the abstract mathematical limits of Claude Shannon's Information Theory (guided by Prof. Raymond W. Yeung's classic textbook) with modern system architecture and performance engineering.

Rather than treating information theory purely as a mathematical discipline, this article explores it from a System Designer's perspective. It demonstrates that the mathematical framework built over 70 years ago provides the exact "limit thinking" required to solve today's most complex hardware bottlenecks—from multi-core scalability to AI model quantization.

🔗 Read the Full Articles

English Version: A First Course in Information Theory: Bridging Shannon and System Architecture
Traditional Chinese (繁體中文): 從 Shannon 到系統設計：資訊理論的工程師視角

✨ Key Highlights

Entropy & the Roofline Model: How data compression physically reshapes the Memory Wall, moving systems from bandwidth-bound to compute-bound.
Fano's Inequality: Understanding the absolute physical ceiling of modern Branch Predictors (e.g., TAGE) and pipeline flushes.
Typicality & Benchmarking: The engineering meaning of the Law of Large Numbers (AEP) in designing representative benchmarks (e.g., SPEC CPU) and predicting P99 heavy-tailed latency.
Information Diagrams & the 7-Domain Framework: Visualizing system modularity breakdowns and cross-domain anti-synergies (Negative Mutual Information).
Information Inequalities: Why multi-core scalability degrades non-linearly after 4 cores, and how ITIP machine proofs align with LLM-based code verification (IntrinTrans).
Rate-Distortion & Network Coding: The mathematical necessity behind LLM INT8/NF4 quantization, and how In-Network Computing (e.g., NVIDIA SHARP) embodies the butterfly network's XOR logic.
Group Theory: The algebraic foundation for formally verifying Cache Coherence (MESI) protocols via symmetry reduction.

📦 Attached Media Assets

To make these dense theoretical concepts more accessible, several AI-generated multimedia overviews are included in this release.

English Assets:

🎥 Video Overview: tctr-01-en-The_Language_of_Limits.mp4
📊 Slide Deck (PDF): tctr-01-en-Shannon_Meets_Silicon.pdf

Traditional Chinese (繁體中文) Assets:

🎥 Video Overview: tctr-01-zhtw-A-First-Course-in-Information-Theory.mp4
🎧 Audio Overview (Podcast): tctr-01-zhtw-A-First-Course-in-Information-Theory.m4a

(Note: All attached media files include short information about the author and licensing)

📄 License & Attribution

Author: Danny Jiang
License: CC BY 4.0

Assets 6

25 Mar 10:18

djiangtw

tcte-v0.1.0

ffbc39d

GTC 2026 Deep Dive: How AI Factories Are Reshaping System Architecture & Media Assets

📝 Release Description

This release contains the finalized bilingual (English and Traditional Chinese) column articles for "01-gtc-2026", along with generated multimedia assets designed to help readers better digest the architectural insights.

Rather than just acting as a "hardware spec translator," this article explores how the AI industry is moving from "lab experiments" to true "AI Factories" through the lens of system-level thinking. It outlines three key shifts (from training to inference, from thinking to acting, from digital to physical) and uses four classic performance models (Roofline Model, Little's Law, Universal Scalability Law, and the P-E-C Triangle) to deduce the underlying logic of hardware evolution.

📖 Read the Articles / 閱讀專欄文章

English Version: 01-gtc-2026.en.md
繁體中文版: 01-gtc-2026.md

📦 Attached Media Assets

This release includes the following generated multimedia files. Feel free to download them or use them alongside the article:

🎥 ZHTW Whiteboard Video Overview: tcte-01-zhtw-GTC-2026.mp4
🎧 ZHTW Deep Dive Audio Overview (Podcast): tcte-01-zhtw-GTC-2026.m4a
🎥 EN Whiteboard Video Overview: tcte-01-en-GTC_2026__The_AI_Factory_Era.mp4
📊 EN Slide Deck (Summary): tcte-01-en-The_AI_Factory_Blueprint.pdf (Light-colored background layout, summarizing the core essence)

📖 Key Topics Explored

Vera Rubin Platform: Why Agentic AI needs the deterministic execution and bounded latency of the Vera CPU, and how the Rubin GPU hacks the memory wall using HBM4 bandwidth and low-precision numerical formats (NVFP4, 1.58-bit).
Disaggregated Inference: Splitting Prefill and Decode bottlenecks using Arithmetic Intensity and the Roofline model, and analyzing NVL576 scaling limits and BlueField DPU network offloading via the Universal Scalability Law (USL).
AI Factory and P-E-C Economics: Starting from a Fermi estimate of a 20MW AI site, we explore the commercial trade-offs between Performance, Energy, and Cost (P-E-C), encompassing power delivery, liquid cooling, and temperature-sensitive CPO networking.
Physical AI: How the underlying architecture aligns with strict safety-critical (functional safety) requirements when AI moves from the digital cloud into robots and real-world physical control loops.

📄 License

This article and all attached assets are authored by Danny Jiang and released under the CC BY 4.0 (Creative Commons Attribution 4.0 International) license. You are free to share and adapt this work, as long as you give appropriate credit.

Assets 6

20 Mar 10:23

djiangtw

tcca-v0.4.0

914897d

Article 04: LLM-Driven RISC-V Vector Code Generation

📚 Computer Architecture Series - Article 04

LLM-Driven RISC-V Vector Code Generation and Verification Methodology

Breaking the architecture barrier: How to leverage LLM with proper guardrails to migrate legacy SIMD code to RISC-V Vector extensions.

📄 Article Links

Traditional Chinese (ZH-TW): 04-llm-rvv-methodology.md
English: 04-llm-rvv-methodology.en.md

🎯 Key Topics

IntrinTrans Framework: Multi-Agent FSM (Translator → Compilation → Test → Optimizer)
VLA (Vector Length Agnosticism): Write-once, run-anywhere vector code
Strip-mining: Dynamic loop adjustment via vsetvl
LMUL Register Pressure: Avoiding the register spilling trap
Liveness Analysis: Mathematical guardrails for register optimization
Architecture-Aware Guardrails: Embedding microarchitectural knowledge into LLM prompts
Post-silicon Verification: Trace Encoder/Funnel for hardware-level validation
Cache-aware Optimization Limitations: Understanding the memory hierarchy blind spots

📊 Statistics

Word Count (ZH-TW): ~4,100 words
Word Count (EN): ~4,100 words
Series Total: 97 articles, ~391,600 words

🔑 Key Insights

Legacy Code Tax: RISC-V's biggest barrier is decades of x86/ARM SIMD code
Single-shot LLM Prompting Fails: Rule-based tools achieve only 38.2% compilation success; single-shot LLMs only 52.9%
CI/CD Feedback Loop is Critical: Through iterative compilation and testing feedback, LLMs achieve 100% correctness
Register Spilling is a Performance Killer: LMUL=8 compresses 32 registers into 4 groups, causing 10-20× latency penalty
Liveness Analysis Provides Guardrails: Mathematical formulas precisely calculate register pressure, avoiding spilling black holes
AI Can Beat Human Experts: In the h2v1_upsample case, AI achieved 5.93× speedup through exhaustive optimization space exploration

📎 Attachments

Slide Deck
- English slides: "tcca-04-en-The Vector Blueprint"
- Traditional Chinese slides: "tcca-04-zhtw-AI Guardrails for RISC-V Vector Migration"

📖 Series Articles

Article 01 - All Roads Lead to IPC
Article 02 - Heterogeneous System Architecture
Article 03 - Workload-Driven CPU Selection
Article 04 - LLM-Driven RISC-V Vector Code Generation (This Article)

🎓 Learning Path

This article builds upon concepts from previous articles:

Article 01: Understanding IPC and microarchitectural bottlenecks
Article 02: Heterogeneous computing and performance laws
Article 03: Workload-driven architecture selection
Article 04: AI-driven code migration methodology

📜 License

This work is licensed under CC BY 4.0 International.

Author: Danny Jiang
Source: https://github.com/djiangtw/tech-column-public

Assets 8

11 Dec 16:15

djiangtw

v0p1

383b815

preview v0p1 Pre-release

Pre-release

The preview files for articles in this repo.

Assets 3

Releases: djiangtw/tech-column-public

Tech Event 02 - Breaking Compute Anxiety with Arm AGI CPU (Bilingual & Multimedia)

🚀 Release Overview (版本總覽)

✨ Key Highlights (核心亮點)

📦 Assets & Downloads (下載資源)

Uh oh!

Tech Read 01: A First Course in Information Theory - Bridging Shannon and System Architecture

📖 Overview

🔗 Read the Full Articles

✨ Key Highlights

📦 Attached Media Assets

📄 License & Attribution

Uh oh!

GTC 2026 Deep Dive: How AI Factories Are Reshaping System Architecture & Media Assets

📝 Release Description

📖 Read the Articles / 閱讀專欄文章

📦 Attached Media Assets

📖 Key Topics Explored

📄 License

Uh oh!

Article 04: LLM-Driven RISC-V Vector Code Generation

📚 Computer Architecture Series - Article 04

LLM-Driven RISC-V Vector Code Generation and Verification Methodology

📄 Article Links

🎯 Key Topics

📊 Statistics

🔑 Key Insights

📎 Attachments

📖 Series Articles

🎓 Learning Path

📜 License

Uh oh!

preview v0p1

Uh oh!