Unified code generation pipeline for MC data extraction - part 1/3#294
Draft
mj41 wants to merge 2 commits intoTnze:masterfrom
Draft
Unified code generation pipeline for MC data extraction - part 1/3#294mj41 wants to merge 2 commits intoTnze:masterfrom
mj41 wants to merge 2 commits intoTnze:masterfrom
Conversation
…tion) Introduce unified code generation infrastructure and extraction pipeline for Minecraft 1.21.11. Consolidates 8 separate generators and the mc-extract pipeline into a single coordinated system. 1. **gen_packetid.go** - Packet ID constants - Parses protocol.json from MC server jar - Generates packetid/ registry with clientbound/serverbound packet mappings - Handles protocol phase organization (LOGIN, CONFIGURATION, PLAY) - 254 packets for protocol 774 2. **gen_soundid.go** - Sound event registry - Extracts from registries.json sound event entries - Generates soundid/ registry: 1838 sound events mapped to IDs - Includes sound name to ID lookup tables 3. **gen_item.go** - Item registry - Parses registries.json item entries - Generates item/ registry with all 1505 items - Includes item IDs and metadata 4. **gen_blocks.go** - Block types and properties - Parses blocks.json from MC server (1166 block types) - Generates block/ registry: block types, 29671 block states - Block property enums: Direction, Axis, Half, FrontAndTop, etc. - Hand-crafted overrides in naming_overrides.json 5. **gen_blockentities.go** - Block entity types - Parses BlockEntity (now BlockEntityType) registry - Generates blockentities/ registry: 49 block entity types - Supports MC 1.21.11+ API changes (ResourceLocation → Identifier) 6. **gen_entity.go** - Entity types - Parses registries.json entity types - Generates entity/ registry: 157 entity types - Includes entity metadata and type mappings 7. **gen_registryid.go** - Network registry indices - Extracts all registry types from registries.json - Generates registryid/ package with 34 registry files - Covers: blocks, items, block entities, entities, biomes, dimensions, enchantments, etc. 8. **gen_componenttypes.go** + **gen_component.go** - Data component infrastructure - Component type generation from hand-crafted schema - Auto-generates wrapper types for 104 data components - Patterns: embed, embed_nbt, eitherholder, empty, delegate, named_int, array, tuple, custom - GenComponentSchema.java: runtime schema extraction from DataComponents class - Supports component encoding/decoding for inventory items **extract.go** - Unified extraction orchestrator - Downloads Minecraft server jar (1.21.11) - Executes Java extractors via container (eclipse-temurin:21-jdk) - Produces intermediate JSON files (temp/jsons/1.21.11/) - Feeds output to all eight generators **Java Extractors** (7 files in tools/java/) - **ExtractAll.java** - Main orchestrator (495 lines) - Extracts from server jar: registries.json, blocks.json, asset files - Invokes individual extractors for specialized data - Generates component_schema.json via reflection - **GenBlockProperties.java** - Block property enums - Parses Block enum members via reflection - Detects enum types from block properties - 29 property enums generated - **GenComponentSchema.java** - Component schema reflection (751 lines) - Introspects DataComponents class - Extracts generic type parameters via reflection - Classifies types: Record/enum/primitive/Holder/EitherHolder/Unit - Probes StreamCodec for VarInt vs Int detection - 104 components: 56 auto-matched to schema, 48 requiring overrides - **GenBiomes.java** - Biome data extraction - **GenBlockEntities.java** - Block entity type extraction - **GenComponents.java** - Component type enumeration - **GenEntities.java** - Entity type enumeration **tools/hand-crafted/** directory - **component_schema.json** - Complete 104-component type schema - Wire format definitions - Type patterns and parameters - Overrides for complex types - 53 embed patterns, 22 tuple, 6 array, 4 empty, 3 eitherholder, 16 custom - **component_schema_overrides.json** - Manual corrections - 49 entries with pattern-specific wire format overrides - 10 comment entries for documentation - **naming_overrides.json** - Go naming conventions - Component name mappings (map_id → MapID) - Block trim prefix types - Block property naming rules - **packet_phases.json** - Protocol phase metadata - Phase ordering and Go code prefixes - Section comments for generated code organization **helpers.go** (184 lines) - JSON file reading (readJSON, readHandCrafted) - File writing with formatting - Go package/type name validation - Lazy loading for hand-crafted data **main.go** (157 lines) - CLI interface: --extract (full pipeline) vs direct generation - Version argument handling - Generator selection and execution - Error handling and progress reporting **docs/README.md** - Project overview - Architecture overview - Getting started with examples - Breaking changes from MC upgrades - Protocol compatibility table **docs/dev/README.md** - Development guide - Contributing guidelines - Testing expectations - Code organization **docs/dev/tools.md** - Generator documentation (164 lines) - Each generator's purpose, inputs, outputs - Hand-crafted data format specifications - Component schema format details - Maintenance procedures **docs/dev/minecraft-internals.md** - MC protocol reference - Packet structure overview - Registry format explanation - Component encoding scheme 1. **Separation of Concerns** - Java extractors run in container with MC runtime - Go generators operate on extracted JSON - Hand-crafted schema acts as single source of truth for components 2. **Generic Type Handling** - Java reflection probes StreamCodec for wire format - Template patterns for EitherHolder, Holder, CompositeHolder - Auto-detects VarInt vs Int encoding 3. **Manual Override System** - 49 components require manual wire format overrides - 56 auto-matched via schema pattern recognition - 10 comment entries for future maintenance notes 4. **Tool Distribution** - Separate tools/go.mod to isolate generator dependencies - No tooling deps in main module - Single command for full extraction + generation - Protocol 774 - Java API changes: ResourceLocation → Identifier - Block entity refactoring: new types and properties - Chat fixes and improvements - Item component system expanded to 104 types - All generators validate Go syntax before write - go build ./... and go test ./... pass on generated code - Container-based extraction ensures consistency - Hand-crafted data independently reviewed - 33 files changed, 5543 insertions(+), 9 deletions(-) - Primary additions: 8 generators (2847 lines), Java extractors (2148 lines) - Documentation: 292 lines
This was referenced Feb 16, 2026
Author
|
I will add some more bugfixes to
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Credits
Co-authored with Claude Opus 4.6 (Anthropic) via GitHub Copilot.
Java reflection extractors use Mojang's unobfuscated server jar
(published by Mojang/Microsoft).
Summary
Unified code generation pipeline for go-mc, replacing the previous scattered
generators with a single coordinated system. Removes all old generators and
introduces
tools/— a separate Go module that extracts Minecraft data fromthe server jar and generates all Go source files.
This PR adds the infrastructure only — no generated output files are changed.
The generated data update follows in a separate PR.
Commits
— Adds the complete
tools/module with 10 generators, 7 Java extractors,templates, hand-crafted config files, and documentation.
are superseded by
tools/.What's in
tools/10 Go Generators
packets.jsondata/packetid/packetid.goregistries.jsondata/soundid/soundid.goitems.json+registries.jsondata/item/item.goblocks.json+block_properties.jsonlevel/block/blocks.go+block_states.nbt+properties_enum.goentities.jsondata/entity/entity.gocomponents.json+component_schema.jsonlevel/component/components.go+*_gen.goblock_entities.jsonlevel/block/blockentity.go+blockentities.goregistries.jsondata/registryid/*.go(95 files)biomes.jsonlevel/biome/list.golang/*.jsondata/lang/<locale>/<locale>.go(147 languages)7 Java Extractors (in
tools/java/)Orchestrated by
ExtractAll.java, running in aneclipse-temurin:21-jdkcontainer:
3-Phase Pipeline
Usage
Old Generators Removed
data/entity/gen_entity.godata/item/gen_item.godata/lang/gen_lang.godata/soundid/gen_soundid.godata/packetid/generator/(GenPacketId.java)data/registryid/generator/(generate.go + registries.json + template)level/block/generator/(GenBlocks.java + 5 Go generators)internal/generateutils/utils.goDiff Stats
58 files changed, 5,889 insertions(+), 19,085 deletions(-)
Net reduction of ~13k lines due to removing the bundled 17,877-line
registries.jsonfromdata/registryid/generator/.Module Structure
tools/is a separate Go module with areplacedirective pointing to theparent go-mc module. This keeps generation tooling out of the main module's
dependency graph.
Related