Skip to content

Unified code generation pipeline for MC data extraction - part 1/3#294

Draft
mj41 wants to merge 2 commits intoTnze:masterfrom
mj41:mj-121-v2a
Draft

Unified code generation pipeline for MC data extraction - part 1/3#294
mj41 wants to merge 2 commits intoTnze:masterfrom
mj41:mj-121-v2a

Conversation

@mj41
Copy link

@mj41 mj41 commented Feb 16, 2026

Credits

Co-authored with Claude Opus 4.6 (Anthropic) via GitHub Copilot.
Java reflection extractors use Mojang's unobfuscated server jar
(published by Mojang/Microsoft).

Summary

Unified code generation pipeline for go-mc, replacing the previous scattered
generators with a single coordinated system. Removes all old generators and
introduces tools/ — a separate Go module that extracts Minecraft data from
the server jar and generates all Go source files.

This PR adds the infrastructure only — no generated output files are changed.
The generated data update follows in a separate PR.

Commits

  1. tools: unified code generation and MC data extraction (1.21.11 foundation)
    — Adds the complete tools/ module with 10 generators, 7 Java extractors,
    templates, hand-crafted config files, and documentation.
  2. Remove obsolete generators — Deletes 15 old generator/data files that
    are superseded by tools/.

What's in tools/

10 Go Generators

Generator Input Output
packetid packets.json data/packetid/packetid.go
soundid registries.json data/soundid/soundid.go
item items.json + registries.json data/item/item.go
blocks blocks.json + block_properties.json level/block/blocks.go + block_states.nbt + properties_enum.go
entity entities.json data/entity/entity.go
component components.json + component_schema.json level/component/components.go + *_gen.go
blockentities block_entities.json level/block/blockentity.go + blockentities.go
registryid registries.json data/registryid/*.go (95 files)
biome biomes.json level/biome/list.go
lang lang/*.json data/lang/<locale>/<locale>.go (147 languages)

7 Java Extractors (in tools/java/)

Orchestrated by ExtractAll.java, running in an eclipse-temurin:21-jdk
container:

  • GenComponentSchema — 104 component wire format schemas via reflection
  • GenBiomes — biome protocol ordering via runtime registry introspection
  • GenBlockEntities — block entity types with valid blocks
  • GenBlockProperties — block state property definitions
  • GenComponents — data component type enumeration
  • GenEntities — entity types with dimensions

3-Phase Pipeline

Phase 1 (Go host):  Downloads server jar + 147 language files
Phase 2 (Container): Extracts data from jar (no internet needed)
Phase 3 (Go host):  Generates Go source from extracted JSON

Usage

cd tools && go run . --version 1.21.11

Old Generators Removed

  • data/entity/gen_entity.go
  • data/item/gen_item.go
  • data/lang/gen_lang.go
  • data/soundid/gen_soundid.go
  • data/packetid/generator/ (GenPacketId.java)
  • data/registryid/generator/ (generate.go + registries.json + template)
  • level/block/generator/ (GenBlocks.java + 5 Go generators)
  • internal/generateutils/utils.go

Diff Stats

58 files changed, 5,889 insertions(+), 19,085 deletions(-)

Net reduction of ~13k lines due to removing the bundled 17,877-line
registries.json from data/registryid/generator/.

Module Structure

tools/ is a separate Go module with a replace directive pointing to the
parent go-mc module. This keeps generation tooling out of the main module's
dependency graph.

Related

mj41 added 2 commits February 16, 2026 11:34
…tion)

Introduce unified code generation infrastructure and extraction pipeline for Minecraft 1.21.11.
Consolidates 8 separate generators and the mc-extract pipeline into a single coordinated system.

1. **gen_packetid.go** - Packet ID constants
   - Parses protocol.json from MC server jar
   - Generates packetid/ registry with clientbound/serverbound packet mappings
   - Handles protocol phase organization (LOGIN, CONFIGURATION, PLAY)
   - 254 packets for protocol 774

2. **gen_soundid.go** - Sound event registry
   - Extracts from registries.json sound event entries
   - Generates soundid/ registry: 1838 sound events mapped to IDs
   - Includes sound name to ID lookup tables

3. **gen_item.go** - Item registry
   - Parses registries.json item entries
   - Generates item/ registry with all 1505 items
   - Includes item IDs and metadata

4. **gen_blocks.go** - Block types and properties
   - Parses blocks.json from MC server (1166 block types)
   - Generates block/ registry: block types, 29671 block states
   - Block property enums: Direction, Axis, Half, FrontAndTop, etc.
   - Hand-crafted overrides in naming_overrides.json

5. **gen_blockentities.go** - Block entity types
   - Parses BlockEntity (now BlockEntityType) registry
   - Generates blockentities/ registry: 49 block entity types
   - Supports MC 1.21.11+ API changes (ResourceLocation → Identifier)

6. **gen_entity.go** - Entity types
   - Parses registries.json entity types
   - Generates entity/ registry: 157 entity types
   - Includes entity metadata and type mappings

7. **gen_registryid.go** - Network registry indices
   - Extracts all registry types from registries.json
   - Generates registryid/ package with 34 registry files
   - Covers: blocks, items, block entities, entities, biomes, dimensions, enchantments, etc.

8. **gen_componenttypes.go** + **gen_component.go** - Data component infrastructure
   - Component type generation from hand-crafted schema
   - Auto-generates wrapper types for 104 data components
   - Patterns: embed, embed_nbt, eitherholder, empty, delegate, named_int, array, tuple, custom
   - GenComponentSchema.java: runtime schema extraction from DataComponents class
   - Supports component encoding/decoding for inventory items

**extract.go** - Unified extraction orchestrator
- Downloads Minecraft server jar (1.21.11)
- Executes Java extractors via container (eclipse-temurin:21-jdk)
- Produces intermediate JSON files (temp/jsons/1.21.11/)
- Feeds output to all eight generators

**Java Extractors** (7 files in tools/java/)
- **ExtractAll.java** - Main orchestrator (495 lines)
  - Extracts from server jar: registries.json, blocks.json, asset files
  - Invokes individual extractors for specialized data
  - Generates component_schema.json via reflection

- **GenBlockProperties.java** - Block property enums
  - Parses Block enum members via reflection
  - Detects enum types from block properties
  - 29 property enums generated

- **GenComponentSchema.java** - Component schema reflection (751 lines)
  - Introspects DataComponents class
  - Extracts generic type parameters via reflection
  - Classifies types: Record/enum/primitive/Holder/EitherHolder/Unit
  - Probes StreamCodec for VarInt vs Int detection
  - 104 components: 56 auto-matched to schema, 48 requiring overrides

- **GenBiomes.java** - Biome data extraction
- **GenBlockEntities.java** - Block entity type extraction
- **GenComponents.java** - Component type enumeration
- **GenEntities.java** - Entity type enumeration

**tools/hand-crafted/** directory
- **component_schema.json** - Complete 104-component type schema
  - Wire format definitions
  - Type patterns and parameters
  - Overrides for complex types
  - 53 embed patterns, 22 tuple, 6 array, 4 empty, 3 eitherholder, 16 custom

- **component_schema_overrides.json** - Manual corrections
  - 49 entries with pattern-specific wire format overrides
  - 10 comment entries for documentation

- **naming_overrides.json** - Go naming conventions
  - Component name mappings (map_id → MapID)
  - Block trim prefix types
  - Block property naming rules

- **packet_phases.json** - Protocol phase metadata
  - Phase ordering and Go code prefixes
  - Section comments for generated code organization

**helpers.go** (184 lines)
- JSON file reading (readJSON, readHandCrafted)
- File writing with formatting
- Go package/type name validation
- Lazy loading for hand-crafted data

**main.go** (157 lines)
- CLI interface: --extract (full pipeline) vs direct generation
- Version argument handling
- Generator selection and execution
- Error handling and progress reporting

**docs/README.md** - Project overview
- Architecture overview
- Getting started with examples
- Breaking changes from MC upgrades
- Protocol compatibility table

**docs/dev/README.md** - Development guide
- Contributing guidelines
- Testing expectations
- Code organization

**docs/dev/tools.md** - Generator documentation (164 lines)
- Each generator's purpose, inputs, outputs
- Hand-crafted data format specifications
- Component schema format details
- Maintenance procedures

**docs/dev/minecraft-internals.md** - MC protocol reference
- Packet structure overview
- Registry format explanation
- Component encoding scheme

1. **Separation of Concerns**
   - Java extractors run in container with MC runtime
   - Go generators operate on extracted JSON
   - Hand-crafted schema acts as single source of truth for components

2. **Generic Type Handling**
   - Java reflection probes StreamCodec for wire format
   - Template patterns for EitherHolder, Holder, CompositeHolder
   - Auto-detects VarInt vs Int encoding

3. **Manual Override System**
   - 49 components require manual wire format overrides
   - 56 auto-matched via schema pattern recognition
   - 10 comment entries for future maintenance notes

4. **Tool Distribution**
   - Separate tools/go.mod to isolate generator dependencies
   - No tooling deps in main module
   - Single command for full extraction + generation

- Protocol 774
- Java API changes: ResourceLocation → Identifier
- Block entity refactoring: new types and properties
- Chat fixes and improvements
- Item component system expanded to 104 types

- All generators validate Go syntax before write
- go build ./... and go test ./... pass on generated code
- Container-based extraction ensures consistency
- Hand-crafted data independently reviewed

- 33 files changed, 5543 insertions(+), 9 deletions(-)
- Primary additions: 8 generators (2847 lines), Java extractors (2148 lines)
- Documentation: 292 lines
@mj41
Copy link
Author

mj41 commented Feb 21, 2026

I will add some more bugfixes to mj-121-v2-all-fixes branch for now:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant