Replace unnest with compactColumns for columnar data transformation#171
Open
DZakh wants to merge 23 commits intodz/decode-encodefrom
Open
Replace unnest with compactColumns for columnar data transformation#171DZakh wants to merge 23 commits intodz/decode-encodefrom
DZakh wants to merge 23 commits intodz/decode-encodefrom
Conversation
- Renamed 'unnest' schema field to 'compactColumns' in type definitions - Removed old unnestSerializer and unnest function - Implemented new compactColumns function with decoder/encoder architecture inspired by jsonStringDecoder, jsonDecoder, and textEncodeDecoder - New API: S.compactColumns(S.unknown)->S.to(objectSchema) - Updated tests to use new API - Tests are currently failing for reverse operations (encoder) https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Changed compactColumns field type from bool to schema type - Updated S.d.ts with proper generic types for compactColumns - Updated .resi files with correct type signatures - Removed FIXME comment about deleting compactColumns on reverse - Fixed S_test.ts to not use 'as' cast - Updated test to verify schema is stored in compactColumns field https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Reverse compactColumns schema on S.reverse() call - Add decoder test to CompactColumns test - Add test with S.compactColumns(S.json) and S.bigint field - Restore expectType check in tests https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
…ays schema - Add arrayFormat type with CompactColumns variant - Remove compactColumns field from schema types - compactColumns function now creates array of arrays schema using additionalItems - Set arrayFormat = CompactColumns as identifier - Update type signature to t<'value> => t<array<array<'value>>> - Update interface files and TypeScript definitions - Update tests to use %raw for runtime type comparisons https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Add arrayFormat to format type spread instead of separate field - Remove arrayFormat field from Array variant and untagged types - Use existing format field for compactColumns identifier - compactColumnsDecoder falls back to arrayDecoder when no .to - Use array factory instead of base for inner array creation - Add TypeScript types: NumberFormat, StringFormat, ArrayFormat, Format - Update tests to check format field https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Fix toExpression to return inner element type for forward compactColumns schemas - Fix toExpression to return object expression + "[]" for reversed compactColumns schemas - Add compactColumns export to S.js for JavaScript/TypeScript users - Update S.d.ts to use ArrayFormat type for array schemas - Note: Field-level transformations (nullAsOption, bigint) are not applied in compactColumns - it creates raw objects without field transformations https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Fix S.to to store properties directly on compactColumns schema instead of chaining via .to, which prevents objectSchema decoder from running in the reversed direction - Fix compactColumns encoding for nullAsOption fields by properly detecting direction using selfSchema.parser presence - Add CompactColumnsSchema type and S.to overload for proper TypeScript type inference with compactColumns - Fix test argument order in edge case tests - Update test expectations for non-object schema error message https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Use default S.to logic instead of special-casing compactColumns - Add skipTo handling in compactColumnsDecoder for both forward and reverse - Add check in objectDecoder to skip when .to is compactColumns (reversed chain) - Fix direction detection using selfSchema.to presence - Update toExpression to show proper array types for compactColumns - Fix toExpression for reversed compactColumns to show [] - Keep CompactColumnsSchema type and S.to overload for proper type inference https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Add objectDecoder skip logic for reversed compactColumns chain - Restore direction detection in compactColumnsDecoder - Add compactColumns and reverseConvertOrThrow to S.js exports - Add reverseConvertOrThrow type to S.d.ts - Update TypeScript tests to use parser and reverseConvertOrThrow https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Add compactColumns export to Pack.res for TypeScript/JS usage - Update S.d.ts: remove reverseConvertOrThrow (use S.encoder), fix compactColumns type to Output[][] - Fix toExpression for compactColumns without S.to to show inner schema type (e.g., "string[][]") - Update S_test.ts to use S.encoder instead of reverseConvertOrThrow - Add test for compactColumns toExpression without S.to https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Add Alpha.6 section documenting compactColumns feature - Remove accidentally committed genType generated files https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Fix compactColumnsDecoder direction detection to be based on input type rather than selfSchema.to - Fix compactColumnsEncoder to pass through when input is already in columnar format (happens when called as parser after reverse) - Add U.assertCompiledCode checks to all compactColumns test cases - Use S.untag instead of Obj.magic to access format field in tests https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Merge Alpha.6 into Alpha.5 in IDEAS.md - Support empty objects in compactColumns (parse empty columnar to empty array) - Improve error message for non-object schemas to: "S.compactColumns supports only object schemas. Use S.compactColumns(S.unknown)->S.to(objectSchema)." - Update tests to reflect new behavior https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
…nto claude/replace-unnest-compactcolumns-Zs0Za
…lAsOption fields - Forward direction (parse): Convert null to undefined for fields with null variant - Reverse direction (encode): Detect undefined->null transformation by checking if field schema's anyOf contains an undefined variant with .to pointing to null - Update tests to expect null<->undefined transformation in compiled code - Fix TypeScript test to expect undefined output when parsing nullable fields https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
The compactColumns transformation produces/consumes an array of objects,
so the target schema must be S.array(objectSchema) not just objectSchema.
Changes:
- Update compactColumnsDecoder to extract properties from selfSchema.to.additionalItems
- Update TypeScript tests to use S.array(S.schema({...}))
- Update ReScript tests to use S.array(S.schema(...))
- Update error message to reflect the new API format
- Update compiled code expectations for the array wrapper validation
https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
…nto claude/replace-unnest-compactcolumns-Zs0Za https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Fix Example_test to use reverseConvertOrThrow instead of reverseConvertToJsonOrThrow (optional fields with undefined can't be converted to JSON) - Fix S_null_test to expect throw for reverseConvertToJsonStringOrThrow with complex optional nullable fields - Update S_null_test compiled code snapshot for new validation patterns - Update S_refine_test compiled code snapshots for new refine compilation - Fix bug in getShapedSerializerOutput where flattenedOutput.vals could be None https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
- Remove propsToUse logic in toExpression for compactColumns (compactColumns never has its own properties) - Reuse array expression logic for compactColumns without S.to - Remove Object with CompactColumns toExpression case - Remove shouldSkipForCompactColumns logic from objectDecoder (only compactColumns decoder should handle this) - Revert unrelated flatten fix - Change mut.serializer to mut.encoder for compactColumns - Update S_toExpression_test.res to use S.array wrapper for reversed compactColumns test https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR replaces the
S.unnestfunction with a newS.compactColumnsfunction that provides better support for columnar data transformations. The new implementation uses aformatfield instead of a boolean flag and includes improved encoder/decoder logic for bidirectional transformations.Key Changes
New
compactColumnsfunction: ReplacesS.unnestwith improved semantics for transforming between row-oriented and column-oriented data formats.to()chainsType system updates:
arrayFormattype withCompactColumnsvariantformatunion to include array formatsformat?: arrayFormatinstead ofunnest?: boolunnestfield from internal schema representationEncoder/Decoder implementation:
compactColumnsEncoder: Handles forward direction (columnar → rows)compactColumnsDecoder: Handles both forward (rows → columnar) and reverse (columnar → rows) directions.skipToflag to prevent double-processing in chained transformationsExpression generation: Enhanced
toExpressionto properly display columnar data types with column arrays notationCode formatting: Normalized S.js to use consistent line endings (removed semicolons)
Test infrastructure: Added e2e test files for genType integration
Implementation Details
The new
compactColumnsimplementation creates an outer array schema withformat = CompactColumnsand attaches custom encoder/decoder builders. The decoders intelligently detect whether they're in a forward or reverse transformation chain and adjust behavior accordingly:This approach provides cleaner semantics than the previous
unnestimplementation while maintaining full bidirectional support.https://claude.ai/code/session_01SKZEuGXnrhtTCktzecM4WE