Skip to content

feat(std.zon): add escape_unicode options to zon.serializer #23596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nurulhudaapon
Copy link

@nurulhudaapon nurulhudaapon commented Apr 17, 2025

Currently std.zon.stringify.serialize always escapes Unicode characters, while std.json.stringify by default does not. This change adds an escape_unicode option that matches the JSON serializer's behavior. To maintain backward compatibility, the default value is true, preserving the current behavior of escaping Unicode.

Change

Before

std.zon.stringify.serialize(buff, .{ .whitespace = true }, writer)

test "std.zon.stringify.serialize escape_unicode = true (default)" {
    const buff = .{ .name = "Test", .description = "⚡ Lightning Bolt", .emoji = "⚡" };
    var buff_str = std.ArrayList(u8).init(std.testing.allocator);
    defer buff_str.deinit();
    try std.zon.stringify.serialize(buff, .{ .whitespace = true }, buff_str.writer());
    std.debug.print("\n{s}\n", .{buff_str.items});
}

Output:

.{
    .name = "Test",
    .description = "\xe2\x9a\xa1 Lightning Bolt",
    .emoji = "\xe2\x9a\xa1",
}

After

std.zon.stringify.serialize(buff, .{ .escape_unicode = false, .whitespace = true }, writer)

test "std.zon.stringify.serialize escape_unicode = false (added option)" {
    const buff = .{ .name = "Test", .description = "⚡ Lightning Bolt", .emoji = "⚡" };
    var buff_str = std.ArrayList(u8).init(std.testing.allocator);
    defer buff_str.deinit();
    try std.zon.stringify.serialize(buff, .{ .escape_unicode = false, .whitespace = true }, buff_str.writer());
    std.debug.print("\n{s}\n", .{buff_str.items});
}

Output:

.{
    .name = "Test",
    .description = "⚡ Lightning Bolt",
    .emoji = "⚡",
}

Test

const std = @import("std");

test "std.zon.stringify.serialize escape_unicode = false" {
    var buf = std.ArrayList(u8).init(std.testing.allocator);
    defer buf.deinit();

    try std.zon.stringify.serialize(
        .{ .char = "abc⚡" },
        .{ .escape_unicode = false },
        buf.writer(),
    );
    try std.testing.expectEqualStrings(".{ .char = \"abc⚡\" }", buf.items);
    buf.clearRetainingCapacity();
}

Use Case

I was trying to store Unicode data in a ZON file, which I previously did in JSON. When converting from JSON to ZON using the JSON parser and ZON serializer, the Unicode characters were always escaped. This made the ZON file hard to read, which defeats its purpose as a human-readable format.

Currently std.zon.stringify.serialize will always produce unicode to be escaped, whereas in std.json.stringify by default doesn't escape unicode. Adding escape_unicode option matching with the json serializer but by default it is false (as the current behaviour) to keep things backward compatible.

```zig
const std = @import("std");

test "std.zon.stringify.serialize escape_unicode = false" {
    var buf = std.ArrayList(u8).init(std.testing.allocator);
    defer buf.deinit();

    try std.zon.stringify.serialize(
        .{ .char = 'অ' },
        .{ .escape_unicode = false },
        buf.writer(),
    );
    try std.testing.expectEqualStrings(".{ .char = \"অ\" }", buf.items);
    buf.clearRetainingCapacity();
}
```
@alexrp
Copy link
Member

alexrp commented Apr 17, 2025

cc @MasonRemaley

@MasonRemaley
Copy link
Contributor

MasonRemaley commented Apr 17, 2025

Thanks for the PR!

I'll take a look at this and the other Unicode related issue today. In particular, I want to look into whether or not it's necessary to maintain backwards compatibility with the current behavior.

[EDIT] Sorry for the delay, haven't forgotten about this though will get to it soon!

@nurulhudaapon
Copy link
Author

Thanks for the PR!

I'll take a look at this and the other Unicode related issue today. In particular, I want to look into whether or not it's necessary to maintain backwards compatibility with the current behavior.

Yeah, I feel like it doesn't need to be backward compatible and should by default not escape unicode since this is usual behavior in most serializer and zon.serializer has not been adopted that much yet.

@MasonRemaley
Copy link
Contributor

MasonRemaley commented Apr 24, 2025

Apologies for the delay on this!

Looking it over, there was no good reason for me to escape everything by default. Adding escape_unicode as an option is good, and it should be false by default.

However there's one important case that needs to be addressed before this can be merged. Unless I'm missing something, the implementation here now doesn't escape \ or " which is necessary for correctness.

You can see how std.json handles this here. I think escaping these two characters is sufficient to guarantee that the output is a valid Zig string, but it's worth double checking stringEscape to make sure it's not doing anything else necessary.

@MasonRemaley
Copy link
Contributor

Linking the issue you filed #23535 here since it's related to this PR in that it's an example of a character that can't really be printed the way you'd expect right now. We probably want to figure out how to address this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants