-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix scvals to print in json string #305
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass looks good, just a couple comments
There is a canonical XDR-JSON format that more tooling is using, the Lab uses it, the CLI uses it, the Soroban Rust SDK uses it, and this could use it too. It's implemented in the Rust XDR lib, and that lib is also published as a Wasm powered NPM package. You could take the Wasm and experiment with powering the JSON generation with it too so that you're using the same JSON format. If the JSON format is largely internal, then this doesn't matter, but has much more value if users see it. Rust XDR Lib: https://github.com/stellar/rs-stellar-xdr If this is of interest we can build a Wasm-powered Go lib that does same. |
Ooo it would be nice if there was the Go lib equivalent. Not sure how much interest there would be for it though. As it pertains to this PR and change it is purely for stellar-etl (basically internal). I don't believe it is worth porting (or calling rust or js) for this specific case.
Is there an existing example of this JSON format? I assume it is largely the same because it is just ScVals but doesn't hurt to check. |
Install the The CLI manual is at: https://developers.stellar.org/docs/tools/developer-tools/cli/stellar-cli#stellar-xdr-decode The following commands example encodes and decodes, they work in reverse with the
If you're not sure what a type is, use If there are any ideas for improvement to the Some things worth calling out: The JSON output will include 64-bit integers, so you need to use a custom JSON parser, which is easy, in JavaScript runtimes which only support integers up to 53-bits. This won't impact Go usage, or most non-JS runtimes, that supports 64-bit integers in JSON. Strings are escaped ascii UTF-8 encoded, not simply UTF-8 encoded. These two attributes aren't well documented in the dev docs, but added them now: |
The JSON can also be explored at: |
jsonMessage, err = xdr2json.ConvertBytes(xdr.ScVal{}, raw) | ||
if err != nil { | ||
return nil, nil, err | ||
} | ||
|
||
serializedDataDecoded["value"] = string(jsonMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the xdr2json. It works pretty well and will now output like
{
"type": "Instance",
"value": "{\"contract_instance\":{\"executable\":\"stellar_asset\",\"storage\":[{\"key\":{\"symbol\":\"METADATA\"},\"val\":{\"map\":[{\"key\":{\"symbol\":\"decimal\"},\"val\":{\"u32\":7}},{\"key\":{\"symbol\":\"name\"},\"val\":{\"string\":\"EURMTL:GACKTN5DAZGWXRWB2WLM6OPBDHAMT6SJNGLJZPQMEZBUR4JUGBX2UK7V\"}},{\"key\":{\"symbol\":\"symbol\"},\"val\":{\"string\":\"EURMTL\"}}]}},{\"key\":{\"vec\":[{\"symbol\":\"Admin\"}]},\"val\":{\"address\":\"GACKTN5DAZGWXRWB2WLM6OPBDHAMT6SJNGLJZPQMEZBUR4JUGBX2UK7V\"}},{\"key\":{\"vec\":[{\"symbol\":\"AssetInfo\"}]},\"val\":{\"vec\":[{\"symbol\":\"AlphaNum12\"},{\"map\":[{\"key\":{\"symbol\":\"asset_code\"},\"val\":{\"string\":\"EURMTL\\\\0\\\\0\\\\0\\\\0\\\\0\\\\0\"}},{\"key\":{\"symbol\":\"issuer\"},\"val\":{\"bytes\":\"04a9b7a3064d6bc6c1d596cf39e119c0c9fa4969969cbe0c264348f134306faa\"}}]}]}}]}}"
}
How does this look now @sydneynotthecity @leighmcculloch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i like this better than cli. i don't like that we've copied over xdr2json
and think that that should be published as its own package but that's outside the scope of this work.
do you think there is merit in us using xdr2json to parse operation details in the future? this looks much simpler to parse vs what we have to do today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends. For the nested params (also ScVals) for invoke host function yes this makes sense. For some of the other operations where we do some mafhs or add extra non-xdr labels maybe 🤷♀️
I think we could technically redesign the operations.Details field as a whole too.
Definitely worth a spike though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the value have two layers of JSON? The outer layer doesn't look like the canonical xdr-json format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outer layer is not. There's a few things that need to be reworked (mentioned it should be put back in draft).
I think it's debatable whether or not to remove that outer layer or not.
Pros with the extra Type/Value outer layer
- Gives a fast/easy way for big query to filter by Type (e.g., I'm only looking for ScMaps)
- The value in the outer layer Type is equivalent to
ArmForSwitch(ScVal.Type)
strings rather than the string names from xdr2json (e.g.,Instance
vscontract_instance
). I don't know if there is an existing list of the type names like in the xdr const or ArmForSwitch() - iirc the JSON type in BQ is weird and can't pass the xdr2json json directly to the JSON type column in BQ. This is why we have like a BQ column named
topics_decoded
but the JSON inside also starts withtopics_decoded
meaning to access it you have toselect topics_decoded.topics_decoded.<the json field you want>
Cons
- Technically the JSON by itself will probably be good enough for anything the data team or anyone else wants to do.
- 1 less column means storage/cost savings in BQ
One thing I should do is I think we can save it as a JSON instead of string(json). This will depend on how many schema changes and if it's okay to change to a generic interface{}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the json isn't stable between protocol versions. I mean it won't change unnecessarily, but xdr structure can change in ways that is binary backwards compatible but not structurally backwards compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't pass the xdr2json json directly to the JSON type column in BQ
Would be helpful to get some feedback about the structure of the json and what you'd change if anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the json isn't stable between protocol versions. I mean it won't change unnecessarily, but xdr structure can change in ways that is binary backwards compatible but not structurally backwards compatible.
That's unfortunate but makes sense (aka unified event changes)
Yup I'll revise and update with the proposed structure on Monday for y'all's review. I'll add some examples SQL queries to show what it'll look like in BQ as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking and playing with the ScVal column in BQ I think it makes sense to just make everything a generic interface (use whatever xdr2json spits out). The main reason I liked the outer layer was for easier filtering BUT I think that filtering ScVal types will be rare but still possible in BQ. And if it is a pain this change could be added downstream (dbt model for enriched contract events) keeping the history_contract_events
table at its rawest form.
Another benefit is the human names for the ScVal types from xdr2json are nicer than the ArmForSwitch
. Not sure why they diverged but I would say it'd be easy enough to make a new map/const/enum for the xdr2json names if needed (or I wonder if we can update ArmForSwitch).
Example traversing column in BQ is still the same except with the xdr2json names
select
val_decoded.contract_instance.storage[0].val.map.symbol
from history_contract_events
Example filtering in BQ
select
val_decoded
from history_contract_events
where true
-- filter for only contract_instance isn't that difficult but less straightforward
and json_extract(val_decoded, '$.contract_instance') is not null
Original filtering in BQ
select
val_decoded
where true
-- ArmForSwitch name for Type
and val_decoded.Type in ("Instance")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why they diverged
There's a few things that the Rust xdr lib does that we didn't implement in other xdr libs:
- Arm names are reduced to their unique text, so if they have a long prefix shared among them, that prefix disappears. Unfortunately this doesn't work for single value enums 😅
- Anything representable by a strkey is rendered as the strkey instead of the structure it contains.
- Asset codes are rendered as escaped ascii instead of hex, base64, or hope-for-the-best utf8.
Updated dependencies detected. Learn more about Socket for GitHub ↗︎
|
Ready for review again tl;dr
|
Does this mean there will need to be a schema change or data type change in BigQuery as well? If there are downstream implications we'll need to coordinate with dependent partners so that we don't break them. |
There won't be any downstream data type changes. It is still JSON BQ data type. There will be implications if anything is built on top of the contents of these ScVals which seems unlikely because they were barely parseable before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, i didn't review the xdr2json
lib itself and assume that it matches the original
PR Checklist
PR Structure
otherwise).
Thoroughness
Release planning
semver, and I've changed the name of the BRANCH to major/_ , minor/_ or patch/* .
What
github issue
Fix printing of ScVals for contract_data and history_contract_events to print in a JSON parsable format
Why
The ScVal.String() function uses the %v to print out the ScVal struct which causes some oddities in how nested maps, arrays, and contract instances were converted to where they did not conform to a normal map/json structure. By fixing ScVals to print in a JSON parsable format we can more easily dive into decoded keys/vals using parse_json() in BigQuery
Known limitations
Edit:
interface{}
rather than the originalmap[string]string