-
Notifications
You must be signed in to change notification settings - Fork 49
Deep Data Structures Manipulation Design
Ilya Sher edited this page Oct 18, 2021
·
26 revisions
As data-focused programming language, Next Generation Shell should have deep data structures manipulaton as part of the standard library.
NGS GitHub Issue #437: Design Deep Data Structures Operations
- Minimal syntax changes (implemented
pattern -> action) - Customizable for user-defined types (postponed)
- Should work with tree-like data structure where "pre" and "post" visit are alternatives (processing the node or the children first)
- Should work with flat data structures as well (
Arr,Hash) (implemented) - Integrate well with existing facilities (implemented)
-
=~and patterns - Universal Patten Matching
-
- support transformation (map-like behavior) (implemented)
- support filtering (filter-like behavior) (implemented)
- provide context to
deep_...functions callbacks (path, etc), not only current value (implemented) - support passing arbitrary values from the callback to other callbacks invocations (todo)
- support in-place modifications? (maybe later)
- maybe: support for data which is not present yet - when needed additional data will be lazily fetched (think AWS results pagination, where you maybe find what you are looking for before fetching all pages) (maybe later)
- maybe: support implicit loops (not sure what's the use case) (maybe later)
- maybe: support of pub/sub (maybe somewhat like Go channels and
select) (maybe later)
- Remove all
Hashes that match{'url': ...}which are in an array which resides underextensionkey in anyHashin the data. (implemented:my_data.tr({"url": AnyOf("http://hl7.org/fhir/StructureDefinition/maxValue", "http://hl7.org/fhir/StructureDefinition/minValue")} -> skip(Y))) - Remove all empty arrays/hashes/strings (or strings containing only spaces) from a data structure (implemented:
my_data.tr([] -> skip(Y))) - Parse JSON files into NGS objects (more thought needed regarding schema and serialization/deserialization in general) (maybe later)
- Prepend/append element to an array (implemented:
tr([1,2,[3,4]], Arr -> { ["start"] + descend(B) + ["end"] } )) - "Deep uniq" - remove duplicate primitive values anywhere, remove empty Hashes or Arrays. See https://github.com/ngs-lang/nsd/blob/5c6cd31930cabe28b20bea6fb13e0631e655e2f5/data-manipulation/deep-uniq.ngs (maybe later)
In no particular order:
- XPath
- XSLT
- FHIRPath - note that FHIRPath was created for validation and hence lacks semantics for modification of the data. "... needs for path navigation, selection and formulation of invariants ..."
- GraphQL - not sure if it's a good match for arbitrary nested data such as FHIR
- Neo4j - native graph database platform
- https://github.com/tobega/tailspin-v0/blob/master/TailspinReference.md#projections-and-lenses (thanks u/tobega)
- JsonPath
- JMESPath
- Raku Grammars
- Libraries
- New type -
TrContext- transformation context (implemented) -
PatternActionobjects are restricted to (Any -> Fun) type in this design, at least for now. Actions are called if patterns match.- Note that this allows defining functions like for example
skip(Any, TrContext)and then using them likeSomething -> skip. (don't know how to solve yet. dangerous because laterskip(?, pattern:Any)can be defined and might potentially take precedence)
- Note that this allows defining functions like for example
- New method
tr(data:Any, transformation:Any)
UPM patterns will be used. UPM patterns need to be updated to return MatchResult (containing both indexed and named matches)
TBD: Whether UPM functionality will be extended with something specific for tr().
UPM should have a way to match on the context.
Action callbacks (in pattern -> action) will be called with the following arguments:
- Current matched data
- Context object with:
-
[N]- Positional matches (up until now) -
.something- Named matches (up until now), can not start with_. -
._path- an array which represents the path of current focus relative to object root.[]would signify the root. (implemented) -
._root- the root object with whichtr()(or=~) was invoked. (implemented) -
._cur- current item being processed (implemented, used bydescend(TrContext), not sure should be "public API") -
._transformation- currently active transformation rules (implemented) -
.del()method to delete currently matching data:bad_data_pattern -> { B.del() }(maybe later) -
.replace()...bad_data_pattern -> { B.replace("(warning, bad data: $A)") }(maybe later) -
.return(v)- returnvas top level value of thetr()operation (not done yet, open issue: how to set a default if.return()is not called) -
.descend()- continue recursion into current focus object. (implemented) TBD: pass rules/mode? Maybe.tr()? -
.skip()- exclude current focus from the new structure (implemented)
-
An action callback can either
- Return a value to be used as replacement in the newly generated data structure or
- Call one of the methods on the context object (to affect original data structure or do other, TBD, operations)
# Removing specific extension from FHIR Questionnaire (implemented)
questionnaire.tr({'url': 'my_extension'} -> skip(Y))
# Removing extension can lead to empty array, which is invalid in FHIR
questionnaire.tr({[] -> skip(Y)})
# Is our extension used?
questionnaire.tr({'url': 'my_extension'} -> {B.return(true)})
# maybe # our_ext_used = questionnaire =~ Deep({'url': 'my_extension'})
- What would
tr(data:Any)mean? (Without any transformations passed). Currently nothing. Maybe: cleanup all empty arrays/hashes/strings. - Think about peg/leg style parsing, where
- The focus is moving inside a string
- Patterns which are defined later are referenced from earlier rules
- Augment UPM with
=~~and!~~for recursive search? Probably not but provideDeep(pattern)facility. - Start deprecating the older
~match? - Base the solution exclusively on Universal Pattern Matching?
- Probably not, sounds like
filter()-like andmap()-like functionality should be in anyway.
- Probably not, sounds like
- A function to get to element at given path? Example:
data.at([0, "a", 10, "b"])- How to behave for missing path? Exception?
- Would it work with patterns for path and handle multiple paths? (Example:
data.at([Int, {"x": 1}]))
- Visit the node first or children first?
- Implement as
visit_node()andvisit_children()? -
visit_node()would be a no-op for non-tree containers such asArr?
- Implement as
- "view" design that would allow modifying or operating on previously selected paths/items. -- supported by specifying the rules that only work with "interesting" objects + default rules to recursively apply rules.
- Context object (to pass to callbacks to each/visit/etc) with
- Current path
- Most recent container, index/key, and value
- Should callbacks be multimethods? (Considering that currently there is no syntax for expressing unnamed multimethods so specifying multimethod inline is not straightforward).
- Naming:
deep_...vs..._deep
- Searches:
nested data structuresdeep data structures-
graph data structures(less applicable)
NGS official website is at https://ngs-lang.org/