Skip to content

Sync vs Async Worksheet

Dehann Fourie edited this page Aug 20, 2021 · 74 revisions

MOVED TO DISCUSSION:

https://github.com/NavAbility/NavAbilitySDK.jl/discussions/22


TEXT BELOW IS NOW ARCHIVED

Short-Tape-Long-Tape Requirements (Duplication-model)

Illustration of the Duplication Model:

  • (DF) Duplication == Tee

References to related open and closed discussions:

Synchronization Requirements

  • addFactor! only makes sense after respective variables are in the graph.
    • User must at minimum for any <:AbstractDFG first call required addVariable!s before calling addFactor!, or restrictions on DFG API case,
      • (DF) think it is okay to use fetch(addFactor!(...)) as mechanism for blocking on async call.
    • Alternate usage (similar to copyGraph! case) first do all addVariable! calls of graph, then all addFactors!,
      • probably benefits batch / mini-batch usage.
      • (SC) I do this already in copyGraph, and Jim's queuing algorithm also bumps factor to the end of the queue if it can't find all the variables. However, in IIF we don't wait for them to exist first. That will need to happen, or we will have to remove the existence check.
      • (JT) think you only need to know it will exist at some time in the future, ie. the user tried to add the variables.
  • For cloud to scale, async processing is fundamentally needed:
    • (JT) how to ensure comms through unstable internet still works,
      • (JH) task id returned as acknowledge?
    • (SC) can addVariable__ and addFactor__ return task ids?
      • Also see, Julia Docs on @async
      • (DF) maybe API-mod, addVariableAsync!, addFactorAsync! -- returns task ids rather than return DFGVariable/DFGFactor?
        • Legacy equivalence would then be addFactor!(w...;kw...) = fetch(addFactorAsync!(w...;kw...))?
      • (DF) alternative is change entire DFG API to always return tasks, and fetch(addFactor!(...)) (via a Task)?
      • (SC) Is this a problem though? One that we need to address?
      • (JT) I'm not sure why we would want to wait for the result of addFactor! from the cloud?
  • Maintain symmetry as much possible on DFG driver implementations which use <:AbstractDFG.
    • (DF) desire for ease of use for novice user -- (s)he goes, hey let me just hack a graph in REPL quickly to see if stuff works...
    • Maintain cross-feeding as emergent property from good symmetry/commutative, e.g.
      • copy!(cfg, lfg), copy!(lfg, cfg), or loadDFG!(cfg, filedfg), or merge!(lfg, cfg)
    • NavAbilitySDK.CloudDFG, FileDFG, FuseDFG, ZMQDFG, all share consistent serialization
    • Sam's question on if generators should be changed for SDK / async server pipeline?
      • (DF) versatile generators probably good (and internal reasons), but NaviSDK should allow wide operating regime since users are likely to build graph generators of their own (in a robot) that look very similar to current canonical generators.
  • Implement Tee as feature downstream from DFG (suggested to do so in SDK/IIF/RoME as "robotic" short-tape-long-tape feature),
    • Simultaneously allows the nav-engineer to easily leverage features like clique-recycling (fixed-lag-window) on local short-tape without having to learn / override 'smart features' that differs between different DFG drivers (i.e. loss of symmetry).
  • New graph nodes can have value solvable=0 set to avoid concurrency issues with services,
    • A must, since DFG is a highly concurrent object with many things (micro / services / agents) operating on DB at the same time.
      • It takes time for changes to graphs to show up everywhere where needed, while services keep flying.
    • Single atomic setSolvable!(newnodeset, 1) once nodes edges are confirmed to exist on server, and similar atomic for backendset semaphore below.
  • Previous slamindb had graph nodes also carry a flag backendset=1 from IIF.solveGraph!,
    • (DF) Was a necessary "semaphore" to simplify graph discovery processes during the solve, possibly missing in current stack, must check...
  • (JT, pp. DF) pay close attention to how errors are return from DFG drivers.(see linked discussions above)

Serialization

  • Ongoing Serialization refactor thread at DFG 590

XRef Related Wikis


Conversations / Questions

SDK Smarts? (17 Aug 2021)

  • (JT/DF) is this a case of SDK = CloudDFG + SDK_smarts, or do the 'smarts' get built in other packages like IIF/RoME/Caesar?

  • (SC) Can someone please clarify smarts?

    • (DF) sorry, for example Tee-feature, is a "smart" piece of logic that uses the DFG interface between two fg objects (e.g. a remote and local graph). There may be other "smart" features that use DFG logic in a different combo. So I suggested the work "smarts" as some of the emergent features that follow from symmetry in DFG. So the question for me is should the 'smarts' be built in DFG or downstream. In my mind IIF/RoME are candidates for where to put the 'smarts'. Another way perhaps the SDK has an internal "module" CloudDFG which respects the symmetry requirement, and other features ('smarts') are then part of the SDK. My concern is that 'smarts' end up being baked into CloudDFG somehow, but that weakens symmetry with other DFG drivers. All as reasonable of course.
      • Is this too cowboy: can we do the Tee using something like [SDK/IIF/RoME].duplicateToRemote(tc::TConn, lfg) = merge!(tc.cfg, lfg)?
  • (JT) For symmetry, as long as the API (https://juliarobotics.org/DistributedFactorGraphs.jl/latest/imgs/CoreAPI.png) is used it shall work/be consistent (I think we all feel strongly enough about symmetry that it can be a requirement). We/users should only use "by reference" modification if they know what they are doing, ie the pattern is getVariable - modify - updateVariable!. (If you get* and change it locally it won't be synced automatically)

  • (JT) Different smarts go different places... for example, IIF currently copies a CloudGraphsDFG (Neo4jDFG) to a local subgraph for every clique and solves it locally before updating the CGDFG. My first guess on the T will be that DFG defines an abstract and it gets extended by smarts in NavSDK and RoME. If it's not absolutely necessary NavSDK should not depend on RoME.

  • (JT) My current preference is for CloudDFG(NavSKD) to handle communications without blocking on add*. Previously I could not use GraffSDK practically as communication was too slow and ended up writing my own version for a local network (we don't want that to happen). I still had to use a queue even on the local network. I build this into the tee, but think it should work in CloudDFG.

SDK Needs 2021/8/17 (Sam asked for)

  • Overall design "philosophy" for a CloudDFG - what is minimal spec for basic operations (e.g. a task ID is returned, what do I do with it). I'd like to know that we all agree with each method returning a task ID and I need to check it.
    • DF, "symmetry" for all things <:AbstactDFG is my primary concern,
      • don't build too many smarts into CloudDFG,
      • put addFactor-needs-variables-first sync buffer logic on server side as able?
    • DF, and suggest that Tee should not happen in CloudDFG driver, but downstream maybe SDK||IIF||RoME,
    • (SC) At the moment symmetry is maintained because the only thing changing is the return types, which are not hard-typed. So that's ok for now unless someone has an issue.
  • What are the essential behaviors that a user would want (end-user features, calling this MVP) before we can release it?
    • DF, my instinct is to go for minimum set requiring user upload of batch compute, then build out towards short-tape-long-tape for one robot.
  • Do we modify IIF generators to allow to make it asynchronous compatible?
    • DF, mostly no, but internal reason for yes also valid -- see above.
    • (SC) Then at the moment the generator will not work directly with CloudDFG. Let me know if this should change.
  • How do we pair up a T connector to stream data in as we load it from a local graph?
    • DF, see above (do Tee in RoME)
    • (SC) Ok, let's try it there.

Core Package Layout

Rev 17 August 2021

Legend:

  • Light blue is where most of current work is happening.
  • Yellow boxes are packages that are a little behind main branch.
  • Orange boxes are quite far behind main branch.
  • Orange arrows are current plans but work still needs to be done. (AMP 41, Mani 405)
  • Dark blue is suggested location for Duplication-Tee (DF).
  • Grey with Red X are packages that will be deprecated as soon as possible.

Gist for uploading images

https://gist.github.com/dehann/c62377671cd7d69a901696b8ffb57e2a