-
Notifications
You must be signed in to change notification settings - Fork 2
Sync vs Async Worksheet
https://github.com/NavAbility/NavAbilitySDK.jl/discussions/22
Illustration of the Duplication Model:
- (DF) Duplication == Tee

- https://github.com/JuliaRobotics/DistributedFactorGraphs.jl/issues/134
- https://github.com/JuliaRobotics/DistributedFactorGraphs.jl/issues/145
- https://github.com/JuliaRobotics/DistributedFactorGraphs.jl/issues/182
- https://github.com/JuliaRobotics/DistributedFactorGraphs.jl/pull/276
-
addFactor!
only makes sense after respective variables are in the graph.- User must at minimum for any
<:AbstractDFG
first call requiredaddVariable!
s before callingaddFactor!
, or restrictions on DFG API case,- (DF) think it is okay to use
fetch(addFactor!(...))
as mechanism for blocking on async call.
- (DF) think it is okay to use
- Alternate usage (similar to
copyGraph!
case) first do alladdVariable!
calls of graph, then alladdFactors!
,- probably benefits batch / mini-batch usage.
- (SC) I do this already in
copyGraph
, and Jim's queuing algorithm also bumps factor to the end of the queue if it can't find all the variables. However, in IIF we don't wait for them to exist first. That will need to happen, or we will have to remove the existence check. - (JT) think you only need to know it will exist at some time in the future, ie. the user tried to add the variables.
- User must at minimum for any
- For cloud to scale, async processing is fundamentally needed:
- (JT) how to ensure comms through unstable internet still works,
- (JH) task id returned as acknowledge?
- (SC) can addVariable__ and addFactor__ return task ids?
- Also see, Julia Docs on
@async
- (DF) maybe API-mod,
addVariableAsync!
,addFactorAsync!
-- returns task ids rather than return DFGVariable/DFGFactor?- Legacy equivalence would then be
addFactor!(w...;kw...) = fetch(addFactorAsync!(w...;kw...))
?
- Legacy equivalence would then be
- (DF) alternative is change entire DFG API to always return tasks, and
fetch(addFactor!(...))
(via aTask
)? - (SC) Is this a problem though? One that we need to address?
- (JT) I'm not sure why we would want to wait for the result of addFactor! from the cloud?
- Also see, Julia Docs on
- (JT) how to ensure comms through unstable internet still works,
- Maintain symmetry as much possible on DFG driver implementations which use
<:AbstractDFG
.- (DF) desire for ease of use for novice user -- (s)he goes, hey let me just hack a graph in REPL quickly to see if stuff works...
- Maintain cross-feeding as emergent property from good symmetry/commutative, e.g.
-
copy!(cfg, lfg)
,copy!(lfg, cfg)
, orloadDFG!(cfg, filedfg)
, ormerge!(lfg, cfg)
-
- NavAbilitySDK.CloudDFG, FileDFG, FuseDFG, ZMQDFG, all share consistent serialization
- (Contracts Schema, and DFG 590 link below).
- Sam's question on if generators should be changed for SDK / async server pipeline?
- (DF) versatile generators probably good (and internal reasons), but NaviSDK should allow wide operating regime since users are likely to build graph generators of their own (in a robot) that look very similar to current canonical generators.
- Implement Tee as feature downstream from DFG (suggested to do so in SDK/IIF/RoME as "robotic" short-tape-long-tape feature),
- Simultaneously allows the nav-engineer to easily leverage features like clique-recycling (fixed-lag-window) on local short-tape without having to learn / override 'smart features' that differs between different DFG drivers (i.e. loss of symmetry).
- New graph nodes can have value
solvable=0
set to avoid concurrency issues with services,- A must, since DFG is a highly concurrent object with many things (micro / services / agents) operating on DB at the same time.
- It takes time for changes to graphs to show up everywhere where needed, while services keep flying.
- Single atomic
setSolvable!(newnodeset, 1)
once nodes edges are confirmed to exist on server, and similar atomic forbackendset
semaphore below.
- A must, since DFG is a highly concurrent object with many things (micro / services / agents) operating on DB at the same time.
- Previous slamindb had graph nodes also carry a flag
backendset=1
fromIIF.solveGraph!
,- (DF) Was a necessary "semaphore" to simplify graph discovery processes during the solve, possibly missing in current stack, must check...
- (JT, pp. DF) pay close attention to how errors are return from DFG drivers.(see linked discussions above)
- Ongoing Serialization refactor thread at DFG 590
- Caesar.jl High Level Requirements wiki
-
Draper/Apollo DSKY inspired
verbNoun
API standard- Also see DFG API Quick Reference.
- Old McFlurry (Federated) Payloads Wiki.
- Caesar.jl Docs page to Cloud / SDK Service.
- Stefan Karpinski's talk on "Unreasonable Effectiveness of Multiple Dispatch and Cross Packages".
- Good example to show that C++ is single-dispatch only!
-
(JT/DF) is this a case of
SDK = CloudDFG + SDK_smarts
, or do the 'smarts' get built in other packages like IIF/RoME/Caesar? -
(SC) Can someone please clarify smarts?
- (DF) sorry, for example Tee-feature, is a "smart" piece of logic that uses the DFG interface between two fg objects (e.g. a remote and local graph). There may be other "smart" features that use DFG logic in a different combo. So I suggested the work "smarts" as some of the emergent features that follow from symmetry in DFG. So the question for me is should the 'smarts' be built in DFG or downstream. In my mind IIF/RoME are candidates for where to put the 'smarts'. Another way perhaps the SDK has an internal "module" CloudDFG which respects the symmetry requirement, and other features ('smarts') are then part of the SDK. My concern is that 'smarts' end up being baked into CloudDFG somehow, but that weakens symmetry with other DFG drivers. All as reasonable of course.
- Is this too cowboy: can we do the Tee using something like
[SDK/IIF/RoME].duplicateToRemote(tc::TConn, lfg) = merge!(tc.cfg, lfg)
?
- Is this too cowboy: can we do the Tee using something like
- (DF) sorry, for example Tee-feature, is a "smart" piece of logic that uses the DFG interface between two fg objects (e.g. a remote and local graph). There may be other "smart" features that use DFG logic in a different combo. So I suggested the work "smarts" as some of the emergent features that follow from symmetry in DFG. So the question for me is should the 'smarts' be built in DFG or downstream. In my mind IIF/RoME are candidates for where to put the 'smarts'. Another way perhaps the SDK has an internal "module" CloudDFG which respects the symmetry requirement, and other features ('smarts') are then part of the SDK. My concern is that 'smarts' end up being baked into CloudDFG somehow, but that weakens symmetry with other DFG drivers. All as reasonable of course.
-
(JT) For symmetry, as long as the API (https://juliarobotics.org/DistributedFactorGraphs.jl/latest/imgs/CoreAPI.png) is used it shall work/be consistent (I think we all feel strongly enough about symmetry that it can be a requirement). We/users should only use "by reference" modification if they know what they are doing, ie the pattern is
getVariable
- modify -updateVariable!
. (If you get* and change it locally it won't be synced automatically) -
(JT) Different smarts go different places... for example, IIF currently copies a CloudGraphsDFG (Neo4jDFG) to a local subgraph for every clique and solves it locally before updating the CGDFG. My first guess on the T will be that DFG defines an abstract and it gets extended by smarts in NavSDK and RoME. If it's not absolutely necessary NavSDK should not depend on RoME.
-
(JT) My current preference is for CloudDFG(NavSKD) to handle communications without blocking on
add*
. Previously I could not use GraffSDK practically as communication was too slow and ended up writing my own version for a local network (we don't want that to happen). I still had to use a queue even on the local network. I build this into the tee, but think it should work in CloudDFG.
- Overall design "philosophy" for a CloudDFG - what is minimal spec for basic operations (e.g. a task ID is returned, what do I do with it). I'd like to know that we all agree with each method returning a task ID and I need to check it.
- DF, "symmetry" for all things
<:AbstactDFG
is my primary concern,- don't build too many smarts into CloudDFG,
- put addFactor-needs-variables-first sync buffer logic on server side as able?
- DF, and suggest that Tee should not happen in CloudDFG driver, but downstream maybe SDK||IIF||RoME,
- (SC) At the moment symmetry is maintained because the only thing changing is the return types, which are not hard-typed. So that's ok for now unless someone has an issue.
- DF, "symmetry" for all things
- What are the essential behaviors that a user would want (end-user features, calling this MVP) before we can release it?
- DF, my instinct is to go for minimum set requiring user upload of batch compute, then build out towards short-tape-long-tape for one robot.
- Do we modify IIF generators to allow to make it asynchronous compatible?
- DF, mostly no, but internal reason for yes also valid -- see above.
- (SC) Then at the moment the generator will not work directly with CloudDFG. Let me know if this should change.
- How do we pair up a T connector to stream data in as we load it from a local graph?
- DF, see above (do Tee in RoME)
- (SC) Ok, let's try it there.

Legend:
- Light blue is where most of current work is happening.
- Yellow boxes are packages that are a little behind main branch.
- Orange boxes are quite far behind main branch.
- Orange arrows are current plans but work still needs to be done. (AMP 41, Mani 405)
- Dark blue is suggested location for Duplication-Tee (DF).
- Grey with Red
X
are packages that will be deprecated as soon as possible.
https://gist.github.com/dehann/c62377671cd7d69a901696b8ffb57e2a