Skip to content
This repository was archived by the owner on Jan 29, 2019. It is now read-only.

[WIP] Introduce Firenest.ReplicatedState #21

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open

Conversation

michalmuskala
Copy link
Contributor

No description provided.

@michalmuskala
Copy link
Contributor Author

@josevalim I think the local part is done. Now on to figure out the hard part - distributing the state.

I also haven't introduced the callbacks yet.

@michalmuskala michalmuskala changed the title [WIP] Introduce Firenest.PG [WIP] Introduce Firenest.ReplicatedState Jun 12, 2018
@michalmuskala
Copy link
Contributor Author

For anybody looking at the progress of this. We've changed the abstraction slightly into a ReplicatedState that should allow implementing something like phoenix presence, a process group or a rate limiter on top.

The code is not there yet, but the documentation is close to how I'd like to see it in the end.


It takes in the local delta value constructed in the `c:local_update/4`
calls and returns a remote delta value that will be replicated to other
servers. The value of the local delta is reset to the initial delta value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure I'm missing something obvious here 😅 Does it mean that local_delta argument will be the delta returned by local_update/4?

local state. If keeping track of incremental changes is not convenient
for a particular state type, the value of a delta can be set to be equal
to the current state - this will always cause full state transfers.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a full working example to the docs or maybe even two (maps and rate limiter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be in here or should we have an examples directory? And if we have them as examples - would it actually make sense to release them as components?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have them inline for now, unless they become too big.


* `:name` - name for the process, required;
* `:topology` - name of the supporting topology, required;
* `:partitions` - number of partitions, defaults to 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PubSub defaults to "By default uses one partition for every 4 cores". Contention in PubSub is mostly ets so we don't need many partitions. My point is, we have precedent for not having a default of 1 in case we want to start with many. Although more partitions make it more chatty, so we may want to document that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with one for now for compatibility with phoenix presence, but we could split it up more. I'm fine with whatever default.

end

# TODO: should we store somewhere we're catching up with the server?
# if so, then we should accumulate the broadcasts we can't handle, until we can.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we always needs to this because I believe the topology does not guarantee ordering between sends and broadcasts (or it should not, as that by definition implies using the same connection). So if you need ask a node to catch you up and it replies to you and then broadcasts, there is a chance the broadcast will arrive before the reply, and you would have lost it.

{:catch_up, old_clock, state}

# We were caught up with a newer clock than the current, ignore
# TODO: is that even possible?
Copy link
Member

@josevalim josevalim Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible if we are getting information from different nodes instead of the node itself. So one node can see A at clock 11 but for the other node A is at clock 9.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The topology/synced_server tells us where the information is coming from - we rely only on that to know which node is communicating with us. We don't tag the messages ourselves with the node name.

Copy link
Member

@josevalim josevalim Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think we are agreeing. What I was saying is that, if you can receive the clock for node A from node B, then this can happen. But if you always receive the clock for node A from node A, then this CANNOT happen and I would raise accordingly.

@josevalim
Copy link
Member

Review is done, this looks fantastic! I really like the organization into three modules, it really helps keep things contained!

@@ -269,8 +269,7 @@ defmodule Firenest.PubSub.Supervisor do
end

def init({pubsub, topology, options}) do
partitions =
options[:partitions] || System.schedulers_online() |> Kernel./(4) |> Float.ceil() |> trunc()
partitions = options[:partitions] || ceil(System.schedulers_online() / 4)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just came across this, cool project! Doesn't look like there has been much movement recently so I figured I'd just point out that the tests are failing due to this line, ceil/1 doesn't exist, perhaps you meant to leave it as Float.ceil/1.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants