Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named services #6

Merged
merged 20 commits into from
May 20, 2024
Merged

Named services #6

merged 20 commits into from
May 20, 2024

Conversation

djmb
Copy link
Contributor

@djmb djmb commented May 7, 2024

Name services so changing the host automatically removes the previous host mapping. Also allows us to remove the service by name without worrying about its current mapping.

@djmb djmb requested a review from kevinmcconnell May 7, 2024 14:17
@djmb
Copy link
Contributor Author

djmb commented May 7, 2024

I think we need to rethink the structure of the services and targets and how the state is saved. I think we'll probably want to add more complicated routing in future (e.g. path based), so we need a future proof state file format.

Copy link
Collaborator

@kevinmcconnell kevinmcconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a good idea to name the services like this 👍 We didn't have a way to update the host before. It makes the API a little more verbose in the common case, but if we're not planning to use this proxy outside of Kamal than it doesn't matter, since nobody is writing these commands by hand.

I'm in the middle of a couple of changes that were needed to support the pausing & rollout features that will be impacted by this, and in some places could simplify it. I'd like to simplify the way the router matches targets to requests, and I've also ditched the idea of "adding" services. But I can refactor around this after it's merged.

I think we need to rethink the structure of the services and targets and how the state is saved.

Yes, I agree. It's beginning to outgrow the structure it had before. And the host-based lookup is getting a bit more complex now. But as I say, I've got more changes coming along to support the other features, so it's probably a good idea for me to finish that and then refactor it all together. If you want to merge these changes first I'd be happy to do another pass on the structure at that point?

I think we'll probably want to add more complicated routing in future (e.g. path based)

I think that's fine if we need it, but we should also be careful about what we add here. It feels like the sort of thing that leads to Traefik-like complexity once you can have complicated routing. I would rather try to keep this as simple as possible to avoid that.

internal/cmd/deploy.go Outdated Show resolved Hide resolved
internal/cmd/deploy.go Outdated Show resolved Hide resolved
Comment on lines 239 to 261
state.ActiveTargets[name] = savedTarget{
Host: service.host,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, I've also been moving this structure around a bit while adding the rollout feature (although that's not merged yet). It needs to support some different options, and multiple targets per host, so I've been looking at making it more general and easier to evolve over time. Should be fine to merge it with this, just mentioning as a heads-up.

internal/server/router.go Outdated Show resolved Hide resolved
return nil
}

func (r *Router) defaultService() *Service {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be as well setting this at deploy time, rather than looking it up repeatedly on every request. The loop will be very fast, but still, it's not something we have to do repeatedly.

Copy link
Collaborator

@kevinmcconnell kevinmcconnell May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the more I think about it, I wonder if we should keep the same internal structure we had before. The router is a host-based router, and the primary access pattern we have is to look up the service for a host, with a fallback to the default. The map access it had before is a bit simpler and (marginally) faster for that access pattern.

The names feel a bit like labels that we use to locate the services when updating configuration. But that access happens much less often.

I think that would make this a smaller change too -- rather than restructuring the internals, it's adding the service name to be used for configuration updates. But the internal structure of the router still reflects how it works.

If we ever change from a host-based router to a something-else-based router we'd want to change this internal structure, but that might never happen. And even if it does, delaying the change until we know what we need means we'll have more information about how it should be structured.

I can see arguments either way for this, and I don't have very strong feelings either way. But I wonder if changing the internal structure is a bit speculative here, and is making it more general just in case we need it later. What do you think?

The persisted state should be independent of this structure of course. For that, we might be better to use a flat list so that no internal structure is even implied. I'm also planning to change this to use custom marshalling so that the Service objects can do their own serialisation. But I'd like to tackle that bit separately, after this lands, because it's separate, and I'll need to make sure it can accommodate the rollout information anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping the hosts to services no longer makes sense I think, because we could be changing the host in which case the active target is for one host and the adding target is for another.

We could keep a second map from hosts -> targets for performance though. Then that's kind of a routing implementation detail and if more routing was added later we can replace that part without changing the service map.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing the host would be fine in any case, as long as we remove the previous item when we deploy so it's always mapped from the right host. Changing host seems like a rare case, whereas lookup by host is happening on every request.

I thought about the second map approach too. The actual performance difference is likely to be quite tiny, in any case, so maybe it's not worth it just for that reason alone (although easy enough to add if we want). It more just felt like the map is now backwards from how the router works.

But as I say, I don't feel very strongly about it. Happy to go with the name-based map and see how that works out 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to look up services by name for all the commands for having a map for that makes things simpler for that I think. Also until a target becomes active, we don't want to map the host to it, so at that point there's no way to find a service via the host.

I've moved the host onto the target right now. I'm not sure that feels quite right though, maybe we just need an intermediate object that contains the host and target instead.

But putting them together I think simplifies things. There's a separate activeHostTargets map now for routing.

I've removed the check for if a host is already in use. It's tricky because we want to reserve the host when you start deploying but only start routing to it when the host becomes active, so you need to make sure to unreserve it when the deploy fails. Seems simpler just to allow multiple hosts to be mapped.

The single host map works against us a little bit here though. So right now if you do this:

docker compose exec proxy parachute deploy service1 --target parachute-web-1:3000 requests
# service1 serves all requests

docker compose exec proxy parachute deploy service2 --target parachute-web-1:3000
# service2 serves all requests

docker compose exec proxy parachute remove service2
# Now nothing serves requests!

Just keying by host would have the same problem...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the host in the Service, but update it only when the new target is moved to active. Both will happen while it has the write lock on the map, so they'll take effect together. It also feels more like a routing-related concern than a "target" one to me, so I think it belongs there.

This will also matter when I add in the rollout part, because we'll have multiple targets on a single service, but only one host setting for that service.

maybe we just need an intermediate object that contains the host and target instead.

The service is intended to be this intermediate object (although previously the host was the map key). The router should know about a bunch of services that I can serve traffic to, the Service (or ServiceMap) holds the information needed to route between them, and then the Target handles the actual request.

I've removed the check for if a host is already in use. It's tricky because we want to reserve the host when you start deploying but only start routing to it when the host becomes active, so you need to make sure to unreserve it when the deploy fails.

I think this wasn't a problem before (if I'm understanding what you mean correctly) because the host was the identifier. So putting a new target onto the same host was intentionally taking over the traffic from that host.

With the new way I think we need the check. Instead of reserving it, could we just wait to check it we're ready to make the target active? If it's not available then, we'd fail the deploy. Worst case you wait a moment for a healthcheck only to find out you have that conflict, but it's not really a problem. (We could also check at the start as well if we want to avoid that, but I think it's probably overkill).

So the gist of this would be:

  • The Service stores the host that it's routing for
  • When deploying a target, wait for the new host to become healthy. If it doesn't become healthy, fail the deploy.
  • Update the Service with the new host and active Target. But if the new host is already in use in another service, fail the deploy instead.

We don't need to save the new host anywhere until we use it, because it all happens inside a method. In my rollout branch I've ditched the concept of those adding fields in the Service for the same reason -- putting them in the struct just complicates things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the gist of this would be:

  • The Service stores the host that it's routing for
    *When deploying a target, wait for the new host to become healthy. If it doesn't become healthy, fail the deploy.
  • Update the Service with the new host and active Target. But if the new host is already in use in another service, fail the deploy instead.

Yep that makes sense 👍
From the point of view of Kamal I think trying to deploy two different services with the same host would be an error.

I was getting a bit tied up about locking the host once the deploy starts, but you are right that waiting until making the host active would be much simpler and would work just fine.

In general adding a service and then removing it should leave things exactly as they were before it was added. Say you accidentally deploy to a host with another app on it (e.g. copy and paste error) you need to be able to run kamal remove and reverse it. That'll be the case with this setup 👍

internal/server/router.go Outdated Show resolved Hide resolved
@djmb
Copy link
Contributor Author

djmb commented May 7, 2024

Thanks @kevinmcconnell. I've got some merge conflicts with main now, so I'll sort those out tomorrow and address your feedback then 👍

I think we'll probably want to add more complicated routing in future (e.g. path based)

I think that's fine if we need it, but we should also be careful about what we add here. It feels like the sort of thing that leads to Traefik-like complexity once you can have complicated routing. I would rather try to keep this as simple as possible to avoid that.

Agreed that we want to keep it as simple as possible, but if we do add new ways of routing, I think we'd want to be able to add them without having to redesign the state format. So maybe abstract out routing rules, even if they only support a single host to start with (or ever!)

But yes anything new could bring a world of complications - rule precedence, multiple rules for the same host etc etc.

@kevinmcconnell
Copy link
Collaborator

if we do add new ways of routing, I think we'd want to be able to add them without having to redesign the state format.

Totally, agree 100% 👍 My concern was only about adding the features in the first place :) Not the state format part.

Adding to the state file is OK in any case because we can just add keys as the new features develop. But as I say, I've been looking at making it more general to support those things a bit more nicely. The structure is evolving a bit as we add these new things that mean that old representation is becoming more awkward.

Copy link
Collaborator

@kevinmcconnell kevinmcconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djmb I added a couple more thoughts that occurred to me last night. Let me know what you think.

Also sorry about the merge conflicts! It was unlucky timing when I pushed my changes for the pause commands. I think they should be easy to resolve, but let me know if you run into anything awkward, as we could always pull those changes back out if they're in your way.

internal/cmd/deploy.go Outdated Show resolved Hide resolved
Comment on lines 131 to 132
func (r *Router) ListActiveServices() map[string]map[string]string {
result := map[string]map[string]string{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map[string]map[string]string here is kinda vague about what it contains. We'd be better defining a struct to give it a defined shape.

So maybe something like

type ServiceDescription struct {
  Host   string `json:"host"`
  Target string `json:"target"`
}

func (r *Router) ListActiveServices() map[string]ServiceDescription {
   ...
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also define the map type too, to make it even tidier:

// Maps the service name to its human-readable description
type ServiceDescriptionMap map[string]ServiceDescription

func (r *Router) ListActiveServices() ServiceDescriptionMap {
   ...
}

I also wondered, do you have a use case for this, and is the format it's in useful for that? I originally added a list function just to have something in there, but I didn't have a clear idea of what we'll actually want from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case for list, is where you've deployed a bunch of stuff to a host and it doesn't seem to be working as you expect. You could run something like kamal proxy list and get the full output.

I think though we'd want something text based not JSON, like:

SERVICE          HOST                 TARGET
service1         foo.example.com      172.0.1.1:98765
service2         foo2.example.com     172.0.1.2:56789
service3         *                    172.0.1.2:56789

return nil
}

func (r *Router) defaultService() *Service {
Copy link
Collaborator

@kevinmcconnell kevinmcconnell May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the more I think about it, I wonder if we should keep the same internal structure we had before. The router is a host-based router, and the primary access pattern we have is to look up the service for a host, with a fallback to the default. The map access it had before is a bit simpler and (marginally) faster for that access pattern.

The names feel a bit like labels that we use to locate the services when updating configuration. But that access happens much less often.

I think that would make this a smaller change too -- rather than restructuring the internals, it's adding the service name to be used for configuration updates. But the internal structure of the router still reflects how it works.

If we ever change from a host-based router to a something-else-based router we'd want to change this internal structure, but that might never happen. And even if it does, delaying the change until we know what we need means we'll have more information about how it should be structured.

I can see arguments either way for this, and I don't have very strong feelings either way. But I wonder if changing the internal structure is a bit speculative here, and is making it more general just in case we need it later. What do you think?

The persisted state should be independent of this structure of course. For that, we might be better to use a flat list so that no internal structure is even implied. I'm also planning to change this to use custom marshalling so that the Service objects can do their own serialisation. But I'd like to tackle that bit separately, after this lands, because it's separate, and I'll need to make sure it can accommodate the rollout information anyway.

@djmb djmb force-pushed the named-services branch from 797a3ed to d846dd6 Compare May 8, 2024 15:34
@djmb
Copy link
Contributor Author

djmb commented May 9, 2024

@kevinmcconnell - I updated that now with your suggestions - definitely looks a lot simpler. I ported over your change to remove adding from the target, which made it easier to implement. I tried to cherry-pick it but things had diverged too much so I've lost the history on that, sorry!

I've gone back to keying the services by host, but the saved state and list commands are keyed by service. For the saved state I think that makes it more future proof and for the list command I think that's the order you'd want to see. All the commands are service based.

So I feel like it might be better to key the services by name and have an additional host map as a performance optimisation. Even if we don't add more complicated routing later it better to have it matching the saved state and list output. And if we do add more routing then we can replace the host map with a route mapper of some sort.

Copy link
Collaborator

@kevinmcconnell kevinmcconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. It does seem to have become simpler. And thanks for porting over that adding thing, too.

the saved state and list commands are keyed by service

For the saved state, I've been thinking it should just be a list. There's really no need for structure there, it's just a dump of all the loaded services. And making it a list forces it to be distinct from the internal representation, so we don't accidentally lock ourselves into any assumptions. So something more like:

{
  "services": [
    { "name": "app1", "host": "one.example.com", "target": "server1:3000", ... },
    { "name": "app2", "host": "two.example.com", "target": "server2:3000", ... }
  ]
}

And then I think we could use UnmarshalJSON & MarshalJSON on the Service so it can handle its own serialization to & from that format. To restore it we'd unmarshal the list and then pop the services into our internal map structure, whichever form it currently is.

I haven't tried it yet, but I think it would clear up the state management quite a bit.

I do get what you mean about keying the services by name, and am sort of on the fence. But I also like how simple this change became. I tend to think we should merge this with the format it has now, and have it working first, but then we can remain open to changing the internal representation afterwards if we want. We can also see how it looks once the next few changes land -- I can imagine stuff like the rollout feature could end up pushing us more towards either direction, but I think we'll know more when we see it. Does that sound like a good plan to you?

@@ -38,11 +39,13 @@ func newDeployCommand() *deployCommand {
deployCommand.cmd.Flags().Int64Var(&deployCommand.args.TargetOptions.MaxRequestBodySize, "max-request-body", 0, "Max size of request body (default of 0 means unlimited)")
deployCommand.cmd.Flags().StringVar(&deployCommand.args.Host, "host", "", "Host to serve this target on (empty for wildcard)")

deployCommand.cmd.MarkFlagRequired("target")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL the MarkFlagRequired stuff existed! I see there's also a MarkFlagsRequiredTogether which I could use to replace the --tls & --host checks below, which will make that clearer. I'll do a pass of that after this lands to see where else I can make use of those.

Short: "Pause a service",
RunE: pauseCommand.run,
Args: cobra.NoArgs,
Use: "pause <service>",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commands are definitely nicer now that everything has a name 👍

target := r.activeTargetForHost(host)
if target == nil {
func (r *Router) PauseService(name string, drainTimeout time.Duration, pauseTimeout time.Duration) error {
service := r.serviceForName(name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to obtain a read lock when looking up the service, because the map isn't concurrency-safe. activeTargetForHost used to do that, but serviceForName doesn't. We'll also want to release the lock right after, since the Pause could take a long time.

Probably the simplest thing to do would be wrap the call in a withReadLock.

The pattern I tried to use before was that the helper accessors like activeTargetForHost would handle the read lock while looking something up, whereas the operations that are going to mutate the map handle their own locks and manipulate the map directly, but that's fiddly in a case like serviceForName that's used within mutating and non-mutating operations. I guess we could create locking and non-locking versions of it, and use some sort of naming convention to make it clearer which of these helpers take locks.

What do you think? If we want to just grab the lock at the call site for now, and possibly rethink locking later, that seems fine to me. But if you have a better idea I'm all ears!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a boolean to serviceForName indicating whether to take a read lock - a6a5514. How does that look?

@djmb
Copy link
Contributor Author

djmb commented May 9, 2024

I do get what you mean about keying the services by name, and am sort of on the fence. But I also like how simple this change became. I tend to think we should merge this with the format it has now, and have it working first, but then we can remain open to changing the internal representation afterwards if we want.

I've tried out switching to keying by name, and its not really any nicer right now so happy to stick with that 👍

@kevinmcconnell
Copy link
Collaborator

Oh btw, my idea about using a flat list for the state file isn’t something we need to do here (in case I made it sound like it was).

I think what you have here is good, and it improves on what was there before. That’s just me mulling something else we could try after!

@djmb djmb merged commit 03589d4 into main May 20, 2024
1 check passed
@djmb djmb mentioned this pull request May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants