Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for internet gateways #6475

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from
Draft

Support for internet gateways #6475

wants to merge 16 commits into from

Conversation

rcgoodfellow
Copy link
Contributor

@rcgoodfellow rcgoodfellow commented Aug 29, 2024

@rcgoodfellow rcgoodfellow added this to the 11 milestone Sep 6, 2024
@rcgoodfellow rcgoodfellow self-assigned this Sep 6, 2024
@rcgoodfellow
Copy link
Contributor Author

Something I'm not too sure about with this PR is how it needs to interact with the reconfigurator. If any services move around, we need to trigger the vpc_route_manager background task. @jgallagher do you have some insight here?

// I think we should move decisions around change
// propagation to be based on actual delta
// calculation, rather than trying to manually
// maintain a signal.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this comment out as something to discuss in this PR. CC @FelixMcFelix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had maybe overlooked the scope of things that could change in connection with an Internet Gateway. That's a big surface area to keep an eye on...

I'm not really sure how we would want to work around this, however. The motivation for using router versions was more for performance rather than correctness. My intent was to prevent each Nexus's copy of the background task from querying every Project's/VPC's/Subnet's state on a minute-by-minute (and per-wakeup) basis, since there are a lot of tables which are referenced during route resolution. This partly stems from the granularity of background task wakeups, which don't AFAIK allow a selective wakeup. So my initial feeling is that having Nexus store resolved tables and then send deltas is orthogonal to that?

It could also, in practice, not be too expensive to just rebuild VPC routing tables periodically! In which case, the places where we perform version bumps and/or background task wakeups would keep the system responsive rather than be crucial for correctness. If cost remains a concern, I wonder if we can just use modified timestamps in connected tables to drive the route update process (which would, I hope, be less error-prone).

@jgallagher
Copy link
Contributor

Something I'm not too sure about with this PR is how it needs to interact with the reconfigurator. If any services move around, we need to trigger the vpc_route_manager background task. @jgallagher do you have some insight here?

Maybe! The bit of reconfigurator that can actually move services around is also an RPW (blueprint_executor). However, it doesn't currently know whether any given execution actually moved services. It's structured as "idempotently send every sled the set of services it should be running", and almost always that does not result in any changes. So I think I have some questions and some options:

  • Does the vpc_route_manager already run periodically, so we're only looking for an optimization to tighten the window between "services moved around" and "VPC route manager fixed things"?
  • If so, would it make sense to trigger vpc_route_manager after every blueprint_executor run? That would effectively cause it to run at the period of blueprint execution + its own timer, I think; I don't know whether that would be okay.
  • I think blueprint_executor could know whether any sleds actually changed, if we modified the sled-agent API it's using to return information about whether it made any changes, at which point it could explicitly trigger vpc_route_manager only if services moved (or maybe only if certain services moved?). There might be wrinkles here I'm not aware of that make this tricky; should we look into this?

@twinfees twinfees added the customer For any bug reports or feature requests tied to customer requests label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer For any bug reports or feature requests tied to customer requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement the Internet Gateway concept
4 participants