-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dns-server] only return answers to the incoming question #6308
base: main
Are you sure you want to change the base?
Conversation
As noted in #4051, queries to the internal DNS server would get any records for a name in response, rather than only records matching the query incoming query type. That behavior is confusing, but worse, wrong. Subtly, this is not actually a misbehavior I believe we can observe through `trust_dns_resolver`: the resolver from that crate includes its own `CachingClient`. As a side effect of upstream answers going through that caching client, incorrect Answers sections are cached and only correct answers actually make it out to us as consumers of `trust_dns_resolver`. But as is plenty clear in #4051, `dig` and other DNS clients can get incoherent answers! Simple enough to fix: only return answers that are answers to the question we were asked.
errr... didn't realize i'd broken something in |
ah, i think the test failures are in fact a bug in seems simple enough to fix? |
If I'm understand right, you're saying: the DNS client code is incorrect because it wants to be doing an SRV lookup, but it's doing a AAAA lookup. But this is working by accident because the DNS server is returning the SRV records for the AAAA query. So fixing that would break all of our DNS consumers. Is that right? And presumably the fix is to fix the DNS client to do an actual SRV lookup, which should continue to return the additional AAAA records? That all sounds good. The only question I have is whether we need a transitional release where the server still has the old behavior. I think not because the way we do upgrades by shutting down the whole stack, by the time the new DNS servers come up, there can be nothing still using the old client behavior. It'd be worth double-checking that. If wicket or wicketd rely on this DNS client code, for example, it seems like it could be a problem during the mupdate. |
yes, though i wouldn't say all.. definitely would break service discovery over DNS (which is presumably most of it). NTP lookups, for example, seem like that should still work...
precisely.
i was wondering about that, since fixing resolution in since the changes are both small i'm leaning towards a staged fix just to be very sure this won't cause an issue in the transition. |
and with 338b7f6 picked onto this branch, tests all pass again. instead of "fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I thin you'll have to sync up with main to pick up the hickory-dns update. Hopefully that won't break much here.
@@ -268,6 +268,19 @@ async fn handle_dns_message( | |||
let mut additional_records = vec![]; | |||
let response_records = records | |||
.into_iter() | |||
.filter(|record| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine but I'm curious why you decided to filter here and not in the store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly expecting that we would reasonably want related records in other cases. an immediate example might be returning AAAA records as additionals for an A query, or vice versa (admittedly tenuous, since there's not much reason A records would be anywhere.)
maybe more reasonably: this makes it a bit easier to detect (and warn?) about clients doing AAAA queries directly to the name for a SRV record.
if we weren't storing all the records for a name together, and filtering in the store could have saved some work, i'd be strongly in favor of filtering there.
dns-server/tests/basic_test.rs
Outdated
// transparently correct the server misbehavior "server sent AAAA answers to | ||
// an A query": the caching client will cache the extra record, then | ||
// determine that there are no A records to respond with, and send a | ||
// matching response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "matching response" here? A response indicating no records?
Does this comment mean that even if we'd had this test before, it would have passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"matching response" is an attempt to nod to the language a reader might find in RFC 1034 - in this case, yes, a response with all zero records matching the query. re-reading the comment, i could have phrased that better...
about the comment generally: if the test hinged on asserting that resolver.ipv4_lookup(name)
returned no records when we've added only an AAAA record, it would pass on main. the test as written here, avoiding CachingClient
would fail on main. (this was very confusing to discover, as i started by using ipv4_lookup
and realizing i couldn't make this test fail as expected..)
assert_eq!(response.header().response_code(), ResponseCode::NoError); | ||
assert_eq!(response.answers(), &[]); | ||
assert_eq!(response.additionals(), &[]); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding a case with an SRV lookup that produces AAAAs with additional records so that we verify that's still working? edit: I guess we test this behavior already in srv_crud()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, exactly, srv_crud() already covers that. i would kind of like to assert that the AAAAs are in Additionals specifically, but i didn't want to go too wild with raw_dns_client_query
off the bat.
dns-server/tests/basic_test.rs
Outdated
@@ -479,3 +562,36 @@ async fn dns_records_list( | |||
.map(|z| z.records) | |||
.unwrap_or_else(HashMap::new)) | |||
} | |||
|
|||
/// issue a DNS query of `record_ty` records for `name`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I (we?) use traditional capitalization. I see a lot of this file already doesn't use caps or punctuation, but that seems to be mostly the older one-line comments.
Co-authored-by: Eliza Weisman <[email protected]>
let stream = UdpClientStream::<tokio::net::UdpSocket>::new(resolver_addr); | ||
let (mut trust_client, bg) = AsyncClient::connect(stream).await.unwrap(); | ||
|
||
tokio::spawn(bg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we cancel and and then wait for this after we do the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we'd get anything for the effort to do that: bg
does need to be polled, since that's what drives trust_client
's I/O. so if we held bg
and cancelled it after trust_client.query()
, i'm pretty sure the query would just time out .. :D
so we'd need to thread some signal through that the query is done and we should tear down bg
. but dropping trust_client
has a really similar outcome in getting DnsExchangeBackground
shutdown here.
is there state you're thinking might persist if bg
were to leak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just worried about leaking those tasks (and bg
itself) every time we run a query, plus general principles -- it makes me nervous when stuff kicks stuff off in the background and forgets about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, fair. i'm less concerned about that here because this is only used in the context of some individual test, and the whole executor gets torn down when the test finishes?
As noted in #4051, queries to the internal DNS server would get any records we have for a name in response, rather than only records matching the query incoming query type. That behavior is confusing, but worse, wrong.
Subtly, this is not actually a misbehavior I believe we can observe through
trust_dns_resolver
: the resolver from that crate includes its ownCachingClient
. As a side effect of upstream answers going through that caching client, incorrect Answers records are cached and only correct answers actually make it out to us as consumers oftrust_dns_resolver
.But as is plenty clear in #4051,
dig
and other DNS clients can get incoherent answers!Simple enough to fix: only return answers that are answers to the question we were asked.