<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head><body dir="auto"><div dir="auto">Fuck you</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div id="composer_signature" dir="auto"><div style="font-size:85%;color:#575757" dir="auto">Von meinem/meiner Galaxy gesendet</div></div><div dir="auto"><br></div><div><br></div><div align="left" dir="auto" style="font-size:100%;color:#000000"><div>-------- Ursprüngliche Nachricht --------</div><div>Von: Dominik <dl6er@dl6er.de> </div><div>Datum: 01.04.21 09:52 (GMT+01:00) </div><div>An: Tony Ambardar <tony.ambardar@gmail.com>, dnsmasq-discuss@lists.thekelleys.org.uk </div><div>Betreff: Re: [Dnsmasq-discuss] Partial denial of service with dnsmasq on
resource constrained systems </div><div><br></div></div>Hey Tony,<br><br>On Wed, 2021-03-31 at 19:43 -0700, Tony Ambardar wrote:<br>> You're right that text segments are fairly small and shared; memory usage<br>> was dominated by storage for blocklists read from file. This makes the<br>> problem more general than just tiny systems, since people tend to size<br>> their blocklists proportional to system memory size.<br><br>I wounldn't say this. Users try to squeeze too-large files also when they<br>do not have enough memory for them...<br><br>On Wed, 2021-03-31 at 19:43 -0700, Tony Ambardar wrote:<br>> You're also right that actual memory footprint increases only minimally<br>> with each fork() thanks to copy-on-write; I'm certain these OOM systems<br>> aren't really exhausting memory. But I do think there's confusion around<br>> memory usage optimizations like COW vs. memory accounting used for OOM.<br><br>OOM is just severely broken IMO. As a concept. Linux should likely not<br>allow overcommitment at all, there is just no way at all for software to<br>account for memory not being available it successfully allocated some time<br>ago.<br><br>On Wed, 2021-03-31 at 19:43 -0700, Tony Ambardar wrote:<br>> I recall looking at dnsmasq process statistics on OOM invocation, and<br>> noticed their VM set sizes were usually close to total system memory,<br>> i.e.<br>> COW wasn't relevant. And from a dnsmasq proc memory map, the large<br>> segment<br>> storing the blocklist was marked read-write. I suspect that despite COW,<br>> since that memory is *potentially* writable it's being accounted for at<br>> fork() time.<br><br>The fork technically needs to allocate as much memory as the program is<br>currently using but /proc/[pid]/maps won't tell you if the memory is copy-<br>on-write or not. It is for sure read-write as, otherwise, when the fork<br>would write to it, it would be sent SIGSEGV. Instead, when trying to write<br>to a copy-on-write page, you will trigger a page-fault, the page will be<br>duplicated and you can continue happily as if nothing would have happened.<br>Also the "p" (private) doesn't help much here because it is just<br>distinguishing from "s" (shared) at this point.<br><br>It *should* be possible to extract the relevant information from<br>/proc/[pid]/pagemap and then check the details of the page(s) in<br>/proc/kpageflags for KPF_SWAPBACKED (page is backed by swap/RAM). This is<br>the only way I'm aware of to check if this is a copy-on-write page existing<br>in multiple places.<br><br>If you know a simpler way to do this, I'd be happy to learn.<br><br>On Wed, 2021-03-31 at 19:43 -0700, Tony Ambardar wrote:<br>> A possible fix I'd suggest is to update dnsmasq's memory handling. IIRC,<br>> we use the same cache structure and memory allocation for both DNS cache<br>> and storing static server lists read from file. Perhaps use a separate,<br>> page-aligned memory pool to store these lists, then after initialization<br>> (and before forking) use mprotect() to set the region as read-only.<br>> <br>> Assuming it works, this would have the advantage of being a no-knobs<br>> solution vs. setting kludgey process or connection limits.<br><br>I like the idea of splitting the cache in two parts. Say a static and a<br>dynamic cache. Using mprotect() shouldn't even be necessary but helps to<br>ensure we're not writing to the static part of the cache anywhere in the<br>code.<br><br>KSM (kernel samepage merging) comes to my mind as well, but this seems to<br>be the wrong tool for the job. Figured I should mention it nonetheless.<br><br>On Wed, 2021-03-31 at 19:43 -0700, Tony Ambardar wrote:<br>> One other thing I saw while testing with large blocklists was a<br>> noticeable<br>> latency increase, likely related to lookup times. I recall some<br>> discussion<br>> on the ML where you mentioned work on a hash/tree solution was in<br>> progress. Were those changes completed?<br><br>Yes, dnsmasq uses hash buckets to minimize the amount of memory it has to<br>loop over when trying to find a name.<br><br>Best,<br>Dominik<br><br><br><br>_______________________________________________<br>Dnsmasq-discuss mailing list<br>Dnsmasq-discuss@lists.thekelleys.org.uk<br>https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss<br></body></html>