Loading...
 

No Food for Thought

Food is something you should provide to your brain long before coming to this blog. You will find no food recipes here, only raw, serious, non-fake news for mature minds.

Optimizing the optimization - Performant Incremental Updates for Packages files

admin Wednesday January 6, 2016

In 2006, Michael Vogt implemented support for PDiff (differential Packages) files in APT to optimize the process of updating Packages. At the time, the Packages file, which was already several MB-s compressed, needed to be downloaded entirely to update package indices. Joerg Jaspert and probably other members of the archive maintenance team implemented support for generating PDiff files on the archive side.

Unfortunately, APT's performance when applying (several) PDiff files was quite poor, sometimes worst than the performance for a non-incremental update, as reported (at least) in ticket 372712 and ticket 376158, which was particularly problematic for testing users - until APT 1.1.7, a nice X-mas present from Julian Andres Klode, who identified the bottleneck and optimized the process.

I haven't had the time to test testing since Jessie's release, but I'm starting to miss it smile
I wish to thank Michael, Anthony Towns, Andreas Barth, Joerg and others who contributed to the initial implementation, as well as jak, who's finalizing this work a decade later with an optimization job even more thankless than the initial implementation.
The next step? Differential updating of packages, with a lowercase "p"... which promises to be even harder to get right.

Finally, I'm using this opportunity to thank APT contributors - particularly its current maintainers mvo and jak - for all of their work. Progress has been slow over the last decade, but the direction is right, and each step is appreciated.

Electoral reform coming to Canada?

admin Tuesday January 5, 2016

After the last election, I wrote about the federal electoral reform promised. Nothing has really changed since then, which is why I am writing a new post.

Since the election, I have seen electoral reform discussed several times on the CBC's At Issue panel, by several commentators. Just yesterday, Tasha Kheiriddin mentioned reform on another CBC panel. Since the election, I am not sure any minister received more media attention than Maryam Monsef.

10 weeks after the election, the media still hasn't forgotten the Liberal Party's promise of electoral reform. It really seems like the government will propose electoral reform. What will happen - what system will be proposed, whether a referendum will be held, and the result of such a referendum - is still unknown, although the Liberal Party ruled out a referendum just last week. But clearly, the next months of Canadian politics will be exciting to watch. The system proposed will certainly be extremely suboptimal. But any change will probably be the greatest advancement in governance at the federal level since women's suffrage, a century ago. Canadian citizens could realize in 2019 that they have (slightly) more political power than checking a box every 4 years. The next generation may realize that democracy is not merely FPTP, except if we want it to be kept in its infancy.

On the other hand, if loyalist Canadians fear taking the lead on the UK for once and reject reform by referendum, governance reform could become a topic as taboo as constitutional changes and could be set back by decades.

Finally, while achieving proportional representation is just one governance improvement for me, I would like to congratulate Fair Vote Canada for all they have done during the campaign and after. FVC probably did not influence the results of the last election in the end, but your continual activity may still prove useful in the upcoming debate. Thank you, Anita Nickerson, Kelly Carmichael and all others for all the energy you invest in our goal. Keep up the good work.

2017 update: No

Civilization: Beyond Earth on Debian GNU/Linux? Good luck

admin Saturday December 26, 2015

Ever since I moved to GNU/Linux, the video game I missed the most was Sid Meier's Civilization. The only version ported to GNU/Linux was Sid Meier's Alpha Centauri, probably my favorite version. But that port seemed to be an afterthought. One needed to look for the special installer, which was buggy.

With the release of CivBE, I was under the impression that Firaxis was finally truly making GNU/Linux a supported platform for Civilization. The GNU/Linux version was released less than 2 months after the Microsoft Windows version. Mac/Linux was even the fourth item in the game's official FAQ. For the first time in many years, I put a video game on my wish list. To my surprise, my mother offered it to me this week (I suppose she did not realize it was the same series I spent so many hundreds of hours playing over nearly 2 decades razz).

I was also happy to see the game's box didn't have the huge Games for Windows banner anymore. Unfortunately, system requirements claimed Windows was necessary. But I thought that was just randomly written system requirements, as usual (how credible are requirements asking for "Windows Vista SP2/ Windows 7" for a Q4 2014 game anyway?). I was less impressed when I inserted the DVD and realized there was absolutely no material for GNU/Linux, nor any documentation explaining where to go. And now, I cannot even find instructions on the Internet. The FAQ item mentioned above still discusses a Linux version as something future (although Wikipedia says it was released 2014-12-18). And I cannot even find installations instructions when searching on Google.

Is Civilization: Beyond Earth beyond Windows? I am far from being convinced at this point.

Hopefully, at least the game will be stable - without serious bugs as those which I experienced playing the original versions of Civilization III and V (let alone serious networking issues with Civilization IV).

Memory usage of Apache's PHP children processes

admin Monday December 14, 2015

I ran a PHP benchmark for which I allowed PHP to take as much memory as it wanted. The benchmark worked, but I then realized Apache was using 2 GB of RAM. The parent process was fine, but it turned out the apache2 child process which had run the benchmark was still using 2 GB (RES).

I thought that was abnormal, but I verified on ##php and eventually had confirmation from several people that - to my great surprise - this is not a memory leak. This behavior is expected. And indeed, I can re-run the same benchmark and it will never run out of memory if it succeeded to reserve enough memory the first time. I am not a sysadmin, but that was still quite a shock. I was told PHP has its own memory manager, and only releases memory if the Apache child is restarted. In reality though, other processes (including Apache children) will manage to "steal" memory reserved by idle children. This is surely the part I find most amazing. I am curious to learn how Linux manages that.

So, the memory Apache grants to PHP children will sometimes only be released when these children processes are restarted, but other processes will manage to reclaim that memory if needed. At the very least in our configuration (Debian 8's PHP 5.6.14 on Apache 2.4.10 with prefork MPM).

One important word above is "sometimes". For some reason, children sometimes immediately release their memory. I initially thought it took 2 executions for memory to stick, but a second execution does not always lock. Which is why I would welcome pointers to discussion of this behavior. It seems memory will not be freed if 2 requests come with little idle time in between (seconds).

The following shows well enough an Apache restart freeing 2 GB of RAM:

root@Daphnis:/var/log/apache2# free -h; grep Mem /proc/meminfo; service apache2 restart; free -h; grep Mem /proc/meminfo
             total       used       free     shared    buffers     cached
Mem:          3,0G       2,4G       660M       9,5M       688K        75M
-/+ buffers/cache:       2,3G       736M
Swap:         713M       276M       437M
MemTotal:        3173424 kB
MemFree:          675824 kB
MemAvailable:     634736 kB
             total       used       free     shared    buffers     cached
Mem:          3,0G       216M       2,8G       9,4M       756K        88M
-/+ buffers/cache:       126M       2,9G
Swap:         713M       270M       443M
MemTotal:        3173424 kB
MemFree:         2951552 kB
MemAvailable:    2917400 kB

Transition to the SI - A matter of numerous Ms-s

admin Saturday December 12, 2015
##php wrote:

(19:32:13) chealer: so if I consider that PHP's 0 ds should be 1 ds, then that proves my understanding that it's not the DB which adds that extra second.
(19:33:33) Literphor: chealer: What is ds? A decisecond?
(19:33:41) chealer: Literphor: yeah. it's all on a 1 Gb/s LAN, but that probably explain the 3 ds difference.
(19:33:45) chealer: Literphor: right
(19:34:03) Literphor: chealer: Heh you’re the first person I’ve ever seen use those units
(19:34:40) chealer: Literphor: Heh, you're not the first person telling me I'm the first person they see use those units.

dig(1) and other DNS clients sometimes taking 5 seconds to return the results of a local query

admin Friday November 27, 2015

After installing a few Debian VMs inside our Windows environment, I noticed very strange performance problems resolving local domain names on local DNS servers this week. Simple queries which should have taken milliseconds would sometimes be very slow. And these slow queries would constantly take 50 deciseconds to resolve - never 49 or less. It looked like a timeout, but logs had no such mentions, and it was hard to tell when the timeouts would occur, except that they would occur more on a first test after I stopped testing for a few minutes. For example, a trivial connection to a local MySQL server could take just above 50 ds to establish:

$ time echo 'SELECT 1;'|mysql -u [...] --password=[...] -h PC-0002
1
1

real 0m5.014s
user 0m0.000s
sys 0m0.004s
pcloutier@deimos:/var/lib/dpkg/info$


This was far from MySQL-specific. dig(1) would suffer from the same delays:

$ time dig @phobos.lan.gvq titan.lan.gvq

; <<>> DiG 9.9.5-9+deb8u3-Debian <<>> @phobos.lan.gvq titan.lan.gvq
; (1 server found)

; global options
+cmd

;; Got answer:

; ->>HEADER<<- opcode
QUERY, status: NOERROR, id: 15593
; flags
qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1


;; OPT PSEUDOSECTION:

EDNS
version: 0, flags:; udp: 1280

;; QUESTION SECTION:
;titan.lan.gvq. IN A

;; ANSWER SECTION:
titan.lan.gvq. 3600 IN A 10.10.1.29

; Query time
0 msec
; SERVER
10.10.1.23#53(10.10.1.23)
; WHEN
Fri Nov 27 12:14:42 EST 2015
; MSG SIZE rcvd
58



real 0m5.018s
user 0m0.012s
sys 0m0.004s
pcloutier@deimos:/var/lib/dpkg/info$

...where phobos.lan.gvq is a local DNS server, and titan is just a local hostname which is supposed to resolve very quickly. Attentive readers will notice that Query time indicates 0 ms. This is because the DNS query proper does take 0 ms. The delay comes from the resolution of the name server itself, which I specified by name. This cannot be reproduced with dig if the name server is specified by IP.

This turned out to be an IPv6-related glibc issue. The first big advance came from a Stack Exchange thread, which allowed me to confirm that the delay was due to a timeout in glibc's getaddrinfo(3). This can be achieved with high certitude by changing that delay using the resolv.conf timeout option. glibc's default timeout is 5 seconds. For example, if you notice that the delay decreases to 3 s after setting "options timeout:3", then you are clearly experiencing timeouts. If not, sorry, this post will not help you.

The next step was to determine whether that timeout was IPv6-related. This can be achieved by disabling IPv6 on the GNU clients, but it may be simpler to just set options single-request and single-request-reopen. If none of these helped, you know your problem is caused by timeouts, but the cause is different than ours, and the rest of this post will not help.

If disabling IPv6 helped but single-request and single-request-reopen do not, sorry, I do not know more about your issue. But if single-request or single-request-reopen helped, your problem must be similar to ours. Due to a glibc 2.9 change (see section "DNS NSS improvement"), getaddrinfo() often causes a communication issue between itself and the DNS server when querying either IPv4 or IPv6 addresses due to what Ulrich Drepper describes as server-side breakage. Since at least glibc 2.10, if glibc detects that glitch may have happened, it workarounds by re-sending the queries serially rather than in parallel, so the problem "merely" causes a timeout. If there is a firewall between your DNS server and you, see the Stack Exchange thread above. If a firewall issue is excluded and your DNS server is running Windows Server, you are probably experiencing the same incompatibility as ours.

I first thought our Windows Server 2008 [R1] servers were causing this because of an old bug, but according to a 2014 blog post, this still happens with Windows Server 2012 R2. Although the tcpdump shown on the Stack Exchange thread describes pretty well what is going on, I had to perform my own captures to understand why the timeout would only happen sometimes, and succeeded quickly enough. When the problem does not happen, getaddrinfo() queries both A and AAAA (IPv6) records in parallel in packets 7 and 8 and receives both replies in packets 9 and 10:

Capture 1 - no problem
Capture 1 - no problem

Packets 11 and 12 show the DNS query proper, since this capture shows the full activity for the dig command explained above.

When the problem happens, what was packet 9 in capture 1 is gone. Which is why getaddrinfo() retries 5 seconds later (after the gap between packet 26 and 30), in packets 30 and 32, but now sequentially:

Capture 2 - serial retry after 5 seconds timeout
Capture 2 - serial retry after 5 seconds timeout


Why does the problem happen in capture 2? Surely because of that extra color... the beige ARP packets at 24 and 25. In other words, in the first call, the DNS client's IP address is in the DNS server's ARP cache, so the server does not need to resolve the client IP address. In the second case, the DNS clients's ARP cache in the DNS server has expired, so the server needs to perform an ARP query before being able to send what would be packets 9 and 10 in the first case (I would have thought the server could figure out the ARP address from packets 22 and 23, but apparently that is not how that Windows works).

As explained in Microsoft's ARP caching behavior documentation, in recent Windows versions, an ARP cache record is [usually] maintained for a random time between 30 and 90 seconds after the last time it was used. This must be why that bug was pretty hard to track. Therefore, if the server and the client communicate at least each 30 seconds, this timeout should only be experienced once. This means that in the case of Windows Server DNS servers, the behavior would be the same if glibc didn't fallback to serial queries after the timeout.

Causes and solutions

I have not found a server-side workaround (besides, I guess, disabling IPv6). Unfortunately, I believe this needs to be worked around on every GNU client.

It is more interesting to try determining the root cause of this issue and definitive solutions. glibc developers consider it a Windows bug. But would Microsoft leave a bug which must be triggered millions of times per day unfixed for years?

Windows Server

The captures clearly show that glibc starts with the IPv4 query. Which means the Windows server can only send the AAAA reply after it can send the A reply. In general, that must mean it replies to both. But when the server has to wait for an ARP reply before sending its DNS reply, it may have received the AAAA request before it is able to send the A reply. I would need to perform a server-side capture to confirm that, but it could be that Windows detects that situation and decides to send a single reply to save bandwidth and/or favor IPv6 usage. If the goal was simply to favor IPv6, it would probably be better to just send the AAAA reply before the A reply.

Windows may be doing a heuristic optimization by guessing that the client just needs one address, which would certainly be wrong sometimes. This could be considered a bug in so far as failure to reply constitutes a bug.

DNS clients and the protocol

But there is certainly a client-side issue as well at least in this case. The client requests both an IPv4 address and an IPv6 address while it only needs one. Unless this is a strategy to minimize further queries, this is inefficient.

According to this Stack Overflow thread, it is not clear that requesting both A and AAAA records in a single DNS query is possible. And even that would not be the most optimal solution — that is, requesting whatever single IP address should be used.

From getaddrinfo()'s perspective, it cannot be optimized, since the caller has requested any address to be returned. So the problem is really in dig and other DNS clients calling getaddrinfo() just to resolve a hostname. These clients are all suboptimal. gethostbyname() is optimal, but obsolete since it is not compatible with IPv6. There should be a resolving function which either returns the first IP address obtained, or returns both without blocking while waiting for the second. Clearly, each program cannot implement such a function itself. I do not know glibc, but a C library's API should allow such a resolution. If it doesn't, glibc has an issue too.

HTML/CSS - Centering

admin Wednesday November 11, 2015

Centering in CSS is not easy. But each time I must vertically center, I must search the web to convince myself that I have no choice other than using a hack. So I found it comforting to see this admission, coming from the W3C itself:

At this time (2014), a good way to center blocks vertically without using absolute positioning (which may cause overlapping text) is still under discussion.

Windows Firewall dangers - Is your Windows [8] PC's networking broken after you joined a domain?

admin Friday November 6, 2015

I hate firewalls. One of the first things I do on any personal Windows I install since Vista is to disable Windows Firewall. Usually, that's all it takes... plus disabling the maintenance center's firewall monitoring so it stops harassing you about the firewall, of course.

So when I noticed my PC's Apache was no longer reachable from other machines and that it would no longer ping, Windows Firewall did not come to my mind as an obvious suspect. Only after I realized that the problem started shortly after I joined the install to the entreprise domain did I start suspecting that some GPO was now forcing the firewall. Of course, I then went to check the firewall's status, using the maintenance center. In order to check its status, I clicked Turn on messages about network Firewall. The maintenance centre then displayed:
Windows Firewall
In English: "The Windows Firewall is disabled or configured incorrectly."
I was quite sure the firewall wasn't configured incorrectly, since the only configuration I had done was to disable it, so I assumed the firewall was disabled and proceeded to waste at least 10 minutes in further troubleshooting before finally realizing that the damn firewall was actually enabled... despite the button offering me to "Enable now".

In the end, this had nothing to do with Group Policy. The problem is you can't even directly turn off the firewall completely; you have to disable for every network type: private, public and - when you're on a domain - domain networks, which wasn't done on my install. So I clicked Disable Windows Firewall, closed the window, and proceeded to verify that the network was working again - which, of course, was not the case. After trying to reset the network card without success, I went back to the panel to notice that my change hadn't taken effect. Great, so for that specific panel, your changes are discarded without warning unless you select OK.

Conclusion

If your Windows machine's networking stopped working after joining a domain and won't even send ICMP replies, do verify Windows Firewall, and do so by going to the configuration panel and to the Windows Firewall panel. And if you need to disable it, select OK.

Addendum

After more issues with Windows Firewall, I dedicated it a new post.

Debian KDE - A natural choice?

admin Sunday October 25, 2015
Testing migration summary 2015-10-22 wrote:

[...]
caribou 0.4.19-1 0.4.18.1-1
celery 3.1.18-3 3.1.18-2
cervisia 4:15.08.2-1 4:15.08.1-1
django-nose 1.4.2-1 1.4.1-1
dolphin 4:15.08.2-1 4:15.08.1-1
dolphin-plugins 4:15.08.2-1 4:15.08.1-1
dragon 4:15.08.2-1 4:15.08.1-1
dropbear 2015.68-1 2014.65-1
[...]


This semi-random Debian package list with several KDE elements suggests it tries to portray itself that way.

TP-Link TL-WR1043ND v1 on OpenWrt 15.05

admin Sunday October 4, 2015

I switched my TP-Link TL-WR1043ND v1 from TP-Link's firmware to OpenWrt 15.05 "Chaos Calmer" a couple of weeks ago. Besides errors when trying to connect from PPTP clients, there were no unfortunate surprises.

I was happy to see OpenWrt now includes a web interface (LuCI) enabled by default. It is not exactly the user-friendliest, but I found my way easily enough.

Although I did not do much with it, I found a few bugs, notably:

  • Broken realtime graphs
  • ddns-scripts sending unencrypted passwords without warning
  • SSH server (Dropbear) apparently only accessible from LAN, despite the configuration


The documentation is extensive, but its quality is poor. Installing while playing safe took me quite some time, though part of that was due to a bug in the previous firmware not accepting long filenames. Overall, I am not impressed, but I have no regrets. Coming from a bunch of volunteers, fair software.

I eventually realized that we have been experiencing "constant" intermittent wireless connectivity problems in 2 locations of the house. One of these is a decameter away from the router. The other is slightly more, but at the same floor and there is no exterior wall in between. At times, there was high packet loss and extreme latency. After discovering OpenWrt bug #12372, which possibly persists in OpenWrt 15.05, I suspected that our issue might have been a symptom of this bug, but the same problem persisted after going back to the manufacturer's firmware or to DD-WRT, so I ended up replacing with a TP-Link Archer C8.

Fully Free

Kune ni povos is seriously freethough not completely humor-free:

  • Free to read,
  • free to copy,
  • free to republish;
  • freely licensed.
  • Free from influenceOriginal content on Kune ni povos is created independently. KNP is entirely funded by its freethinker-in-chief and author, and does not receive any more funding from any corporation, government or think tank, or any other entity, whether private or public., advertisement-free
  • Calorie-free*But also recipe-free
  • Disinformation-free, stupidity-free
  • Bias-free, opinion-free*OK, feel free to disagree on the latter.
  • Powered by a free CMS...
  • ...running on a free OS...
  • ...hosted on a server sharedby a great friend for free