Loading...
 

No Food for Thought

Food is something you should provide to your brain long before coming to this blog. You will find no food recipes here, only raw, serious, non-fake news for mature minds.

Civilization: Beyond Earth on Debian GNU/Linux? Good luck

admin Saturday December 26, 2015

Ever since I moved to GNU/Linux, the video game I missed the most was Sid Meier's Civilization. The only version ported to GNU/Linux was Sid Meier's Alpha Centauri, probably my favorite version. But that port seemed to be an afterthought. One needed to look for the special installer, which was buggy.

With the release of CivBE, I was under the impression that Firaxis was finally truly making GNU/Linux a supported platform for Civilization. The GNU/Linux version was released less than 2 months after the Microsoft Windows version. Mac/Linux was even the fourth item in the game's official FAQ. For the first time in many years, I put a video game on my wish list. To my surprise, my mother offered it to me this week (I suppose she did not realize it was the same series I spent so many hundreds of hours playing over nearly 2 decades razz).

I was also happy to see the game's box didn't have the huge Games for Windows banner anymore. Unfortunately, system requirements claimed Windows was necessary. But I thought that was just randomly written system requirements, as usual (how credible are requirements asking for "Windows Vista SP2/ Windows 7" for a Q4 2014 game anyway?). I was less impressed when I inserted the DVD and realized there was absolutely no material for GNU/Linux, nor any documentation explaining where to go. And now, I cannot even find instructions on the Internet. The FAQ item mentioned above still discusses a Linux version as something future (although Wikipedia says it was released 2014-12-18). And I cannot even find installations instructions when searching on Google.

Is Civilization: Beyond Earth beyond Windows? I am far from being convinced at this point.

Hopefully, at least the game will be stable - without serious bugs as those which I experienced playing the original versions of Civilization III and V (let alone serious networking issues with Civilization IV).

Memory usage of Apache's PHP children processes

admin Monday December 14, 2015

I ran a PHP benchmark for which I allowed PHP to take as much memory as it wanted. The benchmark worked, but I then realized Apache was using 2 GB of RAM. The parent process was fine, but it turned out the apache2 child process which had run the benchmark was still using 2 GB (RES).

I thought that was abnormal, but I verified on ##php and eventually had confirmation from several people that - to my great surprise - this is not a memory leak. This behavior is expected. And indeed, I can re-run the same benchmark and it will never run out of memory if it succeeded to reserve enough memory the first time. I am not a sysadmin, but that was still quite a shock. I was told PHP has its own memory manager, and only releases memory if the Apache child is restarted. In reality though, other processes (including Apache children) will manage to "steal" memory reserved by idle children. This is surely the part I find most amazing. I am curious to learn how Linux manages that.

So, the memory Apache grants to PHP children will sometimes only be released when these children processes are restarted, but other processes will manage to reclaim that memory if needed. At the very least in our configuration (Debian 8's PHP 5.6.14 on Apache 2.4.10 with prefork MPM).

One important word above is "sometimes". For some reason, children sometimes immediately release their memory. I initially thought it took 2 executions for memory to stick, but a second execution does not always lock. Which is why I would welcome pointers to discussion of this behavior. It seems memory will not be freed if 2 requests come with little idle time in between (seconds).

The following shows well enough an Apache restart freeing 2 GB of RAM:

root@Daphnis:/var/log/apache2# free -h; grep Mem /proc/meminfo; service apache2 restart; free -h; grep Mem /proc/meminfo total used free shared buffers cached Mem: 3,0G 2,4G 660M 9,5M 688K 75M -/+ buffers/cache: 2,3G 736M Swap: 713M 276M 437M MemTotal: 3173424 kB MemFree: 675824 kB MemAvailable: 634736 kB total used free shared buffers cached Mem: 3,0G 216M 2,8G 9,4M 756K 88M -/+ buffers/cache: 126M 2,9G Swap: 713M 270M 443M MemTotal: 3173424 kB MemFree: 2951552 kB MemAvailable: 2917400 kB

Transition to the SI - A matter of numerous Ms-s

admin Saturday December 12, 2015
##php wrote:

(19:32:13) chealer: so if I consider that PHP's 0 ds should be 1 ds, then that proves my understanding that it's not the DB which adds that extra second.
(19:33:33) Literphor: chealer: What is ds? A decisecond?
(19:33:41) chealer: Literphor: yeah. it's all on a 1 Gb/s LAN, but that probably explain the 3 ds difference.
(19:33:45) chealer: Literphor: right
(19:34:03) Literphor: chealer: Heh you’re the first person I’ve ever seen use those units
(19:34:40) chealer: Literphor: Heh, you're not the first person telling me I'm the first person they see use those units.

dig(1) and other DNS clients sometimes taking 5 seconds to return the results of a local query

admin Friday November 27, 2015

After installing a few Debian VMs inside our Windows environment, I noticed very strange performance problems resolving local domain names on local DNS servers this week. Simple queries which should have taken milliseconds would sometimes be very slow. And these slow queries would constantly take 50 deciseconds to resolve - never 49 or less. It looked like a timeout, but logs had no such mentions, and it was hard to tell when the timeouts would occur, except that they would occur more on a first test after I stopped testing for a few minutes. For example, a trivial connection to a local MySQL server could take just above 50 ds to establish:

$ time echo 'SELECT 1;'|mysql -u [...] --password=[...] -h PC-0002
1
1

real 0m5.014s
user 0m0.000s
sys 0m0.004s
pcloutier@deimos:/var/lib/dpkg/info$


This was far from MySQL-specific. dig(1) would suffer from the same delays:

$ time dig @phobos.lan.gvq titan.lan.gvq

; <<>> DiG 9.9.5-9+deb8u3-Debian <<>> @phobos.lan.gvq titan.lan.gvq
; (1 server found)

; global options
+cmd

;; Got answer:

; ->>HEADER<<- opcode
QUERY, status: NOERROR, id: 15593
; flags
qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1


;; OPT PSEUDOSECTION:

EDNS
version: 0, flags:; udp: 1280

;; QUESTION SECTION:
;titan.lan.gvq. IN A

;; ANSWER SECTION:
titan.lan.gvq. 3600 IN A 10.10.1.29

; Query time
0 msec
; SERVER
10.10.1.23#53(10.10.1.23)
; WHEN
Fri Nov 27 12:14:42 EST 2015
; MSG SIZE rcvd
58



real 0m5.018s
user 0m0.012s
sys 0m0.004s
pcloutier@deimos:/var/lib/dpkg/info$

...where phobos.lan.gvq is a local DNS server, and titan is just a local hostname which is supposed to resolve very quickly. Attentive readers will notice that Query time indicates 0 ms. This is because the DNS query proper does take 0 ms. The delay comes from the resolution of the name server itself, which I specified by name. This cannot be reproduced with dig if the name server is specified by IP.

This turned out to be an IPv6-related glibc issue. The first big advance came from a Stack Exchange thread, which allowed me to confirm that the delay was due to a timeout in glibc's getaddrinfo(3). This can be achieved with high certitude by changing that delay using the resolv.conf timeout option. glibc's default timeout is 5 seconds. For example, if you notice that the delay decreases to 3 s after setting "options timeout:3", then you are clearly experiencing timeouts. If not, sorry, this post will not help you.

The next step was to determine whether that timeout was IPv6-related. This can be achieved by disabling IPv6 on the GNU clients, but it may be simpler to just set options single-request and single-request-reopen. If none of these helped, you know your problem is caused by timeouts, but the cause is different than ours, and the rest of this post will not help.

If disabling IPv6 helped but single-request and single-request-reopen do not, sorry, I do not know more about your issue. But if single-request or single-request-reopen helped, your problem must be similar to ours. Due to a glibc 2.9 change (see section "DNS NSS improvement"), getaddrinfo() often causes a communication issue between itself and the DNS server when querying either IPv4 or IPv6 addresses due to what Ulrich Drepper describes as server-side breakage. Since at least glibc 2.10, if glibc detects that glitch may have happened, it workarounds by re-sending the queries serially rather than in parallel, so the problem "merely" causes a timeout. If there is a firewall between your DNS server and you, see the Stack Exchange thread above. If a firewall issue is excluded and your DNS server is running Windows Server, you are probably experiencing the same incompatibility as ours.

I first thought our Windows Server 2008 [R1] servers were causing this because of an old bug, but according to a 2014 blog post, this still happens with Windows Server 2012 R2. Although the tcpdump shown on the Stack Exchange thread describes pretty well what is going on, I had to perform my own captures to understand why the timeout would only happen sometimes, and succeeded quickly enough. When the problem does not happen, getaddrinfo() queries both A and AAAA (IPv6) records in parallel in packets 7 and 8 and receives both replies in packets 9 and 10:

Capture 1 - no problem
Capture 1 - no problem

Packets 11 and 12 show the DNS query proper, since this capture shows the full activity for the dig command explained above.

When the problem happens, what was packet 9 in capture 1 is gone. Which is why getaddrinfo() retries 5 seconds later (after the gap between packet 26 and 30), in packets 30 and 32, but now sequentially:

Capture 2 - serial retry after 5 seconds timeout
Capture 2 - serial retry after 5 seconds timeout


Why does the problem happen in capture 2? Surely because of that extra color... the beige ARP packets at 24 and 25. In other words, in the first call, the DNS client's IP address is in the DNS server's ARP cache, so the server does not need to resolve the client IP address. In the second case, the DNS clients's ARP cache in the DNS server has expired, so the server needs to perform an ARP query before being able to send what would be packets 9 and 10 in the first case (I would have thought the server could figure out the ARP address from packets 22 and 23, but apparently that is not how that Windows works).

As explained in Microsoft's ARP caching behavior documentation, in recent Windows versions, an ARP cache record is [usually] maintained for a random time between 30 and 90 seconds after the last time it was used. This must be why that bug was pretty hard to track. Therefore, if the server and the client communicate at least each 30 seconds, this timeout should only be experienced once. This means that in the case of Windows Server DNS servers, the behavior would be the same if glibc didn't fallback to serial queries after the timeout.

Causes and solutions

I have not found a server-side workaround (besides, I guess, disabling IPv6). Unfortunately, I believe this needs to be worked around on every GNU client.

It is more interesting to try determining the root cause of this issue and definitive solutions. glibc developers consider it a Windows bug. But would Microsoft leave a bug which must be triggered millions of times per day unfixed for years?

Windows Server

The captures clearly show that glibc starts with the IPv4 query. Which means the Windows server can only send the AAAA reply after it can send the A reply. In general, that must mean it replies to both. But when the server has to wait for an ARP reply before sending its DNS reply, it may have received the AAAA request before it is able to send the A reply. I would need to perform a server-side capture to confirm that, but it could be that Windows detects that situation and decides to send a single reply to save bandwidth and/or favor IPv6 usage. If the goal was simply to favor IPv6, it would probably be better to just send the AAAA reply before the A reply.

Windows may be doing a heuristic optimization by guessing that the client just needs one address, which would certainly be wrong sometimes. This could be considered a bug in so far as failure to reply constitutes a bug.

DNS clients and the protocol

But there is certainly a client-side issue as well at least in this case. The client requests both an IPv4 address and an IPv6 address while it only needs one. Unless this is a strategy to minimize further queries, this is inefficient.

According to this Stack Overflow thread, it is not clear that requesting both A and AAAA records in a single DNS query is possible. And even that would not be the most optimal solution — that is, requesting whatever single IP address should be used.

From getaddrinfo()'s perspective, it cannot be optimized, since the caller has requested any address to be returned. So the problem is really in dig and other DNS clients calling getaddrinfo() just to resolve a hostname. These clients are all suboptimal. gethostbyname() is optimal, but obsolete since it is not compatible with IPv6. There should be a resolving function which either returns the first IP address obtained, or returns both without blocking while waiting for the second. Clearly, each program cannot implement such a function itself. I do not know glibc, but a C library's API should allow such a resolution. If it doesn't, glibc has an issue too.

HTML/CSS - Centering

admin Wednesday November 11, 2015

Centering in CSS is not easy. But each time I must vertically center, I must search the web to convince myself that I have no choice other than using a hack. So I found it comforting to see this admission, coming from the W3C itself:

At this time (2014), a good way to center blocks vertically without using absolute positioning (which may cause overlapping text) is still under discussion.

Windows Firewall dangers - Is your Windows [8] PC's networking broken after you joined a domain?

admin Friday November 6, 2015

I hate firewalls. One of the first things I do on any personal Windows I install since Vista is to disable Windows Firewall. Usually, that's all it takes... plus disabling the maintenance center's firewall monitoring so it stops harassing you about the firewall, of course.

So when I noticed my PC's Apache was no longer reachable from other machines and that it would no longer ping, Windows Firewall did not come to my mind as an obvious suspect. Only after I realized that the problem started shortly after I joined the install to the entreprise domain did I start suspecting that some GPO was now forcing the firewall. Of course, I then went to check the firewall's status, using the maintenance center. In order to check its status, I clicked Turn on messages about network Firewall. The maintenance centre then displayed:
Windows Firewall
In English: "The Windows Firewall is disabled or configured incorrectly."
I was quite sure the firewall wasn't configured incorrectly, since the only configuration I had done was to disable it, so I assumed the firewall was disabled and proceeded to waste at least 10 minutes in further troubleshooting before finally realizing that the damn firewall was actually enabled... despite the button offering me to "Enable now".

In the end, this had nothing to do with Group Policy. The problem is you can't even directly turn off the firewall completely; you have to disable for every network type: private, public and - when you're on a domain - domain networks, which wasn't done on my install. So I clicked Disable Windows Firewall, closed the window, and proceeded to verify that the network was working again - which, of course, was not the case. After trying to reset the network card without success, I went back to the panel to notice that my change hadn't taken effect. Great, so for that specific panel, your changes are discarded without warning unless you select OK.

Conclusion

If your Windows machine's networking stopped working after joining a domain and won't even send ICMP replies, do verify Windows Firewall, and do so by going to the configuration panel and to the Windows Firewall panel. And if you need to disable it, select OK.

Addendum

After more issues with Windows Firewall, I dedicated it a new post.

Debian KDE - A natural choice?

admin Sunday October 25, 2015
Testing migration summary 2015-10-22 wrote:

[...]
caribou 0.4.19-1 0.4.18.1-1
celery 3.1.18-3 3.1.18-2
cervisia 4:15.08.2-1 4:15.08.1-1
django-nose 1.4.2-1 1.4.1-1
dolphin 4:15.08.2-1 4:15.08.1-1
dolphin-plugins 4:15.08.2-1 4:15.08.1-1
dragon 4:15.08.2-1 4:15.08.1-1
dropbear 2015.68-1 2014.65-1
[...]


This semi-random Debian package list with several KDE elements suggests it tries to portray itself that way.

TP-Link TL-WR1043ND v1 on OpenWrt 15.05

admin Sunday October 4, 2015

I switched my TP-Link TL-WR1043ND v1 from TP-Link's firmware to OpenWrt 15.05 "Chaos Calmer" a couple of weeks ago. Besides errors when trying to connect from PPTP clients, there were no unfortunate surprises.

I was happy to see OpenWrt now includes a web interface (LuCI) enabled by default. It is not exactly the user-friendliest, but I found my way easily enough.

Although I did not do much with it, I found a few bugs, notably:

  • Broken realtime graphs
  • ddns-scripts sending unencrypted passwords without warning
  • SSH server (Dropbear) apparently only accessible from LAN, despite the configuration


The documentation is extensive, but its quality is poor. Installing while playing safe took me quite some time, though part of that was due to a bug in the previous firmware not accepting long filenames. Overall, I am not impressed, but I have no regrets. Coming from a bunch of volunteers, fair software.

I eventually realized that we have been experiencing "constant" intermittent wireless connectivity problems in 2 locations of the house. One of these is a decameter away from the router. The other is slightly more, but at the same floor and there is no exterior wall in between. At times, there was high packet loss and extreme latency. After discovering OpenWrt bug #12372, which possibly persists in OpenWrt 15.05, I suspected that our issue might have been a symptom of this bug, but the same problem persisted after going back to the manufacturer's firmware or to DD-WRT, so I ended up replacing with a TP-Link Archer C8.

Error 619 when trying to connect a NAT-ed client to a PPTP server - watch your router

admin Saturday October 3, 2015

Today, I realized my PPTP connections from Windows 7 and 8 machines were no longer able to connect to the PoPToP server I setup at the office. Strangely, nothing had changed on the server (still Debian 7 running PoPToP 1.3.4), on the server-side router, and my clients obviously had not both been changed enough to explain the breakage. The Windows error messages were not too precise. Server logs were also unhelpful, apparently pointing to a bug in PoPToP, which timed out after 30 seconds:

daemon.log
Oct 3 16:47:25 deimos pptpd[24176]: CTRL: Client 173.178.241.108 control connection started Oct 3 16:47:25 deimos pptpd[24176]: CTRL: Starting call (launching pppd, opening GRE) Oct 3 16:47:25 deimos pptpd[24176]: GRE: Bad checksum from pppd. Oct 3 16:47:55 deimos pptpd[24176]: GRE: read(fd=6,buffer=804f620,len=8196) from PTY failed: status = -1 error = Input/output error, usually caused by unexpected termination of pppd, check option syntax and pppd logs Oct 3 16:47:55 deimos pptpd[24176]: CTRL: PTY read or GRE write failed (pty,gre)=(6,7) Oct 3 16:47:55 deimos pptpd[24176]: CTRL: Reaping child PPP[24177] Oct 3 16:47:55 deimos pptpd[24176]: CTRL: Client 173.178.241.108 control connection finished


I finally realized the change to blame was me switching my TP-Link router from TP-Link's firmware to OpenWrt (15.05). I do not understand much of how PPTP works, but it's quite complicated. Apparently, it uses non-standard GRE packets. Therefore, I am not sure if this is a PPTP bug or an OpenWrt bug, but for me the solution was most simple.

As explained in this description of error 619, there are several possible causes, but even if the client is clearly reaching the server, the issue can be client-side. If there is no firewall on the client OS, you should verify any client-side router, which I did by plugging one of the affected PC-s directly on the modem. The VPN could connect again, which confirmed the router was to blame.

OpenWrt does not have a "PPTP VPN passthrough" option to check, but a package to install (which is not installed by default in Chaos Calmer). Following the instructions on OpenWrt's PPTP NAT Traversal document (installing kmod-nf-nathelper-extra), I managed to get the clients to connect while NAT-ed behind OpenWrt.

WampServer? Wait!

admin Wednesday July 1, 2015

Short version

Do not use WampServer.

Long version

The first time I installed an AMP on Windows, I chose EasyPHP. When I came back to PHP development on Windows, around 2013, I chose WampServer. I knew about XAMPP and EasyPHP, but for some reason (probably the precise software offered), I went with WampServer. That decision would cost me several hours...

The good

PHP is already buggy, but I suppose I didn't realize the distribution you chose could make it much more buggy. I sticked with WampServer, initially in version 2.4, because it contained Xdebug, was simple to install and worked immediately. I must admit the UI looks good - the design is good, and it lets you choose your PHP and MySQL versions.

The bad

Things changed when I started configuring PHP. Inevitably, the first problem you'll notice is that WampServer ignores your changes to php.ini. It turns out that is because WampServer installs (at least) 2 different php.ini files, one in php/, one in apache/. The file used by Apache is the latter... but of course, the one linked by WampServer's UI is the one Apache ignores.

That must be when I had to familiarize myself with the wamp directory's hierarchy. Duplication can be deplored, but the worst part is naming. Indeed, both of WampServer's php.ini files are found in... wamp\bin\. Well, I suppose whoever decided that scheme considered that all files were binary, indeed.

Eventually, some development required me to enable 2 PHP extensions disabled by default - intl and ldap. In both cases, I must have wasted an hour until I figured out that icu and sasl DLL-s needed to be copied from php\ to apache\ for the extensions to work with Apache.

The really bad

After hitting some bugs possibly in PHP, I tried upgrading it to 5.5, which required me to upgrade from WampServer 2.4 to 2.5. Unfortunately, WampServer does not offer upgrades. What do you do with such Windows programs? Of course, you download the new version and run the installer. I did that and everything went fine, except servers would no longer start, throwing cryptic errors. That's when I headed to the net and found a post on WampServer's forum explaining that upgrading is not supported. Good thing I had the presence of mind to backup wamp\ first!

It turns out the way to upgrade is indeed to uninstall, and only then to install the new version. You must think I had surely ignored a README which warned against such a crazy idea. But... there is no README. The "official" upgrade instructions are in a forum post!

The only good thing about this part is the honesty of the instructions' writer, who warns that upgrading will take 2 hours, up to 4 for inexperienced users. And indeed, if you had enabled a few extensions, had a few virtual hosts and some customization, it will take hours, even if you've been using PHP, Apache and MySQL for a decade.

The ugly

One of my projects used extensions which are disabled by default. Even after enabling them, they wouldn't show in phpinfo. PHP would merely log a useful error each startup:

PHP Warning: PHP Startup: in Unknown on line 0


After an hour of searches, I eventually applied the same workaround I had used (and since forgotten) when enabling them for the first time, copying the missing DLL-s to apache\.

The workaround

After so much frustration, I realized that the huge release notes for 2.5 (again in the form of a forum post) mentioned a simple fix, not only for extensions not loading, but for the tray icon shortcut pointing to the wrong php.ini. That workaround is to click on Apache->Version->2.4.9 via the tray icon. Yes, clicking on that will create symlinks (yes, symlinks on Windows!) from apache\ to php\ for the missing DLL-s, and make the php.ini in apache\ a symlink to the php.ini in php\. Yes, that is the fix, even if that version is already checked. But do not do that without first backing up your php.ini - it will be silently overwritten.

Still think you might give WampServer a go? Here's one last reason why you shouldn't. Playing with the version made me realize one more issue - the links to get different Apache, PHP and MySQL versions are all broken.

The non-lesson

After wasting all of these hours, I figured I would at least go contribute something to the issue tracker. But WampServer has no issue tracker. Not even a forum about issues. Are WampServer developers aware of these issues, and since when? Who knows.

So the only lesson here is already known - avoid software without an issue tracker. OK really...
Even if the website is fancy,
Even if the project seems active,
Even if you share your native language with the developers...
Do not use software without an issue tracker if you don't have some kind of contract with its developers.
And please, developers, avoid versioning your release 2.x until your software is reasonably functional and has an issue tracker.

Unfortunately, this is fundamentally a rant, not a review (and the list of issues is not complete). I will only dump WampServer when the next time to upgrade comes, and I won't recommend an alternative I have not tested until then. Meanwhile, I found a fairly complete review of WAMP stacks, but it's from 2012. If you're starting PHP development, I'm sure you'll find a better WAMP stack than WampServer 2.5. You may want to read about my unsatisfactory experience with XAMPP or about my similarly unsatisfactory experience with EasyPHP.

Fully Free

Kune ni povos is seriously freethough not completely humor-free:

  • Free to read,
  • free to copy,
  • free to republish;
  • freely licensed.
  • Free from influenceOriginal content on Kune ni povos is created independently. KNP is entirely funded by its freethinker-in-chief and author, and does not receive any more funding from any corporation, government or think tank, or any other entity, whether private or public., advertisement-free
  • Calorie-free*But also recipe-free
  • Disinformation-free, stupidity-free
  • Bias-free, opinion-free*OK, feel free to disagree on the latter.
  • Powered by a free CMS...
  • ...running on a free OS...
  • ...hosted on a server sharedby a great friend for free