2007-07-22 16:50:20

Recently in Brussels

Recently in Brussels:

Customer: Can you also help us with infrastructure and hardware problems?

IT specialist: Of course! What's it?

Customer: We need help upgrading to Vista and Office 2007.

IT specialist: Ooops...

Customer: Is there a problem?

IT specialist: Well, you currently run Windows XP and Office 2003. Honestly, that's everything you need. In fact, I would only recommend you to have a look at OpenOffice.org version 2.2 if you want to upgrade, since that supports the latest standards in document formats. So do you want me to help you migrate to that?

Customer: We cannot use OpenOffice.org or Office 2003 because we want to read and edit Office 2007 DOCX files.

IT specialist: Why? Noone in this world uses this format, except Microsoft.

Customer: Every day, we receive documents in the DOCX format from the European Commission. We also participate in a lot of groups in Brussels, and they all send out informations and forms as DOCX.

IT specialist: So tell them t convert it to the old DOC format or even the new standard ODF format. There are probably hundreds of other interest groups out there that receive these files and cannot read them, and don't want to spend thousands of Euros for buying Office 2007.

Customer: No, everyone else in Brussels uses Office 2007 already, and they will not send the documents in a different format than DOCX. We already asked them and got that answer.

IT specialist: But why did they all upgrade to Office 2007 all of a sudden?!

Customer: The Commission's administrative staff got the licenses from Microsoft - for free.

 

And so Microsoft gained another market share in an important sector.


Posted by Tonnerre Lombard | Permanent link | File under: politics, standards, recently_in_brussels

2007-07-20 23:06:13

Evaluation of RSS feed readers

Today, I looked at a couple of console based RSS readers. My criteria were that

  1. Of course, it would have to advertise new entries and not advertise old ones.
  2. Since my network at home is IPv6-only, it would have to be IPv6 capable.
  3. It has to work with curses or some familiar interface, when SSHing in from any type of system.

The first candidate is the most prominent one: snownews. Adding an URL of an IPv6-only site gives the result that the name does not get resolved. Opening the entry is not possible. However, with IPv4 records, everything works fine.

Conclusion: not IPv6 compatible.

The second candidate is Raggle. Here, the lists stay at (0/0) all the time too, so apparently Raggle doesn't work either.

Conclusion: not IPv6 compatible.

The third candidate in the list was Olive. However, when adding a feed, it responds:
500 Can't connect to blog.pas-un-geek-en-tant-que-tel.ch:80 (connect: No route to host)

Conclusion: not IPv6 compatible.

wnews and newsbeuter are yet untested, however, better results are not expected either...


Posted by Tonnerre Lombard | Permanent link | File under: general

2007-07-20 20:36:56

Evaluation of compression algorithms

The task was clear: we needed a compression program that was capable of compressing a large amount of data into a small piece. In particular, an SQL dump had to be compressed which was rapidly growing every day, and transmitted over a relatively slow line.

The original file:
959M edispo.sql

Running the algorithms

The first guess: bzip2. bzip2 has a fairly small memory footprint, but the compression operation took 8 minutes on a dual core AMD Opteron. Decompression took 27 seconds. The resulting file:
35M edispo.sql.bz2

This was way too large. Within a couple of weeks, the file would cross the 100MB boundary. So we tried another candidate: rzip. rzip uses a vast amount of memory for its dictionaries, but after only 1 minute(!), the end result is quite impressive. The decompression took 33 seconds, slightly longer than with bzip2. The resulting file:
4.3M edispo.sql.rz

After this impressive result, we tried another competitor: lzma. With lzma, we had to wait a very long time again: 14.5 minutes. At all of that time, the memory of the machine was almost exhausted. Decompression however went almost without used memory, and after 50 seconds, the file was decompressed. The resulting file:
3.7M edispo.sql.lzma

The rest of the compression algorithms were well above that number. However, as it turned out, rzip was not all that useful on smaller files. However, there is a combined algorithm called lrzip, which uses lzma as a function in the rzip algorithm, which is said to have even better compression on large files, while still being useful on smaller ones. However, lrzip was not in pkgsrc.

Summary

Algorithm File size Compression Decompression Memory use
cp 959M 0 min 34 sec 0 min 34 sec 0%
bzip2 35M 8 min 42 sec 0 min 27 sec 2%
rzip 4.3M 1 min 3 sec 0 min 33 sec 20%
lzma 3.7M 14 min 32 sec 0 min 50 sec 90%

Conclusions

The favorite algoritm in this pack here is certainly rzip. While lzma still features a slightly better compression ratio, it is very intensive in terms of time and memory in doing so. If the most important constraint is really space, one should most likely go for lzma. However, when it comes to normal-life tasks, rzip will most likely do the job just as well.

The only problem with the rzip algorithm is that it is impossible to pipe it. Thus, it is impossible to use it as an intermediate algorithm in data processing, nor could it be used as a link-layer compression for some protocol. However, for compression of large files, it is most likely the best algorithm you can get.

For those people who wonder why copying the file takes longer actually than to decompress it, the answer is pretty simple: it is easy to load a compressed 3.7M file into memory and only write the 959M output file once, than to seek on the disk between input and output file. If the file is copied, cp fills its buffers with data from the input file and writes it to the output file. For this purpose, a pretty large buffer is needed, otherwise the hard drive has to seek between the two files all the time. This takes a lot of time.

As a proof: copying the file with a 256M buffer takes only 24 seconds, while copying the file with a 64k buffer takes the full 33 seconds.


Posted by Tonnerre Lombard | Permanent link | File under: programming

2007-07-20 20:14:58

Bush removes 5th ammendment

In an executive order, Bush has put the fifth ammendment to the US constitution out of order.

While the fifth ammendment states that no person shall be held responsible for a crime unless he or she has been convicted by a Grand Jury, it also states that exceptions may take place in times of war or public danger. Bush now found that these times have arrived, and passed an executive order to consider people criminals without prior conviction if they threaten the stabilization process in Iraq.

It would thereby of course be a decision of the executive forces to decide whether or not someone constitutes a threat to the stabilization process. Since this is clearly causing more injuries, and is clearly going to destabilize the situation in Iraq further, Bush has possibly incriminated himself with this law. It appears to be just another helpless attempt of a firm believer in law and order to get back control over the situation in Iraq, in order to look better on the day of deauguration. Bush does not do anything else to do other than to reinforce the executive, so that is what he does. As the saying goes, if all you know is a hammer, everything starts to look like nails.

Bush was recently described in the austrian newspaper »Der Standard» as a man who just jumped off the roof of a high building and while still falling asks the spectators not to judge his situation too early.


Posted by Tonnerre Lombard | Permanent link | File under: news, politics

2007-07-19 20:17:44

Teaching psi IPv6

Today, I attempted to find out why my favorite Jabber client doesn't work in my new IPv6-only environment at home. The answer was simple: psi attempted to find a route to the IPv4 address of the foreign host and found out that there was no such thing as an IPv4 route.

I found quickly a thread on a web forum where someone talked about the same problem. Apparently, psi calls gethostbyname() manually instead of using getaddrinfo(). The fix which was committed in version 0.9.2 but not activated by default was to pass the unresolved hostname to the qt library rather than to do the resolution oneself.

The rest of the code doesn't have any problems with IPv6 compatibility. Psi doesn't use sockets at all, it uses the QSocket interface of qt. This interface supports IPv6 «automagically» since qt 3.3, so once the DNS resolution is disabled, psi does IPv6 flawlessly.

One has to wonder why this flag was not active by default, since it doesn't do any harm. But it's nice to see that the fix is so easy this time.


Posted by Tonnerre Lombard | Permanent link | File under: programming