Quoting in Documentation

I just ran across a small niggle in the DateTime::Format::Strptime docs:

%N Nanoseconds. For other sub-second values use %[number]N.

I needed a 3-decimal sub-second value, so I went off and dutifully wrote:

Note my simple-stupid copy of the square brackets. This creates dates like:


Which is not at all what I had in mind. Then I smacked my forehead, remembering printf() conventions, and wrote:

No square brackets, and everything works fine.

This issue of direct quoting conventions reminds me of an entry on writing style from the Jargon File:

Consider, for example, a sentence in a vi tutorial that looks like this:

Then delete a line from the file by typing “dd”.

Standard usage would make this

Then delete a line from the file by typing “dd.”

but that would be very bad — because the reader would be prone to type the string d-d-dot, and it happens that in vi(1), dot repeats the last command accepted. The net result would be to delete two lines!

This isn’t quite a “standard usage” issue. It’s an American usage issue. British style for quotes sensibly puts the period on the outside, in agreement with the Jargon File above.

The problem with the Strptime method above is that the convention is ad-hoc, hoping that the reader will catch on to the author’s intent. This tends to cause problems for newer programmers, and even more experienced programmers may take a few iterations to catch on. In some cases, it could also confuse programmers with a different social background.

One solution is to lay out a style guideline at the beginning of the documentation. Programming books will often do this. It’s also similar to the way RFC2119 lays out the definitions of “MUST”, “SHOULD”, etc. in a way that’s intended to be used by other RFCs.

The problem here is that you’re expecting your readers to know that much more before getting into the nuts and bolts of your library. Language and protocol definitions need to be specified very precisely, but library documentation doesn’t necessarily require that–clarity of understanding and succinctness can be better than rigor. Plus, there’s bound to be competing systems of style guidelines, and readers will have to know all of them, or at least the more popular ones.

Instead, it’s enough to give a quick example. Consider if the Strptime docs had said something like:

%N Nanoseconds. For other sub-second values use %[number]N. Example: ‘%T.%3N’

And now there is no question on the square brackets being included in the string or not. Everything the reader needs to know is encapsulated in a single location of the docs–no referencing back to other sections or documents.

Don’t use is() in Test::More

This results in:

The issue is fairly obvious when it’s laid out like this: is() does a string comparison, which means that trailing zero matters. One day, though, I promise you, you’ll hit a problem like this and will spend hours debugging it, only to smack your forehead when you finally see it. You can prevent that by explicitly giving a comparison operator to cmp_ok() instead.

This is somewhat similar to the issues seen in the ‘==’ operator in JavaScript and PHP, where the language makes a guess about how to coerce the types and gets it wrong (or at least, produces unexpected behavior). Those languages invented ‘===’ to solve that, which is madness.

Musings on Big-Oh and the Schwartzian Transform

Not a new topic, but one I’ve been mulling over recently.

Let’s start with a basic sort operation in Perl over a series of DateTime objects:

Perl 5.6 and earlier used quicksort, and after that used mergesort. There’s also a sort pragma if you want to specify something else.

Now, quicksort has an average runtime of O(n * log(n) ), but there’s a few terrible cases where it goes quadratic–O(n2). Quadratic is bad; the only good thing about quadratic is that it’s not polynomial (that is, O(2n), where the exponent is variable rather than the base, which makes us start using phrases like “assume we have until the end of the Universe to finish this run . . . “). The one nice thing is that quicksort has a few friendly cases where it’s O(n). With mergesort, almost everything is O(n * log(n) )–that is, giving up a few nice cases in exchange for never going quadratic.

What really is all this Big-Oh stuff? One way to think about it is that it’s the number of times you loop over something. It doesn’t tell you how long each iteration of the loop will take, just on how many times it will iterate. If you’re walking a list, that’s O(n), where n is the number of elements of a list. If you walk it twice, that’s O(2n). For an algorithm that runs in O(n * log(n)), things will take longer than O(n) when n > 10 (because log(10) == 1).

(This is why a hash lookup is sometimes slower than a balanced binary tree. Hash lookups are constant time, O(1) (well, it is if we squint a little), and balanced binary trees are O(log(n)) worst case. But that constant time in hashes is very big, and the tree’s loop is very tight. It doesn’t take many entries for hashes to catch up, though.)

So when we look at the sort above, we notice that we’ll want to keep expensive operations out of the subroutine we’re passing to sort. That subroutine will be called O(n * log(n)) times (roughly speaking), so let’s try to hoist expensive operations out of it.

This is exactly what the Schwartzian Transform does. Let’s start by having a map() execute before the sort to pull out some easily sortable data from our DateTime objects. The Unix epoch will do nicely for this (use hires epoch if you need to):

Now we can modify our sort algorithm to pull out that first entry in the arrayref and do a straight numeric comparison:

Perl’s spaceship (<=>) operator does a numeric sort, which will be faster than the lexicographical sorting of the cmp operator (which gets even more complicated if you’re Tom Christiansen). If you can sort based on numbers rather than strings, that’s always better.

Now we want to get at our original object rather than the arrayrefs we created, so we do another map() to pull that out:

Now let’s combine all that together in a single statement to get the Schwartzian Transform as it’s usually seen:

This ends up being O(2n) for the two maps, plus O(n * log(n)) for the sort. But we’re making that O(n * log(n)) cheaper per iteration, so it’s a win overall if @dates is big enough.

Can we take this further? If we can collapse that arrayref into a straight string or number, then we can remove the subroutine passed to sort entirely. This means perl will do a lexicographic sort completely within its internals without the extra time taken to call a Perl subroutine (which more than makes up for the extra time of a lexicographic sort rather than straight numeric). This isn’t always easy, but in this case we could do it by throwing away our DateTime objects entirely in the first map() (first by order of operations, that is), and then rebuilding it with the epoch data passed to the second map():

This form is called the “Guttman-Rosler Transform”. It may or may not be faster than an equivalent Schwartzian, depending on how expensive destructing/reconstructing the object is. That would also do all kinds of fucky things to runtimes with threaded garbage collectors, but it’s not too bad for a refcounting language like Perl5. If you’re serializing things into strings, care has to be taken that you’ll get the output objects deserialized the same way on the flip side. Big-Oh is the same as the Schwartzian.

One thing that I think is overlooked about the Schwartzian is improving separation of concerns. The sort routine is not responsible for formatting data in a way that’s sortable, and there’s no need to copy/paste the same code for $a and $b. That’s the sole job of the first map(). Once you recognize the idiom, your eyes can easily fall to the place where the data munging is happening. This makes the Schwartzian a nice idiom even when its performance improvement is marginal.

Welcome, EEWeb.com Readers

The Wumpus Cave is EEWeb.com‘s site of the day! Here’s a few links to old entries that y’all might be interested in:

Happy reading!

Improving Perl’s GPS Support

A little while ago, I wrote an article for PerlTricks on GPS and Perl, primarily using the GPS::NMEA module. Accessing GPS data was one part of a larger project I’m working on (recording telemetry data for car racing). Unfortunately, I also discovered some limitations in the GPS::NMEA module.

Firstly, as we discovered after the article was published, the coordinates returned are not decimals that you can just stick into Google Maps and get the right location. I had thought the inaccuracy was due to taking readings indoors (it was only off by a few miles), but it turned out to be something else. The decimal portion is actually the number of arcminutes rather than an actual decimal. This isn’t documented anywhere in the module. The correction is easy, but I feel like the module ought to be doing it for me, or at least document what’s coming out.

Secondly, the module gathers data like this:

This comes out of GPS::NMEA->get_position(). This reads the data off the serial port line-by-line until it sees one starting with “GPRMC”, which indicates the line has the position data. There are other commands being sent, such as altitude and velocity, but those are thrown away until it hits the next position data line. This means that if you want to read both position and velocity (as I do in my project), you’ll be waiting much longer than you should for each of those.

This would benefit from a callback interface to get the data as it comes in a loop.

Lastly, there are license issues. It’s under the classic “same terms as Perl itself”, which is problematic. What version of Perl? Earlier ones were GPL-only, and newer ones add the option for the Artistic license.

There are issues with the Artistic license itself, too, especially the first version (which Perl5 still uses). The FSF has criticized it as “too vague; some passages are too clever for their own good, and their meaning is not clear. We urge you to avoid using it, except as part of the disjunctive license of Perl.”

Also, you can’t guarantee what terms Perl will have in the future. While it’s unlikely to happen, the US government could argue that since Larry wrote the early versions under their watch, they could swallow up the whole thing. A more likely scenario (given the problems with the Artistic license above ) is that the Perl devs could change the license at any time. They wouldn’t necessarily change it to something you like–remember the flamewars about the GPLv3 in the larger FOSS community?

I hate to rewrite GPS::NMEA, because some of the comments and code indicates that there was a lot of time spent getting it to work with the quirks of different GPS modules and OSen. For Perl code that originated in the late-90s, it’s very well written. If it were just a matter of the first two technical issues, we could probably work out a better interface on top of the existing code. I’m uncomfortable working with the license issues, though, so a rewrite may be the only way to go.

Top Gear is dead . . .

. . . and I’m OK with that. I was rewatching shows from series 7 and 8, and those were the types of shows I wanted to watch. At some point, it became less about cars and more about cocking about in vaguely car-related ways. It was still fun, and I still watched it, but I felt like it was missing something.

The BBC may try to revive it with new presenters. Problem is, the current three have perfect chemistry together, which was a 1 in a million chance of working out (which is the fundamental reason why Top Gear Australia and US were never going to work). They might try to change it back into a straight car show like old Top Gear, but the brand probably can’t go back to that.

The current presenters may try to revive the concept under a new organization (Netflix? who knows). Fans underestimate the complications involved in that. They may not have access to the track, studio, production staff, or connections to the car industry that they did before. There’s more to putting on a show than filming a couple of old blokes with cars.

Meanwhile, some of the actions of the fanbase have been horrid. Jezza punches a guy for not providing hot meals, and they blame the BBC for suspending and firing Jezza. Plus, the victim started to get death threats, which is all kinds of fucked up.

Even the fanbase that doesn’t go to the deep end of the shit pool still dips their toes in. “Je Suis Clarkson”? Charlie Hebdo is a satirical newspaper that got shot up by religious extremists. Clarkson punched a guy in the face. Let’s have some perspective here, huh?

I still really like Regular Car Reviews. All the review sites were trying to ape the Top Gear style for years. Regular Car Reviews is the first in a while to try something different. Their 1995 Miata review was an instant classic (just like the car itself).

How did Linux become a second-class citizen on Arduino?

Using Linux with Arduino reminds me a lot of using Oracle with Perl; it works, but there’s a lot of things that aren’t up to the same level as other combinations. For Oracle and Perl, it’s somewhat understandable, as Perl comes from a FOSS background and Oracle, well…doesn’t.

That’s not true of Arduino however. Mac OSX seems to have better support than Linux, and Windows is almost seamless. I find it all very odd, because when it comes to most FOSS stuff, Linux support is easy, Mac OSX tends to work with minimal effort, and you’re a bit surprised when things work right on Windows. Somehow, Arduino has gone the other way round.

Just some examples I’ve come across:

  • I haven’t tried the latest versions of the IDE, but burning the Arduino bootloader to a fresh ATmega from the Arduino IDE doesn’t work. The reason is that the Makefile needs to know which of the USB ports to use, but it doesn’t even use the port that you selected in the GUI. The port that it does use is hardcoded to a Windows naming convention.
  • The Arudino IDE serial port monitor always crashes on me.
  • I’ve been playing with UnoJoy lately, which turns an Arduino Uno into a USB joystick. Which is great, but look at the instructions and download pages. There’s all the help you need on Windows and Mac OSX. There’s a wiki page for help on Linux, and it starts with “I really don’t know much about Linux at all . . . “
  • A while back, I was playing with HCDlib for controlling small toy quadcopters. When I tried to compile it, there was a compiler error about a missing arduino.h file. When I raised an issue with the author, he said my environment was screwed up. Once I dug a little deeper, the problem was that the proper name of the file is Arduino.h. Assbackwards compatibility on Windows means that you can get away with case-insensitivity like that, but not on Linux.

I suspect that these issues came about because Arduino was successfully marketed to people outside of the normal FOSS community. This meant a lot of people were developing and testing things on Windows and often Mac, but rarely Linux.