Underappreciated Perl Modules–Tree::Trie

Before the Perl community settled on the term “hashes”, many called them “associative arrays”. This was a common term for an array that looked up elements by a string among dynamic languages in the old days, such as Tcl.

Although the hash algorithm is the popular choice in modern times, it wasn’t always this way. Binary search trees were also popular, but have the disadvantage that they have a worst-case time of O(n), and an average time of O(log(n)) (where n is the number of elements). Hashes lookup keys in constant time; they stay the same speed no matter how many elements you have.

There’s a caveat to hashes though: they have large constant factors, so a binary search tree tends to be faster for a small number of elements. They also don’t come back in any obvious order unless you make them that way by alternative means.

Hashes also have a problem where the value a string hashes to may collide with another string’s hash. When that happens, the value is appended to the list. On lookup, we then have to walk that list. This means that while hashes are generally close to constant time, they actually have a worst-case time of O(n), just like binary search trees. Since hashes are used so much on Perl, it tends to be very careful about colliding hash values, but it’s still a weakness in the algorithm.

This is where the trie comes in (usually pronounced “try”). A trie will lookup keys in O(m) time, where m is the length of the lookup string. Note that hashes actually have this limitation, too. It takes at least O(m) time to hash the string so we can look it up.

Oh, and you can get back the keys in sorted order by doing a pre-order traversal.

Oh, and you can get back a list of keys that match a prefix with practically no additional overhead.

All else being equal, a trie will be on par or better than the speed of hashes. However, in Perl, not all is equal. Perl’s hash implementation is handled by the language internally at the C level. Any trie library you use will go through subroutine/method lookups, which will be substantially slower. If we were rewritting Perl from scratch, maybe you could argue for replacing hashes with tries. But if you were rewritting Perl from scratch, that’s probably pretty low on the list of changes.

But never mind that. Sorted returns and prefix matching are still pretty cool if you need that sort of thing.

Tree::Trie looks like a good place to start experimenting with tries. To get behavior equivalent to hashes, you need to use the methods marked *_data. The other methods only store keys without a value.

Some example code

my $trie = Tree::Trie->new;
$trie->add_data( foo => 1 );
$trie->add_data( bar => 2 );
$trie->add_data( qux => 3 );
$trie->add_data( quuux => 4 );

say $trie->lookup_data( 'bar' ); # Prints "2"

my @qu  = $trie->lookup( 'qu' );
# Prints:
# qux: 3
# quux: 4
say "$_: " . $trie->lookup_data( $_ ) for @qu;

$trie->delete_data( 'qu' ); # Deletes 'qux' AND 'quux'

That’s all I have to say on tries. If you have suggestions for future underappreciated Perl modules, feel free to leave a comment below.

Perl Modules: AnyEvent::ReadLine::Gnu

REPLs (Read-Eval–Print Loop) can be handy little things. In UAV::Pilot, the uav shell takes arbitrary Perl expressions and eval()‘s them.

Before integrating with AnyEvent, handling the prompt was done by Term::ReadLine::Gnu. When AnyEvent was integrated, I wanted the shell to use AnyEvent’s non-blocking I/O, so it was migrated to AnyEvent::ReadLine::Gnu.

This also handles command history. Hit the ‘Up’ arrow to get your previous command. No code is necessary; AnyEvent::ReadLine::Gnu does it for you.

ReadLine also has options for tab-completion. I would like to add this to the uav shell eventually.

Using the AnyEvent version is quite simple. You pass a callback that takes input. In the uav callback, we only run the code when it ends with a semicolon (ignoring trailing whitespace). If it doesn’t, we save it in a buffer and wait for more input.

Here’s how this is implemented in UAV::Pilot:

    my $readline; $readline = AnyEvent::ReadLine::Gnu->new(
        prompt => 'uav> ',
        on_line => sub {
            my ($line) = @_;
            add_cmd( $line );
            if( $line =~ /; \s* \z/x ) {
                my $cmd = full_cmd;
                my $do_continue = run_cmd( $cmd, $repl );
                $cv->send( $do_continue ) unless $do_continue;

The add_cmd() method adds the input to the buffer. If that did end with a semicolon, then we call full_cmd() to get back the text of the code. It also clears the buffer. run_cmd() then eval()‘s the code. If we’re meant to exit the program, it returns false, which we handle with the $cv->send.

The $readline->hide and $readline->show calls stop ReadLine from outputting the prompt when we might have other output going.

Underappreciated Perl Modules: File::ShareDir

Problem: you have some kind of data that needs to be distributed with your Perl module. Where do you put it in a cross-platform way?

Solution #1: Put it in a giant datastructure inside some module. This ends up with a big .pm file that chews up memory.

Solution #2: Put it in a __DATA__ section. But you only get one of those per module, and binary data might get hairy.

Best solution: File::ShareDir.

If you’ve ever looked through your Perl module directory, you’ve no doubt seen a directory called ‘auto’. This was used for a few different autoloading systems, but there’s no reason you can’t use it for your own module.

When I added SDL navigation data output to UAV::Pilot, the text overlayed on the screen needed a specific font. SDLx::Text can take a path to a TrueType font file, but where do you put that file?

The answer is that you create a ‘share/’ directory in your project and drop it in. The files there will be placed in your module’s ‘auto/’ entry. You can then get the path to the file with:

use File::ShareDir 'dist_dir';
use File::Spec;

my $dir = dist_dir( 'UAV-Pilot' );
my $file = File::Spec::catfile( $dir, 'font.ttf' );

Thus providing a safe, cross-platform way of storing module files.