Language Community Litmus Test: Database Placeholders

Search for “[language] database tutorial”. Out of the top 5 results, how many of them show how to use safe practices (preferably placeholders, but safe quoting functions are OK, too) at the earliest available opportunity?

I award a score of 5 points for the first result, 4 for the second, and so on. A perfect score would be 15. If the first INSERT or SELECT statement has simple static data, I give it a pass and find the first statement that’s filling in data from variables.

  • Haskell: 15
  • Python: 12
  • Perl: 11
  • Ruby: 11
  • Node.js: 4
  • PHP: 4

There are some caveats with the judging of this data:

  • Haskell’s interfaces seem to use declarative functions to build the SQL statement rather than concatenating strings. I’m presuming these libraries quote things safely.
  • The third result for Python was to a wiki page that linked to various database-related info. Many of the links had placeholders and quoting mentioned early on, so I gave it points for this one.
  • The last two results on Python were to a Stack Overflow question without many direct examples, and a YouTube vid that has no dynamic statement usage. No points for these.
  • Perl’s second link was to an About.com (??!) tutorial with no placeholders/quote functions shown. I think we in the Perl community might need to fix our SEO on this one.
  • Ruby’s second result was to a general Rails tutorial without much direct SQL access. No points here, but like Perl’s About.com result, this may be more of a case of bad SEO.
  • Tempted not to give points to Ruby’s third result, which took a while to get to a statement with any dynamic data. But when you got there, it did do it safely, so I gave it a pass.
  • Edit: I initially didn’t give Ruby points for the fourth link. It was a short page that had no complex SQL statements. I see now that it’s one part of a larger document, and the subsequent link does use Ruby ActiveRecords. Ruby now gets points for this.
  • Fifth Ruby result was a link to the Ruby/DBI homepage. Clicked on “tutorial” sidebar link, which did show best practices, so I gave it points.
  • This test is perhaps unfair towards Node.js, because most of its top results covered MongoDB rather than an SQL database. While I have mean things to say about NoSQL, critiquing it wasn’t the intent of this test. The second link does use SQL, and does things correctly, so points awarded there.
  • All but the second PHP result concatenates $_POST[‘…’] or $_GET[‘…’] directly into statements in its earliest examples. The second result is trying desperately to show how things should be done, and is the only one to get points. One bad result can be passed off as an SEO issue. All but one is a real problem.

For Mustache Templates in Perl, use Text::Caml

I like Mustache templates. Unfortunately, the obvious Mustache parser in Perl (Template::Mustache) is under-documented, over-documented, and not worth documenting. It has little example code and doesn’t make it at all clear that you basically have to subclass it for any real use.

Fortunately, this problem has been solved in Text::Caml. My one issue is that it’s too sensitive to whitespace. It’s good to give your eyes a resting place for sections and partials, like this:

But Text::Caml will consider the leading whitespace as part of the name. Looking over the main Mustache docs, none of the examples show leading whitespace, and it’s never specified what to do with whitespace. So I guess Text::Caml isn’t technically wrong. I just don’t like it this way.

Other than that, it’s great.

Edit: Turns out the Mustache spec does say that partials and sections should be insensitive to whitespace:

https://github.com/mustache/spec/blob/master/specs/partials.yml (starting line 104)
https://github.com/mustache/spec/blob/master/specs/sections.yml (starting line 252)

Edit 2: I reported the whitespace issue to the author’s github issue tracker, expecting it to be a fairly low priority. He ended up fixing it the next day :)

Review: Perl One-Liners

Full Disclosure: I received a free review copy of this book

I’m primarily a backend web programmer. I try to write in an object-oriented way with test-driven development practices, with each object having a well-defined purpose. I do this primarily in Perl, a language that causes many developers to get a twitch. The reason is that Perl has an old reputation as being messy line noise. Part of the reason for that reputation is because of the one-liner obfuscation games that used to be quite popular.

So when I first heard of this book, I was naturally inclined to avoid it. In a way, it’s perpetuating an aspect of Perl that’s best left as part of history. Besides, as a web developer, what did this book have to offer me?

What I found was that one-liners offer a different perspective, and it’s a useful perspective to have. Converting a text document to double-spaced lines isn’t a problem I have every day, but maybe I will at some point. When that or one of the many other problems this book addresses comes up, I’ll have something within easy reach.

Perhaps more importantly than any specific problem is the attitude that programming in this fashion doesn’t have to be inscrutable. Every one-liner here comes with a detailed explanation, and many come with alternative solutions with their own explanations. If we can document our concise solution in less space than a more “well-developed” solution would take, what’s the problem?

Donald Knuth had attempted a system called “Literate Programming”, where documentation and code are carefully interleaved. Jon Bentley, writer of the Programming Pearls column in Communications of the ACM, challenged Knuth to write a Literate Program for the following problem: read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.

Knuth wrote a 10-page program. Bentley then turned around and wrote a 6-command shell pipeline to do the same thing, which also happened to miss a few bugs that Knuth had stumbled over. The entire shell script and its documentation fit on a post-it note.*

One-liners can look like inscrutable line noise. But if we can otherwise document them in a succinct way, does it really matter?

That’s what this book does: provides solid explanations for extremely concise code. No matter what you do in Perl or your skill level, there’s something in here for everyone.

* See: http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/

Announcing: The Great UAV::Pilot Split

The main UAV::Pilot distro was getting too big. The WumpusRover server had already been spun off into its own distribution, and now the same is happening for many other portions of the system.

Reasons:

  • UAV::Pilot hasn’t been passing CPAN smoke testing because of the ffmpeg dependency. Splitting that off means the main distro can still be tested.
  • There’s no reason to get the WumpusRover stuff if you just want to play with the AR.Drone
  • General aim towards making UAV::Pilot into the DBI of Drones

UAV::Pilot itself will contain all the basic roles used to build UAV clients and servers, as well as what video handling we can do in pure Perl.

UAV::Pilot::Video::Ffmpeg will have the non-pure Perl video decoding.

UAV::Pilot::SDL will take the current SDL handling, such as for joysticks and video display.

UAV::Pilot::ARDrone and UAV::Pilot::WumpusRover will take the client end of things for thier respective drones.

I expect the first release to have interdependency issues. I’ve tested things out on a clean install, but I still might have missed something. I’m going to be watching the CPAN smoke test reports closely and filling in fixes as they come.

UAV::Pilot v0.8 Released — Now Supports WumpusRover

At long last, UAV::Pilot v0.8 has been released. This is a big update with lots of API improvements. Most of those improvements were decoupling the code to support my own WumpusRover in addition to the Parrot AR.Drone. That means a big goal has been reached, where UAV::Pilot can support multiple types of automated vehicles. It also means UAV::Pilot is a major component of the code running on board a UAV, in addition to running the client side.

The WumpusRover is a project I’ve been working on for a while. It’s an old RC car I had laying around, retrofitted with a brushless motor controller, an Arduino, and a Raspberry Pi.

Update: Video of the WumpusRover will be up at http://youtu.be/yeDwb-6ASxw. Going to be leaving soon out of town, but wanted to make sure this gets up before I go out the door.

UAV::Pilot runs on the Raspberry Pi as a server, taking packets from the client and passing turning and throttle data to the Arduino. The reason for the split between Rapsberry Pi and Arduino is:

  1. The Arduino has better support for the communication pulses used by RC servos and ESCs
  2. The Raspberry Pi with Linux is not a real-time OS–that means a carefully timed signal to the servo could be interrupted by the OS
  3. The Raspberry Pi can run Perl, support any WiFi adaptor that Linux does, and has an excelent camera module

(The camera module is not yet implemented directly with Perl support. This is one of my upcoming projects, which will give the WumpusRover a video stream, among other things.)

As you can see, the two complement each other’s strengths. In the future, we might see cheap boards that can combine these two uses; the recently announced Arduino Tre looks promising.

I’ll be posting more detailed instructions later. If you’d like to get started on your own rover, you can start with the Arduino code here:

https://github.com/frezik/wumpus-rover

There are also some improvements to the older AR.Drone code. While making changes to the joystick API to support the WumpusRover, I found a case in the AR.Drone where its navdata sends a floating point -NaN. (More evidence that they shouldn’t have been using floating point for this purpose, and that the AR.Drone was badly implemented in general). UAV::Pilot was crashing in this case, but now handles it gracefully.

Also, the nav data will now be sent over ol’ fashioned unicast IP by default instead of multicast. This should make Mac users happy, as multicast isn’t setup out of the box on OSX.

How UAV::Pilot got Real Time Video, or: So, Would You Like to Write a Perl Media Player?

Real-time graphics isn’t something people normally do in Perl, and certainly not video decoding. Video decoding is too computation-intensive to be done in pure Perl, but that doesn’t stop us from interfacing to existing libraries, like ffmpeg.

The Parrot AR.Drone v2.0 has an h.264 video stream, which you get by connecting to TCP port 5555. Older versions of the AR.Drone had its own encoding mechanism, which their SDK docs refer to as “P.264″, and which is a slight variation on h.264. I don’t intend to implement the older version. It’s for silly people.

Basics of h.264

Most compressed video works by taking an initial “key frame” (or I-frame), which is the complete data of the image. This is followed by several “predicted frames” (or P-frame), which hold only the differences compared to the previous frame. If you think about a movie with a simple dialog scene between two characters, you might see a character on camera not moving very much except for their mouth. This can be compressed very efficiently with a single big I-frame and lots of little P-frames. Then the camera switches to the other character, at which point a good encoder will choose to put in a new I-frame. You could technically keep going with P-frames, but there are probably too many changes to keep track of to be worth it.

Since correctly decoding a P-frame depends on getting all the frames back to the last I-frame right, it’s a good idea for encoders to throw in a new I-frame on a regular basis for error correction. If you’ve ever seen a video stream get mangled for a while and then suddenly correct itself, it’s probably because it hit a new I-frame.

(One exception to all this is Motion JPEG, which, as the name implies, is just a series of JPEG images. These tend to have a higher bitrate than h.264, but are also cheaper to decode and avoid having errors affect subsequent frames.)

If you’ve done any kind of graphics programming, or even just HTML/CSS colors, then you know about the RGB color space. Each of the Red, Green, and Blue channels gets 8 bits. Throw in an Alpha (transparency) channel, and things fit nice into a 32 bit word.

Videos are different. They use the “YCbCr” color space, at term which is sometimes used interchangeably with “YUV”. The “Y” is luma, while “Cb” and “Cr” is blue and red, respectively. There are bunch of encoding variations, but the most important one for our purposes is YUV 4:2:2.

The reason this is that YUV can do a clever trick where it sends the Y channel on every pixel on a row, but only sends the U and V channels on every other pixel. So where RGB has 24 bits per pixel (or 32 for RGBA), YUV averages to only 16.

The h.264 format internally stores things in YUV 4:2:2, which corresponds to SDL::Overlay‘s flag of SDL_YV12_OVERLAY.

Getting Data From the AR.Drone

As I said before, the AR.Drone sends the video stream over TCP port 5555. Before getting the h.264 frame, a “PaVE” header is sent. The most important information in that header is the packet size. Some resolution data is nice, too. This is all processed in UAV::Pilot::Driver::ARDrone::Video.

The Video object can take a list of objects that do the role UAV::Pilot::Video::H264Handler. This role requires a single method to be implemented, process_h264_frame(), which is passed the frame and some width/height data.

The first object to do that role was UAV::Pilot::Video::FileDump, which (duh) dumps the frames to a file. The result could be played on VLC, or encoded into an AVI with mencoder. This is as far as things got for UAV::Pilot version 0.4.

(In theory, you should have been able to play the stream in real time on Unixy operating systems by piping the output to a video player that can take a stream on STDIN, but it never seemed to work right for me.)

Real Time Display

The major part of version 0.5 was to get the real time display working. This meant brushing up my rusty C skills and interfacing to ffmpeg and SDL. Now, SDL does have Perl bindings, but they aren’t totally suitable for video display (more on that later). There are also two major bindings to ffmpeg on CPAN: Video::FFmpeg and FFmpeg. Neither was suitable for this project, because they both rely on having a local file that you’re processing, rather than having frames in memory.

Fortunately, the ffmpeg library has an excellent decoding example. Most of the xs code for UAV::Pilot::Video::H264Decoder was copy/pasted from there.

Most of that code involves initializing ffmpeg’s various C structs. Some of the most important lines are codec = avcodec_find_decoder( CODEC_ID_H264 );, which gets us an h.264 decoder, and c->pix_fmt = PIX_FMT_YUV420P;, which tells ffmpeg that we want to get data back in the YUV 4:2:2 format. Since h.264 stores in this format internally, this will keep things fast.

In process_h264_frame(), we call avcodec_decode_video2() to decode the h.264 frame and get us the raw YUV array. At this point, the YUV data is in C arrays, which are nothing more than a block of memory.

High-level languages like Perl don’t work on blocks of memory, at least not in ways that the programmer is usually supposed to care about. They hold variables in a more sophisticated structure, which in Perl’s case is called an ‘SV’ for scalars (or ‘AV’ for array, or ‘HV’ for hashes). For details, see Rob Hoelz’s series on Perl internals, or read perlguts for all the gory details.

If we wanted to process that frame data in Perl, we would have iterate through the three arrays (one for each YUV channel). As we go, we would put the content in an SV, then push that SV onto an AV. Those AVs can then be passed back from C and into Perl code. The function get_last_frame_pixels_arrayref() handles this conversion, if you really want to do that. Protip: you really don’t want to do that.

Why? Remember that YUV sends Y for every pixel in a row, and U and V for every other pixel, for an average of 2 bytes per pixel, and therefore 2 SVs per pixel (again, on average). If we assume a resolution of 1280×720 (720p), then there are 921,600 pixels, or 1,843,200 SVs to create and push. You would need to do this 25-30 times per second to keep up with a real time video stream, on top of the video decoding and whatever else the CPU needs to be doing while controlling a flying robot.

This would obviously be too taxing on the CPU and memory bandwidth. My humble laptop (which has a AMD Athlon II P320 dual-core CPU) runs up to about 75% CPU usage in UAV::Pilot while decoding a 360p video stream. That laptop is starting to show its age, but it’s clear that the above scheme would not work even on newer and beefier machines.

Fortunately, there’s a little trick that’s hinted at in perlguts. The SV struct is broken down into more specific types, like SViv. The trick is that the IV type is guaranteed to be big enough to store a pointer, which means we can store a pointer to the frame data in an SV and then pass it around in Perl code. This means that instead of 1.8 million SVs, we make just one for holding a pointer to the frame struct.

This trick is pretty common in xs modules. If you’ve ever run Data::Dumper on a XML::LibXML node, you may have noticed that it just shows a number. That number is actually a memory address that points to the libxml2 struct for that particular DOM node. The SDL bindings also do this.

The tradeoff is that the data can never be actually processed by Perl, just passed around between one piece of C code to another. The method get_last_frame_c_obj() will give you those pointers for passing around to whatever C code you want.

This is why SDL::Overlay isn’t exactly what we need. To pass the data into the Perl versions of the overlay pixels() and pitches() methods, we would have to do that whole conversion process. Then, since the SDL bindings are a thin wrapper around C code, it would undo the conversion all over again.

Instead, UAV::Pilot::SDL::Video uses the Perl bindings to initialize everything in Perl code. Since SDL is doing that same little C pointer trick, we can grab the SDL struct for the overlay the same way. When it comes time to draw the frame to the screen, the module’s xs code gets the SDL_Overlay C struct and feeds in the frame data we already have. Actual copying of the data is done by the ffmpeg function sws_scale(), because that’s solution I found, and I freely admit to cargo-culting it.

At this point, it all worked, I jumped for joy, and put the final touches on UAV::Pilot version 0.5.

Where to go From Here

I would like to be able to draw right on the video display, such as to display nav data like the one in this video:

http://www.youtube.com/watch?v=ipFo8YPCs-E

Preliminary work is done in UAV::Pilot::SDL::VideoOverlay (a role for objects to draw things on top of the video) and UAV::Pilot::SDL::VideoOverlay::Reticle (which implements that role and draws a reticule).

The problem I hit is that you can’t just draw on the YUV overlay using standard SDL drawing commands for lines or such. They come up black and tend to flicker. Part of the reason appears to go back to YUV only storing the UV channels on every other pixel, which screws up 1-pixel wide lines 50% of the time. The other reason is that hardware accelerated YUV overlays are rather complicated. Notice that linked discussion thread goes back to 2006, and things don’t appear to have gotten better until maybe just recently with the release of SDL2.

The video frame could be converted to RGB in software, but that would probably be too expensive in real time. The options appear to be to either work it out with SDL2, or rewrite things in OpenGL ES. OpenGL would add a lot more boilerplate code, but could have side benefits for speed on top of just plain working correctly.

Once you can draw on the screen, you could do some other cool things like doing object detection and displaying boxes around those objects. Image::ObjectDetect is a Perl wrapper around the opencv object detection library, though you’ll run into the same problem of copying SVs shown above. Best to use the opencv library directly.

Perl Modules: AnyEvent::ReadLine::Gnu

REPLs (Read-Eval–Print Loop) can be handy little things. In UAV::Pilot, the uav shell takes arbitrary Perl expressions and eval()‘s them.

Before integrating with AnyEvent, handling the prompt was done by Term::ReadLine::Gnu. When AnyEvent was integrated, I wanted the shell to use AnyEvent’s non-blocking I/O, so it was migrated to AnyEvent::ReadLine::Gnu.

This also handles command history. Hit the ‘Up’ arrow to get your previous command. No code is necessary; AnyEvent::ReadLine::Gnu does it for you.

ReadLine also has options for tab-completion. I would like to add this to the uav shell eventually.

Using the AnyEvent version is quite simple. You pass a callback that takes input. In the uav callback, we only run the code when it ends with a semicolon (ignoring trailing whitespace). If it doesn’t, we save it in a buffer and wait for more input.

Here’s how this is implemented in UAV::Pilot:

The add_cmd() method adds the input to the buffer. If that did end with a semicolon, then we call full_cmd() to get back the text of the code. It also clears the buffer. run_cmd() then eval()‘s the code. If we’re meant to exit the program, it returns false, which we handle with the $cv->send.

The $readline->hide and $readline->show calls stop ReadLine from outputting the prompt when we might have other output going.

Dynamic Perl Code Loading

There’s a trick done by mod_perl for running old-style CGI scripts. These have to be loaded on-demand, but how do we load code into the parser at runtime?

Well, eval(STRING), of course. Really, there’s no magic to it beyond that. This often maligned feature is at the heart of not just mod_perl, but also pretty much any templating system that embeds Perl code inside. Isn’t that dangerous, you ask? Not really; you presumably trust the code coming from template files on your own server. It’s no more dangerous than downloading a random CPAN module and loading it up.

Generally, they don’t eval the straight code. They wrap it up inside a package and a subroutine, then call the sub. Something like:

Things can get more sophisticated than this. Perhaps you expect to load up multiple code files, so you generate a predictable package name for each one (maybe based on the file name). Maybe the run() sub takes an argument, like $request for the Apache request object, which will then be available to the loaded code.

One problem with the above as written is that it screws up error messages for the filename and line number reporting. If you use the debugger, it’ll screw up it outputting the code you’re stepping through. This is where a trick with Perl comments comes in handy, where the line number and file in error messages can be set in a comment. It’ll work something like:

And all is well in the world. In the case of templating systems, you might want to keep track of the line number you’re getting the code from, and fill that into the line directive, too.