New module: SQL::Functional

If you have an application with simple database needs, writing the SQL strings directly in the program works fine. If it’s bigger, an Object-Relational Mapper like DBIx::Class starts to make sense. I find the additional boilerplate of an ORM isn’t worth the effort for simple needs, but it doesn’t take much more for the overhead to be worthwhile.

Then there’s another point beyond that where the ORM doesn’t know how to effectively and efficiently run a request. When you get there, it’s time to write raw SQL again, perhaps giving the result set back to your ORM so you can continue on as if nothing happened.

How do you write that SQL? Probably with strings again, or a helper library like SQL::Abstract.

The problem with raw strings is brought up in Epigrams in Programming:

  • It’s difficult to extract sense from strings, but they’re the only communication coin we can count on.
  • The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.

The nature of our database problem is that we’ve hit on a very complicated case, and now we’re stuck using the most ad-hoc way of structuring data. Not a good combination!

SQL::Abstract tries to address this by providing a structured way of building a string. We still have to output a string (” . . . they’re the only communication coin we can count on”), but at least we don’t have to deal with it directly. The problem is that, again, we’re getting to the most complicated cases of SQL, and SQL::Abstract abuses syntax tricks for these cases. Consider this case I pulled out of its docs:

my %where = {
  start0 => { -between => [ 1, 2 ] },
  start1 => { -between => \["? AND ?", 1, 2] },
  start2 => { -between => \"lower(x) AND upper(y)" },
  start3 => { -between => [
    \["upper(?)", 'stuff' ],
  ] },

Why are we taking references to scalars? And references to arrayrefs, for that matter? Could you copy-and-paste this to a DBA and explain what’s going on?

When you drill down a bit, the reason for these issues is that SQL::Abstract uses an Object Oriented interface to build this string, and OO is just not a good fit for this problem.

On the other hand, the problem can be solved more cleanly by combining functions that look vaguely like SQL:

my ($sql, @sql_params) = SELECT [qw{ bar baz }],
    FROM( 'foo' ),
        match( 'bar', '=', 1 ),
            match( 'baz', '=', 2 ),
            match( 'qux', '=', 3 ),

This is exactly what SQL::Functional does. Version 0.1 is now up on CPAN.

Language Community Litmus Test: Database Placeholders

Search for “[language] database tutorial”. Out of the top 5 results, how many of them show how to use safe practices (preferably placeholders, but safe quoting functions are OK, too) at the earliest available opportunity?

I award a score of 5 points for the first result, 4 for the second, and so on. A perfect score would be 15. If the first INSERT or SELECT statement has simple static data, I give it a pass and find the first statement that’s filling in data from variables.

  • Haskell: 15
  • Python: 12
  • Perl: 11
  • Ruby: 11
  • Node.js: 4
  • PHP: 4

There are some caveats with the judging of this data:

  • Haskell’s interfaces seem to use declarative functions to build the SQL statement rather than concatenating strings. I’m presuming these libraries quote things safely.
  • The third result for Python was to a wiki page that linked to various database-related info. Many of the links had placeholders and quoting mentioned early on, so I gave it points for this one.
  • The last two results on Python were to a Stack Overflow question without many direct examples, and a YouTube vid that has no dynamic statement usage. No points for these.
  • Perl’s second link was to an (??!) tutorial with no placeholders/quote functions shown. I think we in the Perl community might need to fix our SEO on this one.
  • Ruby’s second result was to a general Rails tutorial without much direct SQL access. No points here, but like Perl’s result, this may be more of a case of bad SEO.
  • Tempted not to give points to Ruby’s third result, which took a while to get to a statement with any dynamic data. But when you got there, it did do it safely, so I gave it a pass.
  • Edit: I initially didn’t give Ruby points for the fourth link. It was a short page that had no complex SQL statements. I see now that it’s one part of a larger document, and the subsequent link does use Ruby ActiveRecords. Ruby now gets points for this.
  • Fifth Ruby result was a link to the Ruby/DBI homepage. Clicked on “tutorial” sidebar link, which did show best practices, so I gave it points.
  • This test is perhaps unfair towards Node.js, because most of its top results covered MongoDB rather than an SQL database. While I have mean things to say about NoSQL, critiquing it wasn’t the intent of this test. The second link does use SQL, and does things correctly, so points awarded there.
  • All but the second PHP result concatenates $_POST[‘…’] or $_GET[‘…’] directly into statements in its earliest examples. The second result is trying desperately to show how things should be done, and is the only one to get points. One bad result can be passed off as an SEO issue. All but one is a real problem.