Delving the depths of computing,
hoping not to get eaten by a wumpus

How to write regexes that are almost readable

2022-06-06


Let’s start with a moderately simple regex:

/\A(?:\d{5})(?:-(?:\d{4}))?\z/

Some of you might be smacking your forehead at the thought of this being “simple”, but bear with me. Let’s break down what it does piece by piece. Match five digits, then optionally, a dash followed by four digits. All in non-capturing groups, and anchor to the beginning and end of the line. That tells you what it does, but not what it’s for.

Explaining out the details in plain English, as in the above paragraph, doesn’t help anyone understand what it’s for. What you can do to help is have good variable naming and commenting, such as:

# Matches US zip codes with optional extensions
my $us_zip_code_re = qr/\A(?:\d{5})(?:-(?:\d{4}))?\z/;

Like any other code, we hand off contextual clues about its purpose using variable naming and comments. In Perl, qr// gives you a precompiled regex that you can carry around and match like:

if( $incoming_data =~ $us_zip_code_re ) { ... }

Which some languages handle by having a Regex object that you can put in a variable.

There are various proposals for improved syntax, but a different syntax wouldn’t help with this more essential complexity. It could help with readability overall. Except that Perl implemented a feature for that a long time ago that doesn’t so drastically change the approach: the /x modifier. It lets you put in whitespace and comments, which means you can indent things:

my $us_zip_code_re = qr/\A
    (?:
        \d{5} # First five digits required
    )
    (?:
        # Dash and next four digits are optional
        -
        (?:
            \d{4}
        )
    )?
\z/x;

Which admittedly still isn’t perfect, but gives you hope of being maintainable. Your eyes don’t immediately get lost in the punctuation.

I’ve used arrays and join() to implement a similar style in other languages, but it isn’t quite the same:

let us_zip_code_re = [
    "\A",
    "(?:",
        "\d{5}", // First five digits required
    ")",
    "(?:",
        // Dash and next four digits are optional
        "-",
        "(?:",
            "\d{4}",
        ")",
    ")?",
].join( '' );

Which helps, but autoidenting text editors try to be smart with it and fail. Perl having the // syntax for regexes also means editors can handle syntax highlighting inside the regex, which doesn’t work when it’s just a bunch of strings.

More languages should implement the /x modifier.


Converted Blog To Gemini

2022-05-05


I haven’t updated the blog in a while, and I’m also rethinking the use of WordPress. So I decided to dump the old posts, and convert it to gemtext, the Gemini version of markdown.

https://gemini.circumlunar.space/

The blog will be hosted on Gemini, as well as static HTTP.

Not all the old posts converted cleanly. A lot of the code examples don’t come through with gemtext’s preformatted blocks. Some embedded YouTube vids need their iframes converted to links. Comments are all tossed, which is no big loss since 95% of them were spam, anyway.


Running Remote X11 Applications on a Raspberry Pi (or: Bad Minecraft) [Five Minute Building Blocks]

2021-02-08


https://www.youtube.com/watch?v=DcEMNovvb1s


Are A2 (or A1) Application Class SD cards marketing BS?

2021-01-01


People often misunderstand the speed ratings of SD cards. They are built cheaply, and historically have primarily targeted digital cameras. That meant they emphasized sequential IO performance, because that’s how pictures and video are handled. When we plug an SD card into a Raspberry Pi, however, most of our applications will use random reads and writes, which is often abysmal on these cards.

Enter the A1 and A2 speed ratings. These were created by the SD Card Association to guarantee a minimum random IO performance, which is exactly what we want for single board computers.

There’s been a blog post floating around about how the A2 class is marketing BS. In some cases, I’ve seen this cited as evidence of the A1 class also being marketing BS. After all, the top cards on the chart for random IO, the Samsung Evo+ and Pro+, don’t have any A class marks on them.

This misunderstands what these marks are for and how companies make their cards. Your card has to meet a minimum speed to have a given mark. You are free to exceed it. It just so happens that Samsung makes some really good cards and have yet to apply for the mark on them.

Samsung could change the underlying parts to one that still meets its certification marks for the model, but with far worse random IO performance. I don’t think Samsung would do that, but they would technically be within their rights to do so. This kind of thing has happened in the storage industry before. Just this past year, Western Digital put Shingled Magnetic Recording on NAS drives (SMR is a hard drive tech that craters random write performance, making them completely unsuitable for NAS usage). XPG swapped out the controller on the SX8200 Pro to a cheaper, slower one invaliding the praiseworthy reviews they got at launch.

Even if Samsung wouldn’t do that, it’s still something where you have to go out of your way to find the best SD card. Well-informed consumers will do that, but the Raspberry Pi serves a broad market. Remember, its original purpose was education. You can’t expect people to even know what to research when they’re just starting out. It also doesn’t help that major review sites don’t touch on SD cards. Benchmarking can be a tricky thing to do right, and most hobbyist bloggers don’t have the resources to do a good, controlled test, even if they mean well.

What the A class marks do is give a clear indication of meeting a certain minimum standard, at least in theory. Independent reviews are always good in order to keep everyone honest, but if you don’t have time to look at them, you can grab an A1 card and put it in your Pi and you’ll be fine.

As the blog post noted above states, A2 cards don’t always live up to their specs. According to a followup post, this appears to be due to things the OS kernel needs to support, rather than the card itself. It’s also possible for a company to try to pull a fast one, where they launch with a card that meets spec, and then quietly change it. However, if they do, they don’t just have consumer backlash to contend with, but the SD Card Association lawyers. Since they hold the trademark on the A class marks, they have the right to sue someone who is misusing it.

Marketing isn’t just about making people into mindless consumers. It can also be about conveying correct information about your product. That’s what the A classes are intended to do. Nobody knew that Samsung Evo and Pro cards were good until somebody tested them independently. With the A class marks, we have at least some kind of promise backed up with legal implications for breaking it.


Looking for a new maintainer on GStreamer1

2020-10-28


I slapped GStreamer1 together some years ago using the introspection bindings in Glib. Basically, you point Glib to the right file for the bindings, and it does most of the linking to the C library for you. They worked well enough for my project, so I put them up on CPAN. They cover gstreamer 1.0, as opposed to the GStreamer module, which covers 0.10 (which is a deprecated version).

It has an official Debian repository now, but I was informed recently that it would be removed if development continued to be inactive. Besides having enough time, I also don’t feel totally qualified to keep it under my name. I don’t know Gtk all that well, and don’t have any other reason to do so anytime soon. I was just the guy who made some minimal effort and threw it up on CPAN.

I’ve found GStreamer to be a handy way to access the camera data in Perl on the Raspberry Pi. It’s a bit of a dependency nightmare, and keeping it in the Debian apt-get repository makes installation a lot easier and faster.

So it’s time to pass it on. As I mentioned, they’re mostly introspection bindings at this point. It would be good to add a few custom bindings for things the introspection bindings don’t cover (enums, I think), which would round out the whole feature set.


The Seeed XIAO

2020-09-07


The Seeed XIAO board

Seeed has been picking up in the maker community as a source of low cost alternatives to popular development boards. The Rock Pi S is an excellent competitor to the Raspberry Pi Zero, providing the quad core processor that the Pi Zero lacks. They have also released the Seeed XIAO, a small ARM-based microcontroller competing with Arduino or Teensy boards, and sent me units for this review.

The board is small but capable, with an ARM Cortex M0+ 32-bit processor running at 48MHz, 256KB of flash, and 32KB of SRAM. Programming is done through a USB-C port, which also powers the device. There are 14 pins, including one 5V output, one 3.3V output, and one ground. The rest are I/O pins, all of which are capable of digital GPIO, analog input, and PWM output. There’s also a single DAC channel, plus I2C, SPI, and UART. Also included is a set of stickers that fit on the board’s main chip and shows the basic pinout of the main 10 pins.

Shipping came directly from China, and took several months. I’m sure at least some of the delay was likely due to Coronavirus issues. Still, this isn’t the first time I’ve seen Seeed be slow to ship. If you’re interested in this product, I suggest ordering in advance.

Once it arrived, the board was easy to setup. Pin headers come in the package, but will need to be soldered. Their wiki page for the board was easy to follow, especially if you have previous experience with Arduino-based boards. Programming is done through the Arduino IDE, using a plugin that provides the right compiler environment and firmware flasher. Many new boards are preferring the free version of Microsoft Visual Studio with PlatformIO, but Seeed is choosing to stick with Arduino’s IDE for this one.

The board can be tricky to get into programming mode. When uploading code, I often encountered an error like this in the IDE console (on a Windows machine):

processing.app.debug.RunnerException ... Caused by: processing.app.SerialException: Error touching serial port 'COM5'. ... Caused by: jssc.SerialPortException: Port name - COM5; Method name - openPort(); Exception type - Port busy.

Fixing this is done by double tapping the reset pad with a wire over to the pad right next to it. When it goes into bootloader mode correctly, Windows would often pop up a file manager for the device, as if a USB flash drive had been connected. This may also change the COM port, so be sure to check that before trying to upload again.

With that issue sorted, I uploaded the basic LED blinker example from Arduino, and it worked fine against the LED on the board itself. That shows the environment was sane and working.

I next turned my attention to a project I’ve had sitting on my shelf for a while: modifying an old fashioned oil lantern for a Neopixel. I received this lamp from a secret santa exchange party last year, and was told that while it had all the functional parts, it wasn’t designed to take the heat of a real burning wick. So I picked up Adafruit’s 7 Neopixel jewel, and combined it with NeoCandle, which would give a slight flickering effect. NeoCandle was designed for 8 Neopixels rather than 7, but commenting out the call to the eighth pixel worked fine.

Complicated libraries like this are often made for Arduino boards first, and everything else second, so I was bracing myself for a long debugging session to figure out compatibility. My worry was unnecessary, as the Adafruit Neopixel library worked without a hitch.

Neopixel lantern

(White balance on my phone camera wasn’t cooperating. The color of the light is more orange in person. It also has a very subtle flicker effect.)

At $4.90 each, this is a nice little board with a good feature set. It’s cheaper than the Teensy LC, and in terms of processing power, more capable than Arduino Minis. Getting the binary uploaded can be janky, but to be fair, that’s common on a lot of ARM-based microcontrollers. When a project doesn’t need a lot of output pins, I would use this in place of a Teensy LC. Just watch for those shipping times.


Castellated: An Adaptable, Robust Password Storage System for Node.js

2020-05-11


If you ask for advice on how to store passwords, you’ll get some responses that are sensible enough. Don’t roll your own crypto. Salted hashes aren’t good enough against GPU attacks. Use a function with a configurable parameter that causes exponential growth in computation, like bcrypt. If the posters are real go-getters that day, they might even bring up timing attacks.

This is fine advice as far as it goes, but I think it’s missing a big picture item: the advice has changed a number of times over the years, and we’re probably not at the final answer yet. When people see the current standard advice, they tend to write systems that can’t be easily changed. This is part of the reason why you still have companies using crypt() or unsalted MD5 for their passwords, approaches that are two or three generations of advice out of date. Someone has to flag it and then spend the effort needed for a migration. Having run such a migration myself, I’ve seen it reveal some thorny internal issues along the way.

Consider these situations:

Castellated is an approach for Node.js, written in Typescript, that allows password storage to be easily migrated to new methods. Out of the box, it supports encoding in argon2, bcrypt, scrypt, and plaintext (which is there mostly to assist testing). A password encoded this way will look something like this:

ca571e-v1-bcrypt-10-$2b$10$wOWIkiks.tbbftwkJ81BNeuOtq631SzbsVOO7VAHf5ziH.edAAqJi

This stores the encryption type, its parameters, and the encoded string. We pull this string out of a database, check that the user’s password matches, and then see if the encoding is the preferred type. Arguments (such as the cost parameter) are also checked. If these factors don’t match up, then we reencode the password with the new preferred types and store the result. This means that every time a user logs in (meaning we have the plaintext password available to us), the system automatically switches over to the preferred type.

There is also a fallback method which will help in migrating existing systems.

All matching is done using a linear-time algorithm to prevent timing attacks.

It’s all up on npm now.


Doorbot.ts - Let me in the building, with Typescript

2020-02-13


A common issue with makerspaces is letting people in the door. Members sign up and should have access immediately. Members leave and should have their access dropped just as fast. Given the popularity of the Raspberry Pi among makers, it makes sense to start there, but how do you handle the software end?

Doorbot.ts is a modular solution to this problem. It’s split into three parts:

A basic setup might read a keyfob from the reader, authenticate against an API held on a central server, and then (assuming everything checks out) activate a GPIO pin on the Raspberry Pi for 30 seconds. This pin could be connected to a solenoid or magnetic hold to unlock the door (usually with a MOSFET or relay, since the RPi can’t drive the voltages for either of those).

The parts that interface to a Raspberry Pi (such as a Wiegand reader or activating a GPIO pin) are in a separate repository here: https://github.com/frezik/rpi-doorbot-ts

Security

The system can be compromised in quite a few ways. RFID tags can be copied. Cheap 10 digit keys can be brute forced. If you buy cheaply in bulk, there’s a good chance those RFIDs are sequential or otherwise predictable. The Raspberry Pi can be compromised. The database can be compromised.

Which is not to say it’s hopeless. That is, it’s no more hopeless than what security you already have. An attacker can put a rock through any window. Physical locks can be picked, often without much skill at all. If you’re like many makerspaces, you’ll give a keyfob to anyone who shows up with membership dues, and you probably don’t even check their ID.

So no, it’s not the most secure system in the world, but it’s also not the weakest point in the chain, either.

This isn’t to say security should be ignored. Take basic precautions here, like using good passwords, and keeping the systems up to date.

Wiegand

Wiegand is a common protocol for RFID readers. There are cheap readers that work like a USB keyboard, “typing” the numbers of the fob. However, they’re usually not built for outdoor mounting. There are cheap Wiegand options that are.

The protocol runs over two wires (plus power/ground). There is a Reader module in rpi-doorbot-ts for handling it directly. I don’t recommend using it. The issue is that reading the data over GPIO pins means using tight timing, and doing this in Node.js is often unreliable. It often resulted in needing to scan fobs two or three times before getting a clean read.

Instead, you can use a C program who’s only job is to read Wiegand, interpret the bits, and spit out the number. Doorbot.ts then has as Reader that can read a filehandle. So you launch the C program, pipe its stdout into the Reader, and there you go. From practical testing, this has been far more reliable in scanning fobs the first time.

Installing the System

I recommend creating a directory in your home dir on your Raspberry Pi:

$ mkdir doorbot-deployment
$ cd doorbot-deployment

And then create a basic package.json:

{
  "name": "doorbot",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "@frezik/doorbot-ts": "^0.13.0",
    "@frezik/rpi-doorbot-ts": "^0.6.0"
  }
}

Be sure typescript is installed globally, and then install all the dependencies:

$ sudo npm install -g typescript
$ npm install

Next, make an index.ts to run the system:

import * as Doorbot from '../index';

const INPUT_FILE = "database.json";


let reader = new Doorbot.FHReader(
    process.stdin
);
let auth = new Doorbot.JSONAuthenticator( INPUT_FILE );
let act = new Doorbot.DoNothingActivator( () => {
    console.log( "Activate!" );
});

Doorbot.init_logger();
Doorbot.log.info( "Init" );
reader.setAuthenticator( auth );
auth.setActivator( act );
reader.init();

Doorbot.log.info( "Ready to read" );
reader
    .run()
    .then( () => {} );

This code creates a reader that works from STDIN, and authenticates against a JSON file, which would have entries like this:

{
    "1234": true,
    "5678": true,
    "0123": false
}

It then hooks the reader, authenticator, and activator together and starts it running. You can type the key codes (“1234”, “5678”, etc.) in to get a response.

I’ll be following up with some more advanced usages in future posts.


The Wisdom of TAP Numbering in a JavaScript World

2019-08-26


TAP started as a simple way to test the Perl interpreter. It worked by outputting a test count on the first line, followed by a series of “ok” and “not ok” strings on subsequent lines. The hash character could be used for comments.

1..4
ok
ok
not ok # Better check if the frobniscator is working
ok

There’s been some additions over the years, but a lot of output of individual tests still looks a lot like this. Multiple tests could be wrapped together with prove. Here’s what it looks like to run that on Graphics::GVG:

$ prove -I lib t/*
t/001_load.t ............. ok     
t/002_pod.t .............. ok     
t/010_single_line.t ...... ok   
t/020_many_lines.t ....... ok   
t/030_circle.t ........... ok   
t/040_rect.t ............. ok   
t/050_color_variable.t ... ok   
t/060_num_variable.t ..... ok   
t/070_ellipse.t .......... ok   
t/080_regular_polygon.t .. ok   
t/090_int_variable.t ..... ok   
t/100_glow_effect.t ...... ok   
t/110_comments.t ......... ok   
t/120_point.t ............ ok   
t/130_include.t .......... skipped: Implement include files
t/140_include_vars.t ..... skipped: Implement include files
t/150_block_var.t ........ ok   
t/160_poly_offset.t ...... ok   
t/170_ast_to_string.t .... ok   
t/180_two_parses.t ....... ok   
t/190_meta.t ............. ok   
t/200_renderer.t ......... ok   
t/210_two_meta.t ......... ok   
t/220_named_params.t ..... ok   
t/230_renderer_class.t ... ok   
All tests successful.
Files=25, Tests=91, 10 wallclock secs ( 0.10 usr  0.05 sys +  8.86 cusr  0.57 csys =  9.58 CPU)
Result: PASS

Since TAP is just text output, it’s easy to implement libraries for other languages. With prove‘s --exec argument, we just pass an interpreter. Here’s an example of a TypeScript project of mine:

$ prove --exec ts-node test/*
test/activator_do_nothing.ts ........ ok   
test/activator_multi.ts ............. ok   
test/authenticator_always_false.ts .. ok   
test/authenticator_always.ts ........ ok   
test/authenticator_multi_fails.ts ... ok   
test/authenticator_multi.ts ......... ok   
test/logger.ts ...................... ok   
test/reader_fh.ts ................... ok   
test/reader_mock.ts ................. ok   
test/sanity.ts ...................... ok   
All tests successful.
Files=10, Tests=12, 17 wallclock secs ( 0.05 usr  0.01 sys + 30.40 cusr  1.16 csys = 31.62 CPU)
Result: PASS

Although the TAP package for JavaScript has a nice little runner all its own, which includes an automatic coverage report:

$ node_modules/.bin/tap --ts test/**/*.ts
 PASS  test/activator_do_nothing.ts 1 OK 5s
 PASS  test/authenticator_always_false.ts 1 OK 5s
 PASS  test/activator_multi.ts 2 OK 5s
 PASS  test/authenticator_always.ts 1 OK 5s
 PASS  test/authenticator_multi_fails.ts 1 OK 5s
 PASS  test/authenticator_multi.ts 1 OK 5s
 PASS  test/logger.ts 1 OK 5s
 PASS  test/reader_fh.ts 2 OK 5s
 PASS  test/sanity.ts 1 OK 3s
 PASS  test/reader_mock.ts 1 OK 3s

                         
  🌈 SUMMARY RESULTS 🌈
                         

Suites:   10 passed, 10 of 10 completed
Asserts:  12 passed, of 12
Time:     14s
--------------------------|----------|----------|----------|----------|-------------------|
File                      |  % Stmts | % Branch |  % Funcs |  % Lines | Uncovered Line #s |
--------------------------|----------|----------|----------|----------|-------------------|
All files                 |    93.04 |    78.57 |    82.86 |    93.33 |                   |
 doorbot.ts               |      100 |       80 |      100 |      100 |                   |
  index.ts                |      100 |       80 |      100 |      100 |                26 |
 doorbot.ts/src           |     91.4 |    77.78 |    81.82 |    92.31 |                   |
  activator_do_nothing.ts |      100 |      100 |      100 |      100 |                   |
  activator_multi.ts      |      100 |      100 |      100 |      100 |                   |
  authenticator_always.ts |      100 |      100 |      100 |      100 |                   |
  authenticator_multi.ts  |      100 |       80 |      100 |      100 |                22 |
  read_data.ts            |      100 |      100 |      100 |      100 |                   |
  reader.ts               |    28.57 |      100 |       25 |    33.33 |       39,40,41,44 |
  reader_fh.ts            |    81.25 |        0 |       50 |    81.25 |          43,53,54 |
  reader_mock.ts          |      100 |      100 |      100 |      100 |                   |
--------------------------|----------|----------|----------|----------|-------------------|

Which, of course, can be dropped into the scripts -> test section of your package.json.

There’s a feature of TAP that’s been falling out of use in recent years, even in Perl, and that’s test counts. Modern TAP allows you to put the test counts at the end, which can be done in Test::More by calling done_testing(). In the tap module for Node.js, you can simply not set a test count at all and it does it for you. Here’s an example of that with a sanity test, which just makes sure we can run the test suite at all:

import * as tap from 'tap';
import * as doorbot from '../index';

tap.pass( "Things basically work" );

Avoiding a test count seems to be the trend in Perl modules these days. After all, automated test libraries in other languages don’t have anything similar, and they seem to get by fine. If the test fails in the middle, that can be detected by a non-zero exit code. It’s always felt like annoying bookkeeping, so why bother?

For simple tests like the above, I think that’s fine. Failing with a non-zero exit code has worked reliably for me in the past.

However, there’s one place where I think TAP had the right idea way back in 1988: event driven or otherwise asynchronous code. Systems like this have been popping up in Perl over the years, but naturally, it’s Node.js that has built an entire ecosystem around the concept. Here’s one of my tests that uses a callback system:

import * as tap from 'tap';
import * as doorbot from '../index';
import * as os from 'os';

doorbot.init_logger( os.tmpdir() + "/doorbot_test.log"  );

tap.plan( 1 );


const always = new doorbot.AlwaysAuthenticator();
const act = new doorbot.DoNothingActivator( () => {
    tap.pass( "Callback made" );
});
always.setActivator( act );

const data = new doorbot.ReadData( "foo" );
const auth_promise = always.authenticate( data );

auth_promise.then( (res) => {} );

If the callback to run tap.pass() never gets hit, this test will fail. If you had removed the call to tap.plan( 1 ) towards the start, it would pass as long as it compiles and doesn’t otherwise hit a fatal error. Which is a pretty big thing to miss, since the callback is critical to the functionality of this particular module. Even if I knew it worked now, it might not in regression testing later.

Most (all?) other automated test frameworks around JavaScript have this problem. Sometimes, you can get around it by writing in clever ways, but it is far too easy to write test code that can erroneously declare success. Besides, shouldn’t your tests be written in the most obvious, straightforward way? TAP had the answer decades ago.


Setting up a Reverse Proxy with the Raspberry Pi

2019-01-15


https://www.youtube.com/watch?v=EtxFP93h6ss