Populating Test Data

With the beginning of the new term — and, consequently, a new course load — I’ve been trying to work out the right balance between class work and hobby projects. And maybe even spending some time away from the computer (imagine that!). For that reason, I put aside Conway in order to focus on a lighter task: futzing about with the web application database. Though I did get a few tasks accomplished, I was doing just that: futzing. So this update won’t be particularly fascinating, by any accounts.

Rewind to a few weeks ago, and you’ll recall I was discussing the basics of designing and building a database. Primary Keys and Foreign Key contraints and all that. Well, with the database schema (the configuration of tables & columns) firmly in place and tested, I was almost ready to build a very basic prototype/proof-of-concept. All I lacked was actual data to manipulate, which meant it was time to generate some. Verisimilitude wasn’t the top priority here, but I figured, the more realistic the data, the better: the prototype will then provide a better subjective “sense” of how the app works.

MySQL naturally offers some flexible INSERT INTO syntax for adding records into tables. Ye olde standard command looks something like this:

INSERT INTO Person (name, email)
VALUES ("usr1", "usr1@example.com");

Assuming, of course, that your table is named “Person” and has columns “name” and “email” (if those are the only columns it has, you don’t need to specify them). There’s a very obvious problem with using this to populate an entire database, though: it can enter only 1 record at a time. I need to create dozens if not hundreds of entries in one go. Furthermore, there’s the self-referential problem. Remember all those foreign keys? I want to have data that adheres to those constraints, e.g. that refer to actual rows in other tables. Planning that out ahead of time would be murder.

One problem at a time. First, we get around the need for INSERT INTO by instead relying on LOAD DATA INFILE. This is a handy-dandy command that, as advertised, can parse a text file, adding each line as a row in the specified table. So one could type out data.txt, like so:

usr1|usr1@example.com
usr2|usr2@example.com
[...]
usrX|usrX@example.com

And add all those records at once by typing the following in MySQL:

LOAD DATA LOCAL INFILE "data.txt"
INTO TABLE Person
FIELDS TERMINATED BY '|'
(name,email);

The specifics of what this command is doing should be pretty obvious. You specify that your data infile is on the “LOCAL” filesystem (as opposed to on some server somewhere); you give the name of the infile (“data.txt”), the table in which the records should be placed (“Person”), and, in this case, the delimiter used to differentiate between different fields. The use of the pipe ‘|’ here is completely arbitrary; I just think it makes everything more legible. You can also specify a delimiter between records, if your text file has everything on one line.

(Note to self: when running MySQL from the command line, be sure to start it with the “–local-infile” flag. Otherwise, security measures will prevent LOAD DATA LOCAL INFILE from working.)

And now for the second problem: I don’t actually want to think up several hundred distinct records to write those data infiles. So I wrote a little script to automate the task for me. And because I hadn’t given it any love & attention recently, I used Python to do so. Now, I can empty and fill up all my tables with comprehensive test data with two simple commands: one for the Python script, one for loading the data files.

The script’s process is mind-numbingly simple: you open a new file, write a set number of records, and close the file. As long as you specify predictable ID numbers for each record, you can reuse those IDs as foreign keys in other tables. For instance, say you create users with IDs 1-200. When creating Friends records, pick two random numbers in that range; the people with those IDs are now friends. For other fields, there is always the option of generating random content — as long as you can control the degree of randomness, you can come up with some pretty sensible information.

With that ideal in mind, I ported an old name generator I once wrote in Java into the script. It’s nothing sophisticated, simply mixing & matching a series of first and last names. But it allows for legible names, usernames (first name + last name), and emails (username + “@example.com”). Besides, it injects a little humor into the work. My database now contains users soberly named Sara Thompson, William Ives, and Alice Furst. But it also contains such characters as Joel the Bane of B-Trees, Tim the Ringbearer, Lief Von HelYoel, Cthulu Dragonslayer, and Abraham of the Questionable Hygiene. I can hardly wait to see who has befriended whom in this mad system I’ve created.

Anyway, this is all a bit of silliness as I shuffle things into place for the more interesting work. Next up on the agenda: diving into PHP, the single most-used web development tool out there. And perhaps letting myself getting a little distracted by AJAX (ooh, shiny!).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: