Choosing a data format

It’s no secret that I hate just about every data format out there. SQL is non-portable, XML is usually Wrong, abused and badly laid out, nor is it appendable; YAML doesn’t include enough structure. Text files are usually too ad-hoc, and hard to index. RDF is beautiful, but hard to parse.

If you need microformat interoperability — that is, the ability to yank out little pieces of information from larger documents or data structures, look at XML and especially RDF/XML, and take note of existing (meta)data standards like Dublin Core. RDF has a whole suite of well specified semantic types, which are importable by reference to the specs. Parsing is exeedingly well defined, but hell to do in an ad-hoc way. This plays nice with some DTD-based XML, and mosst DTD-less XML. The lack of easy-to-append isn’t often a problem in cases like this, especially with “documents” that are usually updated whole, not streams of data. XML is good at corruption detection because of well-formedness rules.

If you need streams of data, look at separate files for each entry, or a text file you can append to. If indexing or querying is important, look at SQL, too — Sometimes it’s really the best tool for the job. SQL lets you formally specify data types, which is good, but relations are left up to the query, which can be bad, as there’s no universal namespace, so data formats end up being very ad-hoc. Binary storage like SQL isn’t resilient in the case of format upgrades or disk errors, so keep backups. Good ones.

If you want to import the semantics of various IETF RFCs, look at MIME-formatted files, and at using HTTP as a transport. There’s relatively easy conversions between mail and web, and parsers are very easy to implement. Files are text files, so with a human to go through the data, even things like having sectors missing from your disk may not render the data entirely unusable. Searching with linear search programs like grep is easy, and if metadata is in the header of the data, it’s relatively easy to match as well.

If you want to mark up arbitrary text with arbitrary annotations, look at XML with mixed-content DTDs like XHTML. Mixing sets of annotations is relatively easy, for many tasks, and the ability to just mark relevant bits of text with semantic annotation makes for very strong, parseable documents. Pure XHTML 1.0, 1.1 and 2.0 are all easily indexed as well with crawlers like Google, Lucene and HT://Dig.

If you need simple config files, first see if your implementation language serves as a suitable choice itself, if not, look at YAML — its utter simplicity at representing hashes and lists makes it a natural choice for very simple config data. (I find that most software configuration can be simplified to a hash.)

If you’re dealing with encryption, look into the bulk of ASN.1 instead of XML, or at least get a handle on treating data with binary exactness, and not being loose with white-space and newline translations. Have some idea what your canonical format is.

If you’re building compound documents, especially on the Macintosh, take a hint from MacOS and use a directory instead of a file. OS support for directories as opaque objects is getting better.

In all these cases, figure out what your atomic edit is — if an entire document gets saved at once, XML might be fine. If edits are always appends, XML is going to be ugly. If you need random-access and concurrent updates, look at one file per entry, or SQL, since concurrency is well supported in these cases.

532

_E_UNTASTE: cannot place cell of color “maroon” inside table of color “pink”.

Just because I can, bluesbodger

Quote of the Day

Night_Kitten is amused by Sean-speak.

Sean says “I fully realise that Seanish is t3h secks”

lev('Seanish', 'Spanish') => 1

530

The café this morning was overload. Both of the Haga sisters are training to work there at the moment — This being Marisa “Looks like Mariah Carey” Haga (age 18 and going on 27) and her sister Alicia (age 16, and could pass for 25 with no problem whatsoever), both wearing manipulatively attractive clothing, the sort I find both intriguing and abhorrent.

And on the left, some cute construction worker that I couldn’t help but think “Geez, he’s cute” over. Obviously apprentice-level, but smart. Very soft-spoken. Twinkly eyes. Tousled hair, and nicely trimmed. I think I must have it in for Irish genotypes.

529

I finally fixed the busted Jabber↔AIM gateway on nbtsc.org this morning. Perhaps now I’ll actually be able to tell when my AIM friends are online reliably. And now I can talk to people @mac.com.

Now if everyone used Jabber, this wouldn’t be a problem.

Now I just have to package up this version for PLD.

527

Today kicked ass in every way. I got tons done, I wore my wonderful new shirt (and my mother complimented me on it, so it must be either boyish or nice and I know it’s not the first), I ate lunch with my sister. The billing didn’t generate a million phone-calls — just one, in fact. Jem had a good day. Robyn and I talked briefly. I had good conversations this evening. I watered the garden, and made a damn good curry. My kitchen is cleaner than when I started. I got the wireless on my laptop fixed, and checked the fix into PLD’s CVS, so I won’t have to repeat this next time around. I fixed a minor but annoying bug in Ruby relatively neatly, and sent the patch off. I saw a beautiful sunset, I drank enough water, and life is just generally grand today.

Agra Greens à la Ari

Alternately titled “My feet are covered in llama poop and I’m eating weeds, and I like it.”

  • two medium chopped onions, gently sauteéd in oil,

  • one half teaspoon each of clove, corriander, paprika, red pepper, tumeric, coriander, cinnamon, cumin, fennel, black pepper and allspice,

  • sauteé the spices a moment,

  • add one tablespoon chopped fresh ginger,

  • add eight ounces of water and eight ounces of tomato paste,

  • blend and set aside. That’s a basic curry sauce recipe.

  • sauteé one chopped onion until translucent,

  • add one half cup of whole milk yoghurt and cook until some of the water disappears,

  • add 23 cup of the sauce,

  • add two and a half pounds of garden weeds (I prefer dandelion, lambsquarter and marshmallow) or spinach, and a little water. Put a lid over it and let the greens steam down into a mushy puddle.

This is really delicious.

526

Quote of the day: “I am just hated by the world (aka my ex-girlfriend)” — vanilla_megami.

525

It’s a long story, but I’ve got a ton of donated computer equipment that’s supposed to be destined for disabled folks in my area sitting in my office. However, I’ve got a lot of decent things that aren’t suitable for the people they’re destined for. If anyone wants some, just comment and I’d be glad to get rid of it.

  • Scanjet IIc scanner — good for robotics projects, full of servo motors. Works, but it’s SCSI, so not many people could make use of it as is.
  • A few parallel port scanners — Visioneer, Mustek and HP
  • More CRTs than I know what to do with — 13, 14, 15, and 17 inches.
  • Some mediocre computer mics.

524

Have you ever wanted to take your favorite cartoon character and say “Zoë, you’re being a moron!”?

523

I just realized that I hated pyjamas from the moment I got yellow ones and my sister got pink, until this month, when I realized I could wear girly pyjamas.

God, my brain sucks sometimes.

522

Best quote ever: “Like some sort of Xerox bordello, catering to those with bad taste in neckties.” — wolftracks

521

This book-meme thing is on everyone’s minds. Even Peter St. Andre has a book list this week..

520

Airfare for Italy is damn cheap. $800. I am so planning a trip.

Update: If you actually plan to leave later than a week from now, make that $600.

519

I remember when I used to hang out on IRC, and a different tone of conversation would happen in every window. My mood would change right along with the window switching — happy, sad, happy, excited, intense. I used to code alongside this, and be able to flip from focused to not. I haven’t been able to do that in a while. I think it comes from not fully connecting online as much as I used to — and I am again, finally, and it feels like an important part of me is back.