Transitioning from a case-insensitive to a case-sensitive server

Converting sites that have been hosted on a Windows server is often frustrating, as IIS allows files to be accessed with any case in their filename. Here’s a simple solution for a site made of static files, using PHP and an Apache 404 handler:

In 404-case-insensitive.php

<?php

/* Copyright 2010 The Internet Company LLC
 *
 * May be copied under the terms of the MIT software license.
 */

	$directory = dirname($_SERVER['REQUEST_URI']);
	$base = basename($_SERVER['REQUEST_URI']);
	if($directory == '/') $directory = '';

	$potential = glob($_SERVER['DOCUMENT_ROOT'].$directory."/*");

	foreach($potential as $e) {
		$e = basename($e);
		if(strtolower($e) == strtolower($base)) {
			header("Location: $directory/$e");
			exit(0);
		}
	}

	Header("HTTP/1.1 404 File Not Found");
	echo("Page not found.");

?>

And in .htaccess

ErrorDocument 404 /404-case-insensitive.php

Banana and gjetost quesadilla

You have no idea how weird my diet is until you see me eat something like this.

Fry one sliced banana in coconut oil until browned and soft.

Put several strips of gjetost cheese on a tortilla, and melt. Let the tortilla crisp slightly.

Add the bananas and try not to moan in ecstasy while eating.

Ceviches

Mexican-style ceviche

Two fillets of tilapia, cut into 1 cm cubes.

50 ml lime juice

50 ml lemon juice

50 ml rice vinegar

One roasted fresno pepper or ripe jalapeño pepper, chopped finely.

50 g of finely chopped onion.

2 g salt

Let the fish marinate in the rest of the ingredients. Serve and enjoy

Dill trout ceviche

Three tiny or two small trout fillets, cut into small pieces

50 g finely chopped onion

50 ml lemon juice

50 ml balsamic vinegar

5g dill seeds

5g dill weed, cut

2g salt

Marinate the fish in the rest, serve and enjoy.

Node.js Streams

Making an object that speaks the node.js streams interface is surprisingly difficult.

There’s a fair number more interfaces than meet the eye:

You have the interplay of stream.readable and stream.resume()

You have the fact that streams speak in both Buffers and Strings.

sys.pump doesn’t relay errors, so you have to attach handlers to the right objects – I’m not sure if that one’s a problem yet or not.

Stewed eggplant with sweet rice (vegan!)

Eggplant are in season. We’re eating them stewed.

Chop two large eggplant into half inch cubes.

Chop a large onion. Fry it in a generous portion of olive oil.

Add herbs. Tonight’s: oregano, a head of garlic, a bit of paprika. Last night’s: a touch of cinnamon, paprika, oregano, dill. Fry them into the onion, then add the eggplant. Let the eggplant brown slightly, then add two large cans of tomatoes, or a couple pounds of fresh tomatoes. Add a spoonful of sugar, possibly some balsamic vinegar.

Let this cook down. It’ll stick slightly. If so, it’s caramelizing, and that’s just what you want. Don’t let it stick too badly, but it should sizzle when you stir it down to the bottom of the pot.

Let it cook until it’s a thick paste. It won’t be smooth, but it’ll be a really rich spread.

Cook rice, I used a short-grain white rice.

Rehydrate some raisins. Drain them.

Fry a half an onion in a frying pan. Let it start to caramelize and brown. Add a half teaspoon of tumeric and a teaspoon of paprika.

Add a tablespoon of sugar. Let it caramelize slightly. Add the rice, the raisins, and a tablespoon of poppy seeds. Salt just a little.

Serve side by side, let the flavors contrast. The intense richness with the velvety texture of the eggplant, with the sweet chewiness of the rice and raisins. The bright yellow-orange of the rice with the deep red of the eggplant and tomatoes.

Statistics from mail filters

Entities: connections, messages, sending IPs, destination email addresses and domains, sending email addresses and domains

  • RBL hits per entity
  • Minimum, maximum, average, mean, deviation
  • Bad RCPTs per entity
  • Total RCPTs per entity

I’m sure there’s more, this post will be edited as I think of them.

You can detect VERP senders by having a high correlation of sending domain and receiver email address.

You can detect dictionary attacks by having a high correlation of sending IP, domain or receiver email address and receiving domain.

Mail filter actions

Most mail filters get something major wrong. Most use an ordered list of actions, but limited to narrow scopes, in the order that they occur in SMTP: first check the sender, then the receivers, then check the content.

Mail filter plugins should be run first in order of what phase of processing they need to be in, but evaluated in order of finality of their decision. Check RBLs that outright block hosts first, then ones that are used to decide to quarantine. Then check for viruses, things that will get a message outright rejected or quarantined, then check spam filters.

Execute in parallel, in fact. Many checks involve waiting on networks, disks and other resources, so there’s no reason not to set several actions off at once and wait for completion.

There are several sets of actions that happen: responses to the SMTP client that’s sending us the message, and internal processing of the message, logs, notices to receivers about exceptional events. Once a message is accepted at SMTP time, we no longer have the option to bounce it: if it disappears into the aether, it had better really be junk, because nobody will know what happened to it. Each stream of actions is independent: rules will continue to be evaluated until all specified actions have been satisfied. (smtp, receiver, message, system)

The actions one might want: tempfail, accept, reject, notify, drop, log, record, add-header, add-footer, filter-message, redirect, quarantine, and continue.

The redirect and quarantine actions merely change the destination of the message, and don’t stop processing.

I figure group them numerically, with the highest priority overriding any lower priorities. Let groups be ORed together. Stop when you have a definite answer.

There are two kinds of actions: on`` actions react to the conditions of the group -- if a whitelist matches or not, if a spamfilter returns 'spam', 'not spam' or 'unsure'. ``on .. when actions are triggered when the condition of the when clause matches as well, forming a primitive boolean AND while still respecting an idea of priorities.

`

defaults { on error tempfail all; on success continue all; on any log all; }

group virus { checkcontent clamd; on match reject all, log system, log receiver; }

group user-whitelist { check whitelist; on match accept all; on match when virus match notify receiver; }

group { checkrbl b.barracudacentral.com; checkrbl b.spamcop.org; on match reject all, log system; }

group { checkcontent lmtp:///tmp/spamd.sock; checkcontent blacklistedwords; on spam accept smtp, quarantine message; }

finally { on any accept all; } `

A message comes in from 127.0.0.2: RBLs come up saying to block it. Because no higher rule will accept it, it gets rejected before DATA. The connection attempt is logged to the user, but no message is accepted at all.

A virus-bearing message comes in from 1.2.3.4, from a white-listed sender: RBLs don’t reject it, not being a listed IP. The SMTP connection gets as far as DATA, and the virus scanner is fired off, and returns a ‘virus’ response. The message is rejected on the SMTP side, a notice is sent to the receiver with the details. The whitelist is lower priority than virus scanner, so the message is still rejected. However, since there is also an action aimed at the receiver, that event fires and a notice is sent to the receiver of the message. At this point, evaluation stops since there are no more actions that could happen.

Thoughts and suggestions are welcome.

Mail filter extensibility

The biggest internal requirement that I have for a new mail filter setup is extensibility. The actual decision as to what is and is not spam needs to be left up to modules.

I hesitate to write a system that is a suite of full ACLs, like Exim or Postfix’s access controls. Postfix’s are barely flexible enough to work at all, and Exim’s are so overwhelming and yet limited that you have to be a programmer to write a system that’s not going to break or lose mail, and a clever programmer at that.

Every technique for filtering has a natural place in the flow of things: RBLs are early, at HELO or RCPT TO time; Learning filtering must come after DATA has been received, and could either stream or receive the message as a single dump. Filtering at HELO time should be rare: you can’t check a per-destination whitelist that early. You have to wait for RCPT TO, and in fact, many senders may retry again and again and again if you reject at HELO instead of RCPT TO.

So each plugin receives some part of the SMTP-time data: early ones get IPs and connection-related information, and later ones get the full message data.

Plugins essentially distill their input into a status: “good”, “bad”, “not sure”

Mail filter requirements

It’s time to update the spam filter at The Internet Company again.

I’m getting a lot of feedback from users of both my system and another I administer that they need several different things in a spam filter.

My users need:

  • The ability to retrieve a filtered message. Even if it’s rejected, in most cases, being able to fetch it from a quarantine is necessary. Some things can be hard-rejects, like virus-infected mails and things from very obvious spam sources, but the grey area needs to be very wide.
  • Some degree of control over what techniques are used: degree of quarantining, whether blacklists are used, and whether they reject or merely quarantine mail
  • Whitelisting, both by individual user and by domain.
  • Blacklisting, both by individual user and by domain, including whether to quarantine or reject.
  • Ability to retrain a learning filter while still using a POP3 mail client. This means a ‘signature’ with saved fulltext of the message like DSPAM or CRM114’s mailreaver do, so mail can be forwarded back altered by mail clients with no interest in preserving formatting like Microsoft Outlook, or so that there can be a web interface to retrain.

The overall themes here are ‘user control’ and ‘ability to retrieve a missed message’. Spam filters can be highly accurate in practice, with well-trained users who understand how the filters work, but most aren’t accurate enough or careful enough while training to be able to reject mail based on a learning filter alone. Business users could lose a thousand dollars or more on certain emails from previously unknown senders, so the ability to review and recover from the filter’s decisions is very important.

Tomato-tahini pizza sauce

This rocked on a red-pepper-and-onion pizza last night.

Fry a half dozen cloves of garlic, chopped fine, in some oil or fat.

add a can of tomato paste, and let it caramelize around the edges, stirring occasionally over a couple minutes.

Add a a cup and a half of chicken or turkey stock, preferably the gelatinous kind.

Add two or three tablespoons of tahini

Add some balsamic vinegar and salt to taste.

Add some hot red pepper paste, or a little red pepper powder. (I used both, and the pepper powder was habanero. Hot and delicious.)

Let it cook down until thick. With my turkey stock, that doesn’t take long.

Makes a fantastic pizza.

Tonight's creative output

Light is painfully bright, after being in room for so long. Door opens, a slight sucking noise, pressure matches outside.

There is nothing to see. Just blinding whiteness, sunlight glares fiercely.

No alarms sound. Hum of generators, Gentle whistle of air scrubbers, all quiet. No noise.

Light fades as irises tighten, world comes into focus, slowly, detail emerges.

Rubble is everywhere. Almost everything is ashy white, scorched and scorched again, until even black char marks are burned away in intense heat.

A little more in bright light, and shadows snap into place. Faint against burned objects, but there. Grey-white shadow, hints of what things had been before.

Silence.

There is no breeze. Sky is brilliant, cloudless blue. Sun feels white hot, tempting to look. Too much. Too much heat.

Blink.

Blink.

Stretch, as if waking from slumber.

Move rubble.

Glad that door opens inward. So much right there, it would not have moved if it opened outward.

Drop rubble. Silence shatters. A clatter. Gone again. More silence.

Sun beats down from overhead. Skin prickles.

Another piece of rubble. Set gently down, more slowly this time.

Blink.

Blink.

Just rubble, heaps large and small, a sort of pattern. Maybe like cells. Cells, only stone and concrete and large. Too large.

A loud, metal bang. Maybe close.

Turn, but see nothing.

A clatter of rubble being moved. Definitely close.

Blink. Still bright.

Figures stand on rubble. Not far, just as far as a body. Body. A body of cells. Reach, reach for figures. Too far. More than a body can reach.

Move another piece. More noise.

Silence.

Figures. Two. Eyes. Hands. Feet. Two figures. Two figures. Many hands. Many feet.

Long sleep. Not sleep. Long wait. A long wait, then brightness, then everything is new again. Now one and one is two again, two and two is four. Bodies have eyes, eyes see sky. Sun is bright. Noon sun. Any noon. No dates now. No time. Just days. The world is new again.

The world is new again.

An HTML5 parser for Javascript

I’ve been in the process of writing a port of the HTML5lib HTML5 parser to Javascript, at the moment, specifically node.js.

The parsing algorithms laid out in the spec are really excellent: The fallbacks for various cases where tags are omitted are mostly elegant and entirely clever. Supporting fragments of XML languages like SVG and MathML inline in HTML is excellent – with any luck, we’ll see a lot more rich vector graphics in web pages now, without dropping down to a box full of Flash.

The parser is currently a bit slow, and I’ll blog about why soon – suffice it to say that V8’s string-handling leaves a lot to be desired when you’re poking at numerous, tiny pieces of them, rather than larger manipulations.

Anyway, check it out.

Back-to-Back Cisco 828 SHDSL Routers

Something I’ve struggled with often in my career is getting a link between two buildings in the same property – a house to an office, a house to an outbuilding, two offices, two apartment – and having it be non-line-of-sight, so wireless links start getting expensive (900mhz equipment runs $500 per end if you’re getting them prefab; even assembling them yourself runs $250 per end, plus time.)

This particular property already had phones running from the house to the barn/office, so I knew there was some sort of cable between them – turns out to be a reasonably good Category 3 phone cable.

We purchased two Cisco 828 DSL modems, and set them up thus. On the master:

` bridge irb ! interface Ethernet0 no ip address bridge-group 1 hold-queue 100 out ! interface ATM0 no ip address no atm ilmi-keepalive dsl equipment-type CO dsl operating-mode GSHDSL symmetric annex A dsl linerate AUTO bridge-group 1 pvc 0/35 ubr 2312 ! !
interface BVI1 ip address 192.168.44.1 255.255.255.0 ! ip forward-protocol nd ip forward-protocol spanning-tree no ip http server

bridge 1 protocol ieee bridge 1 route ip `

And on the slave end:

bridge irb ! interface Ethernet0 no ip address bridge-group 1 hold-queue 100 out ! interface ATM0 no ip address no atm ilmi-keepalive dsl equipment-type CPE dsl operating-mode GSHDSL symmetric annex A dsl linerate AUTO bridge-group 1 pvc 0/35 ubr 2312 encapsulation aal5snap ! ! interface BVI1 ip address 192.168.44.2 255.255.255.0 ! ip forward-protocol nd ip forward-protocol spanning-tree ! bridge 1 protocol ieee bridge 1 route ip

And we have a working 2mbps link between buildings.

The one downside we ran into is that this property is so far from the ISP central office that the ADSL signal on their main phone line is very, very weak. The SHDSL signal is very, very strong. We had to lower the line rate of the SHDSL to get the link budget low enough to back off the power.

We ended up lowering the rate to 1032 kbps, which allowed the ADSL just enough wiggle room (moving from 6dB to 8dB of noise margin!) to sync at 1 mbps, rather than 288 kbps.

Resetting the error bit in ext3

A server I had to work on over the weekend has several Very Large Filesystems – checks take about an hour, and every hour of downtime means angry customers.

The date was being reset badly, and making EXT3 throw an inconsistency check, because the time of the last mount was after the current time. It’d set the “filesystem has errors” bit on the filesystem, making a disk check mandatory.

Eventually, I reset the filesystem’s error bit, so that it would ignore the trivial error:

debugfs -w -R 'set_super_value status 1' /dev/diskname tune2fs -T now /dev/diskname

Status 1 is “clean, no errors”

Danger, Will Robinson, if you’re doing this to bypass serious errors, but in this trivial case, it saved several hours of downtime.

Yup. Exactly.