Node.js Streams

Making an object that speaks the node.js streams interface is surprisingly difficult.

There’s a fair number more interfaces than meet the eye:

You have the interplay of stream.readable and stream.resume()

You have the fact that streams speak in both Buffers and Strings.

sys.pump doesn’t relay errors, so you have to attach handlers to the right objects – I’m not sure if that one’s a problem yet or not.

Stewed eggplant with sweet rice (vegan!)

Eggplant are in season. We’re eating them stewed.

Chop two large eggplant into half inch cubes.

Chop a large onion. Fry it in a generous portion of olive oil.

Add herbs. Tonight’s: oregano, a head of garlic, a bit of paprika. Last night’s: a touch of cinnamon, paprika, oregano, dill. Fry them into the onion, then add the eggplant. Let the eggplant brown slightly, then add two large cans of tomatoes, or a couple pounds of fresh tomatoes. Add a spoonful of sugar, possibly some balsamic vinegar.

Let this cook down. It’ll stick slightly. If so, it’s caramelizing, and that’s just what you want. Don’t let it stick too badly, but it should sizzle when you stir it down to the bottom of the pot.

Let it cook until it’s a thick paste. It won’t be smooth, but it’ll be a really rich spread.

Cook rice, I used a short-grain white rice.

Rehydrate some raisins. Drain them.

Fry a half an onion in a frying pan. Let it start to caramelize and brown. Add a half teaspoon of tumeric and a teaspoon of paprika.

Add a tablespoon of sugar. Let it caramelize slightly. Add the rice, the raisins, and a tablespoon of poppy seeds. Salt just a little.

Serve side by side, let the flavors contrast. The intense richness with the velvety texture of the eggplant, with the sweet chewiness of the rice and raisins. The bright yellow-orange of the rice with the deep red of the eggplant and tomatoes.

Statistics from mail filters

Entities: connections, messages, sending IPs, destination email addresses and domains, sending email addresses and domains

  • RBL hits per entity
  • Minimum, maximum, average, mean, deviation
  • Bad RCPTs per entity
  • Total RCPTs per entity

I’m sure there’s more, this post will be edited as I think of them.

You can detect VERP senders by having a high correlation of sending domain and receiver email address.

You can detect dictionary attacks by having a high correlation of sending IP, domain or receiver email address and receiving domain.

Mail filter actions

Most mail filters get something major wrong. Most use an ordered list of actions, but limited to narrow scopes, in the order that they occur in SMTP: first check the sender, then the receivers, then check the content.

Mail filter plugins should be run first in order of what phase of processing they need to be in, but evaluated in order of finality of their decision. Check RBLs that outright block hosts first, then ones that are used to decide to quarantine. Then check for viruses, things that will get a message outright rejected or quarantined, then check spam filters.

Execute in parallel, in fact. Many checks involve waiting on networks, disks and other resources, so there’s no reason not to set several actions off at once and wait for completion.

There are several sets of actions that happen: responses to the SMTP client that’s sending us the message, and internal processing of the message, logs, notices to receivers about exceptional events. Once a message is accepted at SMTP time, we no longer have the option to bounce it: if it disappears into the aether, it had better really be junk, because nobody will know what happened to it. Each stream of actions is independent: rules will continue to be evaluated until all specified actions have been satisfied. (smtp, receiver, message, system)

The actions one might want: tempfail, accept, reject, notify, drop, log, record, add-header, add-footer, filter-message, redirect, quarantine, and continue.

The redirect and quarantine actions merely change the destination of the message, and don’t stop processing.

I figure group them numerically, with the highest priority overriding any lower priorities. Let groups be ORed together. Stop when you have a definite answer.

There are two kinds of actions: on`` actions react to the conditions of the group -- if a whitelist matches or not, if a spamfilter returns 'spam', 'not spam' or 'unsure'. ``on .. when actions are triggered when the condition of the when clause matches as well, forming a primitive boolean AND while still respecting an idea of priorities.

`

defaults { on error tempfail all; on success continue all; on any log all; }

group virus { checkcontent clamd; on match reject all, log system, log receiver; }

group user-whitelist { check whitelist; on match accept all; on match when virus match notify receiver; }

group { checkrbl b.barracudacentral.com; checkrbl b.spamcop.org; on match reject all, log system; }

group { checkcontent lmtp:///tmp/spamd.sock; checkcontent blacklistedwords; on spam accept smtp, quarantine message; }

finally { on any accept all; } `

A message comes in from 127.0.0.2: RBLs come up saying to block it. Because no higher rule will accept it, it gets rejected before DATA. The connection attempt is logged to the user, but no message is accepted at all.

A virus-bearing message comes in from 1.2.3.4, from a white-listed sender: RBLs don’t reject it, not being a listed IP. The SMTP connection gets as far as DATA, and the virus scanner is fired off, and returns a ‘virus’ response. The message is rejected on the SMTP side, a notice is sent to the receiver with the details. The whitelist is lower priority than virus scanner, so the message is still rejected. However, since there is also an action aimed at the receiver, that event fires and a notice is sent to the receiver of the message. At this point, evaluation stops since there are no more actions that could happen.

Thoughts and suggestions are welcome.

Mail filter extensibility

The biggest internal requirement that I have for a new mail filter setup is extensibility. The actual decision as to what is and is not spam needs to be left up to modules.

I hesitate to write a system that is a suite of full ACLs, like Exim or Postfix’s access controls. Postfix’s are barely flexible enough to work at all, and Exim’s are so overwhelming and yet limited that you have to be a programmer to write a system that’s not going to break or lose mail, and a clever programmer at that.

Every technique for filtering has a natural place in the flow of things: RBLs are early, at HELO or RCPT TO time; Learning filtering must come after DATA has been received, and could either stream or receive the message as a single dump. Filtering at HELO time should be rare: you can’t check a per-destination whitelist that early. You have to wait for RCPT TO, and in fact, many senders may retry again and again and again if you reject at HELO instead of RCPT TO.

So each plugin receives some part of the SMTP-time data: early ones get IPs and connection-related information, and later ones get the full message data.

Plugins essentially distill their input into a status: “good”, “bad”, “not sure”

Mail filter requirements

It’s time to update the spam filter at The Internet Company again.

I’m getting a lot of feedback from users of both my system and another I administer that they need several different things in a spam filter.

My users need:

  • The ability to retrieve a filtered message. Even if it’s rejected, in most cases, being able to fetch it from a quarantine is necessary. Some things can be hard-rejects, like virus-infected mails and things from very obvious spam sources, but the grey area needs to be very wide.
  • Some degree of control over what techniques are used: degree of quarantining, whether blacklists are used, and whether they reject or merely quarantine mail
  • Whitelisting, both by individual user and by domain.
  • Blacklisting, both by individual user and by domain, including whether to quarantine or reject.
  • Ability to retrain a learning filter while still using a POP3 mail client. This means a ‘signature’ with saved fulltext of the message like DSPAM or CRM114’s mailreaver do, so mail can be forwarded back altered by mail clients with no interest in preserving formatting like Microsoft Outlook, or so that there can be a web interface to retrain.

The overall themes here are ‘user control’ and ‘ability to retrieve a missed message’. Spam filters can be highly accurate in practice, with well-trained users who understand how the filters work, but most aren’t accurate enough or careful enough while training to be able to reject mail based on a learning filter alone. Business users could lose a thousand dollars or more on certain emails from previously unknown senders, so the ability to review and recover from the filter’s decisions is very important.

Tomato-tahini pizza sauce

This rocked on a red-pepper-and-onion pizza last night.

Fry a half dozen cloves of garlic, chopped fine, in some oil or fat.

add a can of tomato paste, and let it caramelize around the edges, stirring occasionally over a couple minutes.

Add a a cup and a half of chicken or turkey stock, preferably the gelatinous kind.

Add two or three tablespoons of tahini

Add some balsamic vinegar and salt to taste.

Add some hot red pepper paste, or a little red pepper powder. (I used both, and the pepper powder was habanero. Hot and delicious.)

Let it cook down until thick. With my turkey stock, that doesn’t take long.

Makes a fantastic pizza.

Tonight's creative output

Light is painfully bright, after being in room for so long. Door opens, a slight sucking noise, pressure matches outside.

There is nothing to see. Just blinding whiteness, sunlight glares fiercely.

No alarms sound. Hum of generators, Gentle whistle of air scrubbers, all quiet. No noise.

Light fades as irises tighten, world comes into focus, slowly, detail emerges.

Rubble is everywhere. Almost everything is ashy white, scorched and scorched again, until even black char marks are burned away in intense heat.

A little more in bright light, and shadows snap into place. Faint against burned objects, but there. Grey-white shadow, hints of what things had been before.

Silence.

There is no breeze. Sky is brilliant, cloudless blue. Sun feels white hot, tempting to look. Too much. Too much heat.

Blink.

Blink.

Stretch, as if waking from slumber.

Move rubble.

Glad that door opens inward. So much right there, it would not have moved if it opened outward.

Drop rubble. Silence shatters. A clatter. Gone again. More silence.

Sun beats down from overhead. Skin prickles.

Another piece of rubble. Set gently down, more slowly this time.

Blink.

Blink.

Just rubble, heaps large and small, a sort of pattern. Maybe like cells. Cells, only stone and concrete and large. Too large.

A loud, metal bang. Maybe close.

Turn, but see nothing.

A clatter of rubble being moved. Definitely close.

Blink. Still bright.

Figures stand on rubble. Not far, just as far as a body. Body. A body of cells. Reach, reach for figures. Too far. More than a body can reach.

Move another piece. More noise.

Silence.

Figures. Two. Eyes. Hands. Feet. Two figures. Two figures. Many hands. Many feet.

Long sleep. Not sleep. Long wait. A long wait, then brightness, then everything is new again. Now one and one is two again, two and two is four. Bodies have eyes, eyes see sky. Sun is bright. Noon sun. Any noon. No dates now. No time. Just days. The world is new again.

The world is new again.

An HTML5 parser for Javascript

I’ve been in the process of writing a port of the HTML5lib HTML5 parser to Javascript, at the moment, specifically node.js.

The parsing algorithms laid out in the spec are really excellent: The fallbacks for various cases where tags are omitted are mostly elegant and entirely clever. Supporting fragments of XML languages like SVG and MathML inline in HTML is excellent – with any luck, we’ll see a lot more rich vector graphics in web pages now, without dropping down to a box full of Flash.

The parser is currently a bit slow, and I’ll blog about why soon – suffice it to say that V8’s string-handling leaves a lot to be desired when you’re poking at numerous, tiny pieces of them, rather than larger manipulations.

Anyway, check it out.

Back-to-Back Cisco 828 SHDSL Routers

Something I’ve struggled with often in my career is getting a link between two buildings in the same property – a house to an office, a house to an outbuilding, two offices, two apartment – and having it be non-line-of-sight, so wireless links start getting expensive (900mhz equipment runs $500 per end if you’re getting them prefab; even assembling them yourself runs $250 per end, plus time.)

This particular property already had phones running from the house to the barn/office, so I knew there was some sort of cable between them – turns out to be a reasonably good Category 3 phone cable.

We purchased two Cisco 828 DSL modems, and set them up thus. On the master:

` bridge irb ! interface Ethernet0 no ip address bridge-group 1 hold-queue 100 out ! interface ATM0 no ip address no atm ilmi-keepalive dsl equipment-type CO dsl operating-mode GSHDSL symmetric annex A dsl linerate AUTO bridge-group 1 pvc 0/35 ubr 2312 ! ! interface BVI1 ip address 192.168.44.1 255.255.255.0 ! ip forward-protocol nd ip forward-protocol spanning-tree no ip http server

bridge 1 protocol ieee bridge 1 route ip `

And on the slave end:

bridge irb ! interface Ethernet0 no ip address bridge-group 1 hold-queue 100 out ! interface ATM0 no ip address no atm ilmi-keepalive dsl equipment-type CPE dsl operating-mode GSHDSL symmetric annex A dsl linerate AUTO bridge-group 1 pvc 0/35 ubr 2312 encapsulation aal5snap ! ! interface BVI1 ip address 192.168.44.2 255.255.255.0 ! ip forward-protocol nd ip forward-protocol spanning-tree ! bridge 1 protocol ieee bridge 1 route ip

And we have a working 2mbps link between buildings.

The one downside we ran into is that this property is so far from the ISP central office that the ADSL signal on their main phone line is very, very weak. The SHDSL signal is very, very strong. We had to lower the line rate of the SHDSL to get the link budget low enough to back off the power.

We ended up lowering the rate to 1032 kbps, which allowed the ADSL just enough wiggle room (moving from 6dB to 8dB of noise margin!) to sync at 1 mbps, rather than 288 kbps.

Resetting the error bit in ext3

A server I had to work on over the weekend has several Very Large Filesystems – checks take about an hour, and every hour of downtime means angry customers.

The date was being reset badly, and making EXT3 throw an inconsistency check, because the time of the last mount was after the current time. It’d set the “filesystem has errors” bit on the filesystem, making a disk check mandatory.

Eventually, I reset the filesystem’s error bit, so that it would ignore the trivial error:

debugfs -w -R 'set_super_value status 1' /dev/diskname tune2fs -T now /dev/diskname

Status 1 is “clean, no errors”

Danger, Will Robinson, if you’re doing this to bypass serious errors, but in this trivial case, it saved several hours of downtime.

Yup. Exactly.

Defeating the fake Antivirus

This applies to things like “Windows Antivirus Pro”, “Antivirus 2010”, “Antivirus Live”, “Antivirus Pro 2009”, among others. They’re a dime a dozen and the names change often.

There’s two ways these nasty things work: some install a module that keeps any but a few programs from starting in the first place, others close down programs they don’t want to let you run after the fact.

Often, the first kind can be defeated by right-clicking the program you want to run and clicking “Run as”, then selecting this user and un-checking the box to protect your computer from the program.

That bypasses the weakest of the modules that won’t let programs run.

The others can usually be defeated by renaming the executable you’re trying to run to iexplore.exe or explorer.exe.

The trick is to then shut down the fake antivirus that’s blocking removal tools. I usually start by running the Windows task manager, taskman.exe, and shutting down as much as I can – shut down the most random-looking process names first, then if there’s nothing that’s not part of windows left, shut down explorer.exe. Within task manager, you can run things like the installers for anti-malware tools, web browsers, etc. If you’ve got the first sort of blocker, you’ll have to rename each executable to get ‘em to run, if running from Run As didn’t do the trick.

I usually open up regedit and look in HKLMSoftwareMicrosoftWindowsCurrentVersionRun and HKCUSoftwareMicrosoftWindowsCurrentVersionRun for anything I don’t recognize; temporary folders especially. Nothing in there is critical, so remove things and sort them out later if you’re not sure what they are.

Next look in HKLMSoftwareMicrosoftWindows NTCurrentVersionWinlogon. Check the userinit value for paths other than to userinit.exe. Remove them if so. Look in the subkey notification and look for modules with odd names. Antiviruses usually show up here, as well as Windows Genuine Advantage modules. Sometimes something obvious shows up here. Google if you have to.

You may have to run netsh winsock reset to get network access running again.

Once you can get something like Malware Bytes Anti-Malware running and scanning, you’re usually golden. Do a quick scan, then a full scan. Get your preferred antivirus installed and up to date.

With any luck, you’ve defeated the fake antiviruses.

Sea Vegetable and Garlic Dressing

1/2 cup canola oil 2 tablespoons balsamic vinegar 1 tablespoon lime juice 1 tablespoon ume plum vinegar 1 tablespoons soy sauce 1 tablespoon kelp granules 1 tablespoon dulse flakes 1-2 tablespoons brown rice syrup 4-5 cloves of garlic, chopped fine

Just mix and shake. The syrup and carageenan in the kelp together make a decent emulsifier, so this separates more slowly than other vinaigrettes.

KB977165 causes a blue screen

Apparently it’s quite common for the fix to MS010-15, that is Tuesday’s KB977165 to cause a blue-screen of death after it’s installed.

The computer reboots in an endless loop, and if you start it up disabling the reboot after crash, you see a STOP error:

Page_Fault_In_Non-Paged_Area

STOP 0x00000050 (0x80097004,0x00000001,0x80516103,0x00000000)

The security fix fixes one of the longest-standing bugs in the Windows kernel, a seventeen year old bug that’s recently been used in the Chinese attacks on Google, among other attacks.

A prime cause of the crash is being infected with a virus that relies on the old bug. Viruses like this live in device drivers, particularly ATAPI.SYS (the CD ROM device driver)

Fixing the problem involves uninstalling KB977165 while started into the rescue console from the Windows CD, and replacing ATAPI.SYS with the stock copy from the CD:

cd windows$NTUninstallKB977165$spuninst batch spuninst.txt cd windowssystem32drivers expand d:i386atapi.sy_ exit

Do a virus scan afterward, and re-install KB977165. The Virus ESET Nod32 detects is Win32/Olmarik.SJ in my case; others may have similar or the same symptoms and fix.