June 20th, 2015

Port numbers and URLs

Today someone asked on the node.js mailing list why the URL that Express.js gave them to access their application had a port number in it, and if they could get rid of it (since other sites don’t have it.)

My explanation is this:

There are some interesting details to this!

Each service on the Internet has a port assigned to it by a group called IANA. http is port 80, ssh is 22, https is 443, xmpp is 5222 (and a few others, because it’s complicated), pop3 is 110 and imap is 143. If the service is running on its normal port, things don’t usually need to know the port because it can just assume the usual one. In http URLs, this lets us leave the port number out – http://example.org/ and http://example.org:80/ in theory identify the same thing. Some systems treat them as ‘different’ when comparing, but they access the same resource.

Now if you’re not on the default port, you have to specify – so Express apps in particular suggest you access http://localhost:8080/ (or 3000 – there’s a couple common ports for “this is an app fresh off of a generator, customize from here”). This is actually just a hint – usually they listen to more than localhost, and the report back for the URL is actually not very robust, but it works enough to get people off the ground while they learn to write web services.

If you run your app on port 80, you won’t need that.

However!

Unix systems restrict ports under 1024 as reserved for the system – a simple enough restriction to keep a user from starting up something in place of a system service at startup time, in the era of shared systems. That means you have to run something as root to bind port 80, unless you use special tools. There’s one called authbind that lets you bind a privileged port (found most commonly on Debian-derived Linuxes), one can call process.setuid and process.setgid to relinquish root privilege after binding (a common tactic in classic unix systems), though there’s some fiddly details there that could leave you exposed if someone manages to inject executable code into what you’re running. And finally, one can proxy from a ‘trusted’ system daemon to your app on some arbitrary port – nginx is a popular choice for this, as are haproxy, stunnel and others.

Now as to why it’s just a hint: the problem of an app figuring out its own URL(s) is actually very hard, unsolvable often even in simple cases, given the myriad of things we do to networking – NAT and proxies in particular confuse this – and that there’s no requirement to be able to look up a hostname for an IP address, even if the hostname can be looked up to get the IP address. None of this matters for localhost though, which has a nice known name and a nice known IP and most people do development on their own computers, and so we can hand-wave all this complexity away until later, after someone has something up and running.