Caching Servers

For the top/front layer of your server architecture, i.e. your cache servers, it is also important to have a clear understanding of the scalability issues of TCP connections.

The first thing you may notice when putting load on your system, is that the cache server process runs out of file handles (if the start script does not increase the right kernel parameter). This is because the OS will use one file handle per connection (as this is what applications can relate to) and the default number of handles a given user process can create, (on many systems) is 1024. However, this is easily remedied with e.g. the ulimit -u command (on at least Linux and FreeBSD) or persistently using /etc/sysctl.conf (on Linux and FreeBSD) or /etc/system (on Solaris). The number of file handles can be cranked up to many hundred thousand and thus has no real limitation.

What is more interesting, is that the OS when creating the connection will create a connection from a local port to an anonymous port on the requesting host:

cache01:2323 -> ohterhost:1237

The port numbers are defined in the TCP protocol to be an unsigned 16bit number, yielding a maximum of 65535 ports available. Luckily, the local port number must not be unique across requesting hosts. Thus, the same port can be used for multiple connections to different IPs:

cache01:2323 -> ohterhost:1237
cache01:2323 -> yetanotherhost:4545

This means, that the maximum theoretical number of connections a cache server can handle, is (65535 - <number of ports reserved for system services, normally 1024>) * incoming IPs. For this to work as desired, it is important that the load balancer in front of the cache is fully transparent, exposing the incoming IP of the request to the cache server(s) and not the IP of the balancer itself.

To illustrate the last point, given the use case where three users are visiting your web site:

user1:2213 -> load-balancer:80 -> cache01:80
user2:1212 -> load-balancer:80 -> cache01:80    
user3:5333 -> load-balancer:80 -> cache01:80

It is important that that cache01 either sees the IPs of the requesting clients (user01, user02 and user03) and not the IP of the load balancer, this is the optimal situation, or different IPs from of load balancer.

The latter solution is somewhat a hack, but it also works; just adding an additional interface/IP to the load balancer and/or the cache server will generate an additional set of origin host/port and destination host/port combinations, which will also make the cache server handle more than 65535 connections.

However, if you can, go with the first option and make the load balancer transparent. Your cache server will then be able to handle as many TCP connection as your load balancer can pass on (given that your OS kernel manages to allocate and recycle enough TCP connections fast enough, of course).