Single Point of Failure

To solve the problem of a single point of failure (which is what we inevitably have with the RMI hub and NFS), is to have the same software installed and configured on two hosts, where only is actually running it. A heart beat is running to check the availability of that service. In case it is down, the heart beat will start the service on the other (backup) host.

Included in the heart beat/fail over solution, should also be a virtual IP (often referred to as VIP) that the host running the critical software has. All servers that uses the service, relates only to the VIP. In case the host goes down and the heart beat starts the service on the backup host, the VIP is moved from the primary host to the backup host.

This way, the other servers will not have to change their configuration in any way. They will normally only lose their current transactions/connections (which were made to the server that went down) and operation will be resumed as normal with the consecutive requests/transactions, which then are handled by the backup host.