The info cluster synchronizes files between the different web servers with the wwwpush command (which is called when using the -push
flag with wwwuinst or wwwdinst, the tool used to install files). The same mechanism is used to synchronize FTP files, but for simplicity this document will only refer to the web servers.
The main program used to synchronize files is rdist, a UNIX program. The host info.cac.washington.edu ("info" in this document) is the rdist client and the web servers are the rdist servers (because the rdist command is initiated on the host info). Much terminology specific to rdist is used in this document; the terms are explained in the rdist documentation.
Usually files and directories are pushed by content developers, but all directories are automatically pushed every morning, forcing everything to be in sync.
The steps involved in using the wwwpush command are:
The major steps are described in detail below.
The locking mechanism on info affects both files and directories. If a directory is locked, then all files under that directory are implicitly locked too. If a file is locked, then any of its parent directories are also implicitly locked.
For example, if the path world/webinfo/behind/ is locked, then a request to lock world/webinfo/behind/push.html will need to wait until the directory is unlocked. Likewise, if world/webinfo/behind/push.html is locked, a request to lock world/webinfo/behind/ will not be granted until the file is unlocked.
To tell rdist what files to send to the web servers, the push mechanism builds one or more distfiles. To form a basis for these distfiles, info contains a skeleton distfile named distfile.skel. For the web servers, there are several distfile.skel files which are used to build distfiles to the different kinds of web servers; if a push command involves files which are not used by a particular distfile.skel, that file is skipped.
The distfile.skel has a list of files which should not be sent to that particular server. When distfiles are built, files to push which are in the exception list (or match the exception patterns) are not added as source files. Target hostnames are also not added, since those will be defined on the command line. The generated distfiles are stored in a temporary area with unique names (so multiple push processes don't overwrite each other's distfiles).
To prevent info from doing too many rdists at one time, info has an rdist locking mechanism which limits the number of push processes. Once a process is given an rdist slot, it can actually start as many rdist processes as it wants, so the maximum number of slots needs to be chosen to take this into account.
While the rdists can be done serially, doing them in parallel greatly increases performance, since much of the time rdist on info will be waiting for a response from the remote web server (which is either reading its directory and comparing files, or writing a new or updated file).
To further improve performance, multiple hosts are used to update the web servers, and each of these push servers runs parallel rdists.