Geographically Distributed Web-Server Systems


GeoWeb prototype

Distributed Web systems can use some request redirection mechanism as a second-level dispatching, because the DNS dispatching has limited control on offered load. We have implemented a geographically distributed Web-server system, where request dispatching takes place in two stages and is based on DNS dispatching and HTTP redirection.
Our prototype realizes a fully distributed control scheme, where both the decisions for the activation of the redirection process and for the localization of the destination server are carried out by the Web servers. Indeed, on the basis of simulation studies and qualitative observations about the Web system architecture, we found that fully distributed schemes achieve a performance comparable to that of the centralized counterparts, and present a much lower system complexity.

DNS dispatching is carried out by a simple round-robin algorithm. The real load distribution is achieved through some redirection mechanism, that each server activates autonomously.
The key features of our prototype are:

Figure 1 depicts the main system components of the GeoWeb prototype. The authoritative DNS server performs only the first-level dispatching among the Web servers, without communicating with the software components located on the Web servers that manage the redirection process autonomously. The A-DNS executes the task of assigning address requests to the Web servers through a round-robin policy; in our prototype, we use BIND 9.2.1 as DNS server software.
In the Web server, the redirection schemes require a server load monitor and a redirection component. To this purpose, our prototype uses a modified Apache server. The monitor tracks the load of the server resources and sends this information to the redirection component of its server. The redirection component implements the activation, request selection and node localization policies, as detailed in Figure 2.

architecture.jpg
Figure 1: High-level system architecture.

We have implemented the GeoWeb prototype incorporated in the Apache HTTP server. We have developed our prototype based on Apache for a number of reasons:


The GeoWeb redirection functionalities have been realized in four distinct Apache modules. These modules are implemented using the API and the module interface provided by Apache. The Apache's module interface can let us insert several processing handlers into the Apache server's request processing cycles, which is divided in several phases. The GeoWeb modules then use those handlers to process incoming requests. Specifically, four distinct Apache modules have been realized for: In addition to these modules, two external programs load_monitor and site_parser have been realized in C language. The first provides a load monitor of the Web server, while the latter provides some information needed by the request-aware request selection policies.
load_monitor evaluates the utilization of the server's CPU(s) and disk(es) by analyzing the /proc/stat Linux system files. site_parser pre-processes the document tree of the Web site to find out the mean number of embedded objects in the pages and the mean page size. This information can be used by mod_selection on the basis of the chosen request selection policy. Figure 2 shows a high-level DFD diagram of the system architecture with the four modules.

modules.jpg
Figure 2: DFD diagram of the redirection system.

In our prototype, we use the redirection mechanism provided by the HTTP protocol, which allows a Web server to respond to a client request with a 302 status code in the response header. This code instructs the client to resubmit its request to another node. The main drawback of HTTP redirection is that it introduces an extra round-trip time to the request processing, as every redirection requires the client to initiate a new TCP connection with another node. This extra round-trip time increases the network component of the response time and implies that redirection must be used with parsimony and caution. Nevertheless, it is not automatic that a redirected page experiences a slower response time, because the increased network time may be compensated by a sensible reduction in the response time component due to the server, especially when the first contacted Web server is highly loaded.
To avoid multiple redirections of the same request, when constricting the new URL to be placed into the location field in the HTTP response header, the localization module also inserts in the URL a notification of the occurred redirection. Therefore, as first step the activation module checks the required URL to find out whether the request has been already redirected or not. The redirecting server modifies the new URL provided in the location field of the response header not only rewriting the hostname with the IP address of the destination server, but also adding a proper string in the URL pathname (a <redirected> string), in such a way to notify the destination server that the request has already been redirected. As an example, let us assume that the request http://www.site.org/papers.html has been selected for redirection from server 160.80.85.38 to server 155.185.54.131; then, the redirecting server inserts into the location field the URL http://155.185.54.131/<redirected>/papers.html.


Overview | Publications | People | Prototype | Related links

http://www.ce.uniroma2.it/geoweb/prototype.html
Last updated: October 25, 2002