GeoWeb prototype
Distributed Web systems can use some request redirection mechanism
as a second-level dispatching, because the DNS dispatching has
limited control on offered load.
We have implemented a geographically distributed Web-server system,
where request dispatching takes place in two stages and
is based on DNS dispatching and HTTP redirection.
Our prototype realizes a fully distributed control scheme,
where both the decisions for the activation of the redirection
process and for the localization of the destination server
are carried out by the Web servers.
Indeed, on the basis of simulation studies and qualitative
observations about the Web system architecture, we found that
fully distributed schemes achieve a performance comparable
to that of the centralized counterparts, and present
a much lower system complexity.
DNS dispatching is carried out by a simple round-robin algorithm.
The real load distribution is achieved through some redirection mechanism,
that each server activates autonomously.
The key features of our prototype are:
- a distributed policy for the activation
of the redirection process;
the activation decision policy we consider is threshold-based
and uses local load information.
In particular, each Web server activates redirection only when its load
exceeds a given threshold.
- a distributed policy for the localization of the server
to whom redirecting the request; we have currently implemented only
stateless policies such as random and round-robin that do
not require any information exchange among the servers in the system.
- a request selection policy that selects
the requests for an entire Web page that need to be redirected.
We have implemented request-blind policies
that redirect all or a random subset of requests reaching the server
when the redirection mechanism is activated, as well as
request-aware policies that consider the characteristics of the
HTTP request content (in particular, the page size and the
number of objects composing the page).
Figure 1 depicts the main system components of the GeoWeb prototype.
The authoritative DNS server performs only the first-level dispatching
among the Web servers, without
communicating with the software components located on the Web servers
that manage the redirection process autonomously.
The A-DNS executes the task of assigning address requests
to the Web servers through a round-robin policy; in our prototype,
we use BIND 9.2.1
as DNS server software.
In the Web server, the redirection schemes require a server load
monitor and a redirection component.
To this purpose, our prototype uses a modified
Apache server.
The monitor tracks the load of the server resources and sends this
information to the redirection component of its server.
The redirection component implements
the activation, request selection and node localization policies,
as detailed in Figure 2.
Figure 1: High-level system architecture.
We have implemented the GeoWeb prototype
incorporated in the Apache HTTP server.
We have developed our prototype based on Apache for a number of reasons:
- Apache is the currently most used Web server software;
the recent Netcraft
Web Server Survey
found that over 60% of the Web sites are using Apache.
- Apache source code is freely available.
- Apache provides Web server API that we can use to extend the
functionality of the Web server itself by linking new modules
directly to the server executable.
The GeoWeb redirection functionalities have been realized in four distinct
Apache modules. These modules are implemented using the API and
the module interface provided by Apache.
The Apache's module interface can let us insert several processing
handlers into the Apache server's request processing cycles, which is divided
in several phases.
The GeoWeb modules then use those handlers to process incoming
requests.
Specifically, four distinct Apache modules have been realized for:
- the activation process (mod_activation),
- the selection decision (mod_selection),
- the localization decision (mod_localization),
- the construction of the appropriate HTTP response header in the case
that the request has to be redirected (mod_redirection)
.
In addition to these modules, two external programs load_monitor
and site_parser have been realized in C language.
The first provides a load monitor of the Web server, while the latter
provides some information needed by the request-aware request selection policies.
load_monitor evaluates the utilization of the server's
CPU(s) and disk(es) by analyzing the /proc/stat Linux system files.
site_parser pre-processes the document tree of the Web site to
find out the mean number of embedded objects in the pages and the
mean page size. This information can be used by mod_selection
on the basis of the chosen request selection policy.
Figure 2 shows a high-level DFD diagram of the
system architecture with the four modules.
Figure 2: DFD diagram of the redirection system.
In our prototype, we use the redirection mechanism provided by the HTTP protocol,
which allows a Web server to respond to a client request
with a 302 status code in the response header.
This code instructs the client to resubmit its request to another node.
The main drawback of HTTP redirection is that it introduces an
extra round-trip time to the request processing, as every redirection
requires the client to initiate a new TCP connection with another node.
This extra round-trip time increases the network component of the response time
and implies that redirection must be used with parsimony and caution.
Nevertheless, it is not automatic that a redirected page experiences a slower
response time, because the increased network time may be compensated
by a sensible reduction in the response time component due to the
server, especially when the first contacted Web server is highly loaded.
To avoid multiple redirections of the same request, when constricting
the new URL to be placed into the location field in the HTTP response header,
the localization module also inserts in the URL
a notification of the occurred redirection.
Therefore, as first step the activation module checks the required URL
to find out whether the request has been already redirected or not.
The redirecting server modifies the new URL provided in the location field
of the response header not only rewriting the hostname with the IP address
of the destination server, but also adding a proper string
in the URL pathname (a <redirected> string),
in such a way to notify the destination server
that the request has already been redirected.
As an example, let us assume that the request
http://www.site.org/papers.html
has been selected for redirection from server 160.80.85.38
to server 155.185.54.131; then, the redirecting server inserts into
the location field the URL
http://155.185.54.131/<redirected>/papers.html.
Overview |
Publications |
People |
Prototype |
Related links
http://www.ce.uniroma2.it/geoweb/prototype.html
Last updated: October 25, 2002