Set up a Web server cluster in 5 easy steps
Get up and running with the Linux Virtual Server and Linux-HA.org's Heartbeat
Summary: Construct a highly available Apache Web server cluster that spans multiple physical or virtual Linux® servers in 5 easy steps with Linux Virtual Server and Heartbeat v2.
Date: 22 Aug 2007
Activity: 7725 views
Comments: 0 (Add comments)
Spreading a workload across multiple processors, coupled with various software recovery techniques, provides a highly available environment and enhances overall RAS (Reliability, Availability, and Serviceability) of the environment. Benefits include faster recovery from unplanned outages, as well as minimal effects of planned outages on the end user.
To get the most out of this article, you should be familiar with Linux and basic networking, and you should have Apache servers already configured. Our examples are based on standard SUSE Linux Enterprise Server 10 (SLES10) installations, but savvy users of other distributions should be able to adapt the methods shown here.
This article illustrates the robust Apache Web server stack with 6 Apache server nodes (though 3 nodes is sufficient for following the steps outlined here) as well as 3 Linux Virtual Server (LVS) directors. We used 6 Apache server nodes to drive higher workload throughputs during testing and thereby simulate larger deployments. The architecture presented here should scale to many more directors and backend Apache servers as your resources permit, but we haven't tried anything larger ourselves. Figure 1 shows our implementation using the Linux Virtual Server and the linux-ha.org components.
Figure 1. Linux Virtual Servers and Apache
As shown in Figure 1, the external clients send traffic to a single IP address, which may exist on any of the LVS director machines. The director machines actively monitor the pool of Web servers they relay work to.
Note that the workload progresses from the left side of Figure 1 toward the right. The floating resource address for this cluster will reside on one of the LVS director instances at any given time. The service address may be moved manually through a graphical configuration utility, or (more commonly) it can be self-managing, depending on the state of the LVS directors. Should any director become ineligible (due to loss of connectivity, software failure, or similar) the service address will be relocated automatically to an eligible director.
The floating service address must span two or more discrete hardware instances in order to continue operation with the loss of one physical machine. With the configuration decisions presented in this article, each LVS director is able to forward packets to any real Apache Web server regardless of physical location or proximity to the active director providing the floating service address. This article shows how each of the LVS directors can actively monitor the Apache servers in order to ensure requests are sent only to operational back-end servers.
With this configuration, practitioners have successfully failed entire Linux instances with no interruption of service to the consumers of the services enabled on the floating service address (typically http and https Web requests).
You can duplicate our configuration using an entirely open source software stack consisting of Heartbeat technology components provided by linux-ha.org, and server monitoring via mon and Apache. As stated, we used SUSE Linux Enterprise Server for testing our configuration.
All of the machines used in the LVS scenario reside on the same subnet and use the Network Address Translation (NAT) configuration. Numerous other network topographies are described at the Linux Virtual Server Web site (see Resources); we favor NAT for simplicity. For added security, you should limit traffic across firewalls to only the floating IP address that is passed between the LVS directors.
The Linux Virtual Server suite provides a few different methods to accomplish a transparent HA back-end infrastructure. Each method has advantages and disadvantages. LVS-NAT operates on a director server by grabbing incoming packets that are destined for configuration-specified ports and rewriting the destination address in the packet header dynamically. The director does not process the data content of the packets itself, but rather relays them on to the realservers. The destination address in the packets is rewritten to point to a given realserver from the cluster. The packet is then placed back on the network for delivery to the realserver, and the realserver is unaware that anything has gone on. As far as the realserver is concerned, it has simply received a request directly from the outside world. The replies from the realserver are then sent back to the director where they are again rewritten to have the source address of the floating IP address that clients are pointed at, and are sent along to the original client.
Using the LVS-NAT approach means the realservers require simple TCP/IP functionality. The other modes of LVS operation, namely LVS-DR and LVS-Tun require more complex networking concepts. The major benefit behind the choice of LVS-NAT is that very little alteration is required to the configuration of the realservers. In fact, the hardest part is remembering to set the routing statements properly.
Begin by making a pool of Linux server instances, each running Apache Web server, and ensure that the servers are working as designed by pointing a Web browser to each of the realserver's IP addresses. Typically, a standard install will be configured to listen on port 80 on its own IP address (in other words, on a different IP for each realserver).
Next, configure the default Web page on each server to display a static page containing the hostname of the machine serving the page. This ensures that you always know which machine you are connecting to during testing.
As a precaution, check that IP forwarding on these systems is OFF by issuing the following command:
# cat /proc/sys/net/ipv4/ip_forward
If it's not OFF and you need to disable it, issue this command:
# echo "0" >/proc/sys/net/ipv4/ip_forward
An easy way to ensure that each of your realservers is properly listening on the http port (80) is to use an external system and perform a scan. From some other system with network connectivity to your server, you can use the nmap utility to make sure the server is listening.
Listing 1. Using nmap to make sure the server is listening
Be aware that some organizations frown on the use of port scanning tools such as nmap: make sure that your organization approves before using it.
Next, point your Web browser to each realserver's actual IP address to ensure each is serving the appropriate page as expected. Once this is completed, go to Step 2.
Now you are ready to construct the 3 LVS director instances needed. If you are doing a fresh install of SUSE Linux Enterprise Server 10 for each of the LVS directors, be sure to select the high availability packages relating to heartbeat, ipvsadm, and mon during the initial installation. If you have an existing installation, you can always use a package management tool, such as YAST, to add these packages after your base installation. It is strongly recommended that you add each of the realservers to the /etc/hosts file. This will ensure there is no DNS-related delay when servicing incoming requests.
At this time, double check that each of the directors are able to perform a timely ping to each of the realservers:
Listing 2. Pinging the realservers
Once completed, install ipvsadm, Heartbeat, and mon from the native package management tools on the server. Recall that Heartbeat will be used for intra-director communication, and mon will be used by each director to maintain information about the status of each realserver.
If you have worked with LVS before, keep in mind that configuring Heartbeat Version 2 on SLES10 is quite a bit different than it was for Heartbeat Version 1 on SLES9. Where Heartbeat Version 1 used files (haresources, ha.cf, and authkeys) stored in the /etc/ha.d/ directory, Version 2 uses the new, XML-based Cluster Information Base (CIB). The recommended approach for upgrading is to use the haresources file to generate the new cib.xml file. The contents of a typical ha.cf file are shown in Listing 3.
We took the ha.cf file from a SLES9 system and added the bottom 3 lines (
crm) for Version 2. If you have an existing version 1 configuration, you may opt to do the same. If you are using these instructions for a new installation, you can copy Listing 3 and modify it to suit your production environment.
Listing 3. A sample /etc/ha.d/ha.cf config file
respawn directive is used to specify a program to run and monitor while it runs. If this program exits with anything other than exit code 100, it will be automatically restarted. The first parameter is the user id to run the program under, and the second parameter is the program to run. The
-m parameter sets the attribute
pingd to 100 times the number of ping nodes reachable from the current machine, and the -
d parameter delays 5 seconds before modifying the
pingd attribute in the CIB. The
ping directive is given to declare the PingNode to Heartbeat, and the
crm directive specifies whether Heartbeat should run the 1.x-style cluster manager or 2.x-style cluster manager that supports more than 2 nodes.
This file should be identical on all the directors. It is absolutely vital that you set the permissions appropriately such that the hacluster daemon can read the file. Failure to do so will cause a slew of warnings in your log files that may be difficult to debug.
For a release 1-style Heartbeat cluster, the haresources file specifies the node name and networking information (floating IP, associated interface, and broadcast). For us, this file remained unchanged:
This file will be used only to generate the cib.xml file.
The authkeys file specifies a shared secret allowing directors to communicate with one another. The shared secret is simply a password that all the heartbeat nodes know and use to communicate with one another. The secret prevents unwanted parties from trying to influence the heartbeat server nodes. This file also remained unchanged:
1 sha1 ca0e08148801f55794b23461eb4106db
The next few steps show you how to convert the version 1 haresources file to the new version 2 XML-based configuration format (cib.xml). Though it should be possible to simply copy and use the configuration file in Listing 4 as a starting point, it is strongly suggested that you follow along to tailor the configuration for your deployment.
To convert file formats to the XML-based CIB (Cluster Information Base) file you will use in deployment, issue the following command:
python /usr/lib64/heartbeat/haresources2cib.py /etc/ha.d/haresources > /var/lib/heartbeat/crm/test.xml
A configuration file similar to the one shown in Listing 4 will be generated and placed in /var/lib/heartbeat/crm/test.xml.
Listing 4. Sample CIB.xml file
Once your configuration file is generated, move test.xml to cib.xml, change the owner to hacluster and the group to haclient, and then restart the heartbeat process.
Now that the heartbeat configuration is complete, set heartbeat to start at boot time on each of the directors. To do this, Issue the following command (or equivalent for your distribution) on each director:
# chkconfig heartbeat on
Restart each of the LVS directors to ensure the heartbeat service starts properly at boot. By halting the machine that holds the floating resource IP address first, you can watch as the other LVS Director images establish quorum, and instantiate the service address on a newly-elected primary node within a matter of seconds. When you bring the halted director image back online, the machines will re-establish quorum across all nodes, at which time the floating resource IP may transfer back. The entire process should take only a few seconds.
Additionally, at this time you may wish to use the graphical utility for the heartbeat process, hb_gui (see Figure 2), to manually move the IP address around in the cluster by setting various nodes to the standby or active state. Retry these steps numerous times, disabling and re-enabling various machines that are active or inactive. With the choice of configuration policy selected earlier, as long as quorum can be established and at least one node is eligible, the floating resource IP address will remain operational. During your testing, you can use simple pings to ensure that no packet loss occurs. When you have finished experimenting, you should have a strong feel for how robust your configuration is. Make sure you are comfortable with the HA configuration of your floating resource IP before continuing on.
Figure 2. Graphical configuration utility for the heartbeat process, hb_gui
Figure 2 illustrates the graphical console as it appears after login, showing the managed resources and associated configuration options. Note that you must log into the hb_gui console when you first launch the application; the credentials used will depend on your deployment.
Notice in Figure 2 how the nodes in the cluster, the litsha2* systems, are each in the running state. The system labeled litsha21 is the current active node, as indicated by the addition of a resource displayed immediately below and indented (IPaddr_1).
Also note the selection labeled "No Quorum Policy" to the value "stop". This means that any isolated node releases resources it would otherwise own. The implication of that decision is that at any given time, 2 heartbeat nodes must be active to establish quorum (in other words, a voting majority). Even if a single active, 100% operational node loses connection to its peer systems due to network failure or if both the inactive peers halt simultaneously, the resource will be voluntarily released.
The next step is to take the floating resource IP address and build on it. Because LVS is intended to be transparent to remote Web browser clients, all Web requests must be funneled through the directors and passed on to one of the realservers. Then any results need to be relayed back to the director, which then returns the response to the client who initiated the Web page request.
To accomplish that flow of requests and responses, first configure each of the LVS directors to enable IP forwarding (thus allowing requests to be passed on to the realservers) by issuing the following commands:
# echo "1" >/proc/sys/net/ipv4/ip_forward
# cat /proc/sys/net/ipv4/ip_forward
If all was successful, the second command would return a "1" as output to your terminal. To add this permanently, add:
Next, to tell the directors to relay incoming HTTP requests to the HA floating IP address on to the realservers, use the
First, clear the old ipvsadm tables:
# /sbin/ipvsadm -C
Before you can configure the new tables, you need to decide what kind of workload distribution you want the LVS directors to use. On receipt of a connect request from a client, the director assigns a realserver to the client based on a "schedule," and you will set the scheduler type with the
ipvsadm command. Available schedulers include:
- Round Robin (RR): New incoming connections are assigned to each realserver in turn.
- Weighted Round Robin (WRR): RR scheduling with additional weighting factor to compensate for differences in realserver capabilities such as additional CPUs, more memory, and so on.
- Least Connected (LC): New connections go to the realserver with the least number of connections. This is not necessarily the least-busy realserver, but it is a step in that direction.
- Weighted Least Connection (WLC): LC with weighting.
It is a good idea to use RR scheduling for testing, as it is easy to confirm. You may want to add WRR and LC to your testing routine to confirm that they work as expected. The examples shown here assume RR scheduling and its variants.
Next, create a script to enable ipvsadm service forwarding to the realservers, and place a copy on each LVS director. This script will not be necessary when the later configuration of mon is done to automatically monitor for active realservers, but it aids in testing the ipvsadm component until then. Remember to double-check for proper network and http/https connectivity to each of your realservers before executing this script.
Listing 5. The HA_CONFIG.sh file
As you can see in Listing 5, the script simply enables the ipvsadm services, then has virtually identical stanzas to forward Web and SSL requests to each of the individual realservers. We used the
-m option to specify NAT, and weight each realserver equally with a weight of 1 (
-w 1). The weights specified are superfluous when using normal round robin scheduling (as the default weight is always 1). The option is presented only so that you may deviate to select weighted round robin. To do so change
wrr on the 2 consecutive lines below the comment about using round robin, and of course do not forget to adjust the weights accordingly. For more information about the various schedulers, consult the man page for ipvsadm.
You have now configured each director to handle incoming Web and SSL requests to the floating service IP by rewriting them and passing the work on to the realservers in succession. But in order to get traffic back from the realservers, and do the reverse process before handing the requests back to the client who made the request, you need to alter a few of the networking settings on the directors. This is necessary because of the decision to implement LVS directors and realservers in a flat network topology (that is, all on the same subnet). We need to perform the following steps to force the Apache response traffic back through the directors rather than answering directly themselves:
echo "0" > /proc/sys/net/ipv4/conf/all/send_redirects
echo "0" > /proc/sys/net/ipv4/conf/default/send_redirects
echo "0" > /proc/sys/net/ipv4/conf/eth0/send_redirects
This was done to prevent the active LVS director from trying to take a TCP/IP shortcut by informing the realserver and floating service IP to talk directly to one another (since they are on the same subnet). Normally redirects are useful, as they improve performance by cutting out unnecessary middlemen in network connections. But in this case, it would have prevented the response traffic from being rewritten as is necessary for transparency to the client. In fact, if redirects were not disabled on the LVS director, the traffic being sent from the realserver directly to the client would appear to the client as an unsolicited network response and would be discarded.
At this point, it is time to set the default route of each of the realservers to point at the service floating IP address to ensure all responses are passed back to the director for packet rewriting before being passed back to the client that originated the request.
Once redirects are disabled on the directors, and the realservers are configured to route all traffic through the floating service IP, you may proceed to test your HA LVS environment. To test the work done thus far, point a Web browser on a remote client to the floating service address of the LVS directors.
For testing in the laboratory, we used a Gecko-based browser (Mozilla), though any browser should suffice. To ensure the deployment was successful, disable caching in the browser, and click the refresh button multiple times. With each refresh, you should see that the Web page displayed is one of the self-identifying pages configured on the realservers. If you are using RR scheduling, you should observe the page cycling through each of realservers in succession.
Are you now thinking of ensuring that the LVS configuration starts automatically at boot? Don't do that just yet! There is one more step needed (Step 5) to perform active monitoring of the realservers (thus keeping a dynamic list of which Apache nodes are eligible to service work request).
So far, you have established a highly available service IP address and bound that to the pool of realserver instances. But you must never trust any of the individual Apache servers to be operational at any given time. By choosing RR scheduling, if any given realserver becomes disabled, or ceases to respond to network traffic in a timely fashion, 1/6th of the HTTP requests would be failures!
Thus it is necessary to implement monitoring of the realservers on each of the the LVS directors in order to dynamically add or remove them from the service pool. Another well-known open source package called mon is well suited for this task.
The mon solution is commonly used for monitoring LVS realnodes. Mon is relatively easy to configure, and is very extensible for people familiar with shell scripting. There are essentially three main steps to get everything working: installation, service monitoring configuration, and alert creation. Use your package management tool to handle the installation of mon. When finished with the installation, you need only to adjust the monitoring configuration, and create some alert scripts. The alert scripts are triggered when the monitors determine that a realserver has gone offline, or come back online.
Note that with heartbeat v2 installations, monitoring of realservers can be accomplished by making all the realserver services resources. Or, you can use the Heartbeat ldirectord package.
By default, mon comes with several monitor mechanisms ready to be used. We altered sample configuration file in /etc/mon.cf to make use of the HTTP service.
In the mon configuration file, ensure the header reflects the proper paths. SLES10 is a 64-bit Linux image, but the sample configuration as shipped was for the default (31- or 32-bit) locations. The configuration file sample assumed the alerts and monitors are located /usr/lib, which was incorrect for our particular installation. The parameters we altered were as follows:
alertdir = /usr/lib64/mon/alert.d
mondir = /usr/lib64/mon/mon.d
As you can see, we simply changed
lib64. Such a change may not be necessary for your distribution.
The next change to the configuration file was to specify the list of realservers to monitor. This was done with the following 6 directives:
Listing 6. Specifying realservers to monitor
If you want to add additional realservers, simply add additional entries here.
Once you have all of your definitions in place, you need to tell mon how to watch for failure, and what to do in case of failure. To do this, add the following monitor sections (one for each realserver). When done, you will need to place both the mon configuration file and the alert on each of the LVS heartbeat nodes, enabling each heartbeat cluster node to independently monitor all of the realservers.
Listing 7. The /etc/mon/mon.cf file
Listing 7 tells mon to use the http.monitor, which is shipped with mon by default. Additionally, port 80 is specified as the port to use. Listing 7 also provides the specific page to request; you may opt to transmit a more efficient small segment of html as proof of success rather than a complicated default html page for your Web server.
upalert lines invoke scripts that must be placed in the
alertdir specified at the top of the configuration file. The directory is typically something that is the distribution default, such as "/usr/lib64/mon/alert.d". The alerts are responsible for telling LVS to add or remove Apache servers from the eligibility list (by invoking the
ipvsadm command, as we shall see in a moment).
When one of the realservers fails the http test,
dowem.down.alert will be executed by mon with several arguments automatically. Likewise, when the monitors determine that a realserver has come back online, the mon process executes the dowem.up.alert with the numerous arguments automatically. Feel free to alter the names of the alert scripts to suit your own deployment.
Save this file, and create the alerts (using simple bash scripting) in the alertdir. Listing 8 shows a bash script alert that will be invoked by mon when a real server connection is re-established.
Listing 8. Simple alert: we have connectivity
Listing 9 shows a bash script alert that will be invoked by mon when a real server connection is lost.
Listing 9. Simple alert: we have lost connectivity
Both of those scripts use of the
ipvsadm command-line tool to dynamically add and remove realservers from the LVS tables. Note that these scripts are far from perfect. With mon monitoring only the http port for simple Web requests, the architecture as outlined here is vulnerable to situations where a given realserver might be operating correctly for http requests but not for SSL requests. Under those circumstances, we would fail to remove the offending realserver from the list of https candidates. Of course, this is easily remedied by making more advanced alerts specifically for each type of Web request in addition to enabling a second https monitor for each realserver in the mon configuration file. This is left as an exercise for the reader.
To ensure monitoring has been activated, enable and disable the Apache process on each of the realservers in sequence, observing each of the directors for their reaction to the events. Only when you have confirmed that each director is properly monitoring each realserver, should you use the
chkconfig command to make sure that the mon process starts automatically at boot. The specific command used was
chkconfig mon on, but this may vary based on your distribution.
With this last piece in place, you have finished the task of constructing a cross-system, highly-available Web server infrastructure. Of course, you might now opt to do more advanced work. For instance, you may have noticed that the mon daemon itself is not monitored (the heartbeat project can monitor mon for you), but with this last step, the basic foundation has been laid.
There are many reasons why an active node could stop functioning properly in an HA cluster, either voluntarily or involuntarily. The node could lose network connectivity to the other nodes, the heartbeat process could be stopped, there might be any one of a number of environmental occurrences, and so on. To deliberately fail the active node, you can issue a halt on that node, or set it to standby mode using the
hb_gui (clean take down) command. If you feel inclined to test the robustness of your environment, you might opt to be a bit more aggressive (yank the plug!).
There are two types of log file indicators available to the system administrator responsible for configuring a Linux HA heartbeat system. The log files vary depending on whether or not a system is the recipient of the floating resource IP address. Log results for cluster members that did not receive the floating resource IP address look like so:
Listing 10. Log results for also-rans
As you can see from Listing 10, a roll is taken, and sufficient members for quorum are available for the vote. A vote is taken, and normal operation is resumed with no further action needed.
In contrast, log results for cluster members that did receive the floating resource IP address are as follows:
Listing 11. The log file of the resource holder
As shown in Listing 11, the /var/log/messages file shows this node has acquired the floating resource. The
ifconfig line shows the eth0:0 device being created dynamically to maintain service.
And as you can see from Listing 11, a roll is taken, and sufficient members for quorum are available for the vote. A vote is taken, followed by the
ifconfig commands that are issued to claim the floating resource IP address.
As an additional means of indicating when a failure has occurred, you can log in to any of the cluster members and execute the
hb_gui command. Through this method, you can determine by visual inspection which system has the floating resource.
Lastly, we would be remiss if we did not illustrate a sample log file from a no-quorum situation. If any singular node cannot communicate with either of its peers, it has lost quorum (since 2/3 is the majority in a three-member voting party). In this situation, the node understands that it has lost quorum, and invokes the no quorum policy handler. Listing 12 shows an example of the log file from such an event. When quorum is lost, a log entry indicates it. The cluster node showing this log entry will disown the floating resource. The
ifconfig down statement releases it.
Listing 12. Log entry showing loss of quorum
As you can see from the Listing 12, when quorum is lost for any given node, it relinquishes any resources as a result of the chosen no quorum policy configuration. The choice of no quorum policy is up to you.
One of the more interesting implications of a properly-configured Linux HA system is that you do not need to take any action to re-instantiate a cluster member. Simply activating the Linux instance is sufficient to let the node rejoin its peers automatically. If you have configured a primary node (that is, one that is favored to gain the floating resource above all others), it will regain the floating resources automatically. Non-favored systems will simply join the eligibility pool and proceed as normal.
Adding another Linux instance back into the pool will cause each node to take notice, and if possibly, re-establish quorum. The floating resources will be re-established on one of the nodes if quorum can be re-established.
Listing 13. Quorum is re-established
- In Listing 13, you see that quorum has been re-established. When quorum is re-established, a vote is performed and litsha22 becomes the active node with the floating resource.