How to Nail Server Load Balancing: HAProxy Tutorial and Use Cases

JetRuby Agency
JetRuby Agency
Published in
7 min readMay 18, 2018

--

In this very special episode, we’ll talk about HAProxy. We’ll find out what this software is capable of and look at its main pros and cons.

First and foremost, you may come up with a question: Why do I actually need HAProxy? What is the main benefit for me?

The main advantage is that it allows distributing connection requests across multiple server nodes. As a result, HAProxy handles a huge amount of HTTP and HTTPS traffic with almost no resource usage.

What’s the benefit for businesses?

Simply put, this technology allows us to improve the deployment of high-load applications while also increasing their performance and stability.

Now, let’s look at HAProxy from the technical perspective. First and foremost, we need to understand what HAProxy means. HAProxy is quite fast and efficient open source software that provides a high availability load balancer and proxy server both for TCP and HTTP-based applications, which spread requests across multiple servers.

HAProxy is used in various high-loaded websites, such as Instagram, Reddit, Twitter and so on. Its main functions are:

  • HTTP and TCP balancing;
  • An automatic restart without losing any active connections;
  • Built-in web interface that allows to see a stat with HTTP basic authentication;
  • A possibility to terminate SSL

HTTP and TCP connections balancing

HTTP supports balancing of HTTP, HTTPS, HTTP2 (such as web servers or Elasticsearch) and TCP connections (such as Postgres, or redis). Let’s look at the requests balancing algorithms that are being supported by the HAProxy:

  • round-robin. Requests are being sent to backends by turn. This is especially useful for the balancing of the stateless protocols, such as HTTP. Nevertheless, this method won’t work for the applications, which data is stored not in the shared data repository but on the specific backend.
  • leastconn. A request is being sent to the backend with the least number of opened connections;
  • source: requests from the same IP will always be sent to the same backend. It may be helpful for such services as Memcached

There are many other algorithms. and some of them are too specific for HTTP. More information you’ll find in HAProxy Configuration Manual.

An automatic restart without losing any active connections

HAProxy doesn’t support the config rereading. Nevertheless, it allows restarting without losing any open connections. This purpose is started a new process that will be accepting new connections. At the same time, an old one will be processing old connections. You may find more information about this mechanism in the 4th chapter of HAProxy Management Guide.

Using various algorithms of the balancing

Source-based balancing

Frequently, especially for TCP backends, there’s a necessity to ensure that all the requests from the same client will always go to the same backend. For this will be used an algorithm of “source” balancing. In such a way, we can balance the load between various instances of memcached or redis (in case, all the redis are independent and are not master/slave of each other).

Imagine that we need to provide a single entry point for memcached. It should be like a single memcached server for the clients. However, if a client saved the data in backend X, a reading request will also have to be sent to the backend X. For this purpose we may use the following config:

listen memcached *:11211option tcplogbalance source # ip-based balancingserver memcached-1 host-1.example.com:11211 check inter 3s fall 3 minconn 50server memcached-2 host-2.example.com:11211 check inter 3s fall 3 minconn 50server memcached-3 host-3.example.com:11211 check inter 3s fall 3 minconn 50

If all 3 backends are working, all the requests will be balanced this way as well. If some of them fell, HAProxy would exclude it from the pool and send requests to another backend. On the one hand, a client loses cache but, at the same time, he saves the access to the service.

Round-robin balancing

If it doesn’t matter, which backend should serve a request, you can use a round-robin algorithm. If backends are of the same size, this algorithm will allow balancing the requests both cyclically and equally. Since Elasticsearch cluster nodes do all the work on the record distribution by itself, this algorithm can be used both for balancing the reading and the record the entry in Elasticsearch. Consequently, our config looks as follows:

listen *:9200option tcplogserver elastic-1 host-1.example.com:9200 check inter 3s fall 3 minconn 50server elastic-2 host-2.example.com:9200 check inter 3s fall 3 minconn 50server elastic-3 host-3.example.com:9200 check inter 3s fall 3 minconn 50

Using TCP-backend redis in the scheme master-slave

All the examples of the balancing supposed that the services don’t actually depend on each other. Nevertheless, if there are the main and a reserve service, you’ll need a more complicated configuration. HAProxy needs to determine which of two services is the main one and ready to accept requests.

As an example, we can cite a several instances of redis. One of the instances is a master and the other ones are its replicas. Requests from users should go only to master. That’s why HAProxy should understand which one of the backends will be a master in any period of time.

HAProxy uses a simple port availability to check for determining whether the backend is functioning or not. Additionally, it supports complicated checks that send the data to the specified port for checking the received response. In such a way, HAProxy can communicate with redis-daemon for getting its role (master/slave). Here’s one of the examples of such setting.

Despite the fact that the setting is relatively simple and works in most of the cases, it doesn’t provide 100% accuracy in determining the master. In case with the network split, two different redis will consider themselves as a master. And this may fool HAProxy. As a result, it’s better to use redis sentinel as a source. Much more information is in this article. Nevertheless, it significantly complicates the config of HAProxy.

However, if you need to send all the requests to the master and to slaves on reading, it’ll be better to use not HAProxy but specialized solutions, such as pgpool for postgres.

Usage of HAProxy in the docker-container

Restart without losing connections

Up to version 1.5.16, HAProxy was run as the main process in the standard docker-container, such as PID 1. That’s why it was impossible to restart without losing the connection. The thing is that at the end of PID 1, a container also came to an end. That’s why it’s important to make sure that HAProxy is not that process. Starting since the version 1.5.16 as a PID 1 is being run a haproxy-systemd-wrapper. It allows restarting HAProxy without crashing. An additional advantage is the possibility to end the container using both the combination of keyboard keys Ctrl+C and a docker stop.

Logging

It’s expected that all the applications, which are run in the docker containers, will be outputting their logs in stdout. In this case, logs will be available through docker logs. HAProxy doesn’t support logging into file or stdout (only syslog). There’s a method to get all the logs by specifying HAProxy and log everything in /dev/log. But for that to happen, we need to specify in the config:

defaultslog /dev/log local0 debugoption dontlognulloption dontlog-normal/dev/log is being set inside the container:docker run haproxy -v /dev/log:/dev/log

However, keep in mind that only a system log will be working (not the docker log). In the up-to-date distribution kits, you can get to it with the help of journalctl.

Why do you need HAProxy?

Nowadays, HAProxy is not the single solution for HTTP and TCP proxying. It’s no secret that nginx also supports TCP-proxying. However, there are seven reasons why we actually give preferences to HAProxy:

  1. HAProxy is trusted by the years. The first release of HAProxy has been made in the december 2001. Since then, it keeps constantly developing.
  2. High performance rate. HAProxy is written in C. That’s why it’s used in the high-load systems.
  3. High customizability. The documentation is huge. So you can literally customize everything you want.
  4. Reliability. This is probably one of the strongest sides of HAProxy.
  5. Wide scope of application. There are dozens of various scenarios of how you can use HAProxy, such as reverse proxy, load balancing, failover, support of HTTP and TCP.
  6. Built-in web-interface with a stat.
  7. Monitoring. Some kind of a web interface that allows outputting the data not only as an HTML-page in CSV format. As a result, it can be integrated with the monitoring systems.

The Bottom Line

Congrats! If you read the whole article, you got the basic concepts of HAProxy and load balancing. In the world of web-development, they play a crucial role.

HAProxy is one of the most popular open load balancers of HTTP/TCP with an open programming code. At the same time, it’s a proxy server for such systems as Linux, FreeBSD, and Solaris. How will it be useful for you? First of all, with the help of load balancing between several servers, you’ll significantly improve the performance and robustness of the server environment.

We have a great experience with this outstanding technology and already implemented it in various projects. If you want to find out more about it or use in in your project, feel free to send us an email and subscribe to our blog :)

--

--

JetRuby is a Digital Agency that doesn’t stop moving. We expound on subjects as varied as developing a mobile app and through to disruptive technologies.