Detect and Fix HAProxy+Apache+Passenger Queue Backlogs

To inspire hard work, some young men hang a poster on their wall that includes: (1) an exotic sports car (2) a scantly clad lady and (3) a beach house. My inspirational poster would be much less attractive: a friendly butler who offers time-honored wisdom (with an accent because people with accents are smarter) and absolutely loves running errands for me.

I don't like running errands because I don't like waiting in lines. My nightmare: having to pickup groceries during a busy weekend afternoon. There are 3 queues at the grocery store that can cause a delay:

Modern web apps face the same queuing issues serving web requests under heavy traffic. For example, a web request served by Scout passes through several queues.

That's Apache (for SSL processing) to HAProxy on the load balancer, then Apache to Passenger to the Rails app on a web server.

A request can get stuck in any of those five spots. The worst part about queues? Time in queue is easy to miss. Most of the time, people look at the application log when they suspect a slowdown. However, a slowdown in any of the four earlier queues won’t show up in your application log. Just looking at your application and database activity for slowdowns is like recording the time it takes to get your groceries from the time you grab the first item on the shelf till you start waiting to checkout: you're leaving out the time it takes to find a parking spot, get a cart, and checkout.

Now, before you start worrying about queues, take a deep breath. First, each of these systems are super reliable. For the most part, they just work. Second, it's much more likely your application logic is the cause of a performance issue than a queuing problem. Look there first.

Third (and most importantly), each of these systems handles queues in remarkably similar ways. Understanding some basic queuing concepts will go a long way. Let's take a look at some basics and then specific examples for Apache, HAProxy, and Passenger.

Queuing Basics

Global queues prevent large outliers

If you're shopping during holiday season, you'll sometimes see stores modify the checkout procedure. They'll put everyone in one line and then direct them to an open register. This way, the poor guy that's trying to find a credit card that isn't maxed out doesn't hold up people directly behind him.

You don't need to do anything to enable global queuing for Apache and HAProxy. For Passenger, it depends on your version: according to the Passenger docs, the default value for the PassengerUseGlobalQueue directive is "on". In the past, it was "off", so you may need to consult docs for your installed version of Passenger.

Beware of cascading backlogs

You're opening a hot new club in the warehouse area of the city because all hot new clubs open in warehouse areas. You tell the bouncer to keep the line outside the door long to make your club look busy. On the inside though, things are calm: there's no wait at the bar.

However, your burly bouncer is a teddy bear and lets everyone inside so they don't have to wait in the cold. Suddenly, there's no line, but the bartenders are overwhelmed. The backlog was just shifted to another queue.

It's the same with your web app: increasing the number of max HAProxy connections will push more traffic to your web servers. This may cause backlogs on the web servers. It may cause higher database activity. You'll need to closely monitor the performance of your app when you open the floodgates on your load balancer.

Faster app performance = fewer backlogs

Busy lunch spots know the faster they turn tables, the shorter the wait for a table. It's the same for web apps: when your Rails app gets faster, the queue backlog will drop across the stack.

More capacity = more memory

Brick and mortar businesses need a healthy balance of staff vs. customers. Too much staff during a non-peak time wastes money: too little during a busy time means upset customers.

It's the same for your Rails app. Increasing your capacity has real-world costs: memory. The biggest consumer of memory will likely be Passenger as it serves the actual Rails app (which may be hundreds of MB in size).

Queuing Specifics

Apache

Requests will back up in Apache if the maximum allowed connections is exceeded. To look for an Apache backup, we use the following command:

watch --interval=1 "/usr/sbin/apachectl status"

Watch this line:

423 requests currently being processed, 55 idle workers

If idle workers drops to zero, you have a backup.

The command above refreshes the Apache Server Status every second. We use watch because the worker output is only valid for that point in time – we want to watch it over a longer period.

To enable the status page, you’ll need to use the mod_status module and specify a location entry in your Apache config file.

Tuning Apache

The MaxClients directive specifies the limit for the maximum number of simultaneous connections. See the Apache docs for background on this directive.

HAProxy

Requests will back up in HAProxy if:

The easiest way to check for an HAProxy backlog is to examine the output of the HAProxy stats page. There are four important metrics to watch.

At the top of the page, maxconn shows the maximum number of connections HAProxy will handle. current conns shows the number of connections HAProxy is handling now:

In this case, we're using 20 of 4096 available connections. There is plenty of headroom.

For each backend server, look at the Max and Limit columns under the Sessions heading:

In this case, the maximum number of concurrent sessions a single web server handled was 561. This is a bit more than half of the 1,000 connection limit specified in the adjacent column.

Enabling the stats page

Use the stats directive in your HAProxy configuration file to enable the stats page. For example:

  stats enable
  stats uri     /haproxy?stats
  stats auth administrator:PASS

Tuning HAProxy

To modify the global maximum number of connections, change the maxconn directive in the HAProxy config file. To modify maximum connections for a specific server, modify its maxconn option:

  server web1 web1.host.com:80 maxconn 1000

Passenger

Phusion Passenger serves our Rails and Sinatra apps. To look for a queue backlog we use the following command:

watch --interval=1 "sudo passenger-status"

This displays information on each Passenger process. If the Sessions count is high for a process, it has a queue of requests waiting to be processed:

PID: 26049 Sessions: 20 Processed: 2977 Uptime: 2h 53m 55s

There is a 20 session backlog for the process above: it looks like we need to increase the number of Passenger processes.

Tuning Passenger

As I mentioned earlier, Passenger is likely to be the biggest consumer of memory in your web stack. It requires some special attention. Take a look at our previous post: Production Rails Tuning with Passenger: PassengerMaxProcesses for instructions on tuning Passenger.

Ongoing monitoring

A big challenge with ongoing queue monitoring is sampling: the commands used to watch for queue backups in Apache, HAProxy, and Passenger all show the current status. If you aren’t watching these over time, you may miss a backup.

The easiest solution we’ve found is monitoring request times at the highest point in the stack. For us, this is Apache on our load balancer. Anything that happens beneath Apache (HAProxy, or on an individual web server) will show up here. If we’re seeing larger request times we can dig deeper into the stack.

We use Scout's Apache Log Analyzer and Rails Monitoring plugins to monitor request times. A spike in Apache request times can indicate a queue backlog:

We use Scout’s HAProxy Monitoring plugin to monitor HAProxy.

See our previous post: Is your Rails app under-provisioned? for more.

Relavant Links