Deserialized The Ramblings of a Web Architect


Reverse Proxy Performance – Varnish vs. Squid (Part 2)

Posted by Bryan Migliorisi


In part one of this series I tested the raw throughput performance of Varnish and Squid.  My results are consistent with all the blogs and comments floating around the blogosphere – Varnish blows away Squid.

Unfortunately, the first series of tests were somewhat uninformative.  Since they only tested the raw performance of serving cached content from memory, it did not mimic a real world scenario of serving cached content as well as fetching content from the backend and caching it.

While we would hope for a primed, full cache, it is unlikely to happen and you will undoubtedly see a decent amount of backend requests from your caching proxy.

A better test of the two proxies would involve a large set of random URLs, but not too random because we want to simulate both cache hits and cache misses.  To accomplish this, I wrote a small PHP script that would take two parameters: total number of URLs to generate and the hostname for those URLs.

Generating a usable URL list

Generating the list is simple.  This script looks like this:

 1) {
		} else {
			$as = "";
                echo "http://$host/varnish/gen/$random$as\n";
                flush(); ob_flush();
        echo "http://$host/varnish/gen/$random$as";

All this does is create a long list of URLs.  I used PHPs output buffering mechanisms to flush the buffer which is necessary when creating large URL lists so that you don’t wait forever.  Maybe it could have been written better but I don't care – that wasn't the point of this test.

The URLs that are created are in the format of:



This URL is mapped to another PHP file that simply generates dummy data of the size specified in the URL.  In the above cases, the files would be 50Kb large.  The query parameter “as” is just a useless piece of information that is meant to tell the proxy to cache it.  If the “as” query parameter does not exist, the proxy will forward the request to the backend and not cache it.  Its a simple way to generate cacheable and non-cacheable URLs.

To generate the list and store it in a local file, I used this command:

	> urls-10k.txt

Verify the results of the script

For your own sanity, make sure that the script did in fact generate a list of URLs that suits your needs.

Count the amount of URLs generated:

cat urls-10k.txt | wc –l

(yes, I know it creates one extra URL … Its fine by me.)

Count the amount of cacheable URLs containing the “as” query parameter:

cat urls-10k.txt | grep as | wc –l

Count the amount of unique cacheable URLs:

cat urls-10k.txt | grep as | sort | uniq | wc –l

Running the tests

In part one I used ApacheBench to load the servers but for these tests, I used Siege and http_load which both allowed me to load URLs from a file.

I started with Varnish using the following commands:

	> urls-100k.txt
http_load -parallel 10 -fetches 100000 urls-100k.txt
http_load -parallel 25 -fetches 100000 urls-100k.txt
http_load -parallel 50 -fetches 100000 urls-100k.txt
http_load -parallel 100 -fetches 100000 urls-100k.txt
http_load -parallel 200 -fetches 100000 urls-100k.txt
http_load -parallel 400 -fetches 100000 urls-100k.txt

In between each http_load command, I restarted the Varnish service so that each test ran with an empty cache.  When I was done with the Varnish tests, I ran the same tests against Squid using the same commands above.

The results

The results of these tests represent the typical web application much better than the original tests did.

This first graph shows the average time for the proxy to accept a connection.  As concurrency goes up, it is expected that the time to connect would go up too.  Squid suffers more than Varnish does, but the difference is negligible.


The second graph is much more interesting.  As concurrency goes up, the Time-To-First-Byte for Squid goes up very sharply while Varnish holds its ground and remains very quick around 25ms.


This third graph shows another interesting behavior.  As concurrency goes up, Varnish begins to even itself out at just under 800 fetches per second while Squid peaks at around 1100 fetches per second with around 50 concurrent connects and then sharply drops off as concurrency goes up.



Squid versus Varnish is just another holy war that may never end.  The tests that I have performed have been very helpful for me and my team but your results may vary.  Of course, there are many more things to consider and I plan to write about some of the major differences between Squid and Varnish.

My results show that in raw cache hit performance, Varnish puts Squid to shame.  In real world scenarios I found that Squid can hold its own when dealing with small amounts of traffic, but it’s performance drops off very sharply as it begins to handle more connections. Varnish handles them without a sweat, as it was designed to do.

My next blog post will detail the differences between Varnish and Squid’s architecture, features, and the reasons I am pushing for Varnish in our environment.


Some people are complaining in comments on Reddit and HackerNews that I have not provided any information about the hardware or operating system for my tests.  This information was posted in Part one of this post.


Reverse Proxy Performance – Varnish vs. Squid (Part 1)

Posted by Bryan Migliorisi

squid-vs-varnish Typical web applications require dozens of SQL queries to generate a single page.  When your application is serving over 1,000,000 pages per day, you quickly realize that the performance bottleneck is your database.  The typical answer to slow database queries is “just use memcached!”  Memcached and other data caches can only take you so far.  This is where reverse proxies come in.  There are a handful of them out there, including Nginx, Perlbal, Squid and Varnish.  Which to use is up to you.


Deciding what is best for you

Assuming that you have taken a step back and really analyzed your problem first, the next step is to analyze the possible solutions.  For us, Varnish seems like the best option with Squid close behind.  To be fair, I’ve set up a test server with both Varnish and Squid running.  I’ll use ApacheBench to generate load and requests.

I’ve analyzed our pages to see what the typical page size is and recorded the average page sizes for 5 different page types.  They range from around 10KB to 35KB (gzipped).  For my test, I’ll be benchmarking with 10KB, 15KB, 20KB, 30KB, 40KB, and 50KB files to get a good range of different size requests.

To test under different load capacities, I’ll use ApacheBench to generate loads with different amounts of concurrent users ranging from 10 to 400.


The test

I’ll be using two identical machines on the same local class C network to eliminate (as much as possible) network latency. 

The machines look something like this:

  • Pentium 4 3GHz (8KB Level 1, 512KB Level 2)
  • 2GB (4x512 DDR 400MHz)
  • 120GB ATA Western Digital Caviar WD1200JB
  • CentOS 5

(I don't have more information than that.  Suffice to say that it is a few years old and not very powerful)

I am using Varnish 2.04 and Squid 2.6.STABLE21.  There are newer versions of Squid but i am using this version because the 3.x branch is missing features found in the 2.x branch and I have read several reports of 2.7 crashing, etc.


The command to run the load test looks something like this:

ab –c concurrent_users –n total_requests “url”

This will let you specify how many concurrent users to run and how many requests to make.  I have the proxy servers running on ServerA and I run the benchmark from ServerB.


The results

In general, Varnish seems to perform twice as well as Squid does.  In every test, Varnish serves nearly 2x more requests per second and has half the average response time.

  Varnish Squid
File Size Concurrent Users (V) Requests per second (V) Avg across all requests (V) Average Request (ms) (S) Requests per second (S) Avg across all requests (S) Average Request (ms)
10k 10 6592 0.152 1 3078 0.325 3
10k 25 6915 0.145 3 3568 0.280 7
10k 50 7071 0.141 7 3539 0.283 14
10k 100 6860 0.146 13 e="3" face="Calibri">3565 0.280 28
10k 200 7252 0.138 27 3506 0.285 57
10k 400 7181 0.139 56 3518 0.284 113
15k 10 4636 0.216 2 2949 0.339 3
15k 25 5954 0.168 4 3168 0.316 7
15k 50 6036 0.166 8 3118 0.321 16
15k 100 6060 0.165 16 3247 0.308 30
15k 200 6066 0.165 32 3226 0.310 61
15k 400 6048 0.165 66 3092 0.323 129
20k 10 4689 0.213 2 2553 0.392 3
20k 25 5342 0.187 4 2675 0.374 9
20k 50 5422 0.184 9 2799 0.357 17
20k 100 5446 0.184 18 2861 0.349 34
20k 200 5430 0.184 36 2795 0.358 71
20k 400 5400 0.185 74 2656 0.376 150
25k 10 4135 0.242 2 2331 0.429 4
25k 25 4485 0.223 5 2308 0.433 10
25k 50 4488 0.223 11 2221 0.450 22
25k 100 4446 0.225 22 2217 0.451 45
25k 200 4311 0.232 46 2180 0.459 91
25k 400 4160 0.240 96 2026 0.493 197
30k 10 3463 0.289 2 1936 0.516 5
30k 25 3689 0.271 6 2002 0.499 12
30k 50 3661 0.273 13 1887 0.530 26
30k 100 3627 0.276 27 1778 0.562 56
30k 200 3589 0.279 55 1746 0.573 114
30k 400 3541 0.282 112 1798 0.556 222
40k 10 2752 0.363 3 1602 0.624 6
40k 25 2824 0.354 8 1584 0.631 15
40k 50 2826 0.354 17 1492 0.670 33
40k 100 2827 0.354 35 1551 0.645 64
40k 200 2822 0.354 70 1538 0.65 130
40k 400 2794 0.358 143 1372 0.728 291
50k 10 2254 0.443 4 1401 0.713 7
50k 25 2265 0.441 11 1379 0.725 18
50k 50 2266 0.441 22 1368 0.731 36
50k 100 2268 0.441 44 1360 0.735 73
50k 200 2266 0.441 88 1230 0.813 162
50k 400 2267 0.441 176 1216 0.822 328

Here are the graphs of the above data for easier visualization:



Something is wrong here

These are simply benchmarks and are not meant to represent real world scenarios for a few reasons.  Most importantly, this test takes place on a local network that goes through one router.  Running this test on a local network does not take into consideration the typical network latency you would find across the internet.

Secondly, this test only illustrates the raw speed of serving up cached content which isn’t a typical real world scenario.  To really test the overall performance of both of these, we need to simulate the three major steps of a reverse proxy:

  1. Forwarding a request to a backend server
  2. Physically caching it (memory or disk)
  3. Serving the cached data

Testing any of these three steps is good, and shows the raw performance of that function but it doesn’t give us a general overview of the overall performance.


Next Steps

I need to come up with a way to generate load on the server such that it represents the typical flow of requests that we would normally see on a server.  I am running this on a test server, not against production data, so if anyone has an idea of how I can do this, please do let me know.  The results of this test will be Part 2 of this post.

Additionally, please let me know if you spot inefficiencies in my testing methodology. I don’t claim to be a load testing expert so any advice you can offer is appreciated.