Written by Stas Bekman
Correct configuration of the
is very important. There are no defaults. If they are too low, you
will under-use the system's capabilities. If they are too high, the
chances are that the server will bring the machine to its knees.
All the above parameters should be specified on the basis of the
resources you have. With a plain apache server, it's no big deal if
you run many servers since the processes are about 1Mb and don't eat a
lot of your RAM. Generally the numbers are even smaller with memory
sharing. The situation is different with mod_perl. I have seen
mod_perl processes of 20Mb and more. Now if you have
set to 50: 50x20Mb = 1Gb. Do you have 1Gb of RAM? Maybe not. So how
do you tune the parameters? Generally by trying different
combinations and benchmarking the server. Again mod_perl processes
can be of much smaller size with memory sharing.
Before you start this task you should be armed with the proper weapon.
You need the crashme utility, which will load your server with the
mod_perl scripts you possess. You need it to have the ability to
emulate a multiuser environment and to emulate the behavior of
multiple clients calling the mod_perl scripts on your server
simultaneously. While there are commercial solutions, you can get
away with free ones which do the same job. You can use the
ApacheBench utility which comes with the Apache distribution, the
crashme script which uses
LWP::Parallel::UserAgent, httperf or
http_load all discussed in one of the previous articles.
It is important to make sure that you run the load generator (the client which generates the test requests) on a system that is more powerful than the system being tested. After all we are trying to simulate Internet users, where many users are trying to reach your service at once. Since the number of concurrent users can be quite large, your testing machine must be very powerful and capable of generating a heavy load. Of course you should not run the clients and the server on the same machine. If you do, your test results would be invalid. Clients will eat CPU and memory that should be dedicated to the server, and vice versa.
I'm going to use
ab) utility to tune our server's
configuration. We will simulate 10 users concurrently requesting a
very light script at
Each simulated user makes 10 requests.
% ./ab -n 100 -c 10 http://www.example.com/perl/access/access.cgi
The results are:
Document Path: /perl/access/access.cgi Document Length: 16 bytes Concurrency Level: 10 Time taken for tests: 1.683 seconds Complete requests: 100 Failed requests: 0 Total transferred: 16100 bytes HTML transferred: 1600 bytes Requests per second: 59.42 Transfer rate: 9.57 kb/s received Connnection Times (ms) min avg max Connect: 0 29 101 Processing: 77 124 1259 Total: 77 153 1360
The only numbers we really care about are:
Complete requests: 100 Failed requests: 0 Requests per second: 59.42
Let's raise the request load to 100 x 10 (10 users, each makes 100 requests):
% ./ab -n 1000 -c 10 http://www.example.com/perl/access/access.cgi Concurrency Level: 10 Complete requests: 1000 Failed requests: 0 Requests per second: 139.76
As expected, nothing changes -- we have the same 10 concurrent users. Now let's raise the number of concurrent users to 50:
% ./ab -n 1000 -c 50 http://www.example.com/perl/access/access.cgi Complete requests: 1000 Failed requests: 0 Requests per second: 133.01
We see that the server is capable of serving 50 concurrent users at
133 requests per second! Let's find the upper limit. Using
10000 -c 1000 failed to get results (Broken Pipe?). Using
-c 500 resulted in 94.82 requests per second. The server's
performance went down with the high load.
The above tests were performed with the following configuration:
MinSpareServers 8 MaxSpareServers 6 StartServers 10 MaxClients 50 MaxRequestsPerChild 1500
Now let's kill each child after it serves a single request. We will use the following configuration:
MinSpareServers 8 MaxSpareServers 6 StartServers 10 MaxClients 100 MaxRequestsPerChild 1
Simulate 50 users each generating a total of 20 requests:
% ./ab -n 1000 -c 50 http://www.example.com/perl/access/access.cgi
The benchmark timed out with the above configuration.... I watched the
ps as I ran it, the parent process just wasn't capable
of respawning the killed children at that rate. When I raised the
MaxRequestsPerChild to 10, I got 8.34 requests per second. Very
bad - 18 times slower! You can't benchmark the importance of the
StartServers with this
kind of test.
Now let's reset
MaxRequestsPerChild to 1500, but reduce
MaxClients to 10 and run the same test:
MinSpareServers 8 MaxSpareServers 6 StartServers 10 MaxClients 10 MaxRequestsPerChild 1500
I got 27.12 requests per second, which is better but still 4-5 times
slower. (I got 133 with
MaxClients set to 50.)
Summary: I have tested a few combinations of the server
configuration variables (
results I got are as follows:
StartServers are only
important for user response times. Sometimes users will have to wait a
The important parameters are
MaxClients should be not too big, so it will not abuse your
machine's memory resources, and not too small, for if it is your users
will be forced to wait for the children to become free to serve them.
MaxRequestsPerChild should be as large as possible, to get the full
benefit of mod_perl, but watch your server at the beginning to make
sure your scripts are not leaking memory, thereby causing your server
(and your service) to die very fast.
Also it is important to understand that we didn't test the response times in the tests above, but the ability of the server to respond under a heavy load of requests. If the test script was heavier, the numbers would be different but the conclusions very similar.
The benchmarks were run with:
HW: RS6000, 1Gb RAM SW: AIX 4.1.5 . mod_perl 1.16, apache 1.3.3 Machine running only mysql, httpd docs and mod_perl servers. Machine was _completely_ unloaded during the benchmarking.
After each server restart when I changed the server's configuration, I made sure that the scripts were preloaded by fetching a script at least once for every child.
It is important to notice that none of the requests timed out, even if it was kept in the server's queue for more than a minute! That is the way ab works, which is OK for testing purposes but will be unacceptable in the real world - users will not wait for more than five to ten seconds for a request to complete, and the client (i.e. the browser) will time out in a few minutes.
Now let's take a look at some real code whose execution time is more than a few milliseconds. We will do some real testing and collect the data into tables for easier viewing.
I will use the following abbreviations:
NR = Total Number of Request NC = Concurrency MC = MaxClients MRPC = MaxRequestsPerChild RPS = Requests per second
Running a mod_perl script with lots of mysql queries (the script under test is mysqld limited) (http://www.example.com/perl/access/access.cgi?do_sub=query_form), with the configuration:
MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 MaxRequestsPerChild 5000
NR NC RPS comment ------------------------------------------------ 10 10 3.33 # not a reliable figure 100 10 3.94 1000 10 4.62 1000 50 4.09
Conclusions: Here I wanted to show that when the application is slow (not due to perl loading, code compilation and execution, but limited by some external operation) it almost does not matter what load we place on the server. The RPS (Requests per second) is almost the same. Given that all the requests have been served, you have the ability to queue the clients, but be aware that anything that goes into the queue means a waiting client and a client (browser) that might time out!
Now we will benchmark the same script without using the mysql (code limited by perl only): (http://www.example.com/perl/access/access.cgi), it's the same script but it just returns the HTML form, without making SQL queries.
MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 MaxRequestsPerChild 5000
NR NC RPS comment ------------------------------------------------ 10 10 26.95 # not a reliable figure 100 10 30.88 1000 10 29.31 1000 50 28.01 1000 100 29.74 10000 200 24.92 100000 400 24.95
Conclusions: This time the script we executed was pure perl (not
limited by I/O or mysql), so we see that the server serves the
requests much faster. You can see the number of requests per second
is almost the same for any load, but goes lower when the number of
concurrent clients goes beyond
MaxClients. With 25 RPS, the
machine simulating a load of 400 concurrent clients will be served in
16 seconds. To be more realistic, assuming a maximum of 100
concurrent clients and 30 requests per second, the client will be
served in 3.5 seconds. Pretty good for a highly loaded server.
Now we will use the server to its full capacity, by keeping all
MaxClients clients alive all the time and having a big
MaxRequestsPerChild, so that no child will be killed during the
MinSpareServers 50 MaxSpareServers 50 StartServers 50 MaxClients 50 MaxRequestsPerChild 5000 NR NC RPS comment ------------------------------------------------ 100 10 32.05 1000 10 33.14 1000 50 33.17 1000 100 31.72 10000 200 31.60
Conclusion: In this scenario there is no overhead involving the parent server loading new children, all the servers are available, and the only bottleneck is contention for the CPU.
Now we will change
MaxClients and watch the results: Let's reduce
MaxClients to 10.
MinSpareServers 8 MaxSpareServers 10 StartServers 10 MaxClients 10 MaxRequestsPerChild 5000 NR NC RPS comment ------------------------------------------------ 10 10 23.87 # not a reliable figure 100 10 32.64 1000 10 32.82 1000 50 30.43 1000 100 25.68 1000 500 26.95 2000 500 32.53
Conclusions: Very little difference! Ten servers were able to
serve almost with the same throughput as 50 servers. Why? My guess
is because of CPU throttling. It seems that 10 servers were serving
requests 5 times faster than when we worked with 50 servers. In that
case, each child received its CPU time slice five times less
frequently. So having a big value for
MaxClients, doesn't mean
that the performance will be better. You have just seen the numbers!
Now we will start drastically to reduce
MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50
NR NC MRPC RPS comment ------------------------------------------------ 100 10 10 5.77 100 10 5 3.32 1000 50 20 8.92 1000 50 10 5.47 1000 50 5 2.83 1000 100 10 6.51
Conclusions: When we drastically reduce
performance starts to become closer to plain mod_cgi.
Here are the numbers of this run with mod_cgi, for comparison:
MinSpareServers 8 MaxSpareServers 16 StartServers 10 MaxClients 50 NR NC RPS comment ------------------------------------------------ 100 10 1.12 1000 50 1.14 1000 100 1.13
Conclusion: mod_cgi is much slower. :) In the first test, when NR/NC was 100/10, mod_cgi was capable of 1.12 requests per second. In the same circumstances, mod_perl was capable of 32 requests per second, nearly 30 times faster! In the first test each client waited about 100 seconds to be served. In the second and third tests they waited 1000 seconds!
MaxClients directive sets the limit on the number of
simultaneous requests that can be supported. No more than this number
of child server processes will be created. To configure more than 256
clients, you must edit the
HARD_SERVER_LIMIT entry in
and recompile. In our case we want this variable to be as small as
possible, because in this way we can limit the resources used by the
server children. Since we can restrict each child's process size with
Apache::GTopLimit, the calculation of
MaxClients is pretty straightforward:
Total RAM Dedicated to the Webserver MaxClients = ------------------------------------ MAX child's process size
So if I have 400Mb left for the webserver to run with, I can set
MaxClients to be of 40 if I know that each child is limited to 10Mb
of memory (e.g. with
You will be wondering what will happen to your server if there are
more concurrent users than
MaxClients at any time. This situation
is signified by the following warning message in the
[Sun Jan 24 12:05:32 1999] [error] server reached MaxClients setting, consider raising the MaxClients setting
There is no problem -- any connection attempts over the
limit will normally be queued, up to a number based on the
ListenBacklog directive. When a child process is freed at the end
of a different request, the connection will be served.
It is an error because clients are being put in the queue rather than getting served immediately, despite the fact that they do not get an error response. The error can be allowed to persist to balance available system resources and response time, but sooner or later you will need to get more RAM so you can start more child processes. The best approach is to try not to have this condition reached at all, and if you reach it often you should start to worry about it.
It's important to understand how much real memory a child occupies.
Your children can share memory between them when the OS supports that.
You must take action to allow the sharing to happen. We have
disscussed this in one of the previous article whose main topic was
shared memory. If you do this, the chances are that your
can be even higher. But it seems that it's not so simple to calculate
the absolute number. If you come up with a solution please let us
know! If the shared memory was of the same size throughout the
child's life, we could derive a much better formula:
Total_RAM + Shared_RAM_per_Child * (MaxClients - 1) MaxClients = --------------------------------------------------- Max_Process_Size
Total_RAM - Shared_RAM_per_Child MaxClients = --------------------------------------- Max_Process_Size - Shared_RAM_per_Child
Let's roll some calculations:
Total_RAM = 500Mb Max_Process_Size = 10Mb Shared_RAM_per_Child = 4Mb
500 - 4 MaxClients = --------- = 82 10 - 4
With no sharing in place
500 MaxClients = --------- = 50 10
With sharing in place you can have 64% more servers without buying more RAM.
If you improve sharing and keep the sharing level, let's say:
Total_RAM = 500Mb Max_Process_Size = 10Mb Shared_RAM_per_Child = 8Mb
500 - 8 MaxClients = --------- = 246 10 - 8
392% more servers! Now you can feel the importance of having as much shared memory as possible.
MaxRequestsPerChild directive sets the limit on the number of
requests that an individual child server process will handle. After
MaxRequestsPerChild requests, the child process will die. If
MaxRequestsPerChild is 0, then the process will live forever.
MaxRequestsPerChild to a non-zero limit solves some memory
leakage problems caused by sloppy programming practices, whereas a
child process consumes more memory after each request.
If left unbounded, then after a certain number of requests the children will use up all the available memory and leave the server to die from memory starvation. Note that sometimes standard system libraries leak memory too, especially on OSes with bad memory management (e.g. Solaris 2.5 on x86 arch).
If this is your case you can set
MaxRequestsPerChild to a small
number. This will allow the system to reclaim the memory that a
greedy child process consumed, when it exits after
But beware -- if you set this number too low, you will lose some of
the speed bonus you get from mod_perl. Consider using
Apache::PerlRun if this is the case.
Another approach is to use the
Apache::SizeLimit or the
Apache::GTopLimit modules. By using either of these modules you
should be able to discontinue using the
although for some developers, using both in combination does the
job. In addition the latter module allows you to kill any servers
whose shared memory size drops below a specified limit.
The mod_perl site's URL: http://perl.apache.org/