Home · Search · Print · Help
Simple tests
1. For the impatient
polysrv \
--config /usr/local/polygraph/workloads/simple.pg \
--verb_lvl 10
polyclt \
--config /usr/local/polygraph/workloads/simple.pg \
--verb_lvl 10
# watch console output then kill polysrv and polyclt
2. Introduction
Polygraph distribution includes a client and server simulators called
polyclt and polysrv. You need to run both programs to
simulate the desired workload. The server(s) should be launched first. We will
show the command line for polysrv (polyclt) followed by the
polysrv (polyclt) output generated in our environment.
For simplicity, we will start both simulators on one machine. To run the
tests below, your machine must be allowed to connect to itself via
127.0.0.1 IP address. If you want to start Polygraph on a different
port or host, you must adjust configuration files accordingly.
Polygraph workloads are specified using command line options and
configuration file written in Polygraph Language (PGL).
The tests and workloads described here are not meant to be used for
production benchmarking! They only illustrate basic Polygraph
usage.
3. Hello, world
This simple run will test basic Polygraph functionality.
Polygraph distribution already contains a simple workload specification.
The specs can be found in simple.pg. Other
workload examples can be found in the polygraph/workloads/ directory
as well.
Normally, the workload description includes several phases.
Polygraph stops when all phases reach their goals. For these simple tests, we
will use the default phase that does not have any goal. Hence we would have to
terminate polyclt and polysrv manually by pressing
^C or sending an interrupt signal.
Let's start the server.
> polysrv \
--config /usr/local/polygraph/workloads/simple.pg \
--verb_lvl 10
000.00| Content distribution on server S101:
content planned% likely% error%
some-content 100.00 100.00 0.00
expected average cachability: 80.00%
expected average object size: 13331.30Bytes
bin/polysrv: warning: no run phases were specified; ...
000.00| fyi: no bench selected with use(); ...
000.01| Command: polysrv
--config /usr/local/polygraph/workloads/simple.pg --verb_lvl 10
000.01| Configuration:
version: 2.6.0b5
host_type: i386-unknown-freebsd3.4
verb_lvl: 10
dump: <none>
dump_size: 1.000KB
notify: <none>
label: <none>
fd_limit: 15878
config: /usr/local/polygraph/workloads/simple.pg
cfg_dirs:
console: -
log: <none>
log_size: -1
sample_log: <none>
sample_log_size: -1
stats_cycle: 5.00sec
sync_phases: on
file_scan: poll
priority_sched: 5
new_oids_per_msg_max:16
fake_hosts:
idle_tout: 5.00min
rng_seed: 1
unique_world: on
new_oids_history: 2048
ign_urls: off
000.01| Phases:
phase load_beg load_end rec_beg rec_end smsg_beg smsg_end goal
dflt 1.00 1.00 1.00 1.00 1.00 1.00 <none>
000.01| StatsSamples:
static stats samples: 0
dynamic stats samples: 0
000.01| FDs: 16384 out of 16384 FDs can be used; safeguard limit: 15878
000.01| resource usage:
CPU Usage: 7msec sys + 1.52sec user = 1.53sec
Maximum Resident Size: 7.922MB
Page faults with physical i/o: 0
000.01| group-id: 0480bcf4.2e7d019d:00000002 pid: 413
000.01| current time: 987891522.111529 or Sat, 21 Apr 2001 22:18:42 GMT
000.01| fyi: PGL configuration stored (3054bytes)
000.01| fyi: current state (1) stored
000.01| starting 1 HTTP agents...
000.01| starting S101[1 / 0480bcf4.2e7d019d:00000004]
on 127.0.0.1:9090
000.10| i-dflt 0 0.00 -1 -1.00 0 1
000.18| i-dflt 0 0.00 -1 -1.00 0 1
000.26| i-dflt 0 0.00 -1 -1.00 0 1
000.35| i-dflt 0 0.00 -1 -1.00 0 1
...
As you can see, polysrv process created one server agent bound to
localhost (127.0.0.1), port 9090. Polysrv complained
about no phases found in simple.pg and generated a ``default''
infinite phase. Since there is no client running yet, the run-time statistics
are all zeros. For details about console output format look elsewhere.
Polysrv will continue to do nothing until we start the client.
You may need to open a new window or virtual terminal to do get a second
command line prompt.
> polyclt \
--config /usr/local/polygraph/workloads/simple.pg \
--verb_lvl 10
000.00| Content distribution on server S101:
content planned% likely% error%
some-content 100.00 100.00 0.00
expected average cachability: 80.00%
expected average object size: 13331.30Bytes
bin/polyclt: warning: no run phases were specified; ...
000.00| fyi: no bench selected with use(); ...
000.01| Command: bin/polyclt --config workloads/simple.pg --verb_lvl 10
000.01| Configuration:
version: 2.6.0b5
host_type: i386-unknown-freebsd3.4
verb_lvl: 10
dump: <none>
dump_size: 1.000KB
notify: <none>
label: <none>
fd_limit: 15878
config: /usr/local/polygraph/workloads/simple.pg
cfg_dirs:
console: -
log: <none>
log_size: -1
sample_log: <none>
sample_log_size: -1
stats_cycle: 5.00sec
sync_phases: on
file_scan: poll
priority_sched: 5
new_oids_per_msg_max:16
fake_hosts:
idle_tout: <none>
rng_seed: 1
unique_world: on
proxy: <none>
ports: <none>
icp_tout: 2.00sec
new_oids_prefetch: 256
ign_false_hits: on
ign_bad_cont_tags: off
prn_false_misses: off
000.01| Phases:
phase load_beg load_end rec_beg rec_end smsg_beg smsg_end goal
dflt 1.00 1.00 1.00 1.00 1.00 1.00 <none>
000.01| StatsSamples:
static stats samples: 0
dynamic stats samples: 0
000.01| FDs: 16384 out of 16384 FDs can be used; safeguard limit: 15878
000.01| resource usage:
CPU Usage: 23msec sys + 1.51sec user = 1.53sec
Maximum Resident Size: 8.203MB
Page faults with physical i/o: 18
000.01| group-id: 0480bd09.3a2501a0:00000002 pid: 416
000.01| current time: 987891543.872993 or Sat, 21 Apr 2001 22:19:03 GMT
000.01| fyi: PGL configuration stored (3054bytes)
000.01| fyi: current state (1) stored
000.01| starting 1 HTTP agents...
000.01| starting R101[1 / 0480bd09.3a2501a0:00000004] on 127.0.0.1
000.01| fyi: server scan completed with all local robots ready ...
000.10| i-dflt 3296 659.19 1 0.00 0 2
000.18| i-dflt 6585 657.79 1 0.00 0 2
000.26| i-dflt 9891 661.19 1 0.00 0 2
000.35| i-dflt 13109 643.54 1 0.00 0 2
000.43| i-dflt 16380 654.05 1 0.00 0 2
000.51| i-dflt 19667 657.29 1 0.00 0 2
000.60| i-dflt 22972 660.94 1 0.00 0 2
000.68| i-dflt 26228 651.18 1 0.00 0 2
000.76| i-dflt 29500 654.33 1 0.00 0 2
000.85| i-dflt 32775 654.91 1 0.00 0 2
000.93| i-dflt 35992 643.38 1 0.00 0 2
001.01| i-dflt 39223 646.19 1 0.00 0 2
001.10| i-dflt 42479 651.19 1 0.00 0 2
001.18| i-dflt 45687 641.55 1 0.00 0 2
001.26| i-dflt 48954 653.37 1 0.00 0 2
001.35| i-dflt 52189 646.95 1 0.00 0 2
001.43| i-dflt 55418 645.79 1 0.00 0 2
001.51| i-dflt 58691 654.51 1 0.00 0 2
001.60| i-dflt 61955 652.77 1 0.00 0 2
001.68| i-dflt 65304 669.72 1 0.00 0 2
001.76| i-dflt 68593 657.71 1 0.00 0 2
...
Now we can see some traffic on both client side (above) and server side
(below). The console output tells us that Polygraph is doing around 650
requests per second with response times of 1msec, and that there are no
hits or errors. Note that we are running a very simple back-to-back workload.
Your numbers will differ depending how powerful your OS and hardware are.
We will kill the experiment now by pressing Control+C in client
and server windows. Here is the rest of the server output.
...
000.51| i-dflt 5436 656.53 0 0.00 0 2
000.60| i-dflt 8724 657.52 0 0.00 0 2
000.68| i-dflt 11985 652.20 0 0.00 0 2
000.76| i-dflt 15224 647.73 0 0.00 0 2
000.85| i-dflt 18499 654.98 0 0.00 0 2
000.93| i-dflt 21818 663.77 0 0.00 0 2
001.01| i-dflt 25075 651.38 0 0.00 0 2
001.10| i-dflt 28370 658.96 0 0.00 0 2
001.18| i-dflt 31631 652.18 0 0.00 0 1
001.26| i-dflt 34865 646.75 0 0.00 0 2
001.35| i-dflt 38092 645.31 0 0.00 0 2
001.43| i-dflt 41333 648.19 0 0.00 0 2
001.51| i-dflt 44551 643.51 0 0.00 0 2
001.60| i-dflt 47814 652.55 0 0.00 0 2
001.68| i-dflt 51047 646.56 0 0.00 0 2
001.76| i-dflt 54281 646.80 0 0.00 0 2
001.85| i-dflt 57526 648.96 0 0.00 0 2
001.93| i-dflt 60774 649.50 0 0.00 0 2
002.01| i-dflt 64103 665.70 0 0.00 0 1
002.10| i-dflt 67429 665.18 0 0.00 0 2
002.18| i-dflt 70705 655.17 0 0.00 0 2
002.25| SrvConnMgr.cc:77: error: 1/1 (c16) connection closed ...
002.26| i-dflt 73421 543.14 0 0.00 0 1
002.35| i-dflt 73421 0.00 -1 -1.00 0 1
002.43| i-dflt 73421 0.00 -1 -1.00 0 1
^Cgot shutdown signal (2)
002.44| noticed shutdown signal (2)
002.44| resource usage:
CPU Usage: 24.75sec sys + 29.63sec user = 54.37sec
Maximum Resident Size: 8.824MB
Page faults with physical i/o: 0
002.44| fyi: current state (2) stored
002.44| server 127.0.0.1:9090 is closing listen socket 3
after 73421 xactions
002.44| got 73421 xactions and 0 errors
002.44| shutdown reason: got shutdown signal
And the rest of the client output.
...
001.85| i-dflt 71839 649.16 1 0.00 0 2
^Cgot shutdown signal (2)
001.89| noticed shutdown signal (2)
001.89| resource usage:
CPU Usage: 25.18sec sys + 34.89sec user = 1.00min
Maximum Resident Size: 8.879MB
Page faults with physical i/o: 18
001.89| fyi: current state (2) stored
001.89| got 73421 xactions and 0 errors
001.89| shutdown reason: got shutdown signal
Now it is a good time for you to look through the simple.pg file
and PGL documentation to see what kind of workload we were using during this
simple test.
4. Adding a proxy
In the previous test, polyclt was talking directly to
polysrv running on port 9090. Now we want to introduce a proxy
into the setup.
If your proxy runs in a transparent mode, you will probably need to run
polyclt and polysrv on different hosts and move
polysrv to port 80 so that Polygraph traffic will get
automagically redirected to the proxy. We will not demonstrate transparent
setup here.
Our proxy is running on host 10.44.0.100 and listening for HTTP
queries on port 9090. We need to tell polyclt process which
proxy to connect to using the --proxy command line option.
Since our proxy is not running on the same machine as polysrv, we
can no longer use loopback interface and have to move our server agent to an
address that a proxy can connect to. The IP address and port number (currently
'127.0.0.1:9090') for the server agent are specified in the simple.pg
configuration file. Below are the relevant lines.
Server S = {
kind = "S101";
contents = [ SimpleContent ];
direct_access = contents;
addresses = ['127.0.0.1:9090' ]; // where to create these server agents
};
You will need to edit those lines to change 127.0.0.1 address to the IP of
your machine. Our machine has 10.44.128.61 address. We recommend that you do
not edit simple.pg but create and edit its copy instead. Let's call
that copy file my-simple.pg. Here is how the modified part of
my-simple.pg looks in our case.
Server S = {
kind = "S101";
contents = [ SimpleContent ];
direct_access = contents;
addresses = ['10.44.128.61:9090' ]; // new server address
};
With recent versions of Polygraph, a similar change of IP address is
required for the robot as well. This is because Polygraph robots now always
bind to the specified address. If a robot remains bound to 127.0.0.1, it will
not be able to receive responses from the proxy without special routes or NAT.
Change the addresses field of your robot specification to contain the
primary address of the polyclt machine. In our case, that address is
10.44.128.61 because we use the same machine for both client- and server-side
processes (which is a bad idea for production tests!).
Finally, we want to log detailed run-time statistics into
/tmp/clt.log and /tmp/srv.log files using the --log
command line option.
A sample from the server-side output is below.
...
000.01| Command: bin/polysrv
--config /tmp/my-simple.pg --verb_lvl 10
--log /tmp/srv.log
000.01| Configuration:
config: /tmp/my-simple.pg
log: /tmp/srv.log
...
000.01| starting 1 HTTP agents...
000.01| starting S101[1 / 0480cdae.630701cc:00000004]
on 10.44.128.61:9090
000.10| i-dflt 0 0.00 -1 -1.00 0 1
000.18| i-dflt 33 6.60 0 0.00 0 1
000.26| i-dflt 58 5.00 0 0.00 0 1
000.35| i-dflt 97 7.80 40 0.00 0 1
000.43| i-dflt 101 0.80 374 0.00 0 1
000.51| i-dflt 123 4.40 57 0.00 0 1
000.60| i-dflt 139 3.20 0 0.00 0 1
000.68| i-dflt 179 8.00 33 0.00 0 2
000.76| i-dflt 182 0.60 442 0.00 0 1
000.85| i-dflt 190 1.60 5 0.00 0 1
000.93| i-dflt 194 0.80 344 0.00 0 1
001.01| i-dflt 207 2.60 114 0.00 0 2
001.10| i-dflt 227 4.00 237 0.00 0 1
001.18| i-dflt 238 2.20 0 0.00 0 1
001.26| i-dflt 259 4.20 0 0.00 0 1
001.35| i-dflt 263 0.80 3 0.00 0 1
001.43| i-dflt 263 0.00 -1 -1.00 0 1
001.51| i-dflt 263 0.00 -1 -1.00 0 1
^Cgot shutdown signal (2)
001.52| noticed shutdown signal (2)
001.52| resource usage:
CPU Usage: 118msec sys + 1.67sec user = 1.79sec
Maximum Resident Size: 8.848MB
Page faults with physical i/o: 0
001.52| fyi: current state (2) stored
001.52| server 10.44.128.61:9090 is closing listen socket 4
after 263 xactions
001.52| got 263 xactions and 0 errors
001.52| shutdown reason: got shutdown signal
And here is the polyclt output.
...
000.01| Command: bin/polyclt
--config /tmp/my-simple.pg --verb_lvl 10
--proxy 10.44.0.100:3128 --log /tmp/clt.log
000.01| Configuration:
config: /tmp/my-simple.pg
log: /tmp/clt.log
...
000.01| starting 1 HTTP agents...
000.01| starting R101[1 / 0480cdb5.5c1701cd:00000004]
on 10.44.128.61
000.01| fyi: server scan completed with all local robots ready ...
000.10| i-dflt 98 19.60 50 58.16 0 2
000.18| i-dflt 163 13.00 69 49.23 0 2
000.26| i-dflt 221 11.60 77 56.90 0 2
000.35| i-dflt 272 10.20 117 60.78 0 2
000.43| i-dflt 288 3.20 281 62.50 0 2
000.51| i-dflt 353 13.00 82 47.69 0 2
000.60| i-dflt 392 7.80 106 46.15 0 2
000.68| i-dflt 419 5.40 221 62.96 0 2
000.76| i-dflt 420 0.20 3111 0.00 0 2
000.85| i-dflt 433 2.60 454 61.54 0 2
000.93| i-dflt 466 6.60 137 48.48 0 2
001.01| i-dflt 519 10.60 121 54.72 0 2
001.10| i-dflt 555 7.20 128 58.33 0 2
001.18| i-dflt 586 6.20 140 67.74 0 2
^Cgot shutdown signal (2)
001.20| noticed shutdown signal (2)
001.20| resource usage:
CPU Usage: 331msec sys + 2.00sec user = 2.33sec
Maximum Resident Size: 8.902MB
Page faults with physical i/o: 0
001.20| fyi: current state (2) stored
001.20| got 588 xactions and 0 errors
001.20| shutdown reason: got shutdown signal
Let's concentrate on client-side console output. Note that various proxy
and network overheads increased transaction response time to more than
100msec, causing request rate drop to less than 10 req/sec (the correlation is
due to the best-effort mode of simple robots; production workloads virtually
never use best-effort robots). Also, polyclt is now getting some hits
(hit ratio is about 55%) The measurements reported on the console are unstable
due to relatively low request rate (not enough sample data in a 5-second stats
window).
If you are not getting any hits from a proxy while everything else works as
expected, it is possible that the proxy under test is picky about object
expiration time and other freshness info. Some proxies would not cache an
object without certain HTTP header fields. The simple.pg workload
does not have ``Object Life Cycle'' model configured, and servers generate no
freshness headers. To get hits with picky proxies, you can either use an
advanced workload such as PolyMix or modify your workload specs to
include Object Life Cycle model.
Here is a simple modification to our workload that makes Squid to cache
objects. We added an olcStatic object of type ObjLifeCycle
and used that object in SimpleContent configuration. No other
changes were made.
ObjLifeCycle olcStatic = {
birthday = now + const(-1year); // born a year ago
length = const(2year); // two year cycle
variance = 0%; // no variance
with_lmt = 100%; // all responses have LMT
expires = [nmt + const(0sec)]; // everything expires when modified
};
// we start with defining content properties for our server to generate
Content SimpleContent = {
size = exp(13KB); // response sizes distributed exponentially
cachable = 80%; // 20% of content is uncachable
obj_life_cycle = olcStatic;
};
The olcStatic definition was borrowed from the
contents.pg file distributed with Polygraph. Instead of copying the
definition into your workload, you can simply #include that file like
most standard workloads do. The contents.pg file contains other, more
sophisticated Object Life Cycle configurations.
4.1 Looking at binary logs
The binary logs created during the last test can be analyzed with the
lr (``Log Reader'') and lx (``Log Extractor'') tools
included in the Polygraph distribution. For example, let's get response time
histogram and mean on the client side (after the experiment is over).
> lx --objects rep.rptm.hist /tmp/clt.log
rep.rptm.hist:
# bin min max count % acc%
3 2 2 26 4.42 4.42
4 3 3 160 27.21 31.63
5 4 4 89 15.14 46.77
6 5 5 81 13.78 60.54
7 6 6 18 3.06 63.61
8 7 7 21 3.57 67.18
9 8 8 30 5.10 72.28
10 9 9 20 3.40 75.68
11 10 10 25 4.25 79.93
12 11 11 21 3.57 83.50
13 12 12 17 2.89 86.39
14 13 13 10 1.70 88.10
15 14 14 10 1.70 89.80
16 15 15 3 0.51 90.31
17 16 16 5 0.85 91.16
18 17 21 5 0.85 92.01
23 22 58 6 1.02 93.03
72 71 234 6 1.02 94.05
365 364 1267 7 1.19 95.24
830 1268 1351 6 1.02 96.26
854 1364 1475 5 0.85 97.11
882 1476 2807 6 1.02 98.13
1123 2832 2975 6 1.02 99.15
1142 2984 3431 5 0.85 100.00
> lx --objects rep.rptm.mean /tmp/clt.log
rep.rptm.mean: 119.35
As you can see, 79.93% of responses had response time less than 11msec, but
about 5% of transactions took more than a second, increasing mean response
time to 119msec.
You can get most of the aggregate stats collected during the experiment by
running lx with no --objects option.
> lx /tmp/clt.log
Finally, you can generate a full-blown report using the binary log and
Report Generator tools that come with Polygraph. You will probably want to run
a longer test to get better graphs though.
5. Specifying request rate
Simple robots are best-effort robots. A best-effort robot submits
the next request right after receiving a response to the previous one.
Best-effort robots are useless for most benchmarking tasks because you do not
want request rate to be tied to transaction response time. In real traffic,
the two are usually orthogonal characteristics.
The following instructions will require modifying the workload file. We
strongly recommend that you copy simple.pg to a different file and
modify only that copy. Always keep distributed workload files unmodified. It
may not matter in this simple case, but it is a pain to spend hours debugging
a workload only to find out that you "temporary" modified a file that workload
is using but never reversed the changes.
It is simple to tell the robot to emit a realistic Poisson request stream
with a given mean rate. All you need to do is to add req_rate setting
to the robot configuration. Let's use 1 request per second load (per
robot).
We will also increase the number of robots to 10 by cloning robots address
10 times. Here is the new robot configuration.
Robot R = {
kind = "R101";
public_interest = 50%;
pop_model = { pop_distr = popUnif(); };
recurrence = 55% / SimpleContent.cachable; // adjusted to get 55% DHR
req_rate = 1/sec;
origins = S.addresses; // where the origin servers are
addresses = ['10.44.128.61' ** 10 ]; // use clone operator
};
Try using this new robot and see how console output changes. You should see
a cumulative request rate of about 10 requests per second. There should be
more concurrent connections now because each robot can open several
connections (if response time is more than one second), and there are ten
robots. The response time may change as well.
If your device under test cannot handle 10 req/sec load, decrease per-robot
request rate or decrease the number of robots.
Most production Polygraph workloads use thousands of robots with very low
individual request rates (e.g., 0.4/sec) to simulate large end-user
populations. However, as the above examples demonstrate, you can create a
workload that matches your testing needs. We still recommend starting with
standard workloads so that you gain experience using what has been proven to
work before experimenting with custom designs.
Home · Search · Print · Help
|