Simple tests

1. For the impatient

polysrv \ --config /usr/local/polygraph/workloads/simple.pg \ --verb_lvl 10 polyclt \ --config /usr/local/polygraph/workloads/simple.pg \ --verb_lvl 10 # watch console output then kill polysrv and polyclt

2. Introduction

Polygraph distribution includes a client and server simulators called polyclt and polysrv. You need to run both programs to simulate the desired workload. The server(s) should be launched first. We will show the command line for polysrv (polyclt) followed by the polysrv (polyclt) output generated in our environment.

For simplicity, we will start both simulators on one machine. To run the tests below, your machine must be allowed to connect to itself via 127.0.0.1 IP address. If you want to start Polygraph on a different port or host, you must adjust configuration files accordingly.

Polygraph workloads are specified using command line options and configuration file written in Polygraph Language (PGL).

The tests and workloads described here are not meant to be used for production benchmarking! They only illustrate basic Polygraph usage.

3. Hello, world

This simple run will test basic Polygraph functionality.

Polygraph distribution already contains a simple workload specification. The specs can be found in simple.pg. Other workload examples can be found in the polygraph/workloads/ directory as well.

Normally, the workload description includes several phases. Polygraph stops when all phases reach their goals. For these simple tests, we will use the default phase that does not have any goal. Hence we would have to terminate polyclt and polysrv manually by pressing ^C or sending an interrupt signal.

Let's start the server.

> polysrv \ --config /usr/local/polygraph/workloads/simple.pg \ --verb_lvl 10 000.00| Content distribution on server S101: content planned% likely% error% some-content 100.00 100.00 0.00 expected average cachability: 80.00% expected average object size: 13331.30Bytes bin/polysrv: warning: no run phases were specified; ... 000.00| fyi: no bench selected with use(); ... 000.01| Command: polysrv --config /usr/local/polygraph/workloads/simple.pg --verb_lvl 10 000.01| Configuration: version: 2.6.0b5 host_type: i386-unknown-freebsd3.4 verb_lvl: 10 dump: <none> dump_size: 1.000KB notify: <none> label: <none> fd_limit: 15878 config: /usr/local/polygraph/workloads/simple.pg cfg_dirs: console: - log: <none> log_size: -1 sample_log: <none> sample_log_size: -1 stats_cycle: 5.00sec sync_phases: on file_scan: poll priority_sched: 5 new_oids_per_msg_max:16 fake_hosts: idle_tout: 5.00min rng_seed: 1 unique_world: on new_oids_history: 2048 ign_urls: off 000.01| Phases: phase load_beg load_end rec_beg rec_end smsg_beg smsg_end goal dflt 1.00 1.00 1.00 1.00 1.00 1.00 <none> 000.01| StatsSamples: static stats samples: 0 dynamic stats samples: 0 000.01| FDs: 16384 out of 16384 FDs can be used; safeguard limit: 15878 000.01| resource usage: CPU Usage: 7msec sys + 1.52sec user = 1.53sec Maximum Resident Size: 7.922MB Page faults with physical i/o: 0 000.01| group-id: 0480bcf4.2e7d019d:00000002 pid: 413 000.01| current time: 987891522.111529 or Sat, 21 Apr 2001 22:18:42 GMT 000.01| fyi: PGL configuration stored (3054bytes) 000.01| fyi: current state (1) stored 000.01| starting 1 HTTP agents... 000.01| starting S101[1 / 0480bcf4.2e7d019d:00000004] on 127.0.0.1:9090 000.10| i-dflt 0 0.00 -1 -1.00 0 1 000.18| i-dflt 0 0.00 -1 -1.00 0 1 000.26| i-dflt 0 0.00 -1 -1.00 0 1 000.35| i-dflt 0 0.00 -1 -1.00 0 1 ...

As you can see, polysrv process created one server agent bound to localhost (127.0.0.1), port 9090. Polysrv complained about no phases found in simple.pg and generated a ``default'' infinite phase. Since there is no client running yet, the run-time statistics are all zeros. For details about console output format look elsewhere.

Polysrv will continue to do nothing until we start the client. You may need to open a new window or virtual terminal to do get a second command line prompt.

> polyclt \ --config /usr/local/polygraph/workloads/simple.pg \ --verb_lvl 10 000.00| Content distribution on server S101: content planned% likely% error% some-content 100.00 100.00 0.00 expected average cachability: 80.00% expected average object size: 13331.30Bytes bin/polyclt: warning: no run phases were specified; ... 000.00| fyi: no bench selected with use(); ... 000.01| Command: bin/polyclt --config workloads/simple.pg --verb_lvl 10 000.01| Configuration: version: 2.6.0b5 host_type: i386-unknown-freebsd3.4 verb_lvl: 10 dump: <none> dump_size: 1.000KB notify: <none> label: <none> fd_limit: 15878 config: /usr/local/polygraph/workloads/simple.pg cfg_dirs: console: - log: <none> log_size: -1 sample_log: <none> sample_log_size: -1 stats_cycle: 5.00sec sync_phases: on file_scan: poll priority_sched: 5 new_oids_per_msg_max:16 fake_hosts: idle_tout: <none> rng_seed: 1 unique_world: on proxy: <none> ports: <none> icp_tout: 2.00sec new_oids_prefetch: 256 ign_false_hits: on ign_bad_cont_tags: off prn_false_misses: off 000.01| Phases: phase load_beg load_end rec_beg rec_end smsg_beg smsg_end goal dflt 1.00 1.00 1.00 1.00 1.00 1.00 <none> 000.01| StatsSamples: static stats samples: 0 dynamic stats samples: 0 000.01| FDs: 16384 out of 16384 FDs can be used; safeguard limit: 15878 000.01| resource usage: CPU Usage: 23msec sys + 1.51sec user = 1.53sec Maximum Resident Size: 8.203MB Page faults with physical i/o: 18 000.01| group-id: 0480bd09.3a2501a0:00000002 pid: 416 000.01| current time: 987891543.872993 or Sat, 21 Apr 2001 22:19:03 GMT 000.01| fyi: PGL configuration stored (3054bytes) 000.01| fyi: current state (1) stored 000.01| starting 1 HTTP agents... 000.01| starting R101[1 / 0480bd09.3a2501a0:00000004] on 127.0.0.1 000.01| fyi: server scan completed with all local robots ready ... 000.10| i-dflt 3296 659.19 1 0.00 0 2 000.18| i-dflt 6585 657.79 1 0.00 0 2 000.26| i-dflt 9891 661.19 1 0.00 0 2 000.35| i-dflt 13109 643.54 1 0.00 0 2 000.43| i-dflt 16380 654.05 1 0.00 0 2 000.51| i-dflt 19667 657.29 1 0.00 0 2 000.60| i-dflt 22972 660.94 1 0.00 0 2 000.68| i-dflt 26228 651.18 1 0.00 0 2 000.76| i-dflt 29500 654.33 1 0.00 0 2 000.85| i-dflt 32775 654.91 1 0.00 0 2 000.93| i-dflt 35992 643.38 1 0.00 0 2 001.01| i-dflt 39223 646.19 1 0.00 0 2 001.10| i-dflt 42479 651.19 1 0.00 0 2 001.18| i-dflt 45687 641.55 1 0.00 0 2 001.26| i-dflt 48954 653.37 1 0.00 0 2 001.35| i-dflt 52189 646.95 1 0.00 0 2 001.43| i-dflt 55418 645.79 1 0.00 0 2 001.51| i-dflt 58691 654.51 1 0.00 0 2 001.60| i-dflt 61955 652.77 1 0.00 0 2 001.68| i-dflt 65304 669.72 1 0.00 0 2 001.76| i-dflt 68593 657.71 1 0.00 0 2 ...

Now we can see some traffic on both client side (above) and server side (below). The console output tells us that Polygraph is doing around 650 requests per second with response times of 1msec, and that there are no hits or errors. Note that we are running a very simple back-to-back workload. Your numbers will differ depending how powerful your OS and hardware are.

We will kill the experiment now by pressing Control+C in client and server windows. Here is the rest of the server output.

... 000.51| i-dflt 5436 656.53 0 0.00 0 2 000.60| i-dflt 8724 657.52 0 0.00 0 2 000.68| i-dflt 11985 652.20 0 0.00 0 2 000.76| i-dflt 15224 647.73 0 0.00 0 2 000.85| i-dflt 18499 654.98 0 0.00 0 2 000.93| i-dflt 21818 663.77 0 0.00 0 2 001.01| i-dflt 25075 651.38 0 0.00 0 2 001.10| i-dflt 28370 658.96 0 0.00 0 2 001.18| i-dflt 31631 652.18 0 0.00 0 1 001.26| i-dflt 34865 646.75 0 0.00 0 2 001.35| i-dflt 38092 645.31 0 0.00 0 2 001.43| i-dflt 41333 648.19 0 0.00 0 2 001.51| i-dflt 44551 643.51 0 0.00 0 2 001.60| i-dflt 47814 652.55 0 0.00 0 2 001.68| i-dflt 51047 646.56 0 0.00 0 2 001.76| i-dflt 54281 646.80 0 0.00 0 2 001.85| i-dflt 57526 648.96 0 0.00 0 2 001.93| i-dflt 60774 649.50 0 0.00 0 2 002.01| i-dflt 64103 665.70 0 0.00 0 1 002.10| i-dflt 67429 665.18 0 0.00 0 2 002.18| i-dflt 70705 655.17 0 0.00 0 2 002.25| SrvConnMgr.cc:77: error: 1/1 (c16) connection closed ... 002.26| i-dflt 73421 543.14 0 0.00 0 1 002.35| i-dflt 73421 0.00 -1 -1.00 0 1 002.43| i-dflt 73421 0.00 -1 -1.00 0 1 ^Cgot shutdown signal (2) 002.44| noticed shutdown signal (2) 002.44| resource usage: CPU Usage: 24.75sec sys + 29.63sec user = 54.37sec Maximum Resident Size: 8.824MB Page faults with physical i/o: 0 002.44| fyi: current state (2) stored 002.44| server 127.0.0.1:9090 is closing listen socket 3 after 73421 xactions 002.44| got 73421 xactions and 0 errors 002.44| shutdown reason: got shutdown signal

And the rest of the client output.

... 001.85| i-dflt 71839 649.16 1 0.00 0 2 ^Cgot shutdown signal (2) 001.89| noticed shutdown signal (2) 001.89| resource usage: CPU Usage: 25.18sec sys + 34.89sec user = 1.00min Maximum Resident Size: 8.879MB Page faults with physical i/o: 18 001.89| fyi: current state (2) stored 001.89| got 73421 xactions and 0 errors 001.89| shutdown reason: got shutdown signal

Now it is a good time for you to look through the simple.pg file and PGL documentation to see what kind of workload we were using during this simple test.

4. Adding a proxy

In the previous test, polyclt was talking directly to polysrv running on port 9090. Now we want to introduce a proxy into the setup.

If your proxy runs in a transparent mode, you will probably need to run polyclt and polysrv on different hosts and move polysrv to port 80 so that Polygraph traffic will get automagically redirected to the proxy. We will not demonstrate transparent setup here.

Our proxy is running on host 10.44.0.100 and listening for HTTP queries on port 9090. We need to tell polyclt process which proxy to connect to using the --proxy command line option.

Since our proxy is not running on the same machine as polysrv, we can no longer use loopback interface and have to move our server agent to an address that a proxy can connect to. The IP address and port number (currently '127.0.0.1:9090') for the server agent are specified in the simple.pg configuration file. Below are the relevant lines.
Server S = { kind = "S101"; contents = [ SimpleContent ]; direct_access = contents; addresses = ['127.0.0.1:9090' ]; // where to create these server agents };

You will need to edit those lines to change 127.0.0.1 address to the IP of your machine. Our machine has 10.44.128.61 address. We recommend that you do not edit simple.pg but create and edit its copy instead. Let's call that copy file my-simple.pg. Here is how the modified part of my-simple.pg looks in our case.
Server S = { kind = "S101"; contents = [ SimpleContent ]; direct_access = contents; addresses = ['10.44.128.61:9090' ]; // new server address };

With recent versions of Polygraph, a similar change of IP address is required for the robot as well. This is because Polygraph robots now always bind to the specified address. If a robot remains bound to 127.0.0.1, it will not be able to receive responses from the proxy without special routes or NAT. Change the addresses field of your robot specification to contain the primary address of the polyclt machine. In our case, that address is 10.44.128.61 because we use the same machine for both client- and server-side processes (which is a bad idea for production tests!).

Finally, we want to log detailed run-time statistics into /tmp/clt.log and /tmp/srv.log files using the --log command line option.

A sample from the server-side output is below.
... 000.01| Command: bin/polysrv --config /tmp/my-simple.pg --verb_lvl 10 --log /tmp/srv.log 000.01| Configuration: config: /tmp/my-simple.pg log: /tmp/srv.log ... 000.01| starting 1 HTTP agents... 000.01| starting S101[1 / 0480cdae.630701cc:00000004] on 10.44.128.61:9090 000.10| i-dflt 0 0.00 -1 -1.00 0 1 000.18| i-dflt 33 6.60 0 0.00 0 1 000.26| i-dflt 58 5.00 0 0.00 0 1 000.35| i-dflt 97 7.80 40 0.00 0 1 000.43| i-dflt 101 0.80 374 0.00 0 1 000.51| i-dflt 123 4.40 57 0.00 0 1 000.60| i-dflt 139 3.20 0 0.00 0 1 000.68| i-dflt 179 8.00 33 0.00 0 2 000.76| i-dflt 182 0.60 442 0.00 0 1 000.85| i-dflt 190 1.60 5 0.00 0 1 000.93| i-dflt 194 0.80 344 0.00 0 1 001.01| i-dflt 207 2.60 114 0.00 0 2 001.10| i-dflt 227 4.00 237 0.00 0 1 001.18| i-dflt 238 2.20 0 0.00 0 1 001.26| i-dflt 259 4.20 0 0.00 0 1 001.35| i-dflt 263 0.80 3 0.00 0 1 001.43| i-dflt 263 0.00 -1 -1.00 0 1 001.51| i-dflt 263 0.00 -1 -1.00 0 1 ^Cgot shutdown signal (2) 001.52| noticed shutdown signal (2) 001.52| resource usage: CPU Usage: 118msec sys + 1.67sec user = 1.79sec Maximum Resident Size: 8.848MB Page faults with physical i/o: 0 001.52| fyi: current state (2) stored 001.52| server 10.44.128.61:9090 is closing listen socket 4 after 263 xactions 001.52| got 263 xactions and 0 errors 001.52| shutdown reason: got shutdown signal

And here is the polyclt output.
... 000.01| Command: bin/polyclt --config /tmp/my-simple.pg --verb_lvl 10 --proxy 10.44.0.100:3128 --log /tmp/clt.log 000.01| Configuration: config: /tmp/my-simple.pg log: /tmp/clt.log ... 000.01| starting 1 HTTP agents... 000.01| starting R101[1 / 0480cdb5.5c1701cd:00000004] on 10.44.128.61 000.01| fyi: server scan completed with all local robots ready ... 000.10| i-dflt 98 19.60 50 58.16 0 2 000.18| i-dflt 163 13.00 69 49.23 0 2 000.26| i-dflt 221 11.60 77 56.90 0 2 000.35| i-dflt 272 10.20 117 60.78 0 2 000.43| i-dflt 288 3.20 281 62.50 0 2 000.51| i-dflt 353 13.00 82 47.69 0 2 000.60| i-dflt 392 7.80 106 46.15 0 2 000.68| i-dflt 419 5.40 221 62.96 0 2 000.76| i-dflt 420 0.20 3111 0.00 0 2 000.85| i-dflt 433 2.60 454 61.54 0 2 000.93| i-dflt 466 6.60 137 48.48 0 2 001.01| i-dflt 519 10.60 121 54.72 0 2 001.10| i-dflt 555 7.20 128 58.33 0 2 001.18| i-dflt 586 6.20 140 67.74 0 2 ^Cgot shutdown signal (2) 001.20| noticed shutdown signal (2) 001.20| resource usage: CPU Usage: 331msec sys + 2.00sec user = 2.33sec Maximum Resident Size: 8.902MB Page faults with physical i/o: 0 001.20| fyi: current state (2) stored 001.20| got 588 xactions and 0 errors 001.20| shutdown reason: got shutdown signal

Let's concentrate on client-side console output. Note that various proxy and network overheads increased transaction response time to more than 100msec, causing request rate drop to less than 10 req/sec (the correlation is due to the best-effort mode of simple robots; production workloads virtually never use best-effort robots). Also, polyclt is now getting some hits (hit ratio is about 55%) The measurements reported on the console are unstable due to relatively low request rate (not enough sample data in a 5-second stats window).

If you are not getting any hits from a proxy while everything else works as expected, it is possible that the proxy under test is picky about object expiration time and other freshness info. Some proxies would not cache an object without certain HTTP header fields. The simple.pg workload does not have ``Object Life Cycle'' model configured, and servers generate no freshness headers. To get hits with picky proxies, you can either use an advanced workload such as PolyMix or modify your workload specs to include Object Life Cycle model.

Here is a simple modification to our workload that makes Squid to cache objects. We added an olcStatic object of type ObjLifeCycle and used that object in SimpleContent configuration. No other changes were made.
ObjLifeCycle olcStatic = { birthday = now + const(-1year); // born a year ago length = const(2year); // two year cycle variance = 0%; // no variance with_lmt = 100%; // all responses have LMT expires = [nmt + const(0sec)]; // everything expires when modified }; // we start with defining content properties for our server to generate Content SimpleContent = { size = exp(13KB); // response sizes distributed exponentially cachable = 80%; // 20% of content is uncachable obj_life_cycle = olcStatic; };

The olcStatic definition was borrowed from the contents.pg file distributed with Polygraph. Instead of copying the definition into your workload, you can simply #include that file like most standard workloads do. The contents.pg file contains other, more sophisticated Object Life Cycle configurations.

4.1 Looking at binary logs

The binary logs created during the last test can be analyzed with the lr (``Log Reader'') and lx (``Log Extractor'') tools included in the Polygraph distribution. For example, let's get response time histogram and mean on the client side (after the experiment is over).
> lx --objects rep.rptm.hist /tmp/clt.log rep.rptm.hist: # bin min max count % acc% 3 2 2 26 4.42 4.42 4 3 3 160 27.21 31.63 5 4 4 89 15.14 46.77 6 5 5 81 13.78 60.54 7 6 6 18 3.06 63.61 8 7 7 21 3.57 67.18 9 8 8 30 5.10 72.28 10 9 9 20 3.40 75.68 11 10 10 25 4.25 79.93 12 11 11 21 3.57 83.50 13 12 12 17 2.89 86.39 14 13 13 10 1.70 88.10 15 14 14 10 1.70 89.80 16 15 15 3 0.51 90.31 17 16 16 5 0.85 91.16 18 17 21 5 0.85 92.01 23 22 58 6 1.02 93.03 72 71 234 6 1.02 94.05 365 364 1267 7 1.19 95.24 830 1268 1351 6 1.02 96.26 854 1364 1475 5 0.85 97.11 882 1476 2807 6 1.02 98.13 1123 2832 2975 6 1.02 99.15 1142 2984 3431 5 0.85 100.00 > lx --objects rep.rptm.mean /tmp/clt.log rep.rptm.mean: 119.35

As you can see, 79.93% of responses had response time less than 11msec, but about 5% of transactions took more than a second, increasing mean response time to 119msec.

You can get most of the aggregate stats collected during the experiment by running lx with no --objects option.
> lx /tmp/clt.log

Finally, you can generate a full-blown report using the binary log and Report Generator tools that come with Polygraph. You will probably want to run a longer test to get better graphs though.

5. Specifying request rate

Simple robots are best-effort robots. A best-effort robot submits the next request right after receiving a response to the previous one. Best-effort robots are useless for most benchmarking tasks because you do not want request rate to be tied to transaction response time. In real traffic, the two are usually orthogonal characteristics.

The following instructions will require modifying the workload file. We strongly recommend that you copy simple.pg to a different file and modify only that copy. Always keep distributed workload files unmodified. It may not matter in this simple case, but it is a pain to spend hours debugging a workload only to find out that you "temporary" modified a file that workload is using but never reversed the changes.

It is simple to tell the robot to emit a realistic Poisson request stream with a given mean rate. All you need to do is to add req_rate setting to the robot configuration. Let's use 1 request per second load (per robot).

We will also increase the number of robots to 10 by cloning robots address 10 times. Here is the new robot configuration.
Robot R = { kind = "R101"; public_interest = 50%; pop_model = { pop_distr = popUnif(); }; recurrence = 55% / SimpleContent.cachable; // adjusted to get 55% DHR req_rate = 1/sec; origins = S.addresses; // where the origin servers are addresses = ['10.44.128.61' ** 10 ]; // use clone operator };

Try using this new robot and see how console output changes. You should see a cumulative request rate of about 10 requests per second. There should be more concurrent connections now because each robot can open several connections (if response time is more than one second), and there are ten robots. The response time may change as well.

If your device under test cannot handle 10 req/sec load, decrease per-robot request rate or decrease the number of robots.

Most production Polygraph workloads use thousands of robots with very low individual request rates (e.g., 0.4/sec) to simulate large end-user populations. However, as the above examples demonstrate, you can create a workload that matches your testing needs. We still recommend starting with standard workloads so that you gain experience using what has been proven to work before experimenting with custom designs.