Simple tests

1. For the impatient

polysrv \
    --config /usr/local/polygraph/workloads/simple.pg \
    --verb_lvl 10
polyclt \
    --config /usr/local/polygraph/workloads/simple.pg \
    --verb_lvl 10
# watch console output then kill polysrv and polyclt

2. Introduction

Polygraph distribution includes a client and server simulators called polyclt and polysrv. You need to run both programs to simulate the desired workload. The server(s) should be launched first. We will show the command line for polysrv (polyclt) followed by the polysrv (polyclt) output generated in our environment.

For simplicity, we will start both simulators on one machine. To run the tests below, your machine must be allowed to connect to itself via 127.0.0.1 IP address. If you want to start Polygraph on a different port or host, you must adjust configuration files accordingly.

Polygraph workloads are specified using command line options and configuration file written in Polygraph Language (PGL).

The tests and workloads described here are not meant to be used for production benchmarking! They only illustrate basic Polygraph usage.

3. Hello, world

This simple run will test basic Polygraph functionality.

Polygraph distribution already contains a simple workload specification. The specs can be found in simple.pg. Other workload examples can be found in the polygraph/workloads/ directory as well.

Normally, the workload description includes several phases. Polygraph stops when all phases reach their goals. For these simple tests, we will use the default phase that does not have any goal. Hence we would have to terminate polyclt and polysrv manually by pressing ^C or sending an interrupt signal.

Let's start the server.

> polysrv \
    --config /usr/local/polygraph/workloads/simple.pg \
    --verb_lvl 10
000.00| Content distribution on server S101:
        content        planned%         likely%          error%
   some-content          100.00          100.00            0.00
expected average cachability: 80.00%
expected average object size: 13331.30Bytes

bin/polysrv: warning: no run phases were specified; ...
000.00| fyi: no bench selected with use(); ...
000.01| Command: polysrv 
    --config /usr/local/polygraph/workloads/simple.pg --verb_lvl 10
000.01| Configuration:
    version:            2.6.0b5
    host_type:          i386-unknown-freebsd3.4
    verb_lvl:           10
    dump:               <none>
    dump_size:          1.000KB
    notify:             <none>
    label:              <none>
    fd_limit:           15878
    config:             /usr/local/polygraph/workloads/simple.pg
    cfg_dirs:           
    console:            -
    log:                <none>
    log_size:           -1
    sample_log:         <none>
    sample_log_size:    -1
    stats_cycle:        5.00sec
    sync_phases:        on
    file_scan:          poll
    priority_sched:     5
    new_oids_per_msg_max:16
    fake_hosts:         
    idle_tout:          5.00min
    rng_seed:           1
    unique_world:       on
    new_oids_history:   2048
    ign_urls:           off
000.01| Phases:
     phase load_beg load_end  rec_beg  rec_end smsg_beg smsg_end     goal
      dflt     1.00     1.00     1.00     1.00     1.00     1.00 <none>    

000.01| StatsSamples:
    static stats samples:   0
    dynamic stats samples:  0

000.01| FDs: 16384 out of 16384 FDs can be used; safeguard limit: 15878
000.01| resource usage: 
    CPU Usage: 7msec sys + 1.52sec user = 1.53sec
    Maximum Resident Size: 7.922MB
    Page faults with physical i/o: 0

000.01| group-id: 0480bcf4.2e7d019d:00000002 pid: 413
000.01| current time: 987891522.111529 or Sat, 21 Apr 2001 22:18:42 GMT
000.01| fyi: PGL configuration stored (3054bytes)
000.01| fyi: current state (1) stored
000.01| starting 1 HTTP agents...
000.01| starting S101[1 / 0480bcf4.2e7d019d:00000004] 
    on 127.0.0.1:9090
000.10| i-dflt      0   0.00     -1  -1.00   0    1
000.18| i-dflt      0   0.00     -1  -1.00   0    1
000.26| i-dflt      0   0.00     -1  -1.00   0    1
000.35| i-dflt      0   0.00     -1  -1.00   0    1
...

As you can see, polysrv process created one server agent bound to localhost (127.0.0.1), port 9090. Polysrv complained about no phases found in simple.pg and generated a ``default'' infinite phase. Since there is no client running yet, the run-time statistics are all zeros. For details about console output format look elsewhere.

Polysrv will continue to do nothing until we start the client. You may need to open a new window or virtual terminal to do get a second command line prompt.

> polyclt \
    --config /usr/local/polygraph/workloads/simple.pg \
    --verb_lvl 10
000.00| Content distribution on server S101:
        content        planned%         likely%          error%
   some-content          100.00          100.00            0.00
expected average cachability: 80.00%
expected average object size: 13331.30Bytes

bin/polyclt: warning: no run phases were specified; ...
000.00| fyi: no bench selected with use(); ...
000.01| Command: bin/polyclt --config workloads/simple.pg --verb_lvl 10
000.01| Configuration:
    version:            2.6.0b5
    host_type:          i386-unknown-freebsd3.4
    verb_lvl:           10
    dump:               <none>
    dump_size:          1.000KB
    notify:             <none>
    label:              <none>
    fd_limit:           15878
    config:             /usr/local/polygraph/workloads/simple.pg
    cfg_dirs:           
    console:            -
    log:                <none>
    log_size:           -1
    sample_log:         <none>
    sample_log_size:    -1
    stats_cycle:        5.00sec
    sync_phases:        on
    file_scan:          poll
    priority_sched:     5
    new_oids_per_msg_max:16
    fake_hosts:         
    idle_tout:          <none>
    rng_seed:           1
    unique_world:       on
    proxy:              <none>
    ports:              <none>
    icp_tout:           2.00sec
    new_oids_prefetch:  256
    ign_false_hits:     on
    ign_bad_cont_tags:  off
    prn_false_misses:   off
000.01| Phases:
     phase load_beg load_end  rec_beg  rec_end smsg_beg smsg_end     goal
      dflt     1.00     1.00     1.00     1.00     1.00     1.00 <none>    

000.01| StatsSamples:
    static stats samples:   0
    dynamic stats samples:  0

000.01| FDs: 16384 out of 16384 FDs can be used; safeguard limit: 15878
000.01| resource usage: 
    CPU Usage: 23msec sys + 1.51sec user = 1.53sec
    Maximum Resident Size: 8.203MB
    Page faults with physical i/o: 18

000.01| group-id: 0480bd09.3a2501a0:00000002 pid: 416
000.01| current time: 987891543.872993 or Sat, 21 Apr 2001 22:19:03 GMT
000.01| fyi: PGL configuration stored (3054bytes)
000.01| fyi: current state (1) stored
000.01| starting 1 HTTP agents...
000.01| starting R101[1 / 0480bd09.3a2501a0:00000004] on 127.0.0.1
000.01| fyi: server scan completed with all local robots ready ...
000.10| i-dflt   3296 659.19      1   0.00   0    2
000.18| i-dflt   6585 657.79      1   0.00   0    2
000.26| i-dflt   9891 661.19      1   0.00   0    2
000.35| i-dflt  13109 643.54      1   0.00   0    2
000.43| i-dflt  16380 654.05      1   0.00   0    2
000.51| i-dflt  19667 657.29      1   0.00   0    2
000.60| i-dflt  22972 660.94      1   0.00   0    2
000.68| i-dflt  26228 651.18      1   0.00   0    2
000.76| i-dflt  29500 654.33      1   0.00   0    2
000.85| i-dflt  32775 654.91      1   0.00   0    2
000.93| i-dflt  35992 643.38      1   0.00   0    2
001.01| i-dflt  39223 646.19      1   0.00   0    2
001.10| i-dflt  42479 651.19      1   0.00   0    2
001.18| i-dflt  45687 641.55      1   0.00   0    2
001.26| i-dflt  48954 653.37      1   0.00   0    2
001.35| i-dflt  52189 646.95      1   0.00   0    2
001.43| i-dflt  55418 645.79      1   0.00   0    2
001.51| i-dflt  58691 654.51      1   0.00   0    2
001.60| i-dflt  61955 652.77      1   0.00   0    2
001.68| i-dflt  65304 669.72      1   0.00   0    2
001.76| i-dflt  68593 657.71      1   0.00   0    2
...

Now we can see some traffic on both client side (above) and server side (below). The console output tells us that Polygraph is doing around 650 requests per second with response times of 1msec, and that there are no hits or errors. Note that we are running a very simple back-to-back workload. Your numbers will differ depending how powerful your OS and hardware are.

We will kill the experiment now by pressing Control+C in client and server windows. Here is the rest of the server output.

...
000.51| i-dflt   5436 656.53      0   0.00   0    2
000.60| i-dflt   8724 657.52      0   0.00   0    2
000.68| i-dflt  11985 652.20      0   0.00   0    2
000.76| i-dflt  15224 647.73      0   0.00   0    2
000.85| i-dflt  18499 654.98      0   0.00   0    2
000.93| i-dflt  21818 663.77      0   0.00   0    2
001.01| i-dflt  25075 651.38      0   0.00   0    2
001.10| i-dflt  28370 658.96      0   0.00   0    2
001.18| i-dflt  31631 652.18      0   0.00   0    1
001.26| i-dflt  34865 646.75      0   0.00   0    2
001.35| i-dflt  38092 645.31      0   0.00   0    2
001.43| i-dflt  41333 648.19      0   0.00   0    2
001.51| i-dflt  44551 643.51      0   0.00   0    2
001.60| i-dflt  47814 652.55      0   0.00   0    2
001.68| i-dflt  51047 646.56      0   0.00   0    2
001.76| i-dflt  54281 646.80      0   0.00   0    2
001.85| i-dflt  57526 648.96      0   0.00   0    2
001.93| i-dflt  60774 649.50      0   0.00   0    2
002.01| i-dflt  64103 665.70      0   0.00   0    1
002.10| i-dflt  67429 665.18      0   0.00   0    2
002.18| i-dflt  70705 655.17      0   0.00   0    2
002.25| SrvConnMgr.cc:77: error: 1/1 (c16) connection closed ...
002.26| i-dflt  73421 543.14      0   0.00   0    1
002.35| i-dflt  73421   0.00     -1  -1.00   0    1
002.43| i-dflt  73421   0.00     -1  -1.00   0    1
^Cgot shutdown signal (2)
002.44| noticed shutdown signal (2)
002.44| resource usage: 
    CPU Usage: 24.75sec sys + 29.63sec user = 54.37sec
    Maximum Resident Size: 8.824MB
    Page faults with physical i/o: 0

002.44| fyi: current state (2) stored
002.44| server 127.0.0.1:9090 is closing listen socket 3 
    after 73421 xactions
002.44| got 73421 xactions and 0 errors
002.44| shutdown reason: got shutdown signal

And the rest of the client output.

...
001.85| i-dflt  71839 649.16      1   0.00   0    2
^Cgot shutdown signal (2)
001.89| noticed shutdown signal (2)
001.89| resource usage: 
    CPU Usage: 25.18sec sys + 34.89sec user = 1.00min
    Maximum Resident Size: 8.879MB
    Page faults with physical i/o: 18

001.89| fyi: current state (2) stored
001.89| got 73421 xactions and 0 errors
001.89| shutdown reason: got shutdown signal

Now it is a good time for you to look through the simple.pg file and PGL documentation to see what kind of workload we were using during this simple test.

4. Adding a proxy

In the previous test, polyclt was talking directly to polysrv running on port 9090. Now we want to introduce a proxy into the setup.

If your proxy runs in a transparent mode, you will probably need to run polyclt and polysrv on different hosts and move polysrv to port 80 so that Polygraph traffic will get automagically redirected to the proxy. We will not demonstrate transparent setup here.

Our proxy is running on host 10.44.0.100 and listening for HTTP queries on port 9090. We need to tell polyclt process which proxy to connect to using the --proxy command line option.

Since our proxy is not running on the same machine as polysrv, we can no longer use loopback interface and have to move our server agent to an address that a proxy can connect to. The IP address and port number (currently '127.0.0.1:9090') for the server agent are specified in the simple.pg configuration file. Below are the relevant lines.

Server S = {
    kind = "S101"; 
    contents = [ SimpleContent ];
    direct_access = contents;

    addresses = ['127.0.0.1:9090' ]; // where to create these server agents
};

You will need to edit those lines to change 127.0.0.1 address to the IP of your machine. Our machine has 10.44.128.61 address. We recommend that you do not edit simple.pg but create and edit its copy instead. Let's call that copy file my-simple.pg. Here is how the modified part of my-simple.pg looks in our case.

Server S = {
    kind = "S101";
    contents = [ SimpleContent ];
    direct_access = contents;

    addresses = ['10.44.128.61:9090' ]; // new server address
};

With recent versions of Polygraph, a similar change of IP address is required for the robot as well. This is because Polygraph robots now always bind to the specified address. If a robot remains bound to 127.0.0.1, it will not be able to receive responses from the proxy without special routes or NAT. Change the addresses field of your robot specification to contain the primary address of the polyclt machine. In our case, that address is 10.44.128.61 because we use the same machine for both client- and server-side processes (which is a bad idea for production tests!).

Finally, we want to log detailed run-time statistics into /tmp/clt.log and /tmp/srv.log files using the --log command line option.

A sample from the server-side output is below.

...
000.01| Command: bin/polysrv 
    --config /tmp/my-simple.pg --verb_lvl 10 
    --log /tmp/srv.log
000.01| Configuration:
    config:             /tmp/my-simple.pg
    log:                /tmp/srv.log
    ...
000.01| starting 1 HTTP agents...
000.01| starting S101[1 / 0480cdae.630701cc:00000004] 
    on 10.44.128.61:9090
000.10| i-dflt      0   0.00     -1  -1.00   0    1
000.18| i-dflt     33   6.60      0   0.00   0    1
000.26| i-dflt     58   5.00      0   0.00   0    1
000.35| i-dflt     97   7.80     40   0.00   0    1
000.43| i-dflt    101   0.80    374   0.00   0    1
000.51| i-dflt    123   4.40     57   0.00   0    1
000.60| i-dflt    139   3.20      0   0.00   0    1
000.68| i-dflt    179   8.00     33   0.00   0    2
000.76| i-dflt    182   0.60    442   0.00   0    1
000.85| i-dflt    190   1.60      5   0.00   0    1
000.93| i-dflt    194   0.80    344   0.00   0    1
001.01| i-dflt    207   2.60    114   0.00   0    2
001.10| i-dflt    227   4.00    237   0.00   0    1
001.18| i-dflt    238   2.20      0   0.00   0    1
001.26| i-dflt    259   4.20      0   0.00   0    1
001.35| i-dflt    263   0.80      3   0.00   0    1
001.43| i-dflt    263   0.00     -1  -1.00   0    1
001.51| i-dflt    263   0.00     -1  -1.00   0    1
^Cgot shutdown signal (2)
001.52| noticed shutdown signal (2)
001.52| resource usage: 
    CPU Usage: 118msec sys + 1.67sec user = 1.79sec
    Maximum Resident Size: 8.848MB
    Page faults with physical i/o: 0

001.52| fyi: current state (2) stored
001.52| server 10.44.128.61:9090 is closing listen socket 4 
    after 263 xactions
001.52| got 263 xactions and 0 errors
001.52| shutdown reason: got shutdown signal

And here is the polyclt output.

...
000.01| Command: bin/polyclt 
    --config /tmp/my-simple.pg --verb_lvl 10 
    --proxy 10.44.0.100:3128 --log /tmp/clt.log
000.01| Configuration:
    config:             /tmp/my-simple.pg
    log:                /tmp/clt.log
    ...
000.01| starting 1 HTTP agents...
000.01| starting R101[1 / 0480cdb5.5c1701cd:00000004] 
    on 10.44.128.61
000.01| fyi: server scan completed with all local robots ready ...
000.10| i-dflt     98  19.60     50  58.16   0    2
000.18| i-dflt    163  13.00     69  49.23   0    2
000.26| i-dflt    221  11.60     77  56.90   0    2
000.35| i-dflt    272  10.20    117  60.78   0    2
000.43| i-dflt    288   3.20    281  62.50   0    2
000.51| i-dflt    353  13.00     82  47.69   0    2
000.60| i-dflt    392   7.80    106  46.15   0    2
000.68| i-dflt    419   5.40    221  62.96   0    2
000.76| i-dflt    420   0.20   3111   0.00   0    2
000.85| i-dflt    433   2.60    454  61.54   0    2
000.93| i-dflt    466   6.60    137  48.48   0    2
001.01| i-dflt    519  10.60    121  54.72   0    2
001.10| i-dflt    555   7.20    128  58.33   0    2
001.18| i-dflt    586   6.20    140  67.74   0    2
^Cgot shutdown signal (2)
001.20| noticed shutdown signal (2)
001.20| resource usage: 
    CPU Usage: 331msec sys + 2.00sec user = 2.33sec
    Maximum Resident Size: 8.902MB
    Page faults with physical i/o: 0

001.20| fyi: current state (2) stored
001.20| got 588 xactions and 0 errors
001.20| shutdown reason: got shutdown signal

Let's concentrate on client-side console output. Note that various proxy and network overheads increased transaction response time to more than 100msec, causing request rate drop to less than 10 req/sec (the correlation is due to the best-effort mode of simple robots; production workloads virtually never use best-effort robots). Also, polyclt is now getting some hits (hit ratio is about 55%) The measurements reported on the console are unstable due to relatively low request rate (not enough sample data in a 5-second stats window).

If you are not getting any hits from a proxy while everything else works as expected, it is possible that the proxy under test is picky about object expiration time and other freshness info. Some proxies would not cache an object without certain HTTP header fields. The simple.pg workload does not have ``Object Life Cycle'' model configured, and servers generate no freshness headers. To get hits with picky proxies, you can either use an advanced workload such as PolyMix or modify your workload specs to include Object Life Cycle model.

Here is a simple modification to our workload that makes Squid to cache objects. We added an olcStatic object of type ObjLifeCycle and used that object in SimpleContent configuration. No other changes were made.

ObjLifeCycle olcStatic = {
    birthday = now + const(-1year); // born a year ago
    length = const(2year);            // two year cycle
    variance = 0%;                  // no variance
    with_lmt = 100%;                // all responses have LMT
    expires = [nmt + const(0sec)];  // everything expires when modified
};

// we start with defining content properties for our server to generate
Content SimpleContent = {
    size = exp(13KB); // response sizes distributed exponentially
    cachable = 80%;   // 20% of content is uncachable
    obj_life_cycle = olcStatic;
};

The olcStatic definition was borrowed from the contents.pg file distributed with Polygraph. Instead of copying the definition into your workload, you can simply #include that file like most standard workloads do. The contents.pg file contains other, more sophisticated Object Life Cycle configurations.

4.1 Looking at binary logs

The binary logs created during the last test can be analyzed with the lr (``Log Reader'') and lx (``Log Extractor'') tools included in the Polygraph distribution. For example, let's get response time histogram and mean on the client side (after the experiment is over).

> lx --objects rep.rptm.hist /tmp/clt.log
rep.rptm.hist:
# bin   min   max   count     %   acc% 
    3     2     2      26  4.42   4.42
    4     3     3     160 27.21  31.63
    5     4     4      89 15.14  46.77
    6     5     5      81 13.78  60.54
    7     6     6      18  3.06  63.61
    8     7     7      21  3.57  67.18
    9     8     8      30  5.10  72.28
   10     9     9      20  3.40  75.68
   11    10    10      25  4.25  79.93
   12    11    11      21  3.57  83.50
   13    12    12      17  2.89  86.39
   14    13    13      10  1.70  88.10
   15    14    14      10  1.70  89.80
   16    15    15       3  0.51  90.31
   17    16    16       5  0.85  91.16
   18    17    21       5  0.85  92.01
   23    22    58       6  1.02  93.03
   72    71   234       6  1.02  94.05
  365   364  1267       7  1.19  95.24
  830  1268  1351       6  1.02  96.26
  854  1364  1475       5  0.85  97.11
  882  1476  2807       6  1.02  98.13
 1123  2832  2975       6  1.02  99.15
 1142  2984  3431       5  0.85 100.00
> lx --objects rep.rptm.mean /tmp/clt.log
rep.rptm.mean:           119.35

As you can see, 79.93% of responses had response time less than 11msec, but about 5% of transactions took more than a second, increasing mean response time to 119msec.

You can get most of the aggregate stats collected during the experiment by running lx with no --objects option.

> lx /tmp/clt.log

Finally, you can generate a full-blown report using the binary log and Report Generator tools that come with Polygraph. You will probably want to run a longer test to get better graphs though.

5. Specifying request rate

Simple robots are best-effort robots. A best-effort robot submits the next request right after receiving a response to the previous one. Best-effort robots are useless for most benchmarking tasks because you do not want request rate to be tied to transaction response time. In real traffic, the two are usually orthogonal characteristics.

The following instructions will require modifying the workload file. We strongly recommend that you copy simple.pg to a different file and modify only that copy. Always keep distributed workload files unmodified. It may not matter in this simple case, but it is a pain to spend hours debugging a workload only to find out that you "temporary" modified a file that workload is using but never reversed the changes.

It is simple to tell the robot to emit a realistic Poisson request stream with a given mean rate. All you need to do is to add req_rate setting to the robot configuration. Let's use 1 request per second load (per robot).

We will also increase the number of robots to 10 by cloning robots address 10 times. Here is the new robot configuration.

Robot R = {
    kind = "R101";
    public_interest = 50%;
    pop_model = { pop_distr = popUnif(); };
    recurrence = 55% / SimpleContent.cachable; // adjusted to get 55% DHR

    req_rate = 1/sec;

    origins = S.addresses;      // where the origin servers are
    addresses = ['10.44.128.61' ** 10 ]; // use clone operator
};

Try using this new robot and see how console output changes. You should see a cumulative request rate of about 10 requests per second. There should be more concurrent connections now because each robot can open several connections (if response time is more than one second), and there are ten robots. The response time may change as well.

If your device under test cannot handle 10 req/sec load, decrease per-robot request rate or decrease the number of robots.

Most production Polygraph workloads use thousands of robots with very low individual request rates (e.g., 0.4/sec) to simulate large end-user populations. However, as the above examples demonstrate, you can create a workload that matches your testing needs. We still recommend starting with standard workloads so that you gain experience using what has been proven to work before experimenting with custom designs.