Many performance workloads are designed to verify that the device under
test (DUT) can sustain a given level of load. Such design simplifies
workload creation and test result analysis, but requires a priory
knowledge of device's sustained peak performance level. One way to find
that level is to repeat tests, increasing the load with each next test.
This approach is very simple but requires a lot of time. Binary search
optimization can help, but may still require days of testing.
Ideally, we want the benchmark to find the peak and then test that the
peak can be sustained. The search should be done without a priory
knowledge of the device abilities. In practice, it is possible to
approach this ideal using Polygraph features described here.
Testing scenarios other than finding sustained peak performance can
also benefit from the features below. For example, it is often useful to
see how the box behaves under overload conditions and [D]DoS attacks.
Normally, a single test simulates a single attack, and many tests are
needed to build a comprehensive picture. A better approach, available to
Polygraph users, is to create a workload that simulates several attacks,
configuring Polygraph to "back off" and give the device under test a break
when it is clear that the device became unusable.
This page describes a set of features related to DUT Watchdogs
supported by Polygraph. Using watchdogs, one can implement peak finder,
series of DoS attack, and many other useful workloads.
Our first attempt to implement a peak finder feature was based on the
Rptmstat
(response time [thermo]stat) approach: the user would configure desirable
response time range and let Polygraph to increase load when response time
is below the range and decrease load when response time gets too high.
Rptmstat worked in some environments but not others. It turned out, that
having just a couple of knobs is insufficient to accommodate behavior of
many real-world devices Polygraph has to test. For example, if the device
under test does not slow down gradually under load, rptmstat was not able
to detect overload conditions fast enough or was not able to decrease the
load fast enough. Thermostats work well in rooms with gradual changes in
temperature, but are not appropriate for environments that may require
rapid, varying, and complex actions (e.g., nuclear reactor core).
The Watchdog approach described here is an attempt to allow the user
(i.e., workload writer) to specify when and how the offered load (or other
run-time factors) should change. This is very different from our initial
rptmstat approach where a rigid algorithm was hard-coded into Polygraph,
exposing just a couple of control knobs. We want the user to be able
to say something along these non-PGL lines:
run the following script every few transactions:
- if device under test is happy, then increase the load
- if something went wrong, then pause the test for 5 minutes
Watchdog feature allows Polygraph to constantly monitor current
conditions and act when those conditions meet pre-defined criteria. The
sections below describe how conditions can be monitored and what actions
can be taken.
3.1 Current State
In all use cases discussed so far, Polygraph should act based on
various performance measurements. Virtually all performance
measurements reported by Polygraph (run-time and post-moterm) are
accessible. Note that most measurements can only be defined for a
sample of test transactions. A sample can be explicitly defined (see
sampling technique described in the next section), or phase statistics
collected so far during the current phase can be used. Both methods
yield a StatSample
object.
Measurements are accessible via StatSample object fields,
described elsewhere. Here is
a simple example:
StatSample sample = currentSample();
if (sample.real.miss.rptm.mean > 100msec) then {
... // do something
} else {
... // do something else
}
Note that StatSample fields are measurements (facts), not knobs
(variables). One cannot change their values.
3.2 Sampling goal
All measurement categories mentioned above require some sort of
aggregation of information. For example, one has to observe several
transactions to make accurate estimations/measurements of the response
rate. Similarly, error count only means something if the observation
interval has been specified.
To specify the sampling duration, an object of type Goal is used:
Goal smallSample = { duration = 3sec; };
Goal bigSample = { xactions = 10000; };
Goal halfwaySample = { duration = somePhase.goal.duration/2; };
Usually, the longer it takes Polygraph to satisfy the goal, the
more accurate the collected measurements are. On the other hand, long
sampling intervals prevent Polygraph from reacting to sudden changes
in behavior. Fortunately, different watchdogs may have different
sampling goals (see below).
As usual, goal settings are ORed together. For example, the
following code describes a watchdog goal that will be satisfied when
either 30 seconds of data is collected or 100 successful transactions
finish their execution:
Goal goal = {
duration = 30sec;
xactions = 100;
};
3.3 Actions
A watchdog specification includes arbitrary PGL code. Polygraph
does not interpret that code until the sampling goal is reached,
run-time. Moreover, Polygraph interprets that code every time the
sampling goal is reached.
Certain PGL calls are especially useful in a watchdog context:
- float currentLoadFactor()
- float currentPopulusFactor()
- StatSample
currentSample()
- StatSample
currentPhase()
- setLoadFactorTo(float factor)
- changeLoadFactorBy(float percentage)
- setPopulusFactorTo(float factor)
- changePopulusFactorBy(float percentage)
- setSamplingGoalTo(Goal)
- print(...)
3.4 Every statement: putting all things together
A watchdog object is built from the above components using an
"every" PGL statement:
every Goal do Code;
Watchdogs are specified on a per-phase basis, using Phase's
"script" field. Here is an example of two simple watchdog objects
attached to a phase configuration:
Goal smallSample = { duration = 3sec; };
Goal bigSample = { xactions = 10000; };
Phase phase = {
name = "peak_finder";
goal.duration = 1hour;
script = {
every smallSample do {
time t = currentSample().real.miss.rptm.mean;
if (t > 100msec) then {
print("miss response time too large at ", t);
changeLoadFactorBy(-30%);
}
}
every bigSample do {
time t = currentSample().real.hit.rptm.mean;
if (t < 50sec) then {
print("hit response time too small at ", t);
changeLoadFactorBy(+10%);
} else {
print("hit response time is OK at ", t);
}
}
};
...
};
It is very important to understand how watchdogs are interpreted.
For a single watchdog, Polygraph waits until the Goal guard of the
corresponding every statement is satisfied. Polygraph also
accumulates statistics necessary to implement currentSample() and
currentPhase() PGL function calls. Once the guard goal is satisfied,
the corresponding do code of the every-statement is
interpreted (i.e., executed). After the execution, the sample
statistics is reset, and the cycle starts from scratch. In other
words, sample statistics is collected and the do-code is interpreted
every Goal-controlled non-overlapping interval. No sliding
windows are used for sample statistics collection. Phase statistics is
not reset during a phase and is constantly updated during the lifetime
of the phase.
Each watchdog is treated in isolation from other watchdogs. It is
possible for two watchdogs to reach their guard goals and fire their
code at about the same time. To resolve conflicts, if needed, make
sure that all if-statements guarding test-changing calls describe
disjoint, non-overlapping conditions. Often, however, concurrent
execution is not a problem.
A simple workload using watchdog features is available elsewhere). Below is the interesting part of the
console output of the corresponding Polygraph test.
000.10| i-finder 2557 511.40 1 0.00 0 1
000.18| i-finder 5132 514.99 2 0.00 0 3
000.23| script output: increasing load factor by 10%
000.23| fyi: changing load factor level from 100.00% to 110.00%
000.27| i-finder 7724 518.40 2 0.00 0 0
000.35| i-finder 10556 566.39 2 0.00 0 0
000.43| i-finder 13473 583.40 2 0.00 0 0
000.45| script output: increasing load factor by 10%
000.45| fyi: changing load factor level from 110.00% to 121.00%
000.52| i-finder 16527 610.80 4 0.00 0 0
000.60| i-finder 19570 608.57 4 0.00 0 6
000.67| script output: increasing load factor by 10%
000.67| fyi: changing load factor level from 121.00% to 133.10%
000.68| i-finder 22708 627.47 4 0.00 0 1
000.77| i-finder 26025 663.38 4 0.00 0 0
000.85| i-finder 29406 676.09 3 0.00 0 4
000.88| script output: increasing load factor by 10%
000.88| fyi: changing load factor level from 133.10% to 146.41%
000.93| i-finder 32944 707.25 5 0.00 0 4
001.02| i-finder 36629 736.96 4 0.00 0 4
001.10| script output: increasing load factor by 10%
001.10| fyi: changing load factor level from 146.41% to 161.05%
001.10| i-finder 40113 696.77 3 0.00 0 2
001.18| i-finder 44112 799.80 4 0.00 0 0
001.27| i-finder 48083 794.05 5 0.00 0 1
001.32| script output: increasing load factor by 10%
001.32| fyi: changing load factor level from 161.05% to 177.16%
001.35| i-finder 52242 831.20 5 0.00 0 11
001.43| i-finder 56667 884.39 7 0.00 0 5
001.52| i-finder 61129 891.84 8 0.00 0 10
001.53| script output: increasing load factor by 10%
001.53| fyi: changing load factor level from 177.16% to 194.87%
001.60| i-finder 65955 965.07 15 0.00 0 3
001.68| i-finder 70861 981.18 13 0.00 0 3
001.75| script output: increasing load factor by 10%
001.75| fyi: changing load factor level from 194.87% to 214.36%
001.77| i-finder 75726 972.75 14 0.00 0 14
001.85| i-finder 80972 1049.20 23 0.00 0 31
001.93| i-finder 86389 1083.35 44 0.00 0 55
001.97| script output: increasing load factor by 10%
001.97| fyi: changing load factor level from 214.36% to 235.79%
002.02| script output: decreasing load factor by 30%
002.02| fyi: changing load factor level from 235.79% to 165.06%
002.02| i-finder 91471 1015.15 158 0.00 0 691
002.08| script output: decreasing load factor by 30%
002.08| fyi: changing load factor level from 165.06% to 115.54%
002.10| i-finder 96108 927.40 85 0.00 0 0
002.18| i-finder 99028 584.00 3 0.00 0 0
002.27| i-finder 101909 576.16 2 0.00 0 1
002.35| i-finder 104902 598.60 2 0.00 0 0
002.40| script output: increasing load factor by 10%
002.40| fyi: changing load factor level from 115.54% to 127.09%
002.43| i-finder 107908 601.20 2 0.00 0 0
002.52| i-finder 111152 648.72 3 0.00 0 2
002.60| i-finder 114317 632.97 2 0.00 0 1
002.62| script output: increasing load factor by 10%
002.62| fyi: changing load factor level from 127.09% to 139.80%
002.68| i-finder 117684 673.37 3 0.00 0 1
002.77| i-finder 121169 696.90 3 0.00 0 2
002.83| script output: increasing load factor by 10%
002.83| fyi: changing load factor level from 139.80% to 153.78%
002.85| i-finder 124737 713.52 4 0.00 0 1
002.93| i-finder 128629 778.13 4 0.00 0 2
003.02| i-finder 132363 746.80 5 0.00 0 0
003.05| script output: increasing load factor by 10%
003.05| fyi: changing load factor level from 153.78% to 169.16%
003.10| i-finder 136512 829.72 6 0.00 0 2
003.18| i-finder 140702 837.50 5 0.00 0 9
003.27| script output: increasing load factor by 10%
003.27| fyi: changing load factor level from 169.16% to 186.08%
003.27| i-finder 144884 836.37 8 0.00 0 22
003.35| i-finder 149497 922.55 11 0.00 0 12
003.43| i-finder 154202 940.98 9 0.00 0 19
003.48| script output: increasing load factor by 10%
003.48| fyi: changing load factor level from 186.08% to 204.69%
003.52| i-finder 159051 968.61 14 0.00 0 11
003.60| i-finder 164144 1018.49 22 0.00 0 2
003.68| i-finder 169202 1011.38 31 0.00 0 11
003.70| script output: increasing load factor by 10%
003.70| fyi: changing load factor level from 204.69% to 225.15%
003.75| script output: decreasing load factor by 30%
003.75| fyi: changing load factor level from 225.15% to 157.61%
003.77| i-finder 174405 1040.60 102 0.00 0 2
003.82| script output: decreasing load factor by 30%
003.82| fyi: changing load factor level from 157.61% to 110.33%
003.85| i-finder 177877 694.40 4 0.00 0 0
003.93| i-finder 180561 536.71 2 0.00 0 9
004.02| i-finder 183285 544.80 2 0.00 0 0
004.10| i-finder 186043 551.49 2 0.00 0 4
004.13| script output: increasing load factor by 10%
004.13| fyi: changing load factor level from 110.33% to 121.36%
004.18| i-finder 189094 610.16 2 0.00 0 3
004.27| i-finder 192114 603.98 2 0.00 0 5
004.35| script output: increasing load factor by 10%
As you can see, Polygraph starts with 500 requests per second rate and
the load factor is increased in several steps because response time is
below dutGood.rptm_max or 50 milliseconds. About 2 minutes into
the test, the response time climbs above that threshold and the second
watchdog decreases load factor by 30%. Four seconds after, the load factor
is decreased by 30% again. After that, the response time comes back to
normal levels and Polygraph starts increasing the load factor. This
pattern repeats throughout the test.
Request rate analysis shows that a safe peak rate is somewhere around
950-1050 requests per second. The absolute values are not important
though, we are just illustrating the technique with a very simple no-proxy
test.
This example is closer to a repeated DoS simulation than a
Sustained Peak Finder workload. A good Peak Finder workload
should be adjusted to try to sustain high request rate just below the
breaking point. It is quote possible that the device under test can
survive short peaks of 900 req/sec load, but cannot sustain 700 req/sec
load for more than 5 minutes. Such a workload improvement is possible by
writing more complex watchdogs.