Polygraph has many built-in distributions:
normal, exponential, Zipf, constant,
etc. In some cases a user-defined distribution is required. You can
instruct Polygraph to use an arbitrary distribution by specifying a
distribution pdf or value frequency histogram. Such a
histogram is placed in a separate file. File format and usage are
described below.
The table-distribution file format is different from PGL. The
format is line-based. That is, elements cannot span multiple
lines.
Here is a simple example of a distribution called
"PConnDream". This distribution might be used for specifying
the usage limit for persistent connections, for example.
# comments are allowed
int_distr PConnDream = { # mandatory header
1 51.0 # value 1 has frequency 51
2 28.7
3 13.3
4 2.3 # value 4 has frequency 2.3
5 1.3
6 0.7
7 0.4
8 */100 # value 8 absorbs the rest (out of 100 total)
} # closing bracket is required!
The ``frequency'' column may have arbitrary values. The sum of
those values does not have to add up to 100. One can use percents,
probabilities, actual counters, etc. as frequencies. Polygraph will
simply sum all the values and take that sum as an equivalent of 100%.
The last bin of the PConnDream histogram is interesting.
We used percents as frequency values (but Polygraph did not know we
chose percents and not, say, counters!). We were lazy to calculate how
many percents were left after the first 7 entries, and just told
Polygraph to do the math for us. Note that we had to specify the total
of 100 in the wild-card entry.
Also note that Polygraph requires that the type of histogram values is
specified. For PConnDream, that type is simply an integer (the
number of transactions per connection), hence int_distr.
Similarly, for time- or size-based histograms, one has to specify
time_distr or size_distr type.
For now, time_distr histograms use seconds as a
unit, and size_distr histograms use byte.
Here is a more complex example. The distribution below was adopted
from real response time measurements on a cache server.
time_distr client_http_svc_time = {
1.017:1.943 2 # [min:max) range!
:3.069 34 # max above becomes current min
:4.050 303 # [3.069 : 4.050)
:4.938 792 # [4.050 : 4.938)
....
:870251.625 9
:918521.893 9
:969469.420 14
:1023243.301 6 # [969469.420 : 1023243.301)
}
For client_http_svc_time, we used real counters to represent
frequencies to avoid boring recomputation into percents or
probabilities.
As the example above illustrates, you can specify ranges of values and
``borrow'' maximum values from preceding lines. Note that a single value
(as opposed to a range), say N, produces (for the purpose of
borrowing only!) a maximum value of N+1. You may find this
behavior ``natural'' for some applications.
size_distr reply_sizes = {
0 .01 # zero sized replies
:1025 .30 # a [1,1025) range, 0 is not included!
:2049 .15 # a [1025:2049) range
....
1048576 .02 # a 1MB reply precisely: [1MB : 1MB]
:2097152 .01 # replies in [1MB+1byte : 2MB) range
}
A user-defined distribution can appear in PGL anywhere a
distribution is expected. For example:
Content cntImage = {
size = table("/tmp/reply_sizes.pgd", "size");
...
};
...
Server S = {
contents = [ cntImage ];
pconn_use_lmt = table("/tmp/pconn-dream.pgd", "int");
...
};
A special distribution name "table" will instruct Polygraph to load a
distribution histogram from the specified file
(/tmp/reply_sizes.pgd or /tmp/pconn-dream.pgd). You must
specify distribution value type as a second parameter of the
table distribution.
A user-defined distribution can appear on the command line anywhere a
distribution is expected. For example:
> distr_test ... --distr table:/tmp/pconn-dream-num.pgd,num
A special distribution name "table" will instruct Polygraph to load a
distribution histogram from the specified file
(/tmp/pconn-dream-num.pgd). You must specify distribution value
type as a second parameter of the table distribution.
Note that distr_test program accepts "num_distr" distributions
only. Thus, we had to copy pconn-dream.pgd to
pconn-dream-num.pgd and change the distribution type to from
int_distr to num_distr for the example to work.