This page describes how to specify arbitrary table-based distributions in Polygraph.
1. Overview
2. File format
3. PGL usage
4. Command-line usage
Polygraph has many built-in distributions: normal, exponential, Zipf, constant, etc. In some cases a user-defined distribution is required. You can instruct Polygraph to use an arbitrary distribution by specifying a distribution pdf or value frequency histogram. Such a histogram is placed in a separate file. File format and usage are described below.
The table-distribution file format is different from PGL. The format is line-based. That is, elements cannot span multiple lines.
Here is a simple example of a distribution called "PConnDream". This distribution might be used for specifying the usage limit for persistent connections, for example.
# comments are allowed int_distr PConnDream = { # mandatory header 1 51.0 # value 1 has frequency 51 2 28.7 3 13.3 4 2.3 # value 4 has frequency 2.3 5 1.3 6 0.7 7 0.4 8 */100 # value 8 absorbs the rest (out of 100 total) } # closing bracket is required!The ``frequency'' column may have arbitrary values. The sum of those values does not have to add up to 100. One can use percents, probabilities, actual counters, etc. as frequencies. Polygraph will simply sum all the values and take that sum as an equivalent of 100%.
The last bin of the PConnDream histogram is interesting. We used percents as frequency values (but Polygraph did not know we chose percents and not, say, counters!). We were lazy to calculate how many percents were left after the first 7 entries, and just told Polygraph to do the math for us. Note that we had to specify the total of 100 in the wild-card entry.
Also note that Polygraph requires that the type of histogram values is specified. For PConnDream, that type is simply an integer (the number of transactions per connection), hence int_distr. Similarly, for time- or size-based histograms, one has to specify time_distr or size_distr type.
For now, time_distr histograms use seconds as a unit, and size_distr histograms use byte.
Here is a more complex example. The distribution below was adopted from real response time measurements on a cache server.
time_distr client_http_svc_time = { 1.017:1.943 2 # [min:max) range! :3.069 34 # max above becomes current min :4.050 303 # [3.069 : 4.050) :4.938 792 # [4.050 : 4.938) .... :870251.625 9 :918521.893 9 :969469.420 14 :1023243.301 6 # [969469.420 : 1023243.301) }For client_http_svc_time, we used real counters to represent frequencies to avoid boring recomputation into percents or probabilities.
As the example above illustrates, you can specify ranges of values and ``borrow'' maximum values from preceding lines. Note that a single value (as opposed to a range), say N, produces (for the purpose of borrowing only!) a maximum value of N+1. You may find this behavior ``natural'' for some applications.
size_distr reply_sizes = { 0 .01 # zero sized replies :1025 .30 # a [1,1025) range, 0 is not included! :2049 .15 # a [1025:2049) range .... 1048576 .02 # a 1MB reply precisely: [1MB : 1MB] :2097152 .01 # replies in [1MB+1byte : 2MB) range }
A user-defined distribution can appear in PGL anywhere a distribution is expected. For example:
Content cntImage = { size = table("/tmp/reply_sizes.pgd", "size"); ... }; ... Server S = { contents = [ cntImage ]; pconn_use_lmt = table("/tmp/pconn-dream.pgd", "int"); ... };A special distribution name "table" will instruct Polygraph to load a distribution histogram from the specified file (/tmp/reply_sizes.pgd or /tmp/pconn-dream.pgd). You must specify distribution value type as a second parameter of the table distribution.
A user-defined distribution can appear on the command line anywhere a distribution is expected. For example:
> distr_test ... --distr table:/tmp/pconn-dream-num.pgd,numA special distribution name "table" will instruct Polygraph to load a distribution histogram from the specified file (/tmp/pconn-dream-num.pgd). You must specify distribution value type as a second parameter of the table distribution.
Note that distr_test program accepts "num_distr" distributions only. Thus, we had to copy pconn-dream.pgd to pconn-dream-num.pgd and change the distribution type to from int_distr to num_distr for the example to work.