Here is PolyMix-4 at a glance.
Workload Name: PolyMix-4 Polygraph Version: 2.7 Configuration: workloads/polymix-4.pg Parameters: peak request rate, fill rate, cache size
How-Tos: available Results: available Synopsis: workload for testing forward caching proxies, fourth generation.
1. Background
2. Feature overview
3. Details
3.1 Phase schedule
3.2 Servers configuration
3.3 Robots configuration
3.4 Content types
3.5 WAN latency and packet loss
3.6 Other
4. Parameters
4.1 Peak request rate
4.2 Fill request rate
4.3 Proxy cache size
5. Addresses
5.1 Robot addresses
5.2 Server addresses
5.3 Proxy address
PolyMix-4 is based on our experience with using PolyMix-3 during the third cache-off and other tests. We have eliminated some of the known problems of the old workloads and added new features. The ultimate goal is, of course, getting our model closer to the real worlds.
The PolyMix environment models many key Web traffic characteristics, including the following.
- a mixture of content types
- varying offered load, depending on the test phase
- a working set of URLs that changes its content with time but can preserve its size
- all distributed clients can share information about the global URL set
- hot subsets simulating flash crowds
- virtually infinite number of different objects that are added to the working set as needed
- DNS names in URLs
- object life-cycles (expiration and last-modification times)
- persistent connections
- network packet loss
- reply sizes
- server-side latencies
- a mixture of cache hits and cache misses
- a mixture of cachable and uncachable responses
- object popularity (recurrence)
- request rates and interarrival times
- embedded objects and browser behavior
- cache validation (IMS requests)
- forced cache validations (reloads)
This section describes individual components of the PolyMix-4 workload and is mostly auto-generated from PGL configuration files. The configuration files should be consulted whenever a conflict in documentation is suspected.
3.1 Phase schedule
The workload schedule consists of 10 phases. The schedule includes 9 phases with time-based goals and 1 other phase. The total test duration (based on the time-based goals) is about 10.33hour.
Phase Factors (%) Other Populus Recurrence Special Msgs beg end beg end beg end framp 0.04 50.00 9.09 9.09 10.00 10.00 fill 50.00 50.00 9.09 9.09 10.00 10.00 wait for WSS to freeze fexit 50.00 0.04 9.09 100.00 10.00 100.00 inc1 0.04 100.00 100.00 100.00 100.00 100.00 top1 100.00 100.00 100.00 100.00 100.00 100.00 dec1 100.00 10.00 100.00 100.00 100.00 100.00 idle 10.00 10.00 100.00 100.00 100.00 100.00 inc2 10.00 100.00 100.00 100.00 100.00 100.00 top2 100.00 100.00 100.00 100.00 100.00 100.00 dec2 100.00 0.04 100.00 100.00 100.00 100.00 Phase "framp" lasts for 20.00min. During this phase, the robot population size increases from 0.04% to 50.00%. The offered per-robot load remains stable at 100.00% of its peak level. The recurrence level remains stable at 9.09% of robot recurrence ratios. The portion of special messages remains stable at 10.00%.
Phase "fill" does not have a time-based duration configured. During this phase, the robot population size remains stable at 50.00% of its peak level. The offered per-robot load remains stable at 100.00% of its peak level. The recurrence level remains stable at 9.09% of robot recurrence ratios. The portion of special messages remains stable at 10.00%. 1 samples of per-transaction statistics are collected. The phase will continue until working set size is frozen.
Phase "fexit" lasts for 20.00min. During this phase, the robot population size decreases from 50.00% to 0.04%. The offered per-robot load remains stable at 100.00% of its peak level. The offered recurrence level increases from 9.09% to 100.00% of robot recurrence ratios. The portion of special messages changes increases from 10.00% to 100.00%.
Phase "inc1" lasts for 20.00min. During this phase, the robot population size increases from 0.04% to 100.00%. The offered per-robot load remains stable at 100.00% of its peak level. The recurrence level remains stable at 100.00% of robot recurrence ratios. The portion of special messages remains stable at 100.00%.
Phase "top1" lasts for 4.00hour. During this phase, the robot population size remains stable at 100.00% of its peak level. The offered per-robot load remains stable at 100.00% of its peak level. The recurrence level remains stable at 100.00% of robot recurrence ratios. The portion of special messages remains stable at 100.00%. 1 samples of per-transaction statistics are collected.
Phase "dec1" lasts for 20.00min. During this phase, the robot population size decreases from 100.00% to 10.00%. The offered per-robot load remains stable at 100.00% of its peak level. The recurrence level remains stable at 100.00% of robot recurrence ratios. The portion of special messages remains stable at 100.00%.
Phase "idle" lasts for 20.00min. During this phase, the robot population size remains stable at 10.00% of its peak level. The offered per-robot load remains stable at 100.00% of its peak level. The recurrence level remains stable at 100.00% of robot recurrence ratios. The portion of special messages remains stable at 100.00%.
Phase "inc2" lasts for 20.00min. During this phase, the robot population size increases from 10.00% to 100.00%. The offered per-robot load remains stable at 100.00% of its peak level. The recurrence level remains stable at 100.00% of robot recurrence ratios. The portion of special messages remains stable at 100.00%.
Phase "top2" lasts for 4.00hour. During this phase, the robot population size remains stable at 100.00% of its peak level. The offered per-robot load remains stable at 100.00% of its peak level. The recurrence level remains stable at 100.00% of robot recurrence ratios. The portion of special messages remains stable at 100.00%. 1 samples of per-transaction statistics are collected.
Phase "dec2" lasts for 20.00min. During this phase, the robot population size decreases from 100.00% to 0.04%. The offered per-robot load remains stable at 100.00% of its peak level. The recurrence level remains stable at 100.00% of robot recurrence ratios. The portion of special messages remains stable at 100.00%.
3.2 Servers configuration
The workload defines 1 server type.
Server "PolyMix-4-srv" hosts the following 4 content types: "image" (65.00% of all hosted content), "HTML" (15.00%), "download" (0.50%), and "other" (19.50%). The following 3 content types can be accessed directly: "HTML", "download", and "other". Server "think time" distribution is set to norm(2.50sec, 1.00sec). This server uses persistent connections. The number of transactions per connection is distributed as zipf(16). Idle persistent connections are closed after a 15.00sec timeout. Only basic reply types are used.
3.3 Robots configuration
The workload defines 1 robot type.
Robot "PolyMix-4-rbt" is a "constant request rate" robot with request rate of 0.40 requests per second. About 50.00% of requests refer to URLs in a globally shared, public working set. This robot revisits 91.67% of previously requested URLs (offering a hit when a URL is cachable). About 100.00% of embedded objects will be loaded.
This robot is not allowed to open more than 4 connections at any given time, even if that limit causes decrease in request rate or memory exhaustion. Moreover, waiting transaction queue can grow without bounds. Robot's private cache is limited to 1000 entries. This robot uses persistent connections. The number of transactions per connection is distributed as zipf(64). Idle persistent connections are never closed by this robot. The following 3 request types are used: "IMS" (20.00% of all possible request types), "Reload" (5.00%), and "Basic" (75.00%).
"PolyMix-4-rbt" robots direct 10.00% of all requests to 1.00% of the working set, using popUnif() popularity distribution.
3.4 Content types
The workload uses 4 unique content types.
Type Reply Size Cachability Extensions image exp(4.500KB) 80.00% .gif, .jpeg, and .png HTML exp(8.500KB) 90.00% .html and .htm download logn(300.000KB, 300.000KB) 95.00% .exe, .zip, and .gz other logn(25.000KB, 10.000KB) 72.00% The size distribution for "image" content type is exp(4.500KB). About 80.00% of "image" objects are cachable. This content type does not contain other types. The following 3 extensions may appear at the end of URLs: ".gif", ".jpeg", and ".png".
The size distribution for "HTML" content type is exp(8.500KB). About 90.00% of "HTML" objects are cachable. This content type is a container. Objects may contain (embed) the following 1 content type: "image" (100.00% of all embedded content). The number of embedded objects per container is distributed as zipf(13). The following 2 extensions may appear at the end of URLs: ".html" and ".htm".
The size distribution for "download" content type is logn(300.000KB, 300.000KB). About 95.00% of "download" objects are cachable. This content type does not contain other types. The following 3 extensions may appear at the end of URLs: ".exe", ".zip", and ".gz".
The size distribution for "other" content type is logn(25.000KB, 10.000KB). About 72.00% of "other" objects are cachable. This content type does not contain other types.
3.5 WAN latency and packet loss
The Polygraph client and server machines are configured to use FreeBSD's DummyNet feature.
We configure Polygraph servers with 40 millisecond delays (per packet, incoming or outgoing), and with a 0.05% probability of dropping a packet (incoming or outgoing). Server think times are normally distributed with a 2.5 second mean and a 1 second standard deviation. Note that the server think time does not depend on the oid. Instead, it is randomly chosen for every request.
We do not use packet delays or packet loss on Polygraph clients.
3.6 Other
Histograms and aggregate stats for PolyMix content types based on Squid caching proxy tests are available elsewhere.
Here is an explanation of some of the workload parameters.
4.1 Peak request rate
This parameter specifies the request rate for the "plat" phase of the test. The minimum request rate is 0.4 req/sec. The maximum request rate (given PolyMix-4 address allocation scheme rules) is probably around 15500 req/sec.
4.2 Fill request rate
This parameter specifies the request rate for the "fill" phase of the test. The fill rate must be within 10% to 100% of the peak request rate.
4.3 Proxy cache size
Proxy cache size is the configured cache size plus the total amount of RAM that the proxy box has. Configured cache size is whatever is specified in proxy configuration file or the best approximation of that. High/low water marks for garbage collection and other proxy-specific settings and algorithms should not affect this parameter. Proxy cache size is used to determine the duration of the fill phase and does not have direct effect on other phases (though there may be performance side-effects, of course).
This section describes algorithms and rules used to allocate domain names and IP addresses used in PolyMix-4. Most of the addresses are computed automatically based on request rates and address space parameters specified in the workload file.
5.1 Robot addresses
The number of PolyMix-4 robots is determined by peak request rate. Each robot is capable of producing 0.4 req/sec load. The total number of robots is adjusted so that every client-side host has the same number of robots (other similar minor adjustments are also made). The number of hosts is determined based on the maximum host load of 500 req/sec.
Two robots share the same IP address. All IP addresses use /22 subnet.
To allocate IP addresses for robot pairs, Polygraph iterates through the client-side addr_space array and gives the next robot pair the next IP address, until enough IP addresses are allocated for a host. Polygraph then skips remaining IP addresses that belong to the same /22 subnet (if any), and starts allocation for the next host (if any).
The above scheme ensures that individual IPs do not "migrate" from one host to another when request rate changes. Instead, only the number of IPs "enabled" on each host changes.
Robot addresses are bound to loopback interfaces. The bench setup must provide appropriate routes for robots to be able to communicate with the world.
PolyMix-4 uses lo0::10.X.0-123.1-250/22 client-side address space, where X is the bench ID that can vary from 100 to 199.
5.2 Server addresses
Server-side IP allocation algorithm is very similar to the client-side algorithm described above. The only significant difference is that the total number of server agents is computed as 500 + 0.1*R, where R is the total number of robots.
Each server gets a unique IP address. PolyMix-4 uses lo0::10.101.128-251.1-250:80/22 server-side address space.
In addition to server IP addresses, PolyMix-4 uses domain names. Each server gets a unique domain name derived from the server IP address using the IpsToNames() PGL function call. The mapping is 1:1. All domain names have the same length. All domain names belong to the "bench.tst" zone. Robots know the servers by their domain names rather than IP addresses.
Needless to say, PolyMix-4 tests require a functioning DNS server.
5.3 Proxy address
If Polygraph robots should use a proxy address, that single address must be 172.16.X.32 where X is the bench ID.