Both polyclt and polysrv can report "address already
in use" errors. The exact error message text is OS-dependent, but the
causes of the error are similar for all OSes.
1.1 Server Side
The "address already in use" error usually occurs shortly after
polysrv is launched as the polysrv process configures
and starts individual server agents. Each server agent tries to bind to
its address and port number. If that address and port number pair is
already in use by some other application, the error occurs and
polysrv quits.
The most common reason for the address conflict is that another
application (e.g. a real Web server or another polysrv process)
is listening on the address that your server agent is trying to bind to.
You need to find out what that application is and kill it. Alternatively,
you can change the server agent address or port number to avoid the
conflict.
Netstat tool can be used on most operating systems to
check that a given address:port pair is indeed in use. Make sure
you are using the right netstat command line options to
display all allocated addresses and to disable address interpretation.
For example, on FreeBSD, you might run netstat -na.
Tools to find out which process is listening on a given address are
also available. Lsof is one
of them. These tools are handy if you cannot figure out what is listening
on the address that you want to use for Polygraph servers.
1.2 Client side
The "address already in use" error may occur on the client side of the
test. When opening a new connection, a robot agent tries to bind the
client side of the connection to the configured robot address. The port
number for the bind system call is set to zero by default. Zero port
number tells TCP stack to find any suitable "ephemeral" port for the given
address. If --ports command line
option is used, Polygraph will decide which port to use and pass that port
number to the OS.
The most common reason for the address conflict are race conditions
inside busy OS kernel: TCP stack thinks that the port is available,
assigns it to a connection, but does not actually reserves that port until
the connect system call. If another bind/connect sequence happens before
our connection is opened, and if that sequence uses the same address:port,
an error occurs.
It is also possible that the error is returned when the TCP stack has
ran out of free port numbers for a given address. The number of available
ephemeral port numbers (i.e., those assigned by OS, see above) is usually
quite small (a few thousands) by default, but can be increased using
sysctl or other OS-specific methods.
The error is more likely to occur on untuned kernels with default MSL
(TIME_WAIT) values because connections in TIME_WAIT state do consume port
numbers, increasing the chances for a conflict or lack of addresses.
To reduce the number of address conflicts, use the --ports command line
option. We usually use --ports 3000:30000 range. You can also
tune your OS to have more ephemeral ports and/or shorter MSL timeouts.
Netstat tool can be used on most operating systems to
check how many connections are on a TIME_WAIT state.