|
This section documents optional configuration options for the advanced user. You do not
need to be concerned with any of these options in order to get your csWMPI application to
run, however some of the options might prove convenient.
Machines Definition:
The machines section is started by the string "/Machines". In this
section create an entry for each machine of the cluster
using the following syntax:
/Machines
<Machine name>
<device1> <machine id using the device1>
<device2> <machine id using the device2>
<device3> <machine id using the device3>
... |
Each entry starts with the identification of a machine, which is its
DNS Name or IP address.
You then specify the devices that the machine can use to communicate with
the other machines. For each device the machine has an identifier, which
is unique within that device (e.g. for the TCP device the identifier is the
IP address or IP name of the machine).
An example of an entry to specify the machine mountain, which
has the devices tcp and shmem, is:
/Machines
mountain
tcp mountain.criticalsoftware.com
shmem mountain |
Connections Definition:
After defining all the machines, it is necessary to provide information on how
the processes of each machine communicate with other processes. Default
communication devices can be specified for communications between processes
running on the same machine (internal) and for processes on remote machines
(external). You can also specify devices for specific connections between
machines. Finally, you have to specify an intercomputation device, which is the
device to use for computations connecting at runtime using MPI_Comm_connect and
MPI_Comm_accept. Currently, only the tcp device can be used as the
intercomputation device.
The format of the Connection section is shown below:
/Connections
[internal device <device>]
[external device <device>]
[intercomputation device <device>]
[<Machine1> <Machine2> <device>] |
An example of the Connections section is:
/Machines
mountain
tcp mountain.criticalsoftware.com
shmem mountain
squirrel
tcp squirrel.criticalsoftware.com
shmem squirrel
pacific
tcp pacific.criticalsoftware.com
#Note: This machine (pacific) has no shared memory device
/Connections
internal device shmem
external device tcp
intercomputation device tcp
pacific pacific tcp |
This configuration file specifies three different machines. Two
of the machines, mountain and squirrel, have the devices tcp
and shmem, while the third machine, pacific, uses only the device
tcp. In the configuration shown above under the /Connections section
it is specified that the default internal device to be used is shmem. Hence, by default,
shared memory is used for communication between processes residing on the same machine. Likewise,
the default external device is set to be tcp, which means that by default processes residing
on different machines communicate via tcp. The intercomputation device is configured to tcp.
The default configuration is overwritten for the machine named pacific. The last line of the
example shown above specifies that tcp should be used for internal communications between processes
residing on the machine pacific.
Security section:
For each machine you must specify the security context for processes running on
that machine. A domain/user pair defines the security context.
It is possible to set a default user name and domain name to be used in
all the machines. You can specify empty domain name when you are using the same username,
in machines outside a Windows Domain or in Linux machines.
The format of the Security section is shown below:
/Security
[default user <user name>]
[default domain [<domain name>]]
[<Machine> <user name> [<domain name>]] |
Using the example from the Connections Definition above, assume
that the machines mountain and squirrel belong the domain csWMPI,
while the machine pacific does not belong to any domain, the
security section should then be:
/Security
default user csWMPI_user
default domain csWMPI
pacific parallel_user pacific |
csWMPI_user must be an account in the domain csWMPI and
the parallel_user must be an account on the pacific machine.
If the passwords are already present in the system (using the
csWMPIreguser
tool), then csWMPI will use them directly.
Otherwise you will be prompted for the passwords.
Cluster Configuration file example
This is an example of a complete cluster configuration file:
/Machines
mountain
startup address mountain.criticalsoftware.com
tcp mountain.criticalsoftware.com
shmem mountain
squirrel
startup address squirrel.criticalsoftware.com
tcp squirrel.criticalsoftware.com
shmem squirrel
pacific
startup address pacific.criticalsoftware.com
tcp pacific.criticalsoftware.com
#Note: This machine has no shared memory device
/Connections
internal device shmem
external device tcp
intercomputation device tcp
pacific pacific tcp
/Security
default user csWMPI_user
default domain csWMPI
pacific parallel_user pacific |
Creating a portable Cluster Configuration file
A portable configuration file can be created using the wildcard character
"." (period). Every time that it is used it will represent the default system
entry for that field. It can be used to specify the name of a machine (the
current machine name), the name of the user (the user currently logged on and
starting the computation) or the name of the domain (the domain the user logged on).
Below is an example of a portable configuration file:
/Machines
.
startup address .
shmem .
/Connections
internal device shmem
external device shmem
/Security
default user .
default domain . |
Note that wildcards are not accepted in specific connections or in specific security entries.
In order to improve easy of use and provide a Process Group configuration file
flexible and easy to maintain, the PG2 file support was added to csWMPI.
All files are parsed as PG2 files, unless they have extension .pg, when they get parsed
as old .pg files.
A PG2 is a valid XML file that can include a wide number of options, as following:
<job>
<set>
<executable>myapp.exe</executable>
<arguments>"argument1 with spaces" argument2</arguments> <!-- optional -->
<wdir>X:\temp</wdir> <!-- optional -->
<path>X:\apps-dir</path> <!-- optional -->
<processes>2</processes><!-- processes per machine -->
<drivemap><!-- optional -->
<drive>X:</drive>
<share>\\ideafix\public</share>
</drivemap>
<environment><!-- optional -->
<variable name="variable1_name">variable1_value</variable> <!-- optional -->
<variable name="variable2_name">variable2_value</variable> <!-- optional -->
</environment>
<monitor>mpi</monitor> <!-- optional -->
<machine name="machine1"/>
<machine name="machine2"/>
<machine name="machine3">
<processes>4</processes><!-- optional -->
<executable>machine3_executable.exe</executable><!-- optional -->
<arguments>machine3_arg1</arguments><!-- optional -->
</machine>
</set>
</job>
|
Let's see each of the elements one by one
All PG2 file elements have to be inside a <job> element as a correct XML file.
A set is made of a number of common options. A <job> can have multiple
<set> elements. MPI process ranks will be distributed first to all processes described
in the first set, then in second set and so on. A <job> must have at least one
<set> element.
<executable>myapp.exe</executable>
|
The <executable> specifies the executable of each process of this set. It can be a
relative or absolute path filename. In case one uses absolute path filename, notice
that it is to be valid pathname in each machine running the executable.
<arguments>"argument1 with spaces" argument2</arguments> <!-- optional -->
|
The <arguments> specifies the arguments for each process of this set. Arguments
containing spaces must be within quotes. This is an optional element, thus processes
without arguments don't need to define this element.
<wdir>X:\temp</wdir> <!-- optional -->
|
The <wdir> specifies the working directory of each process. Notice this is relative
to each process machine and that it has to be
valid for all machines of this set. It's an optional element and in case it's not present
the working directory will be the executable's directory. Network drives are mapped before
setting the working directory of the process, thus one can specify directories from the
Mapped Network drives as working directory.
<path>X:\apps-dir</path> <!-- optional -->
|
The <path> specifies the path to be look for the executable. Notice this is relative
to each process machine. Network drives are mapped before searching for the executable,
thus one can specify Mapped Network drives in the path having the process executable on
a shared drive.
It can contain a list of paths (valid in Windows or Linux) separated by semicolon, e.g.
<path>X:\apps-dir;c:\other-appsdir;/home/csWMPI-user/apps</path>
The '.' wildcard will be expanded to: current directory when starting computation with
mpiexec; executable's directory when using direct-run
<processes>2</processes><!-- processes per machine -->
|
The <processes> specifies number of processes to be created in each machine. Please notice
that this is NOT the total number of processes in the computation.
<drivemap><!-- optional -->
<drive>X:</drive>
<share>\\ideafix\public</share>
</drivemap>
|
The <drivemap> specifies network shares that will be mapped in each machine. One can
specify multiple <drivemap> elements in each set to map more than one network share.
This is the easiest way to share an executable to all machines of a cluster. Notice that
Windows might have license limitations on sharing a file to a lot of machines.
In the <drive> one specifies the drive letter and in the <share> element the
network share to map the drive. The share is accessed as the owner of the process to be
created.
When using MPI_Comm_spawn and MPI_Comm_spawn_multiple, if no additional drivemap were defined
using the info object, the drivemaps of the first set are activated in the newly
spawned processes.
<environment><!-- optional -->
<variable name="variable1_name">variable1_value</variable> <!-- optional -->
<variable name="variable2_name">variable2_value</variable> <!-- optional -->
</environment>
|
The <variable> elements inside the <environment> element, specify environment
variables that will be defined in all MPI processes. One can define as many environment
variables as desired (limited by OS restrictions only). In case of using direct-run, these
environment variables will be defined in rank 0 (the starting process) after calling MPI_Init.
In case a PATH variable is defined here, it will be added, rather overwriting, to the PATH environment
variable of each csWMPI Service/Daemon when starting the processes. Notice this PATH will not be
set when looking for the mpi executable. For that you should use the <path>
element.
When using MPI_Comm_spawn and MPI_Comm_spawn_multiple, if no additional environment variables
were defined using the info object, the environment variables of the first
set are defined in the newly spawned processes.
<monitor>mpi</monitor> <!-- optional -->
|
The csWMPI Services monitor processes created by them. By default created processes are
MPI processes. In case the created processes are not MPI processes, the csWMPI Service
needs to know that so it monitors the behavior of the process and acts according to
that. The value of <monitor> element specifies which type of monitoring level should be
done and what kind of behavior is expected by the create processes. Must be one of the
following values:
mpi - (default value) created processes are mpi processes and computation
will abort in case a process dies unexpectedly. Also, in case of abortion mpi processes
will be terminated.
process - created processes are not mpi processes, but they will create the mpi
process as their child. In case a created process dies, abortion sequence is NOT started.
In case mpi computation aborts, these processes will be terminated. This can be usefull when
using intermediate wrappers that will start the mpi processes.
permanent - created processes are not mpi processes, but they will create the mpi
processes as their child. In case the created processes die, abortion sequence is NOT started.
In case mpi computation aborts, these processes will NOT be terminated. This option can be
usefull when having third party schedulers or debuggers.
none - created processes will NOT be monitored by csWMPI Service.
<machine name="machine3">
<processes>4</processes><!-- optional -->
<executable>machine3_executable.exe</executable><!-- optional -->
<arguments>machine3_arg1</arguments><!-- optional -->
</machine>
|
Multiple <machine> elements specify the nodes where the processes will be run.
By default, when no inner elements are defined, csWMPI will use the definitions of the set
to create the process on the machine.
Optionally, in case for a specific machine one wants to configure different values for
executable, number of processes or arguments, that can be done by optionally add each one of the
<processes>, <executable> or </machine> elements. In case the executable is not
a fully qualified name, csWMPI will use the <set> <path> element to look for it.
The normal behavior, in case an error occurs, is for csWMPI to generates an error message,
display it in the console of the process, and sends a pop-up message to the master machine
in case the process is on a different machine from rank 0.
While developing applications with csWMPI, you might find it convenient to redirect
the output of errors to files. The environment variable
csWMPI_MASTER_ERROR_OUTPUT sets the
output filename for master process (rank 0 of MPI_COMM_WORLD)
while csWMPI_SLAVE_ERROR_OUTPUT sets the output filename for all
other processes. The files are created in the machines that host the processes. If these
variables have the value null (a string with the value
"null") then no output will be generated.
csWMPI reads the security context in the cluster
configuration file. Then it searches in the registry/personal files for the passwords users specified.
You can configure how csWMPI handles passwords through the
csWMPI_PASSWORD_SEMANTICS environment variable, in case a user's
password is not found in the registry. This environment variable can take
three different values:
| Value: |
Description: |
| ask_user |
Prompts the user for the password on stdin.
|
| return_error |
Exits application with error. |
| get_environment |
Tries to get passwords from environment variables (see below). |
If csWMPI_PASSWORD_SEMANTICS is assigned the value get_environment
csWMPI attempts to find environment variables with names corresponding to the domain\user pairs specified
in the cluster configuration file. For example, if cluster configuration specifies the user
GALIA\asterix for a machine, that user's password should be assigned to an environment variable
named galia\asterix (must be in lowercase letters). The value of galia\asterix
should be the password for the user in clear text.
Storing user's password in environment variables might prove a serious security risk, thus using
the password semantics of get_environment should be used with extreme care.
We have added this functionality to enable users to work around the "feature" that Windows does
not load a user's profile, when the user is logged in using Windows API calls. However this mechanism
is also available in Linux systems.
This section contains a list and description of the environment variables recognized by csWMPI.
| Environment Variable: |
Description: |
| MPI_ROOT |
Denotes the location of the root directory of the csWMPI installation. Default value: C:\Program Files\csWMPI.
|
| csWMPI_CLUSTER_CONF_FILE |
Denotes the full path and file name of the cluster configuration file to use. If this
environment variable is not set, the default behavior is to search in the current directory
for a file named csWMPI.clusterconf (See Cluster Configuration).
Default value: Not set.
|
| csWMPI_PG_FILENAME |
Denotes the full path and file name of the process group file to use. If this
environment variable is not set, the default behavior is to search in the current directory
for a file named [program name].pg. If this file is not found csWMPI
attempts to read a file named: csWMPI.pg
(See Process Group).
Default value: Not set.
|
| csWMPI_NO_OUTPUT_PREFIX |
In case using mpiexec, if this variable
is defined with a value different than 0, no prefix like "Rank 0: " will be added to processes
output lines. Same as using -noprefix argument in mpiexec.
|
| csWMPI_MASTER_ERROR_OUTPUT |
If this variable is set to a filename, the master process' output (stdout) is redirected to
this file. If the variable is set to null no out is output. (See
(See Error Output Redirection).
Default value: Not set.
|
| csWMPI_SLAVE_ERROR_OUTPUT |
If this variable is set to a filename, the slave(s) process' output (stdout) is redirected to
this file. If the variable is set to null no out is output. (See
(See Error Output Redirection).
Default value: Not set.
|
| csWMPI_PASSWORD_SEMANTICS |
Defines how csWMPI should behave in case the security context information is
not found in the registry. Valid values are ask_user,
get_environment, and return_error.
(See Password Checking).
Default value: Not set.
|
| csWMPI_COLL_SYNC_COMM_START |
MPI_Alltoall variants can suffer from floods in Network switches, when multiple
processes send to the same target process. To avoid that, processes can synchronize with
the receiver before sending. This behaviour doesn't not occur in small communicators
and it depends on the Network devices.
This variable sets the minimum communicator size that will require synchronization of
processes when sending/receiving.
Default value: 10
|
| csWMPI_COLL_SYNC_MSG_START |
MPI_Alltoall can suffer from floods in Network switches, when multiple
processes send to the same target process. To avoid that, processes can synchronize with
the receiver before sending. This behaviour doesn't not occur in small messages
and it depends on the Network devices.
This variable sets the minimum message size that will require synchronization of
processes when sending/receiving. This is only applied when the communicator size is
at least the size speficied through csWMPI_COLL_SYNC_COMM_START.
Default value: 8192 (8KB)
|
|
csWMPI_TCP_RENDEZVOUS_START
|
Specifies the minimum message size (in bytes) of messages transferred using a rendezvous
protocol. The rendezvous protocol will synchronize both sender and receiver and the data
will be sent only when the receiver specified the buffer to receive the data, hence called
the matching receive function. This reduces the number of memory copy operations as well
as avoids allocation and deallocation of big memory buffers.
Default value: 1048576 (1 MB).
|
|
csWMPI_TCP_RECV_BUFFER
|
Specifies the TCP socket's receive buffer in bytes.
Default value: 32768 (32 KB)
|
|
csWMPI_TCP_SEND_BUFFER
|
Specifies the TCP socket's send buffer in bytes.
Default value: 16384 (16 KB)
|
|
csWMPI_TCP_RT_SIGNAL
|
The tcp device for Linux uses a realtime signal during communication.
This signal cannot be used by any other library of the process.
Default value (Linux only): SIGRTMIN+2
|
|
csWMPI_SHMEM_SIZE
|
Specifies the size (in bytes) of the memory region shared by processes on the
machine using the shmem device. The minimum and default
size of this region is 16MB. (See shmem device). Default value: Not set.
|
|
csWMPI_SHMEM_END_POINT
|
The end point of the shared memory region. The default is the bottom of the address
space; unfortunately some other DLLs might load into or otherwise use this region.
In cases where such territorial conflicts occur between DLLs, you can set this variable
to some unused memory region. (See shmem device). Default value: Not set.
|
|
csWMPI_SHMEM_RENDEZVOUS_START
|
For big messages, a rendezvous protocol is needed to avoid flooding the share memory segment.
In MS Windows, the rendezvous protocol is implemented through a zero-copy operation, copying
directly from one process's address to another's. In Linux, the message will be transferred
in small pieces of data. Since such a protocol requires the two participating
processes to rendezvous, it is often only beneficial to use it
for messages above some size. Messages below this size are
temporarily stored in the memory region shared by the processes local to a machine.
Default value for MS Windows: 65536 (64 KB)
Default value for Linux: 131072 (128 KB)
|
|
csWMPI_SHMEM_RT_SIGNAL
|
The shmem device for Linux uses a realtime signal during communication.
This signal cannot be used by any other library of the process.
Default value (Linux only): SIGRTMIN+3
|
|
csWMPI_SHMEM_UNIVERSE_SIZE
|
Specifies the maximum number of processes that can connect to the shared memory device
on a single machine. The default value is 256.
(See shmem device). Default value: Not set.
|
|