PARALLEL VIRTUAL MACHINE

A Parallel Programming Environment
for UNIX Workstations

Essentials of PVM
1.0 Introduction

PVM is a computing environment that will allow a "master" program to run parallel "slave" processes on other computers. The programmer must decide how to design the code in order to take advantage of increased speed through concurrent computation, but not lose too much time to overhead since processes must communicate by passing messages over a network. Debugging PVM programs can be difficult at times because slave processes are not able to print information to the programmer's screen since they are running on remote systems. Sometimes programmers will write debugging messages to a file, but that can be difficult too because all the processes may share the same file space on the network. Therefore, the programmer must consider how to avoid having two or more processes try to write to the same file at the same time. It is best to write error-free code that runs perfectly on the first compilation so that there is no need to debug. (Ha! Ha!!)

2.0 Setting up the Environment

2.1 System Wide Defaults:   .profile

Before trying to run PVM at Jefferson, there are several environment variables and paths that need to be set properly. If the system wide paths are not already set properly, you may need to add ae file called ".profile" which should be in your home directory. This will make things run more smoothly in your local UNIX environment. Currently, these are already set, so only make these changes if you are using a custom .profile which does not recognize the proper paths.

To make the environment aware of various details such as where the root directory was installed for PVM on our systems, the type of architecture being used (LINUX), and the location for the slave PVM daemon processes, include these lines in the .profile file:

export PVM_ROOT=/usr/local/pvm3
export PVM_ARCH=LINUX
export PVM_DPATH=/usr/local/pvm3/lib/pvmd

In order for processes to know where to find system wide as well as your own compiled PVM binaries, include these lines under the section for creating a PATH:
[ -d /usr/pvm3/bin ] && PATH=/usr/local/pvm3/bin:$PATH
[ -d /usr/pvm3/lib ] && PATH=/usr/local/pvm3/lib:$PATH
[ -d ~/pvm3/bin ] && PATH=~/pvm3/bin:$PATH

In order to have access to the online help through the "man" pages, you may also include this line under the section for creating a MANPATH:
[ -d /usr/local/pvm3/man ] && MANPATH=/usr/local/pvm3/man:$MANPATH

2.2 Trusted Host Machines:   .rhosts

PVM uses a set of "trusted" remote hosts that you will access without requiring a passwd. For that to work properly, you need to create a file called ".rhosts" that will contain the names of all systems that you want to trust. You may limit the entries to just a few machines, or else include every computer in the lab. You may copy the .rhosts file for our basic LINUX environment, or create the file yourself with entries that look like the following:
station1.tjhsst.edu
station21.tjhsst.edu
station5.tjhsst.edu
megatron.tjhsst.edu

Note that in our current configuration, the "Transformer" systems have been aliased to the previous station names. Therefore, station1 is the same as optimusprime, station2 is the same as megatron, and so forth. There is no need to have both sets of names since pvm will give you an error message about duplicate hosts if you try to have both host names operational at the same time. The new elements will eventually work with the help of our sysadmin team.

2.3 Creating the Proper Dirertories

You will also need to create some directories that PVM will expect to find. In your home directory, create a directory called "pvm3", and under that create a directory called "bin". On that same level as "bin", you may also want to create a directory called "src" for source files. Under the "bin" directory, make a subdirectory called "LINUX" for binaries compiled on our LINUX machines. Use the directory name "SGI5" for pvm binaries that run on the INDYs.
               username
                  \
                  pvm3
                    \
                    bin
                      \
                      LINUX
                          \
                         files 

2.4 Modifying Your Complile Script

You may also wish to have a simple script to help compile your PVM programs and to include necessary libraries for the appropriate architecture. Here is an example script you may use:

gcc $1.c -o $1   -L/usr/pvm3/lib/LINUX   -lpvm3
or
g++ $1.cpp -o $1   -L/usr/pvm3/lib/LINUX   -lpvm3

Note:   These scripts will run the standard C-compilers (gcc for ANSI C or g++ for C++). The scripts will expect the name of a source file that will have an appropriate file extender (.c or .cpp respectively), and will create an executable file of the same name but without the extension. It will link in the necessary libraries for the LINUX version of PVM version 3. You may wish to add the pvm options to your existing compiler script that we used for OpenGL, Lgcc or Lg++. To do this, just add the missing portions of the above scripts that deal with pvm3 to the one you are currently using.

3.0 Running PVM

The PVM environment is controlled by several independent processes that will be spawned on different computers. One process, "pvmd3" is considered the master daemon process run from your console. The other, "pvmd", is for slave processes that run on the remote systems.

To start the console process, run the program by typing:

"pvm"

or "/usr/pvm3/lib/pvm" if paths are not set up properly.

When the console process is running in a window, the prompt will change to the string "pvm>" instead of whatever you had previously. At this point, the main daemon is running, but the remote systems have not been started.

When the environment is set up and running, then just execute PVM programs from any window other than the console used to configure the environment. Type the name of the executable as you would any other program, and the environment will handle all of the parallel communication and execution of slave task processes.

3.1 Adding Systems to the Environment:   "add"

When pvm first begins, the computer that is running the console has the ability to spawn tasks on itself but there is no parallelism. To incorporate other systems listed in your .rhosts file to the active environment, just type "add" followed by the host name at the prompt.

add station21

3.2 Other Console Commands:   "conf" and "help"

If everything is satisfactory, the new host will be added to the active environment and a subordinate daemon will be started on that computer. To review which systems have been added to your configuration, type the command "conf". To find a list of other console commands, type "help", but you might have to look at the man pages to see what arguments they require.

3.3 Don't Forget:   "halt"

Since processes are running remotely on machines that you are not currently logged into, it is important to type the command "halt" before quitting PVM. This will terminate the console process while it signals all of the remote processes to stop, and will also delete some temporary files associated with your user ID located in /tmp on each machine.

If a user accidentally forgets to type "halt" before logging off, there will be daemon processes left running on the remote machines as well as a number of temporary files that must be deleted before the pvm will work properly again. To recover, try the following steps:

  1. Login to each machine that was being used in the previous environment and delete any currently active pvm processes under your username. This can be done by typing the command
    ps -auwx   |  grep username
    dhyatt   10532  0.0  0.5  1384  724 ?        S    15:57   0:00 /usr/local/pvm3/lib/LINUX/pvmd3 -s -d0x0 -nstation2 1 c6261101:075f 4
    dhyatt   11099  0.0  0.8  1752 1028 ttyp0    S    17:04   0:00 -bash
    dhyatt   11122  0.0  0.8  2776 1140 ttyp0    R    17:05   0:00 ps -auwx
    dhyatt   11123  0.0  0.3  1124  408 ttyp0    S    17:05   0:00 grep dhyatt
    

  2. Next look in that list and delete ones that involve pvm by using the "kill" command and the associated "process-id":
    kill -9 process-id
          or in this case...
    kill -9 10532.

  3. Go to that computer's temporary file directory called /tmp and look for temporary files that belong to you and have the word pvm in the file name. This can be done typing the following command:
    ls -alF pvm*   |   grep username
    -rw-------   1 dhyatt   faculty        15 Apr 15 15:55 pvmd.1024
    -rw-------   1 dhyatt   faculty       129 Apr 15 15:55 pvml.1024
    
  4. Delete those files by typing:
    rm pvmd.1024
    rm pvml.1024

    Note: the pvmd is the most important one to delete. The other one will actually be written over if you are successful in running pvm again.

4.0 Simple Send and Receive Program

In this example, the master0 program sends an integer to 10 slave0 programs at the same time. Each slave0 program calculates the recprocal of that number, and and returns the floating point value of the reciprocal, as well as the individual ID of the process.

4.1 The Master Program:   master0

Below is the code for the master program, master0.c

master0.c

// The Master program will send an integer to the slave and will expect to 
// receive back the reciprocal of that number

#include < stdio.h >
#include "/usr/local/pvm3/include/pvm3.h"
#define SLAVENAME "slave0"
#define MAXTASKS 10

main()
{
    int nproc, numtasks, num, i, who, msgtype;
    int mytid;                  /* my task id */
    float reciprocal;    	/* reciprocal */
    int tids[MAXTASKS];		/* slave task ids */
    struct pvmhostinfo *hostp[MAXTASKS]; /* pointers to host information */
    /* Note: the above struct is not necessary in this minimal program 
       but is a common variable used in many larger PVM programs.  It is 
       required for a number of system calls.  We thank our substitute,
       Mr. Tepper, for pointing this out.  See "master2.0" for example. 
    */
       

/* Find this processes task id in PVM */
    mytid = pvm_mytid();

/* Set number of slave processes desired */
nproc = MAXTASKS;

/* Start up slave tasks */  

// If the number of tasks spawned does not match nproc, there's a problem!

    numtasks = pvm_spawn(SLAVENAME, (char**)0, 0, "", nproc, tids);

/* Broadcast data to each slave task */
msgtype = 99;

   for (i=0; i < nproc; i++)		// Cycle through all processes
      { num = i; 			// Assign value to num
	pvm_initsend(PvmDataDefault); 	// Get message buffer ready to send
	pvm_pkint(&num, 1, 1);    	// Pack the number into the buffer 
    	pvm_send(tids[num], msgtype);	// Send buffer to appropriate process
      }

/* Wait for results from slaves */

    msgtype = 55;  	// This value is arbitrary, just for ID purposes

    for( i=0 ; i < nproc ; i++ )  	// Wait for replies from all processes
      {
    	pvm_recv( -1, msgtype );	// Wait for message of right type
    	pvm_upkint( &who, 1, 1 );	// Find out who sent message
        pvm_upkfloat(&reciprocal, 1, 1);	// Unpack Reciprocal

	// Display results
    	printf("Process:  %d  Reciprocal: %10.6f\n",  who, reciprocal  );  
    }

/* Program Finished - exit PVM before stopping */
    pvm_exit();
}


4.2 The Slave Program:   slave0.c

Below is the code for the slave program, slave0.c

slave0.c

// This slave process will receive an integer, and send back the reciprocal

#include < stdio.h >
#include "/usr/local/pvm3/include/pvm3.h"

main()
 { int mytid, nproc;
   int master, msgtype;
   float reciprocal;
   int num;

/* Enroll in pvm */
   mytid = pvm_mytid();    // Get my processor ID

/* Receive data from master */
   msgtype = 99;  		// This was arbitrarily set in master.c
   pvm_recv( -1, msgtype );	// Get ready to receive initial data
   pvm_upkint(&num,1, 1);     	// Unpack Number sent by master 

if (num==0)
    {reciprocal = 0.0;}
else
    {reciprocal = 1.0 / num;}

/* Send data back to master */
   pvm_initsend( PvmDataDefault );  	// Get ready to send data
   pvm_pkint( &mytid, 1, 1 );     	// Pack which processor I am
   pvm_pkfloat( &reciprocal, 1, 1);	// Pack Reciprocal of number sent
   msgtype = 55;                        // Identify message type
   master = pvm_parent();              // Find out where I came from
   pvm_send( master, msgtype );        // Send message back to parent process

/* Program finished. Exit PVM before stopping */
   pvm_exit();

}  
  

4.3 Example Output

Below is a sample run of the master0 program. Notice that the order in which the values were sent out to the slave programs does not necessarily match the order in which they were returned. The time of execution depends upon what other tasks the remote machines were running. If the program were run a second time, the results would likely be different again since system loads and other circumstances might have changed.

Also look at the process ID Numbers, since there were only three machines running in the environment when this program was run. Therefore, the ones that started with the 262*** were all on one machine, the processes starting with 524*** were on another, and 768*** were on the third.


Process:  786448  Reciprocal:   0.250000
Process:  262167  Reciprocal:   0.142857
Process:  262168  Reciprocal:   0.125000
Process:  786449  Reciprocal:   0.200000
Process:  524309  Reciprocal:   0.000000
Process:  786450  Reciprocal:   0.166667
Process:  524310  Reciprocal:   1.000000
Process:  524311  Reciprocal:   0.500000
Process:  524312  Reciprocal:   0.333333
Process:  262169  Reciprocal:   0.111111

The code for these programs is available below:
master0.c

slave0.c


5.0 Another Send and Receive Program

The following two programs work as a unit. The master1 program spawns a set of processes that will run on available computers that are configured in the active PVM environment. It sends to each of them a random number. Every slave1 program accepts the data from the master, determines the name of the machine that it is running on, and then returns the random number, the process ID, and the machine name. The master program then displays that data on the screen as it is received from the various slaves.

The code for those programs is linked below followed by the example output.

master1.c

slave1.c

Example Output:

    corona:~/pvm3/bin/LINUX$     master1
    Enter the number of tasks to start? (Max is 10)     10
    System: (station1.tjhsst.edu)
        Process: 1048583 Random: 1693851068
    System: (station2.tjhsst.edu)
        Process: 786445 Random: 143796173
    System: (station5.tjhsst.edu)
        Process: 524301 Random: 2124593721
    System: (mirage.tjhsst.edu)
        Process: 1310733 Random: 1824686042
    System: (vortex.tjhsst.edu)
        Process: 1572877 Random: 376641819
    System: (station2.tjhsst.edu)
        Process: 786446 Random: 1297272263
    System: (station5.tjhsst.edu)
        Process: 524302 Random: 2120426857
    System: (mirage.tjhsst.edu)
        Process: 1310734 Random: 954237623
    System: (vortex.tjhsst.edu)
        Process: 1572878 Random: 1380927037
    System: (corona.tjhsst.edu)
        Process: 262159 Random: 2008959114
    corona:~/pvm3/bin/LINUX$

6.0 Multicast with Arrays Program

The following programs were used to sum the elements in an array, row by row, and display the results on the screen. The program master2 sends the entire array of data to every processor in the configuration in one multicast approach. In other words, all machines are sent the data in one command. Each slave2 program decides which row it must add up, and returns the sum of that row to the calling program.

Listed below are the two programs as well as the example output:

master2.c

This program spawns a specified number of slave processes on participating nodes, then sends to those nodes a list of ID's and an array of data. All of the data is packed into a communication packet, a copy of which is sent to each active daemon process. The program then waits for the slave processes to send the data back so that it can display the results. It is imortant to pack and unpack the mixed data items in exactly the same order, otherwise the content will become garbage.
slave2.c
The slave program unpacks the data packet and figures out which process it is with respect to the parent program. The slave then does its share of the work, in this case adding up the numbers in a specified row of the array. The slave process then sends that answer back to the parent process.
Example Run with Output PVM is often run from two separate windows. In the "console window", the programmer sets up the PVM environment starting a master daemon process on the computer. The programmer then creates a secondary "runtime window" where programs are run and output will be displayed.

The following text shows a sample run using the programs described previously.

Console Window:

station16:~$ pvm    Start running PVM from station16

pvm> add station1     Set up the environment
1 successful

pvm> add station2
1 successful

pvm> add station3
1 successful



pvm> conf     Check the configuration

4 hosts, 1 data format
HOST DTID ARCH SPEED
station16 40000 LINUX 1000
station1 80000 LINUX 1000
station2 c0000 LINUX 1000
station3 100000 LINUX 1000
pvm> halt


Runtime Window:

station16:~/pvm3/bin/LINUX$ master1     Run the master program from the LINUX directory on station16

How many slave programs (1-MAXTASKS)?
10

Init done:       Initialization is done and packets have been sent. The last column of the array will hold the sum of the elements in the row.

0: 5.00 40.00 73.00 14.00 21.00 77.00 86.00 0.00
1: 85.00 23.00 16.00 77.00 73.00 39.00 66.00 0.00
2: 97.00 65.00 71.00 32.00 10.00 80.00 72.00 0.00
3: 61.00 76.00 8.00 63.00 57.00 75.00 3.00 0.00
4: 24.00 1.00 0.00 81.00 93.00 26.00 96.00 0.00
5: 15.00 55.00 82.00 52.00 30.00 98.00 29.00 0.00
6: 55.00 89.00 47.00 4.00 7.00 71.00 36.00 0.00
7: 17.00 3.00 60.00 78.00 80.00 68.00 41.00 0.00
8: 37.00 96.00 97.00 13.00 49.00 97.00 95.00 0.00
9: 43.00 75.00 91.00 58.00 82.00 73.00 10.00 0.00

Slave processes are calculating sums and sending back results

I got 343.000000 from 3 which is tid 262213
I got 321.000000 from 4 which is tid 262214
I got 316.000000 from 0 which is tid 1048628
I got 484.000000 from 8 which is tid 786477
I got 379.000000 from 1 which is tid 1048629
I got 432.000000 from 9 which is tid 786478
I got 427.000000 from 2 which is tid 1048630
I got 361.000000 from 5 which is tid 524353
I got 309.000000 from 6 which is tid 524354
I got 347.000000 from 7 which is tid 524355

Master process has all data and can now print out results

Final:
0: 5.00 40.00 73.00 14.00 21.00 77.00 86.00 316.00
1: 85.00 23.00 16.00 77.00 73.00 39.00 66.00 379.00
2: 97.00 65.00 71.00 32.00 10.00 80.00 72.00 427.00
3: 61.00 76.00 8.00 63.00 57.00 75.00 3.00 343.00
4: 24.00 1.00 0.00 81.00 93.00 26.00 96.00 321.00
5: 15.00 55.00 82.00 52.00 30.00 98.00 29.00 361.00
6: 55.00 89.00 47.00 4.00 7.00 71.00 36.00 309.00
7: 17.00 3.00 60.00 78.00 80.00 68.00 41.00 347.00
8: 37.00 96.00 97.00 13.00 49.00 97.00 95.00 484.00
9: 43.00 75.00 91.00 58.00 82.00 73.00 10.00 432.00


Console Window:

pvm> halt     Always remember to shut down PVM by typing halt. Otherwise, the slave processes remain alive on the other systems even after the user has logged off the console.

station16:~$ logout



7.0 Examples of a Tree Computation

There is only one program here since the master and slave activities are handled by the same program. The tree program continues to call itself until a predetermined number of subcalls is reached. Then the spawned processes return data to the master program.
This program also has an example of how to use a series of unique files in order to keep track of debugging during the recursive calls.

Source Code     tree2.c


8.0 Some Student Programs Using PVM

Eamon Walsh

Mandelbrot Set in Progress



Mandelbrot Set and PVM

Eamon used PVM to improve calculation speed for fractals such as the Mandelbrot Set. The image to the left shows the familiar image in the process of being generated. The master program displays a row at a time as soon as the slave process has returned the necessary data.
master.cpp     slave.cpp
Mike Gordon





Computer Simulation and PVM

Mike used PVM to increase execution speed when simulating the behavior of projectiles in flight. This graphic represents the solution set of possible hits on a target when all possible angles between 0 and 90 degrees (the vertical axis) are matched with a full range of velocities (the horizontal axis). Every pixel on the screen represented a different combination of angle and velocity that was run through the simulation. The green line indicates those combinations which hit the target whereas black means the projectile missed. A more thorough description of this activity can be found in the article Getting Started with Supercomputing.
master.cpp     slave.cpp

9.0 Additional Resources on PVM



Instructors:

Donald W. Hyatt:     dhyatt@tjhsst.edu

Phyllis T. Rittman:     prittman@tjhsst.edu