The master-slave approach is the most common method for a PVM computation.
It utilizes the power of parallel programming more easily than the other
methods because some computational task can be divided into sub-tasks, and
then these smaller units can be worked on independently by the slave tasks,
and the results sent back to the master when complete. The following code shows how a master program sends an array of data out to a number of slave programs, each of which calculates the sum of a specific row. The slaves return the sum of an individual row of data which the master then records. The data is sent out in what is known as a "multicast" approach, where one packet is sent, yet all slaves listen and receive simultaneously.
master2.c   slave2.c  
In a multi-cast situation, it is not inherently clear whether the program would be more efficient if the master program only sent the specific row of data to be calculated to just the appropriate slave process because there would have to be a number of smaller yet different packets of data sent out in sequence rather than one that is sent to all. Whether the overhead for the extra pvm_pack( ) and pvm_send( ) routines is greater than the overhead of the surplus data sent out is not clear. The relationship between the two factors is an interesting thing to investigate.
The rationale for the relay or pipeline approach parallels some common
approaches used in computer architecture for speeding up computations.
In this situation, if there are a number of steps that although discrete
must be computed in a certain sequence, an algorithm similar to a
"vector pipeline" might be used. This
is an assembly line approach where the first phase is done by the
first processor, afterwhich it passes the results along to the next
processor in line. Meanwhile, the first processor can begin working
on another batch of data. While the second processor is doing its
task. When the second processor is finished, it can pass the data
onto a third, while the second begins working on the next thing from
process one, and the first process can begin on yet another piece of
data. Eventually, the pipeline gets filled and the throughput is significantly increased because each stage of the assembly line is attacking the task in parallel. If the amount of work do be done is small, the pipeline approach is not efficient because of the overhead required to fill the pipeline. If the different stages have some hardware advantage, then the pipeline might be good for improving throughput.
The tree computation approach is more useful for a "divide and conquer"
type algorithm
where a task is sent out to multiple slave-type processes, which in turn
send out off portions to one or more additional slaves. When the
calculations are eventually complete, the data is passed back up the
nodes of the tree and returned to the originator of the task.One potential problem in this recursive approach is the assurance that there is a terminating condition, otherwise a number of processes can continue to be spawned by sub processes, resulting in an endless propagation of child tasks, eventually overwhelming the system resources. Take a look at the program below that allows the generation of a tree with a maximum depth of four levels.
tree4.c  
The following two pages give examples of programs on the Cluster: