MPI Lab 11 Parallel Programming Application 3
Sorting in Parallel: Odd-Even Transposition Sort
Part 1: One number per processor

Assignment
Program the Odd-Even Transposition sort using MPI and multiple processors, one number per processor.

In part 1, n = P, the number of values that are being sorted equals the number of processors

  • Version 1: Use one number on each processor
    Ex. sort 4 values with 4 processors, 8 values with 8 processors, 16 values with 16 processors, etc
  • oddEvenStarter.c

    1. Background reading, see Barry Wilkinson's slides on Sorting Algorithms (pdf), (his book is Parallel Programming)
      • Potential Speedup (Slide 418) when using parallel sorting methods
           O(nlogn) is optimal for any sequential sorting algorithm non-parallel
        
           In parallel, with n processors, the optimal time is O(nlogn)/n = O(logn)
         
      • Rank sort (Slide p. 420)
           Algorithm: Count how many numbers in a list are less than a particular selected number.  
           This count tells you the position in the list for the selected number.
           The time complexity for this sort is O(n2), i.e. not very good.
         
      • Parallel Rank sort (slide 422)
            Allocate one number for each processor.  Each processor will take O(n) steps 
            to determine x, the count of the values less than a[i].
         
      • Better than O(n)? (slides 423-425)
            Increasing the number of processors can theoretically reduce time to O(logn),
            but who has n2 processors?
         
    2. Background reading on parallel compare-and-exchange sorts
      • Compare and Exchange Sorts (sorts that use swap) - slide 427
      • Message passing (MPI) compare-and-exchange - slide 428, Version 1
            (this is sorting in ascending order, processor 1 should have a smaller value than processor 2)
             1. Processor 1 sends value A to processor 2
             2. Processor 2 compares its own value B with A. 
                If A > B, Processor 2 send B back to processor 1,
        	Otherwise processor 2 sends A back to processor 1.
         
      • Message passing (MPI) compare-and-exchange - slide 429, Version 2
             1. Processor 1 sends value A to processor 2, and processor 2 sends B to processor 1
             2. Processor 1 stores the smaller of A and B, and processor 2 stores the larger of A and B.
         
      • Data partitioning - more than one number per processor, Version 1, slide 431
            There are p processors and n numbers.  n/p numbers are therefore 
            on each processor.
            In this example n/p is 4 numbers per processor.
            Processor 1 sends [88,50,28,25] - already sorted values - to processor 2.
            Processor 2 merges these values with its already sorted values, [98,80,43,42].  
            Processor 2 sends back to processor 1 the lower half of these merged values - [43,42,28,25]
        
      • Data partitioning - more than one number per processor, Version 2, slide 432
            Processor 1 sends [88,50,28,25] - already sorted values - to processor 2,
              and processor 2 sends its values [98,80,43,42] to processor 1
            Processors 1 and 2 each merge the eight values to a sorted list. 
            Processor 1 keeps the lower half of the values, and processor 2 sends back 
              keeps the upper half.
        
      • Bubble sort, non-parallel (slides 433, 434)
            In phase 1 the largest number, 8, is placed in its correct position
            In phase 2 the next largest, 7, is correctly placed
            This continues until all are sorted.
            Time complexity is O(n2)
          
      • Odd-even transposition sort, a parallel variation of bubble sort, slide 436
        • Even phase: even numbered processes exchange numbers with their right neighbor
        • Odd phase: odd-numbererd processes exchange numbers with their right neighbor
      • Diagram of odd-even sort, slide 437
        • Steps 0, 2, 4, 6, ... (even phases)
          • even numbered processes exchange max values with odd processes to the RIGHT
          • The even numbered process will have the smaller value, and the odd numbered process will have the greater value.
        • Steps 1, 3, 5, 7, ... (odd phases)
          • odd processes exchange max values with even processes to the RIGHT
          • The odd numbered process will have the smaller value, and the even numbered process will have the greater value.
      • Switch to these slides for more information on odd-even sort. Go to slide 224 for Odd-even transposition sort code.
        • In this code, the number finally being stored in even Processors is B, and the number being stored on odd processors is A
        • Even phase:
          • Even processes: P(i), i = 0, 2, 4, 6, ...
            (even processes keep B, odd processes keep A)
             //Pass max value to the right
             recv(&A, P(i+1)); //from the right
             send(&B, P(i+1)); //to the right
             if (A < B) B = A;
             //B (on the even process) is less than the A (on the odd process) to the right
             
          • Odd processes: P(i), i = 1, 3, 5, 7, ...
             send(&A, P(i-1)); //Send to the left
             recv(&B, P(i-1)); //Receive B from the left
             //A (on the odd process) is larger than B (on the even process) to the left
             if (A < B) A = B;
             
             
        • Part 2: Odd phase
          • P(i), i = 1,3,5,7... (odd processors)
             send(&A, P(i+1)); //send value of A to the right
             recv(&B, P(i+1)); //recv B from the right
             if (A > B) A = B; //A (this time, the value on the left) is the min value
            
             
          • P(i), i = 0,2,4,6,... (even processors)
             recv(&A, P(i-1)); //recv A from the left
             send(&B, P(i-1)); //send value of B to the left
             if (A > B) B = A; //B (the value on the right) is max value
            
          In both cases,
        • odd-numbered processes execute their send() routines first
        • even-numbered processes execute their recv() routines first

          Combining:

        • P(i),i=1,3,5,...n-3 (odd processors)
           send(&A, P(i-1));
           recv(&B, P(i-1));
           if (A< B) A=B;
           if (i <= n-3) {
           send(&A, P(i+1));
           recv(&B, P(i+1));
           if (A> B ) A = B;
           }
          
        • P(i),i=0,2,4,...n-2 (even processors)
           recv(&A, P(i+1));
           send(&B, P(i+1));
           if (A< B) B=A;
           if (i >= 2) {
           recv(&A, P(i-1));
           send(&B, P(i-1));
           if (A>B) B = A;
           }
          
        • Example run through, using eight values to be sorted: 21, 16, 35, 50, 13, 20, 32, 45.
          In this example, eight processors are used to sort eight values.

          Values are arranged one number per processor. A's are on odd processors, and B's are on the even processors:

             0   1   2   3   4   5   6   7
             21  16  35  50  13  20  32  45
             B   A   B   A   B   A   B   A
          

          Part 1: The Even phase

          Step 1: If processor P has an odd rank, these odd processors send a copy of their A to the even processor on the left:

             0       1       2        3       4       5        6         7
           21 16 <-- 16    35 50 <--  50    13 20 <-- 20     32 45 <--   45
          
          Step 2: The even processors each send back the maximum of their two values to A in the odd processors to the right:
          (Only if the even processor is not the last process)
            0     1      2       3     4       5     6      7
           16 --> 21     35 -->  50    13 -->  20   32 -->  45
           B      A      B       A     B       A     B      A
          
          At this point, A (the max of A and B) is stored on the odd processors, and B (the minimum of A and B), is stored in the even processors.

          Part 2, the Odd phase.

          Step 1: If processor P is odd, send a copy of A to processor P+1 to the right:
          (Only if the odd processor is not the last processor)

              0    1       2      3       4       5       6       7
             16    21 --> 21 35   50 --> 50 13   20 -->  20 32   45
          
          Step 2: The Even processors (other than process 0) send the minimum value back to the processor on their left.
          At this point A = min (of A and B) on odd processors, and B = max of A and B on even nodes.
              0     1      2      3      4     5      6     7
             16    21 <-- 35     13 <-- 50    20 <-- 32    45
              B     A      B      A      B     A      B     A  
          
          The algorithm now repeats.

          Part 1: Repeat the Even phase.

          Current state of the values on the eight processors:

              0    1      2     3    4     5    6      7
             16    21    35    13    50   20    32    45
             B     A     B     A     B    A     B      A  
          
          Step 1: Odd processors send A to the left
           
              0        1      2          3     4          5     6        7
            16 21 <--  21    35 13 <--  13    50 20 <--  20    32 45 <-- 45
             B         A      B          A     B          A     B        A  
          
          Step 2: Even processors send back the max of A and B to the odd processor on the right (not if the even processor is the last one):
             0     1     2     3     4      5     6      7
            16 --> 21   13 --> 35   20 --> 50    32 --> 45
             B     A     B     A     B      A     B      A  
          
          Part 2: Odd phase

          Step 1: Odd processors send A to the right. (Not if the odd processor is the last one)

             0     1       2      3       4      5      6      7
            16     21 --> 21 13   35 --> 35 20  50 --> 50 32   45
             B     A       B      A       B      A      B      A  
          
          Step 2: Even processors send the minimum of A and B to the odd processor on the left (not processor 0):
             0    1      2     3      4      5      6      7
            16    13 <-- 21    20 <-- 35    32 <--  50     45
             B     A     B     A      B      A      B      A  
          
          This process continues logn times (logn rounded up to the next whole number)

    Assignment: Implement the odd-even sort using one number per processor, in other words use the same number of processors as numbers that you are sorting.

  • Version 1: Use one number on each processor
    Ex. sort 4 values with 4 processors, 8 values with 8 processors, 16 values with 16 processors, etc
  • oddEvenStarter.c