Hybrid OpenMP/MPI matrix multiplication -


i matrix multiplication traditional way block, 1 mpi task spawn thread, problem how define send , when receive results openmp. if 1 can me great simple sample great.

there different ways can approach problem. 1 break first matrix groups of rows, , send 1 group each rank. there, use openmp parallelize multiplication. finally, recombine results single matrix. using approach, use mpi_send send groups out each rank. assuming rank 0 has full matrix, use like:

float a[ndim1*ndim2]; float b[ndim2*ndim3]; float c[ndim1*ndim3];  nrows=ndim1/nranks;  (int i=1;i++;i<nranks) {   startrow=nrows*i;   nelems=nrows*ndim2;   if (i==nranks-1)  // better ways this, simple example   {     nelems+=(ndim1%nranks)*ndim2;   }   mpi_send[&a[startrow], nelems, mpi_float, i, 0, mpi_comm_world); } 

notice starts rank 1, there's no need send rank 0 itself. we'll have rank 0 working on part of matrix well.

to receive in each of ranks, use

nelems=nrows*ndim2; if (myrank==nranks-1) {   nelems=(ndim1%nranks)*ndim2; } mpi_recv(locala, nelems, mpi_float, 0, 0, mpi_comm_world, mpi_status_ignore); 

you'll need copy first nrows of locala directly. you'll need send entire b array each rank, needed dimensions (unless these values there other means).

once of data in each rank, split rows using openmp handle 1 row @ time.

#pragma omp parallel private(ia,ib,ic) (int i=0;i<localnrows;i++) {   (int j=0;j<ndim3;j++)   {     (int k=0;k<ndim2;k++)     {       ia=i*ndim3+k;       ib=k*ndim2+j;       ic=i*ndim2+j;       localc[ic]=locala[ia]*b[ib];     }   } } 

then pass localc arrays rank 0 how locala passed, swapping mpi_send , mpi_recv.


Comments

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -