Hybrid OpenMP/MPI matrix multiplication -
i matrix multiplication traditional way block, 1 mpi task spawn thread, problem how define send , when receive results openmp. if 1 can me great simple sample great.
there different ways can approach problem. 1 break first matrix groups of rows, , send 1 group each rank. there, use openmp parallelize multiplication. finally, recombine results single matrix. using approach, use mpi_send send groups out each rank. assuming rank 0 has full matrix, use like:
float a[ndim1*ndim2]; float b[ndim2*ndim3]; float c[ndim1*ndim3]; nrows=ndim1/nranks; (int i=1;i++;i<nranks) { startrow=nrows*i; nelems=nrows*ndim2; if (i==nranks-1) // better ways this, simple example { nelems+=(ndim1%nranks)*ndim2; } mpi_send[&a[startrow], nelems, mpi_float, i, 0, mpi_comm_world); }
notice starts rank 1, there's no need send rank 0 itself. we'll have rank 0 working on part of matrix well.
to receive in each of ranks, use
nelems=nrows*ndim2; if (myrank==nranks-1) { nelems=(ndim1%nranks)*ndim2; } mpi_recv(locala, nelems, mpi_float, 0, 0, mpi_comm_world, mpi_status_ignore);
you'll need copy first nrows of locala directly. you'll need send entire b array each rank, needed dimensions (unless these values there other means).
once of data in each rank, split rows using openmp handle 1 row @ time.
#pragma omp parallel private(ia,ib,ic) (int i=0;i<localnrows;i++) { (int j=0;j<ndim3;j++) { (int k=0;k<ndim2;k++) { ia=i*ndim3+k; ib=k*ndim2+j; ic=i*ndim2+j; localc[ic]=locala[ia]*b[ib]; } } }
then pass localc arrays rank 0 how locala passed, swapping mpi_send , mpi_recv.
Comments
Post a Comment