Skip to content

Defect: Performance of derived type coarrays #556

Open
@t-bltg

Description

@t-bltg
Avg response time
Issue Stats

Defect/Bug Report

  • OpenCoarrays Version: OpenCoarrays Coarray Fortran Compiler Wrapper (caf version 2.1.0)
  • Fortran Compiler: gfortran 8.1.0 + patches
  • C compiler used for building lib: gcc 8.1.0
  • Installation method: cmake
  • Output of uname -a: Linux <...> 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • MPI library being used: mpich 3.2.1
  • Machine architecture and number of physical cores: x86_64 32 cores
  • Version of CMake: 3.11.4

Observed Behavior

I've observed some massive slowdown of my code when copying co-array locally (no [remote] references).

Expected Behavior

Memcpy should be used, if not, at least we expect no MPI communications !

Steps to Reproduce

issue.f90

! caf -o issue issue.f90
! cafrun -np 2 ./issue
module co_obj
   implicit none
   type co
      real(8), allocatable :: a(:, :, :, :)[:]
   end type
end module

program main
   use co_obj
   use mpi
   implicit none

   type(co) :: lhs, rhs
   real(8) :: t0
   real(8), allocatable :: buf(:, :, :, :)
   integer :: ni, nj, nk, nl, i, j, k, l

   ni = 8
   nj = 8
   nk = 8
   nl = 8

   if (num_images() /= 2) error stop 1

   allocate( &
      lhs % a(ni, nj, nk, nl)[*], &
      rhs % a(ni, nj, nk, nl)[*], &
      buf(ni, nj, nk, nl))

   sync all

   print *, '==> START <=='
   t0 = mpi_wtime()
   buf(:, :, :, :) = rhs % a
   lhs % a = buf
   print *, 't1=', mpi_wtime() - t0

   sync all

   t0 = mpi_wtime()
   lhs % a = rhs % a ! implicit MPI transfer, where there should NOT be !
   print *, 't2=', mpi_wtime() - t0

   sync all
   t0 = mpi_wtime()
   do l = 1, nl
      do k = 1, nk
         do j = 1, nj
            do i = 1, ni
               lhs % a(i, j, k, l) = rhs % a(i, j, k, l)
            end do
         end do
      end do
   end do
   print *, 't3=', mpi_wtime() - t0

   sync all
   print *, '==> STOP <=='

end program

output (decimals truncated)

 ==> START <==
 ==> START <==
 t1=   1.05E-004
 t1=   9.32E-005
 t2=   8.96     # <== yes this is clearly a bottleneck
 t2=   9.11   
 t3=   2.69E-005
 t3=   3.43E-005
 ==> STOP <==
 ==> STOP <==

Tracking down the source of this unwanted caf_send, in the fortran sources:

gcc/fortran/trans-expr.c around l. 10240

  else if (flag_coarray == GFC_FCOARRAY_LIB
	   && lhs_caf_attr.codimension && rhs_caf_attr.codimension
	   && ((lhs_caf_attr.allocatable && lhs_refs_comp)
	       || (rhs_caf_attr.allocatable && rhs_refs_comp)))
    {
      /* Only detour to caf_send[get][_by_ref] () when the lhs or rhs is an
	 allocatable component, because those need to be accessed via the
	 caf-runtime.  No need to check for coindexes here, because resolve
	 has rewritten those already.  */
      gfc_code code;
      gfc_actual_arglist a1, a2;
      /* Clear the structures to prevent accessing garbage.  */
      memset (&code, '\0', sizeof (gfc_code));
      memset (&a1, '\0', sizeof (gfc_actual_arglist));
      memset (&a2, '\0', sizeof (gfc_actual_arglist));
      a1.expr = expr1;
      a1.next = &a2;
      a2.expr = expr2;
      a2.next = NULL;
      code.ext.actual = &a1;
      code.resolved_isym = gfc_intrinsic_subroutine_by_id (GFC_ISYM_CAF_SEND);
      tmp = gfc_conv_intrinsic_subroutine (&code);
    }

So this is strange: gfortran is delegating the assignment to the underlying coarray lib, even if no explicit remote reference is done (arr(..)[]) !
those need to be accessed via the caf-runtime => Why ?
The documentation clearly states that caf_send is to be used to send data to a remote process, not locally ...

Question

Lets's assume that the assignment needs to be handled by the caf lib, shouldn't we try to use memcpy if we detect that remote_img == this_image ?

If someone could clarify the strategy: should I

  1. patch gfortran so that the assignement does not refer a caf_send
    OR
  2. patch OpenCoarrays trying to avoid MPI comms ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions