Description
Avg response time |
---|
Defect/Bug Report
- OpenCoarrays Version:
OpenCoarrays Coarray Fortran Compiler Wrapper (caf version 2.1.0)
- Fortran Compiler:
gfortran 8.1.0 + patches
- C compiler used for building lib:
gcc 8.1.0
- Installation method: cmake
- Output of
uname -a
:Linux <...> 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
- MPI library being used:
mpich 3.2.1
- Machine architecture and number of physical cores: x86_64 32 cores
- Version of CMake:
3.11.4
Observed Behavior
I've observed some massive slowdown of my code when copying co-array locally (no [remote]
references).
Expected Behavior
Memcpy should be used, if not, at least we expect no MPI communications !
Steps to Reproduce
issue.f90
! caf -o issue issue.f90
! cafrun -np 2 ./issue
module co_obj
implicit none
type co
real(8), allocatable :: a(:, :, :, :)[:]
end type
end module
program main
use co_obj
use mpi
implicit none
type(co) :: lhs, rhs
real(8) :: t0
real(8), allocatable :: buf(:, :, :, :)
integer :: ni, nj, nk, nl, i, j, k, l
ni = 8
nj = 8
nk = 8
nl = 8
if (num_images() /= 2) error stop 1
allocate( &
lhs % a(ni, nj, nk, nl)[*], &
rhs % a(ni, nj, nk, nl)[*], &
buf(ni, nj, nk, nl))
sync all
print *, '==> START <=='
t0 = mpi_wtime()
buf(:, :, :, :) = rhs % a
lhs % a = buf
print *, 't1=', mpi_wtime() - t0
sync all
t0 = mpi_wtime()
lhs % a = rhs % a ! implicit MPI transfer, where there should NOT be !
print *, 't2=', mpi_wtime() - t0
sync all
t0 = mpi_wtime()
do l = 1, nl
do k = 1, nk
do j = 1, nj
do i = 1, ni
lhs % a(i, j, k, l) = rhs % a(i, j, k, l)
end do
end do
end do
end do
print *, 't3=', mpi_wtime() - t0
sync all
print *, '==> STOP <=='
end program
output (decimals truncated)
==> START <==
==> START <==
t1= 1.05E-004
t1= 9.32E-005
t2= 8.96 # <== yes this is clearly a bottleneck
t2= 9.11
t3= 2.69E-005
t3= 3.43E-005
==> STOP <==
==> STOP <==
Tracking down the source of this unwanted caf_send, in the fortran sources:
gcc/fortran/trans-expr.c around l. 10240
else if (flag_coarray == GFC_FCOARRAY_LIB
&& lhs_caf_attr.codimension && rhs_caf_attr.codimension
&& ((lhs_caf_attr.allocatable && lhs_refs_comp)
|| (rhs_caf_attr.allocatable && rhs_refs_comp)))
{
/* Only detour to caf_send[get][_by_ref] () when the lhs or rhs is an
allocatable component, because those need to be accessed via the
caf-runtime. No need to check for coindexes here, because resolve
has rewritten those already. */
gfc_code code;
gfc_actual_arglist a1, a2;
/* Clear the structures to prevent accessing garbage. */
memset (&code, '\0', sizeof (gfc_code));
memset (&a1, '\0', sizeof (gfc_actual_arglist));
memset (&a2, '\0', sizeof (gfc_actual_arglist));
a1.expr = expr1;
a1.next = &a2;
a2.expr = expr2;
a2.next = NULL;
code.ext.actual = &a1;
code.resolved_isym = gfc_intrinsic_subroutine_by_id (GFC_ISYM_CAF_SEND);
tmp = gfc_conv_intrinsic_subroutine (&code);
}
So this is strange: gfortran
is delegating the assignment to the underlying coarray lib, even if no explicit remote reference is done (arr(..)[]
) !
those need to be accessed via the caf-runtime
=> Why ?
The documentation clearly states that caf_send
is to be used to send data to a remote process, not locally ...
Question
Lets's assume that the assignment needs to be handled by the caf lib, shouldn't we try to use memcpy
if we detect that remote_img == this_image
?
If someone could clarify the strategy: should I
- patch
gfortran
so that the assignement does not refer acaf_send
OR - patch
OpenCoarrays
trying to avoid MPI comms ?