-
-
Notifications
You must be signed in to change notification settings - Fork 420
Conversation
Thanks for your pull request and interest in making D better, @etcimon! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + druntime#2234" |
|
Actually it looks like DMD is already smart enough not to do this unnecessarily. See dmd/e2ir.d#L4080-L4109: // Convert from dynamic array to dynamic array
if (tty == Tarray && fty == Tarray)
{
uint fsize = cast(uint)tfrom.nextOf().size();
uint tsize = cast(uint)t.nextOf().size();
if (fsize != tsize)
{ // Array element sizes do not match, so we must adjust the dimensions
if (fsize % tsize == 0)
{
// Set array dimension to (length * (fsize / tsize))
// Generate pair(e.length * (fsize/tsize), es.ptr)
elem *es = el_same(&e);
elem *eptr = el_una(OPmsw, TYnptr, es);
elem *elen = el_una(irs.params.is64bit ? OP128_64 : OP64_32, TYsize_t, e);
elem *elen2 = el_bin(OPmul, TYsize_t, elen, el_long(TYsize_t, fsize / tsize));
e = el_pair(totym(ce.type), elen2, eptr);
}
else
{ // Runtime check needed in case arrays don't line up
if (config.exe == EX_WIN64)
e = addressElem(e, t, true);
elem *ep = el_params(e, el_long(TYsize_t, fsize), el_long(TYsize_t, tsize), null);
e = el_bin(OPcall, totym(ce.type), el_var(getRtlsym(RTLSYM_ARRAYCAST)), ep);
}
}
goto Lret;
} @etcimon Do you have an example where your branch would be taken? |
Although there is still room for possible improvements. For instance, when |
Sorry I'm trying to update my server's dmd and checking if every optimization made its way... What I have is here: https://github.com/etcimon/druntime/blob/2.070-custom/src/rt/arraycast.d#L26 It looks like that modulo wasn't even necessary maybe? |
It's not logically necessary but D specifies it as a runtime error if the length in bytes isn't the same after the cast. Perhaps the check could be replaced with: version (D_NoBoundsChecks) {}
else version (D_BetterC) { assert(nbytes % tsize == 0, "array cast misalignment"); }
else if (nbytes % tsize != 0) { throw new Error("array cast misalignment"); } Although the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also most sensible sizes (like 99% in arrays at least!) are pow2. I’m certain you can make a fast bypass for that and gain a lot.
Most common casts I’ve seen are:
uint[] <-> ubyte[]
size_t[] <-> ubyte[]
Plus some structs that are basically single integer or pairs. All of these are power of 2, you don’t need modulo for them.
What then? It seems to me like it shouldn't be that big of a problem to discard the last bytes after an array cast, considering that most of the time the size is kept in another allocation reference and it wouldn't cause a memory leak. |
src/rt/arraycast.d
Outdated
{ | ||
throw new Error("array cast misalignment"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you squash, please remove the extra blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the CI fail due to the extra whitespace here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small improvement but no reason not to accept it.
…s said and done and performance becomes critical (When no bounds checks are asked for)
I remember this being a huge bottleneck in crypto/math algorithms where ubyte[]<->int is done frequently |
In that case maybe it would make sense to add a second function optimized for the common case where division can be performed with a right shift. |
On x86 and x86_64 removing the remainder check may not avoid as much work as you hope. With optimization enabled both DMD and LDC are smart enough to use a single DIV instruction to simultaneously calculate |
A problem is that calls to https://run.dlang.io/is/H2zTKs import std.stdio;
int main()
{
align(8)
ubyte[16] a;
ubyte[] b = a[];
uint[] c = cast(uint[]) b;
return c[0] + c[$ - 1];
} |
That's definitely an issue that will need to be addressed. I know the ~40 cycles on the modulo were quite intimidating (vs 1 cycle for most other math) and makes the algorithms perform better in most algorithms where you need to convert some ubyte[] to int[] for the encryption/decryption/hashing operations. |
I've been working on it and basically have the fundamentals working, but I've run into a problem: Currently runtime hooks that are neither For example,
I think if we can resolve that in some way, we'll have and implementation that can be inlined and better optimized. I'm awaiting a response from @WalterBright and @andralex about how to proceed. |
It definitely sounds like there should be a |
Yes, that can be done. |
Catching |
Undefined behavior quickly becomes undocumented feature |
As mentioned in the other PR, I don't think so, because it would get removed in |
If it's an immutable object it is. This compiles an runs [1]: immutable error = new Error("foo");
void foo() nothrow pure @nogc @safe
{
throw error;
}
void main()
{
foo();
} Although the line number will be wrong for the actual exception. But it's correct in the backtrace, so it might be ok anyway. |
#2264 has been submitted to enable more opportunity for optimizing array casts. |
Closing this because it is being made obsolete by #2264 and dlang/dmd#8531. |
Optimization because, in most cases, the new array already has the right length, and a modulo is quite expensive...