From: "Martin Storsjö" Subject: Re: [PATCH] msvcrt: Avoid disallowed unaligned writes in memset on ARM Message-Id: Date: Thu, 16 Sep 2021 13:01:52 +0300 (EEST) In-Reply-To: References: <20210915202745.3661089-1-martin@martin.st> <00876200-ab90-e23e-86e5-df022b6b0199@gmail.com> On Thu, 16 Sep 2021, Martin Storsjö wrote: > On Thu, 16 Sep 2021, Piotr Caban wrote: > >> Hi Martin, >> >> On 9/15/21 10:27 PM, Martin Storsjo wrote: >>> ARM can do 64 bit writes with the STRD instruction, but that >>> instruction requires a 32 bit aligned address - while these stores >>> are unaligned. >>> >>> Two consecutive stores to uint32_t* pointers can also be fused >>> into one single STRD, as a uint32_t* is supposed to be properly >>> aligned - therefore, do these stores as stores to volatile uint32_t* >>> to avoid fusing them. >> How about letting the compiler know that the pointers are unaligned >> instead? Is attached patch working for you? > > Thanks, that's even better! > > This way the compiler has more freedom to reason about it and can choose to > use another instruction with less alignment requirements (both GCC and Clang > seem to compile it to use a 16 byte VST, an unaligned SIMD store instead) > which probably is much better than forcing the compiler to do a long sequence > of 32 bit stores. > > Clang doesn't seem to know/exploit that the regular 32 bit store instructions > work unaligned though, so the smaller stores get exploded into a long series > of single byte writes. But I guess that's just a missed optimization > opportunity in Clang, I'll see if I can report it. FWIW this seems to be a target specific issue; Clang does optimize it correctly for an armv7-linux-gnueabihf target, but not for armv7-windows. I'll see about getting that fixed. // Martin