From: "Rémi Bernon" <rbernon@codeweavers.com>
Subject: Re: [PATCH vkd3d 1/2] vkd3d-shader: Optimize get_opcode_info with direct opcode_table access.
Message-Id: <95a6f7bc-cce7-b86a-8516-d47e49308154@codeweavers.com>
Date: Fri, 4 Oct 2019 10:05:46 +0200
In-Reply-To: <CAOsNvwyPA7HnNBdXn=kVOjdHV5dSdzFpfepupLDUi-ssL-azDQ@mail.gmail.com>
References: <20191003170933.12734-1-rbernon@codeweavers.com> <CAOsNvwyPA7HnNBdXn=kVOjdHV5dSdzFpfepupLDUi-ssL-azDQ@mail.gmail.com>

On 10/3/19 9:05 PM, Henri Verbeet wrote:
> On Thu, 3 Oct 2019 at 20:42, Rémi Bernon <rbernon@codeweavers.com> wrote:
>> The shader_sm4_read_instruction function shows up in perf report when
>> running SOTTR on Intel because of this loop.
>>
> That seems like a questionable claim. Does this actually improve
> things? Do you have numbers? Direct3D 12 applications should ideally
> not be creating pipeline states at all during rendering, but if they
> do, actual shader compilation is going to be much more expensive than
> anything we do here.
> 
> That's not to say this can't be improved though.
> 

Yes I did the measurements, and perf (with default settings) reports the 
function from ~2.5% self overhead down to 0.6% with this patch. For the 
second patch it was reporting the other function from 1.7% self overhead 
and didn't report it with the second patch, because it gets inlined 
somewhere - but the sum of all the vkd3d_spirv function overhead is lowered.

My interpretation is that the shader compilation that happens at startup 
but is CPU bound - and it is noticeable. I then didn't let the game run 
for very long but it is highly GPU bound afterwards, so nothing in 
particular shows up in perf.
-- 
Rémi Bernon <rbernon@codeweavers.com>