From: "Zebediah Figura (she/her)" <zfigura@codeweavers.com>
Subject: Re: [PATCH 1/2] wined3d: Implement wined3d_deferred_context_update_sub_resource().
Message-Id: <e476bbf5-cd24-2677-817f-6de3a74ec9c1@codeweavers.com>
Date: Fri, 11 Jun 2021 16:00:41 -0500
In-Reply-To: <CABvNrtMxGqxE=C-S4ytGuBNPyzbub1kWxFsf8nngc8+j4FSNCA@mail.gmail.com>
References: <20210528052127.1589918-1-z.figura12@gmail.com> <CABvNrtML8201Fm6+mHqVdGUuMq_8qMGMRt-UutwKBFprits0fg@mail.gmail.com> <CAOsNvwyGqWAcGX2026Wq6R2wAuyDwr1pLeQqczX=hA48Nngo0w@mail.gmail.com> <7d0b188b-803a-0fc9-ea73-7803011fb9d5@codeweavers.com> <CAOsNvwxJNr62Lz0nwaNdx7zPtFcWas02U_8+FVaCqnqxYdMYvw@mail.gmail.com> <e6cc951f-a8eb-2d17-39ee-6d997a3796dd@codeweavers.com> <CABvNrtMxGqxE=C-S4ytGuBNPyzbub1kWxFsf8nngc8+j4FSNCA@mail.gmail.com>

On 6/11/21 2:44 AM, Matteo Bruni wrote:
> On Fri, Jun 11, 2021 at 5:40 AM Zebediah Figura (she/her)
> <zfigura@codeweavers.com> wrote:
>>
>> On 5/31/21 6:43 PM, Henri Verbeet wrote:
>>> On Mon, 31 May 2021 at 23:45, Zebediah Figura (she/her)
>>> <zfigura@codeweavers.com> wrote:
>>>> On 5/31/21 4:53 AM, Henri Verbeet wrote:
>>>>> My (very rough) proposal for this would be to replace the "data" field
>>>>> in the wined3d_cs_update_sub_resource structure with a
>>>>> wined3d_bo_address, and then introduce something like the following:
>>>>>
>>>>>        static bool wined3d_device_context_get_upload_bo(struct
>>>>> wined3d_device_context *context,
>>>>>                struct wined3d_resource *resource, unsigned int
>>>>> sub_resource_idx, const struct wined3d_box *box,
>>>>>                uint32_t flags, struct wined3d_bo_address *address, void **map_ptr)
>>>>>        {
>>>>>            ...
>>>>>        }
>>>>
>>>> This function strikes me as more than a little confusing, party because
>>>> it seems to be returning two partially overlapping things, with mostly
>>>> different usages...
>>>>
>>>> i.e. I'd presume the idea is for
>>>> wined3d_device_context_update_sub_resource() to retrieve this and then
>>>> call something like wined3d_context_copy_bo_address() to copy from
>>>> sysmem, which makes sense, but then what are we doing with map_ptr?
>>>>
>>> Well, wined3d_context_copy_bo_address() needs a wined3d_context, which
>>> you typically don't have on application threads.
>>>
>>> The basic idea is that you'd pass "address" to
>>> WINED3D_CS_OP_UPDATE_SUB_RESOURCE, and "map_ptr " to the application.
>>> (In case of map/unmap; for update_sub_resource() you'd just copy the
>>> data into "map_ptr".)
>>>
>>>> Or, if we use this in wined3d_device_context_map(), couldn't we just
>>>> call wined3d_context_map_bo_address() on the returned wined3d_bo_address?
>>>>
>>> Conceptually, sure, you could split this into allocate/map/unmap. In
>>> practice though, you'll always want the upload buffer to be mapped, in
>>> order to copy data into it from system memory. And in fact, if we're
>>> going to return a GPU buffer here, it should generally already be
>>> mapped. (For various reasons; for 64-bit applications, we use
>>> persistent maps if we can; for OpenGL, we want to avoid making GL
>>> calls from application threads; if we have a dedicated upload buffer
>>> belonging to the device context, it would have been mapped during
>>> creation.) When needed, unmap can happen when the upload buffer is
>>> consumed.
>>
>> Thanks for explaining the rationale there, that does help a lot.
>>
>> It still feels awkward to me to pass the same information maybe twice,
>> but I guess the wined3d_bo_address part should basically be considered
>> opaque above the CS thread.
> 
> Yes, as I understand it that's to be used by wined3d to keep track of
> the lifetime of the memory block.
> 
>>>>> For deferred contexts you should then be able to do something
>>>>> (conceptually) very similar to patch 2/2, and in the end we may not
>>>>> even need map/unmap/update_sub_resource in the
>>>>> wined3d_device_context_ops structure.
>>>>>
>>>> We'd still presumably need an unmap, though, unless the idea above is to
>>>> *always* allocate new (GPU or CPU) memory, which seems like a much
>>>> bigger step.
>>>>
>>> Not necessarily, but wined3d_cs_exec_update_sub_resource() can unmap if needed.
>>>
>>>> Unless I'm misreading things, this also seems to imply that
>>>> *_resource_sub_resource_map() shouldn't exist anymore, right?
>>>>
>>> Non-discard maps still exist, and
>>> wined3d_device_context_get_upload_bo() wouldn't be used for those.
>>>
>>
>> So the idea goes like this:
>>
>> (1) Pass a whole wined3d_bo_address as part of struct
>> wined3d_cs_update_sub_resource.
>>
>> (2) Introduce wined3d_device_context_get_upload_bo() to allocate memory.
>> At first, only use it for deferred contexts.
>>
>> (3) Use wined3d_device_context_get_upload_bo() for immediate contexts as
>> well, for both DISCARD and NOOVERWRITE maps and for update_sub_resource.
>> This implies always allocating new CPU/GPU memory, going through the
>> DEFAULT queue, and not waiting for the upload to complete.
> 
> wined3d_device_context_get_upload_bo() should probably avoid the
> queues entirely, at least in the normal case. More below.

Right, sorry, I don't mean that wined3d_device_context_get_upload_bo() 
goes through the queues, but that the WINED3D_CS_UPDATE_SUB_RESOURCE 
goes through the DEFAULT queue instead of the MAP queue once it's submitted.

>> The idea that map/unmap/update_sub_resource can go away seems to imply
>> (3) to me, but I'm more than a little curious as to whether it's
>> actually desirable in all cases. Even with persistent mapping you still
>> need to allocate new memory, which isn't exactly cheap, especially with
>> large resources.
> 
> You don't need to actually allocate new memory every time, just return
> memory that you know is available at the moment. The specific
> mechanism used by wined3d_device_get_upload_bo() isn't important to
> the callers but yes, the API we want would in fact return a new,
> independent memory area for each DISCARD map. It's the only way to
> make things fast with constant buffer updates, to name the most
> critical example. For some of those you'll easily have thousands of
> DISCARD maps per frame and if you don't want to block on buffer maps
> and e.g. you have one frame latency to play with, that means having
> thousands of "instances" of each buffer moving around at any given
> time.
> WRT the mechanism, it might be allocating a large chunk of system
> memory and suballocating from it, keeping track of the lifetime of
> each chunk as it is returned to the application and then back to
> wined3d. Or suballocating from GPU memory. It could even be creating a
> ton of GL buffers and flipping them around as needed, it doesn't
> matter; the idea is that each memory chunk can be reused, once we know
> the memory is available again because it's not used by wined3d (e.g.
> if it's a block of system memory that's been uploaded to the GPU) or
> the GPU (e.g. if we directly returned the memory where the GPU
> resource is stored) anymore. Ideally, when the application is in a
> steady state, there should be next to no actual memory allocation or
> GL / Vulkan resource creation. But, in principle, everything would
> work just as well (if not as fast) if we allocated system memory every
> time wined3d_device_context_get_upload_bo() is called and freed it
> once wined3d is done with uploading the data that was stored into it
> by the application.
> 
> FWIW, the DYNAMIC d3d usage is supposed to flag the resources that are
> going to be updated frequently and are performance sensitive. In
> principle we could reserve wined3d_device_context_get_upload_bo() to
> DYNAMIC resources only.
> 
>> And then there's the aforementioned non-discard maps—don't we still need
>> an explicit WINED3D_CS_OP_UNMAP for those? Or any case (are there any?)
>> where we don't want to persistently map a resource?
> 
> We might want to keep the existing CS operations for these cases,
> sure. I suspect it will become clearer further down the road.
> 

Okay, many thanks to you and Henri for those answers; I think I have a 
clear enough picture now.