From: "Zebediah Figura (she/her)" Subject: Re: [PATCH 1/2] wined3d: Implement wined3d_deferred_context_update_sub_resource(). Message-Id: <7d0b188b-803a-0fc9-ea73-7803011fb9d5@codeweavers.com> Date: Mon, 31 May 2021 16:45:39 -0500 In-Reply-To: References: <20210528052127.1589918-1-z.figura12@gmail.com> Replying to both mails at once, sort of... On 5/31/21 4:53 AM, Henri Verbeet wrote: > Yes, mostly what Matteo said. > > On Fri, 28 May 2021 at 15:39, Matteo Bruni wrote: >> So this is, AFAIR, more or less what the staging patchset did. I don't >> think that we want to put the data in the command stream though. >> > It may not be too bad for deferred contexts, but for immediate > contexts probably not, no. > >> Always copying the data from the user-provided source to some other >> storage is definitely the right choice: we don't want to block the >> application (more or less directly) to make sure that the data stays >> around until we need it. That's bad for performance for immediate >> contexts and just impossible for deferred, so not much to argue there. >> At the same time we also want to minimize the amount of copies. >> Ideally, for a GPU resource, here we'd upload the data right to the >> GPU (e.g. inside a persistently mapped buffer), ready to be used >> whenever it's time. To clarify, when talking about uploading directly to the GPU, are you just referring to map() here, or update_sub_resource() as well? Does it even make sense to use a persistently mapped GPU buffer for a resource mapped/updated by a deferred context? Obligatory disclaimer that I don't know anything about GPU-side performance, but obviously you can't reuse an earlier mapping for UpdateSubResource() or for DISCARD maps, and I'm failing to understand why an application would ever use NOOVERWRITE on a deferred context. >> For a CPU resource instead we'd copy the data into >> its final place in sysmem (what will eventually become its >> "heap_memory"). We don't quite have the infrastructure for it at the >> moment but that would be the goal IMO. >> What you do in patch 2/2 is more along the lines of this idea. The >> sysmem allocation mechanism works okay for the time being and could be >> extended to handle other types of memory, e.g. it could eventually >> take charge of the vulkan allocator, making it generic and usable from >> GL too. I don't think it has to be managed by each deferred context >> (i.e. it should probably be per-device), although the current scheme >> has the advantage of not having to care about concurrency. In a sense there is no "current scheme", i.e. deferred contexts are tracking which memory belongs to which resource because that *has* to be done per-context, but nobody's managing custom heaps. Unless I misunderstand what's meant here? >> So, at least in my view, I'd bring the allocator bits from patch 2/2 >> into patch 1/2 and use that for update_sub_resource, both the deferred >> and the immediate variants (the latter could be its own patch). You'd >> only need the DISCARD portion for update_sub_resource, so the rest can >> (and probably should) stay in the map() patch. >> > My (very rough) proposal for this would be to replace the "data" field > in the wined3d_cs_update_sub_resource structure with a > wined3d_bo_address, and then introduce something like the following: > > static bool wined3d_device_context_get_upload_bo(struct > wined3d_device_context *context, > struct wined3d_resource *resource, unsigned int > sub_resource_idx, const struct wined3d_box *box, > uint32_t flags, struct wined3d_bo_address *address, void **map_ptr) > { > ... > } This function strikes me as more than a little confusing, party because it seems to be returning two partially overlapping things, with mostly different usages... i.e. I'd presume the idea is for wined3d_device_context_update_sub_resource() to retrieve this and then call something like wined3d_context_copy_bo_address() to copy from sysmem, which makes sense, but then what are we doing with map_ptr? Or, if we use this in wined3d_device_context_map(), couldn't we just call wined3d_context_map_bo_address() on the returned wined3d_bo_address? > > For deferred contexts you should then be able to do something > (conceptually) very similar to patch 2/2, and in the end we may not > even need map/unmap/update_sub_resource in the > wined3d_device_context_ops structure. > We'd still presumably need an unmap, though, unless the idea above is to *always* allocate new (GPU or CPU) memory, which seems like a much bigger step. Unless I'm misreading things, this also seems to imply that *_resource_sub_resource_map() shouldn't exist anymore, right?