From: "Zebediah Figura (she/her)" Subject: Re: [PATCH 1/2] wined3d: Implement wined3d_deferred_context_update_sub_resource(). Message-Id: Date: Fri, 11 Jun 2021 16:00:41 -0500 In-Reply-To: References: <20210528052127.1589918-1-z.figura12@gmail.com> <7d0b188b-803a-0fc9-ea73-7803011fb9d5@codeweavers.com> On 6/11/21 2:44 AM, Matteo Bruni wrote: > On Fri, Jun 11, 2021 at 5:40 AM Zebediah Figura (she/her) > wrote: >> >> On 5/31/21 6:43 PM, Henri Verbeet wrote: >>> On Mon, 31 May 2021 at 23:45, Zebediah Figura (she/her) >>> wrote: >>>> On 5/31/21 4:53 AM, Henri Verbeet wrote: >>>>> My (very rough) proposal for this would be to replace the "data" field >>>>> in the wined3d_cs_update_sub_resource structure with a >>>>> wined3d_bo_address, and then introduce something like the following: >>>>> >>>>> static bool wined3d_device_context_get_upload_bo(struct >>>>> wined3d_device_context *context, >>>>> struct wined3d_resource *resource, unsigned int >>>>> sub_resource_idx, const struct wined3d_box *box, >>>>> uint32_t flags, struct wined3d_bo_address *address, void **map_ptr) >>>>> { >>>>> ... >>>>> } >>>> >>>> This function strikes me as more than a little confusing, party because >>>> it seems to be returning two partially overlapping things, with mostly >>>> different usages... >>>> >>>> i.e. I'd presume the idea is for >>>> wined3d_device_context_update_sub_resource() to retrieve this and then >>>> call something like wined3d_context_copy_bo_address() to copy from >>>> sysmem, which makes sense, but then what are we doing with map_ptr? >>>> >>> Well, wined3d_context_copy_bo_address() needs a wined3d_context, which >>> you typically don't have on application threads. >>> >>> The basic idea is that you'd pass "address" to >>> WINED3D_CS_OP_UPDATE_SUB_RESOURCE, and "map_ptr " to the application. >>> (In case of map/unmap; for update_sub_resource() you'd just copy the >>> data into "map_ptr".) >>> >>>> Or, if we use this in wined3d_device_context_map(), couldn't we just >>>> call wined3d_context_map_bo_address() on the returned wined3d_bo_address? >>>> >>> Conceptually, sure, you could split this into allocate/map/unmap. In >>> practice though, you'll always want the upload buffer to be mapped, in >>> order to copy data into it from system memory. And in fact, if we're >>> going to return a GPU buffer here, it should generally already be >>> mapped. (For various reasons; for 64-bit applications, we use >>> persistent maps if we can; for OpenGL, we want to avoid making GL >>> calls from application threads; if we have a dedicated upload buffer >>> belonging to the device context, it would have been mapped during >>> creation.) When needed, unmap can happen when the upload buffer is >>> consumed. >> >> Thanks for explaining the rationale there, that does help a lot. >> >> It still feels awkward to me to pass the same information maybe twice, >> but I guess the wined3d_bo_address part should basically be considered >> opaque above the CS thread. > > Yes, as I understand it that's to be used by wined3d to keep track of > the lifetime of the memory block. > >>>>> For deferred contexts you should then be able to do something >>>>> (conceptually) very similar to patch 2/2, and in the end we may not >>>>> even need map/unmap/update_sub_resource in the >>>>> wined3d_device_context_ops structure. >>>>> >>>> We'd still presumably need an unmap, though, unless the idea above is to >>>> *always* allocate new (GPU or CPU) memory, which seems like a much >>>> bigger step. >>>> >>> Not necessarily, but wined3d_cs_exec_update_sub_resource() can unmap if needed. >>> >>>> Unless I'm misreading things, this also seems to imply that >>>> *_resource_sub_resource_map() shouldn't exist anymore, right? >>>> >>> Non-discard maps still exist, and >>> wined3d_device_context_get_upload_bo() wouldn't be used for those. >>> >> >> So the idea goes like this: >> >> (1) Pass a whole wined3d_bo_address as part of struct >> wined3d_cs_update_sub_resource. >> >> (2) Introduce wined3d_device_context_get_upload_bo() to allocate memory. >> At first, only use it for deferred contexts. >> >> (3) Use wined3d_device_context_get_upload_bo() for immediate contexts as >> well, for both DISCARD and NOOVERWRITE maps and for update_sub_resource. >> This implies always allocating new CPU/GPU memory, going through the >> DEFAULT queue, and not waiting for the upload to complete. > > wined3d_device_context_get_upload_bo() should probably avoid the > queues entirely, at least in the normal case. More below. Right, sorry, I don't mean that wined3d_device_context_get_upload_bo() goes through the queues, but that the WINED3D_CS_UPDATE_SUB_RESOURCE goes through the DEFAULT queue instead of the MAP queue once it's submitted. >> The idea that map/unmap/update_sub_resource can go away seems to imply >> (3) to me, but I'm more than a little curious as to whether it's >> actually desirable in all cases. Even with persistent mapping you still >> need to allocate new memory, which isn't exactly cheap, especially with >> large resources. > > You don't need to actually allocate new memory every time, just return > memory that you know is available at the moment. The specific > mechanism used by wined3d_device_get_upload_bo() isn't important to > the callers but yes, the API we want would in fact return a new, > independent memory area for each DISCARD map. It's the only way to > make things fast with constant buffer updates, to name the most > critical example. For some of those you'll easily have thousands of > DISCARD maps per frame and if you don't want to block on buffer maps > and e.g. you have one frame latency to play with, that means having > thousands of "instances" of each buffer moving around at any given > time. > WRT the mechanism, it might be allocating a large chunk of system > memory and suballocating from it, keeping track of the lifetime of > each chunk as it is returned to the application and then back to > wined3d. Or suballocating from GPU memory. It could even be creating a > ton of GL buffers and flipping them around as needed, it doesn't > matter; the idea is that each memory chunk can be reused, once we know > the memory is available again because it's not used by wined3d (e.g. > if it's a block of system memory that's been uploaded to the GPU) or > the GPU (e.g. if we directly returned the memory where the GPU > resource is stored) anymore. Ideally, when the application is in a > steady state, there should be next to no actual memory allocation or > GL / Vulkan resource creation. But, in principle, everything would > work just as well (if not as fast) if we allocated system memory every > time wined3d_device_context_get_upload_bo() is called and freed it > once wined3d is done with uploading the data that was stored into it > by the application. > > FWIW, the DYNAMIC d3d usage is supposed to flag the resources that are > going to be updated frequently and are performance sensitive. In > principle we could reserve wined3d_device_context_get_upload_bo() to > DYNAMIC resources only. > >> And then there's the aforementioned non-discard maps—don't we still need >> an explicit WINED3D_CS_OP_UNMAP for those? Or any case (are there any?) >> where we don't want to persistently map a resource? > > We might want to keep the existing CS operations for these cases, > sure. I suspect it will become clearer further down the road. > Okay, many thanks to you and Henri for those answers; I think I have a clear enough picture now.