From: "Zebediah Figura (she/her)" Subject: Re: [PATCH] winegstreamer: Fixup raw audio caps to be compatible with IMFMediaType. Message-Id: <80b59151-654a-cf50-f687-819de4649e93@codeweavers.com> Date: Sun, 15 Nov 2020 20:06:22 -0600 In-Reply-To: References: <20201106181252.254864-1-dlesho@codeweavers.com> <55bd7125-2708-653f-63b2-1c7439a33aae@gmail.com> <2645ee43-6b7b-6b26-210f-9256f3b8314e@gmail.com> <5f9ea776-6f79-58be-daf4-b1af4d919cd7@codeweavers.com> <426d0875-0f29-2677-0f60-e81eafa2d8ef@gmail.com> On 11/6/20 4:32 PM, Derek Lesho wrote: > On 11/6/20 3:55 PM, Zebediah Figura wrote: > >> On 11/6/20 3:29 PM, Derek Lesho wrote: >>> On 11/6/20 3:10 PM, Zebediah Figura wrote: >>>> On 11/6/20 2:51 PM, Derek Lesho wrote: >>>>> On 11/6/20 2:32 PM, Zebediah Figura wrote: >>>>>> On 11/6/20 2:20 PM, Derek Lesho wrote: >>>>>>> On 11/6/20 2:13 PM, Zebediah Figura wrote: >>>>>>> >>>>>>>> This is done in a rather inconsistent way relative to how video >>>>>>>> streams >>>>>>>> are handled. >>>>>>> Yes, because the goals are different for each of the paths.  The >>>>>>> video >>>>>>> path is just an enhancement to report video formats in a defined >>>>>>> order >>>>>>> as if they were coming from a decoder, since right now we're >>>>>>> skipping >>>>>>> the decoder MFT step.  The step for fixing up the audio caps is >>>>>>> meant to >>>>>>> be a generic solution for any caps which are un-representable as a >>>>>>> IMFMediaType object.  This same path is used for compressed h.264 >>>>>>> video >>>>>>> on my local branch for example. >>>>>>> >>>>>> In both cases you're doing conversion from a type which may not be >>>>>> representable into a type which is. >>>>> No, in the case of the uncompressed video streams, the type is almost >>>>> definitely re-presentable.  Think of it as a necessary hack for the >>>>> bypassing of the decoder MFT we are doing.  On the other hand, >>>>> there are >>>>> plenty of cases where uncompressed audio may be read from a container, >>>>> and the fixup path would still be necessary in those cases. >>>> You can't assume that the type is representable. >>> After applying videoconvert to make it output the most common decoder >>> output types, I think I can. >>>>    In fact, you should >>>> make no assumptions about the type whatsoever. This is not only true in >>>> theory, but in practice—I've seen decoders output GST_VIDEO_FORMAT_RGB. >>> Even if we were outputting the video type decodebin directly feeds us >>> and it happened to be RGB, it wouldn't matter, as media foundation >>> supports RGB types: >>> https://docs.microsoft.com/en-us/windows/win32/medfound/video-subtype-guids#uncompressed-rgb-formats >>> >> Media Foundation supports *BGR* types, to use GStreamer's terminology. >> It does not support GST_VIDEO_FORMAT_RGB, which is identical to e.g. >> WINED3DFMT_B8G8R8_UNORM with swapped R and B channels. > Ah okay, well then if we every run up against that, we can add a > conversion to make_mf_compatible_caps. The thing is that you basically don't need to. That path is already covered by the existing code, almost; the only thing you need to do is not report the "passthrough" caps if they can't be expressed in terms of mfplat caps. Extending make_mf_compatible_caps() in this way (for video) would result in doing the same thing in two different places, and quite likely reporting the same type twice for no good reason. (Of course one can be conscious of this while programming, but why add another thing to be conscious of without need?) >> >>>>>>     The reasons for doing this >>>>>> conversion may be different, but there is no reason for the mechanism >>>>>> to be. >>>>> I would say there is, the conversion we're doing for the video streams >>>>> is unconditional, entirely specific to the media source output, and >>>>> doesn't output 1 fixed up caps structure per input caps structure.  On >>>>> the other hand, the audio and compressed type format would be >>>>> necessary >>>>> any time we want to feed gstreamer buffers with those caps to a media >>>>> foundation component, and is a 1 to 1 conversion in every case. >>>> I don't see any of those as reasons for the code structure to be >>>> different. >>> The make_mf_compatible_caps code is generic enough that we very likely >>> may use it in other areas in the future, while the uncompressed video >>> format conversion is a specific hack of the media-container media >>> source.  I gave potential examples of this in my previous email. >> You may need to be more specific, then, because I don't see how any of >> those examples would use the function in a new way, > The main alternate use-case would be any alternate media source we end > up adding, although I suppose this could also apply to a MFT as well if > we wanted to use it there.  In *any* possible implementation of > IMFMediaSource we provide, we'll need raw-audio output to be > representable by IMFMediaType.  Only in the current media source, whose > job it is to take container files, and (at-least in the currently > upstream version) output uncompressed streams, will we want to generate > multiple media types from one. >>   or not have an >> analogous situation with video. > I'm not saying that any situation doesn't have an analogous situation > with video. >> >>> The >>> semantics for a function which aligns one gstreamer caps structure with >>> another caps structure compatible with IMFMediaType, don't fit the >>> semantics of a function outputting multiple caps to provide >>> compatibility for hardcoded expectations of applications using the >>> source reader. >>>>    Note thought that even if they were, you probably want to >>>> make it possible to convert even from some representable formats. Not >>>> all systems can play back 64-bit float PCM, for example. >>> That code doesn't belong in winegstreamer.  That problem is solved by >>> the streaming audio renderer not supporting an IMFMediaType with 64-bit >>> float PCM, and the topology loader resolving that with the audio >>> conversion MFT.  This kind of problem is already solved in my complete >>> media foundation branch present on staging. >> I don't see any reason that the code shouldn't belong in winegstreamer. >> In fact, it's better to put it there, not only because then we don't >> have to write such a transform, > That's not a problem, I've already written such a transform, it's mostly > boilerplate anyway. There's more to the burden of having written code than actually writing out its first iteration. >>   but also because transforming entirely >> within the GStreamer pipeline will be more efficient in several ways. > I'm not sure we've seen any evidence of that.  Maybe provide > benchmarks?  Either way, due to some details with the topology loader, > we can't just rely on the media source providing every possible output > type under the sun and hooking it up.  Remember, on windows, the media > source usually only outputs one type, which is directly extracted from > the file.  Due to this, Microsoft has an attribute used during topology > resolution, MF_TOPOLOGY_ENUMERATE_SOURCE_TYPES, which UE4 does use.  > Because of this, any conversion to the desired output type must happen > through MFTs. >> >>>>> Examples of other areas where this would be necessary, off the top >>>>> of my >>>>> head, would be a separate uncompressed-audio-emitting media source >>>>> from, >>>>> say, a microphone, or any MFT which outputs compressed video or raw >>>>> audio, such an encoder MFT. >>>>>> Moreover, the goals are not entirely orthogonal; not all video >>>>>> will be >>>>>> output in the four types you have listed. >>>>> All video streams that take the videoconvert enumeration path >>>>> (uncompressed video) won't need any transformation to align with an >>>>> IMFMediaType object.  The only potential incompatibility would be the >>>>> layout, but that problem would never surface with the current media >>>>> source we are pretending that our output types have gone through a >>>>> stand >>>>> media foundation decoder.  An instance where we would want to put this >>>>> type of code in the make_mf_compatible_caps path would be a media >>>>> source >>>>> that provides uncompressed video on windows, such as from a webcam or >>>>> screen capture.  In the code for this media source, we'd >>>>> unconditionally >>>>> put the video caps through the make_mf_compatible_caps path, and add >>>>> code there to replace any unsupported layout with its closest >>>>> equivalent >>>>> defined in media foundation. >>>> "won't need any transformation" is only true because you're *already* >>>> applying a transformation. >>> Yes, a transformation unrelated to resolving incompatibilities between >>> GstCaps and IMFMediaType. >>>>    This code path *is* the transformation. If >>>> you did a similar thing with the audio stream, it "won't need any >>>> transformation" either. >>> Yes but I'm not aware of any requirements applications have on the audio >>> stream outputs of decoders yet, so we don't have to.  This is very >>> fortunate, as if that were the case, the code could get ugly >>> differentiating between audio streams that are raw because they've been >>> decoded, and audio streams that are raw because they were already raw in >>> the container. >> I don't think you should need any such code. >> >> The fundamental point I'm trying to get across is that even if *a* >> *reason* for doing a transformation differs, the transformation itself >> is very similar, > I wouldn't say that.  The make_mf_compatible_caps transformation > operates on caps, whereas the "advertise standard decoder types" hack > just modifies which IMFMediaTypes we advertise, and doesn't interact > with the Gstreamer caps at all.  A function which accounted for this > path wouldn't share the same parameters or outputs.  Just to reiterate, > make_mf_compatible_caps takes on GstCaps, and returns another GstCaps, > or NULL.  The standard decoder type advertisement hack takes one > IMFMediaType, and unconditionally outputs an ordered list of 5 clones of > said IMFMediaType, each with the desired MF_MT_SUBTYPE. The code path to advertise multiple YUV types (which, I must again stress, there is no reason to call a hack) could just as easily operate on GStreamer caps, and in fact probably should, if we're going to be able to translate from things like GST_VIDEO_FORMAT_RGB. That may be the linchpin accounting for the perceived rigidity of design here... >>   and there's no point in doing it in two different ways. >> Look at the similarities between the code paths, not their differences. >> You'll find that you don't actually have to account for the differences >> at all in the code structure—only in the comments that explain why >> you're doing what you're doing. > >> > -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEENTo75Twe9rbPART3DZ01igeheEAFAl+x3p4ACgkQDZ01igeh eEBI6ggAmnWOXhAC/70VFkLLGol4Ltl95807R6K2WPh8fUAxMdwPLv+5lVZaZu6L Ab4RtKSQPWXOxtvQnVm4FzuPhgnLuKIAyKM+gPPzTZWfb+r2L2Ybb3EBAaRRMvul jTC6ximjDG4PtO13FQXYF7XHi2N2geB6zgpJDr0TE/gDbwZOycjCrVHdVHz6ZKEB vCCUiefpd0ViwPar/4x1ZB8ZTbXEu/2YmIyTUnWKPCTmkEJWJAvJo+EdyoJQJxoK xMdMNCkc3AYaoj/jbyCIg2KvfP8uNYUtIIURhv/++f25sLLJScc3DF1898hUIR/X BEzfq1hKEq7UMuC/kwHwTDjojClJZA== =JMFv -----END PGP SIGNATURE-----