From: Zebediah Figura <z.figura12@gmail.com>
Subject: Re: [PATCH] winegstreamer: Fixup raw audio caps to be compatible with IMFMediaType.
Message-Id: <426d0875-0f29-2677-0f60-e81eafa2d8ef@gmail.com>
Date: Fri, 6 Nov 2020 15:55:11 -0600
In-Reply-To: <5f9ea776-6f79-58be-daf4-b1af4d919cd7@codeweavers.com>
References: <20201106181252.254864-1-dlesho@codeweavers.com> <b7696131-bb34-782c-76ae-05d7b9f198ba@gmail.com> <ed56d6dd-c8c8-2e95-84bc-0dfcbb08db76@codeweavers.com> <55bd7125-2708-653f-63b2-1c7439a33aae@gmail.com> <a5608fb3-5085-2d28-f995-2045a521c6a9@codeweavers.com> <2645ee43-6b7b-6b26-210f-9256f3b8314e@gmail.com> <5f9ea776-6f79-58be-daf4-b1af4d919cd7@codeweavers.com>

On 11/6/20 3:29 PM, Derek Lesho wrote:
> 
> On 11/6/20 3:10 PM, Zebediah Figura wrote:
>> On 11/6/20 2:51 PM, Derek Lesho wrote:
>>> On 11/6/20 2:32 PM, Zebediah Figura wrote:
>>>> On 11/6/20 2:20 PM, Derek Lesho wrote:
>>>>> On 11/6/20 2:13 PM, Zebediah Figura wrote:
>>>>>
>>>>>> This is done in a rather inconsistent way relative to how video
>>>>>> streams
>>>>>> are handled.
>>>>> Yes, because the goals are different for each of the paths.  The video
>>>>> path is just an enhancement to report video formats in a defined order
>>>>> as if they were coming from a decoder, since right now we're skipping
>>>>> the decoder MFT step.  The step for fixing up the audio caps is
>>>>> meant to
>>>>> be a generic solution for any caps which are un-representable as a
>>>>> IMFMediaType object.  This same path is used for compressed h.264
>>>>> video
>>>>> on my local branch for example.
>>>>>
>>>> In both cases you're doing conversion from a type which may not be
>>>> representable into a type which is.
>>> No, in the case of the uncompressed video streams, the type is almost
>>> definitely re-presentable.  Think of it as a necessary hack for the
>>> bypassing of the decoder MFT we are doing.  On the other hand, there are
>>> plenty of cases where uncompressed audio may be read from a container,
>>> and the fixup path would still be necessary in those cases.
>> You can't assume that the type is representable.
> After applying videoconvert to make it output the most common decoder
> output types, I think I can.
>>   In fact, you should
>> make no assumptions about the type whatsoever. This is not only true in
>> theory, but in practice—I've seen decoders output GST_VIDEO_FORMAT_RGB.
> Even if we were outputting the video type decodebin directly feeds us
> and it happened to be RGB, it wouldn't matter, as media foundation
> supports RGB types:
> https://docs.microsoft.com/en-us/windows/win32/medfound/video-subtype-guids#uncompressed-rgb-formats

Media Foundation supports *BGR* types, to use GStreamer's terminology.
It does not support GST_VIDEO_FORMAT_RGB, which is identical to e.g.
WINED3DFMT_B8G8R8_UNORM with swapped R and B channels.

> 
>>
>>>>    The reasons for doing this
>>>> conversion may be different, but there is no reason for the mechanism
>>>> to be.
>>> I would say there is, the conversion we're doing for the video streams
>>> is unconditional, entirely specific to the media source output, and
>>> doesn't output 1 fixed up caps structure per input caps structure.  On
>>> the other hand, the audio and compressed type format would be necessary
>>> any time we want to feed gstreamer buffers with those caps to a media
>>> foundation component, and is a 1 to 1 conversion in every case.
>> I don't see any of those as reasons for the code structure to be
>> different.
> The make_mf_compatible_caps code is generic enough that we very likely
> may use it in other areas in the future, while the uncompressed video
> format conversion is a specific hack of the media-container media
> source.  I gave potential examples of this in my previous email.

You may need to be more specific, then, because I don't see how any of
those examples would use the function in a new way, or not have an
analogous situation with video.

> The
> semantics for a function which aligns one gstreamer caps structure with
> another caps structure compatible with IMFMediaType, don't fit the
> semantics of a function outputting multiple caps to provide
> compatibility for hardcoded expectations of applications using the
> source reader.
>>   Note thought that even if they were, you probably want to
>> make it possible to convert even from some representable formats. Not
>> all systems can play back 64-bit float PCM, for example.
> That code doesn't belong in winegstreamer.  That problem is solved by
> the streaming audio renderer not supporting an IMFMediaType with 64-bit
> float PCM, and the topology loader resolving that with the audio
> conversion MFT.  This kind of problem is already solved in my complete
> media foundation branch present on staging.

I don't see any reason that the code shouldn't belong in winegstreamer.
In fact, it's better to put it there, not only because then we don't
have to write such a transform, but also because transforming entirely
within the GStreamer pipeline will be more efficient in several ways.

>>
>>> Examples of other areas where this would be necessary, off the top of my
>>> head, would be a separate uncompressed-audio-emitting media source from,
>>> say, a microphone, or any MFT which outputs compressed video or raw
>>> audio, such an encoder MFT.
>>>> Moreover, the goals are not entirely orthogonal; not all video will be
>>>> output in the four types you have listed.
>>> All video streams that take the videoconvert enumeration path
>>> (uncompressed video) won't need any transformation to align with an
>>> IMFMediaType object.  The only potential incompatibility would be the
>>> layout, but that problem would never surface with the current media
>>> source we are pretending that our output types have gone through a stand
>>> media foundation decoder.  An instance where we would want to put this
>>> type of code in the make_mf_compatible_caps path would be a media source
>>> that provides uncompressed video on windows, such as from a webcam or
>>> screen capture.  In the code for this media source, we'd unconditionally
>>> put the video caps through the make_mf_compatible_caps path, and add
>>> code there to replace any unsupported layout with its closest equivalent
>>> defined in media foundation.
>> "won't need any transformation" is only true because you're *already*
>> applying a transformation.
> Yes, a transformation unrelated to resolving incompatibilities between
> GstCaps and IMFMediaType.
>>   This code path *is* the transformation. If
>> you did a similar thing with the audio stream, it "won't need any
>> transformation" either.
> Yes but I'm not aware of any requirements applications have on the audio
> stream outputs of decoders yet, so we don't have to.  This is very
> fortunate, as if that were the case, the code could get ugly
> differentiating between audio streams that are raw because they've been
> decoded, and audio streams that are raw because they were already raw in
> the container.

I don't think you should need any such code.

The fundamental point I'm trying to get across is that even if *a*
*reason* for doing a transformation differs, the transformation itself
is very similar, and there's no point in doing it in two different ways.
Look at the similarities between the code paths, not their differences.
You'll find that you don't actually have to account for the differences
at all in the code structure—only in the comments that explain why
you're doing what you're doing.

-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEENTo75Twe9rbPART3DZ01igeheEAFAl+lxj8ACgkQDZ01igeh
eEBHQQf/bmAXYqiVwD7h2JoMKlQJ4H6V0aLem6SvOqHScBlV6iJvy/0vs1WYcPrL
LRanS37JVuhsmVXtPwx8CCnXv7MH+wmCM3o12LOt+cWXDQAAEAKxhCHd9KKh28A9
yyFucLmLO2TiM8WQ9XxQT5dwBd+e7EMvvZFmD4kNBI6H9AQXLETs1zlpUeFW6XpZ
/9XmqVGcv8+IO7bLnkp8cXj/KjGWGFotiUKCRnbwXnmx25nl0ld7K0HCp4f29N49
pDCzLuYdbKQZ7+IfBLxaeDCKTdo6J63iIPL7lEJpl1Q3m5qFLxFQo4fn9/qbEbMf
Wjri5FuiG8jIqjMh+C16b0wHdlX8DQ==
=mDUf
-----END PGP SIGNATURE-----