From: Jeff Smith Subject: Re: [PATCH v3 3/3] xmllite: Expand test for any unparsed data at end of XML. Message-Id: Date: Sat, 7 Dec 2019 00:15:42 -0600 In-Reply-To: <30300517-0b38-2b54-d950-32726cd6c362@codeweavers.com> References: <20191205195326.928106-1-whydoubt@gmail.com> <20191205195326.928106-3-whydoubt@gmail.com> <30300517-0b38-2b54-d950-32726cd6c362@codeweavers.com> On Fri, Dec 6, 2019 at 4:19 PM Nikolay Sivov wrote: > > On 12/7/19 12:24 AM, Jeff Smith wrote: > > On Fri, Dec 6, 2019 at 11:16 AM Nikolay Sivov wrote: > >> On 12/5/19 10:53 PM, Jeff Smith wrote: > >>> @@ -2662,7 +2663,7 @@ static HRESULT reader_parse_nextnode(xmlreader *reader) > >>> hr = reader_parse_misc(reader); > >>> if (hr != S_FALSE) return hr; > >>> > >>> - if (*reader_get_ptr(reader)) > >>> + if (buffer->cur*sizeof(WCHAR) < buffer->written) > >>> { > >>> WARN("found garbage in the end of XML\n"); > >>> return WC_E_SYNTAX; > > Hi Nikolay, > > > >> That means we don't have enough data, > > How do you figure that? > > > >> it's another change not backed by tests > > This fixes two tests, and does not break any others. > > > >> and potentially depending on current read-ahead buffer size/filled level. > > I'm pretty sure reader_parse_misc would have read at least one byte > > ahead, which is all that is required for this to trigger, though I > > could double-check that. > > However, to your point made in the patch 2 of the set about not > > exposing the buffer at this level, I will also consider this something > > that potentially needs to be handled elsewhere. > My point is that we should always hit this single invalid syntax/garbage > at the end condition that we already have, That garbage-at-the-end condition, as it exists, is explicitly NOT triggered by a null character, but it should. So what we have here currently is not sufficient. While there may be cases that my patch does not cover, based on existing test cases, it is an improvement. > instead of doing fixups for specific node types. On Windows, the context in which the null character is encountered is significant. For instance, if any character other than '<' is encountered at the end of a whitespace sequence, it raises the syntax error without returning a Whitespace node. So we need to catch the invalid character and interrupt before a Whitespace node is returned.