-
-
Notifications
You must be signed in to change notification settings - Fork 6
Missing DTD in parsed document model #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What is |
My apologies. I forgot to include the formatter declaration.
As for whether
All of this is irrelevant. It isn't the output that's the problem. That's just demonstrating the issue. As I said, it appears the DTD isn't even brought into the document. |
The xml notation is not a doctype - its a preamble. Doctypes are serialized. A DTD (what you wrote in the title) would look like <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "Note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note> So the doctype being |
Thanks for correcting my misunderstanding. I shouldn't refer to it as the DTD. I've gone and familiarized myself with the correct nomenclature. In regards to the preamble (or declaration as I'm seeing it called elsewhere), while it is optional, it is important to xml documents. The example xml used may be small, but is still a full document. |
Yes, I fully agree. Yet the job of the serializer is to serialize nodes - i.e., fragments. As mentioned I think we can go around and make an option to output the declaration for There are multiple reasons for not having the declaration part of the serialization (and therefore putting it in the hand of the application that uses the serializer). As an example, we don't know how / where / in what encoding the document will be stored. Therefore, going forward and specifying, e.g., UTF-8 is generally a mistake. Maybe the best way forward is to introduce an overload for |
Perhaps an overloaded version of ToXml could take a formatter (similar to how ToHtml accepts one) where it contains options specific for a declaration ( 2.8 Prolog and Document Type Declaration |
Well,
I think having both, Note that internally, i.e., when parsing a document we already gather that information. We check that a provided version satisfies the 1.x constraint and use the standalone to determine parsing options. The encoding then switches the character interpretation (same as a |
It's your library, so that's up to you. As a user, it was not intuitively obvious In regards to having different "conflicting" options, that was just a follow-on to your initial statement "I think we can go around and make an option to output the declaration for IXmlDocument instances. The default can be With this, you suggested overloads for All of that aside, the initial drive for this is, if a declaration appears in the original xml document, it should, by default, carry on to the generated output. Override options seem within the domain for a formatter. |
Yeah this is where my note is important. The prolog is only important for the consumption, but again - how it should be consumed depends on the creator of the document, i.e., your application. See my remarks regarding standalone or encoding. Even the version we don't know. You could make your own formatter that uses some formats used in some future / unknown version or XML. Consequently, the original prolog really has no meaning. We consumed that document and now a completely new document (e.g., using some exotic encoding) might have been created. I agree that the naming of I'll be thinking of an appropriate API for this - but just to be clear: The current behavior will stay the default (to be backwards compatible) and any new API should fit nicely in (i.e., be backwards compatible, consistent to the but obvious in usage from the name). Any suggestions appreciated. |
Last response to this issue.
Sure. Which is the point of any override mechanism. But as I said, by default, preserve what comes in when making output.
Unfortunately, as there is zero documentation for AngleSharp.Xml, I had no other basis to go on regarding how to use these libraries other than the exposed API surface. It wasn't even until I saw issue #11 I realized Please close this issue. |
Bug Report
Description
I cannot see anything in the document model that seems to match the values defined in the DTD nor am I seeing the DTD when performing a round-trip on the XML. I was initially investigating self-closing tags and found Issue #11 . From there, I took the example code to test with and confirm it met my need. But I noticed my DTD wasn't getting written out. As far as I can tell, the DTD isn't brought into the parsed document.
Steps to Reproduce
(Tested in LINQPad)
Expected behavior: [What you expected to happen]
Output to look similar to
Actual behavior: [What actually happened]
Output is
Environment details: [OS, .NET Runtime, ...]
Windows 10
LINQPad 7
AngleSharp 1.0.5 via NuGet
AngleSharp.Xml 1.0.0 via NuGet
Possible Solution
Am I missing some options/techniques to force the correct parse?
The text was updated successfully, but these errors were encountered: