Skip to content

scala.xml.parsing.XhtmlParser silently ignores invalid XML #4296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scabug opened this issue Feb 24, 2011 · 3 comments
Closed

scala.xml.parsing.XhtmlParser silently ignores invalid XML #4296

scabug opened this issue Feb 24, 2011 · 3 comments

Comments

@scabug
Copy link

scabug commented Feb 24, 2011

=== What steps will reproduce the problem (please be specific and use wikiformatting)? ===

scala.xml.parsing.XhtmlParser(scala.io.Source.fromString("<p/><b/>"))

=== What is the expected behavior? ===

Expected it to throw an exception saying "document must contain exactly one element"

=== What do you see instead? ===

Repl output:

:1:8: document must contain exactly one element ^
List(

, )
res1: scala.xml.NodeSeq = Document()

(Its printing an error to the console???)

=== Additional information ===
(for instance, a link to a relevant mailing list discussion)

=== What versions of the following are you using? ===

  • Scala:
  • Java:
  • Operating system:
@scabug
Copy link
Author

scabug commented Feb 24, 2011

Imported From: https://issues.scala-lang.org/browse/SI-4296?orig=1
Reporter: Alex Black (alexblack)

@scabug
Copy link
Author

scabug commented May 30, 2011

@dcsobral said:
Speaking only of Xhtml, we have the following rules:

  • It must be an XML Document
  • It must conform to one of certain pre-defined DTDs.
  • The root element of the document must be html.
  • The root element of the document must contain an xmlns declaration.
  • There must be a DOCTYPE declaration in the document prior to the root element.
  • The DTD subset must not be used to override any parameter entities in the DTD

From the pre-requisite of it needing to be an XML Document, we have this requirement, indicating that it must follow the production rule prolog element Misc*, where prolog and Misc contain only the xml declaration, comments, processing instructions, spaces and newlines.

So, taken all together, XhtmlParser should complain about the following:

  • There's more than one root.
  • It does not conform to any of the valid DTDs.
  • The root is not called html.
  • The root does not contain a xmlns declaration.
  • There's no DOCTYPE declaration prior to the root.

I expect such strictness to be detrimental instead of a feature for most applications. However, failing and still returning something -- that does not represent the input given -- is definitely very bad.

For whatever it is worth, this reminds me a lot of #4520. Though the symptom is rather different, I suspect they both arise from the lack of an exception.

@scabug
Copy link
Author

scabug commented Jul 17, 2015

@SethTisue said:
The scala-xml library is now community-maintained. Issues with it are now tracked at https://github.com/scala/scala-xml/issues instead of here in the Scala JIRA.

Interested community members: if you consider this issue significant, feel free to open a new issue for it on GitHub, with links in both directions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant