Skip to content

Clarity about embedded HTML and escaping #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
spookylukey opened this issue Apr 12, 2018 · 3 comments
Open

Clarity about embedded HTML and escaping #96

spookylukey opened this issue Apr 12, 2018 · 3 comments
Labels

Comments

@spookylukey
Copy link
Contributor

spookylukey commented Apr 12, 2018

Some examples use HTML snippets in the message e.g. http://projectfluent.org/fluent/guide/text.html

description =
    Loki is a simple micro-blogging
    app written entirely in <i>HTML5</i>.
    It uses FTL to implement localization.

The question then is what happens when this is used. I would not expect fluent to not do any HTML escaping. It it therefore up to the bindings to always HTML-escape the entire returned string when it is inserted into the DOM (client-side) or into a chunk of HTML (server-side). If the message contains any interpolated user supplied input, this is vital for correctness and security (XSS etc.), but in any case we should not be expecting translators to have to know HTML syntax and manually escape ampersands etc.

However, with the above message, the HTML tags would end up as &lt;i&gt;HTML5&lt;/i&gt; which would be rendered as <i>HTML5</i> rather than HTML5 - this is not what the example implies to me.

Looking around in this repo, it seems the current consensus is in agreement with what I've outlined above (see projectfluent/play#2 for example), and therefore it is the examples that are misleading/confusing.

This leaves the problem of what happens when a translated string actually needs to embed HTML. This seems to be one solution: #16 (comment) . A more lightweight but less robust solution I had been thinking about was a name convention (e.g. any message id that ends -html is treated as HTML, anything else not).

It is vital for this to be really well defined (and simple to implement), otherwise you end up with XSS, or double escaping, or being unable to embed HTML in translated messages. I'm considering an implementation in Elm, and the only practical way it would work would be to compile FTL messages to Elm functions. For this to work, we'd need to know for every message what type of output (text/HTML) it was returning so that it can have the correct type signature. I'm also considering a Python implementation that would integrate into a Django project, and we'd again need to know very explicitly whether something is returning HTML or plain text.

@zbraniecki
Copy link
Collaborator

Hi! Thanks for the writeup! Before we dive in - are you familiar with DOM Overlays?

Here's documentation of the first version - https://github.com/projectfluent/fluent.js/wiki/DOM-Overlays and today we released v2 in fluent-dom 0.2.0.

DOM Overlays is how we approach DOM Fragment localization with safety and flexibility. Version 0.2 adds ability for developers to provide elements in the source HTML that get merged with translation. We'll have to document the new features in v2 :)

@spookylukey
Copy link
Contributor Author

Thanks so much for that link, I think had seen it before but had forgotten about it. I'm still at the stage of investigating fluent and seeing whether it fits my needs. I'm currently not thinking of using fluent-dom, because I've got use cases where it won't work (e.g. plain text emails), and because in some cases I really want server-side rendering (for all the usual reasons).

I guess fluent-dom may be the way to go in some cases though, or I might need to implement similar functionality if I were to go with server-side rendering. I have questions like - what happens if I'm generating a plain text email and there is a message like M<sup>me</sup> { $surname }. It feels like there needs to be way to communicate the context to the translator so that this kind of thing can be avoided (as per the proposal in #16).

@zbraniecki
Copy link
Collaborator

yep, semantic comments are meant to help with that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants