Skip to content

Commit c379fb6

Browse files
committed
Implemented 'escapers' for compiler implementation.
1 parent 51f95f9 commit c379fb6

File tree

11 files changed

+1127
-71
lines changed

11 files changed

+1127
-71
lines changed

fluent.runtime/docs/escaping.rst

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
Escaping and markup
2+
-------------------
3+
4+
In some cases it is common to to have other kinds of markup mixed in to
5+
translatable text, especially for things like HTML/web outputs. Handling these
6+
requires extra functionality to ensure that everything is escaped properly,
7+
especially external arguments that are passed in.
8+
9+
For example, suppose you need embedded HTML in your translated text::
10+
11+
happy-birthday =
12+
Hello { $name }, <b>happy birthday!</b>
13+
14+
In this situation, it is important that ``$name`` is HTML-escaped. The rest of
15+
the text needs to be treated as already escaped (i.e. it is HTML markup), so
16+
that ``<b>`` is not changed to ``&lt;b&gt;``.
17+
18+
python-fluent supports this use case by allowing a list of ``escapers`` to be
19+
passed to the ``FluentBundle`` constructor:
20+
21+
.. code-block:: python
22+
23+
bundle = FluentBundle(['en'], escapers=[my_escaper])
24+
25+
An ``escaper`` is an object that defines the following set of attributes. The
26+
object could be a module, or a simple namespace object you could create using
27+
``types.SimpleNamespace`` (or ``fluent.runtime.utils.SimpleNamespace`` on Python 2), or
28+
an instance of a class with appropriate methods defined. The attributes are:
29+
30+
- ``name`` - a simple text value that is used in error messages.
31+
32+
- ``select(**hints)``
33+
34+
A callable that is used to decide whether or not to use this escaper for a
35+
given message (or message attribute). It is passed a number of hints as
36+
keyword arguments, currently only the following:
37+
38+
- ``message_id`` - a string that is the name of the message or term. For terms
39+
it is a string with a leading dash - e.g. ``-brand-name``. For message
40+
attributes, it is a string in the form ``messsage-name.attribute-name``
41+
42+
In the future, probably more hints will be passed (for example, comments
43+
attached to the message), so for future compatibility this callable should use
44+
the ``**hints`` syntax to collect remaining keyword arguments.
45+
46+
The callable should return ``True`` if the escaper should be used for that
47+
message, ``False`` otherwise. For every message and message attribute, the
48+
``select`` callable of each escaper in the list of escapers is tried in turn,
49+
and the first to return ``True`` is used.
50+
51+
- ``output_type`` - the type of values that are returned by ``escape``,
52+
``mark_escape``, and ``join``, and therefore by the whole message.
53+
54+
- ``escape(text_to_be_escaped)``
55+
56+
A callable that will escape the passed in text. It must return a value that is
57+
an instance of ``output_type`` (or a subclass).
58+
59+
``escape`` must also be able to handle values that have already been escaped
60+
without escaping a second time.
61+
62+
- ``mark_escaped(markup)``
63+
64+
A callable that marks the passed in text as markup i.e. already escaped. It
65+
must return a value that is an instance of ``output_type`` (or a subclass).
66+
67+
- ``join(parts)``
68+
69+
A callable that accepts an iterable of components, each of type
70+
``output_type``, and combines them into a larger value of the same type.
71+
72+
- ``use_isolating``
73+
74+
A boolean that determines whether the normal bidi isolating characters should
75+
be inserted. If it is ``None`` the value from the ``FluentBundle`` will be
76+
used, otherwise use ``True`` or ``False`` to override.
77+
78+
The escaping functions need to obey some rules:
79+
80+
- escape must be idempotent:
81+
82+
``escape(escape(text)) == escape(text)``
83+
84+
- escape must be a no-op on the output of ``mark_escaped``:
85+
86+
``escape(mark_escaped(text)) == mark_escaped(text)``
87+
88+
- ``mark_escaped`` should be distributive with string
89+
concatenation:
90+
91+
``join([mark_escaped(a), mark_escaped(b)]) == mark_escaped(a + b)``
92+
93+
Example
94+
~~~~~~~
95+
96+
This example is for
97+
`MarkupSafe <https://pypi.org/project/MarkupSafe/>`__:
98+
99+
.. code-block:: python
100+
101+
from fluent.runtime.utils import SimpleNamespace
102+
from markupsafe import Markup, escape
103+
104+
empty_markup = Markup('')
105+
106+
html_escaper = SimpleNamespace(
107+
select=lambda message_id=None, **hints: message_id.endswith('-html'),
108+
output_type=Markup,
109+
mark_escaped=Markup,
110+
escape=escape,
111+
join=empty_markup.join,
112+
name='html_escaper',
113+
use_isolating=False,
114+
)
115+
116+
This escaper uses the convention that message IDs that end with
117+
``-html`` are selected by this escaper. This will match
118+
``message-html``, ``message.attr-html``, and ``-term-html``, for
119+
example, but not ``message-html.attr``.
120+
121+
We have set ``use_isolating=False`` here because isolation characters
122+
can cause problems in various HTML contexts - for example:
123+
124+
::
125+
126+
signup-message-html =
127+
Hello guest - please remember to
128+
<a href="{ $signup_url}">make an account.</a>
129+
130+
Isolation characters around ``$signup_url`` will break the link. For HTML, you
131+
should instead use the `bdi element
132+
<https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bdi>`__ in the FTL
133+
messages when necessary.
134+
135+
Escaper compatibility
136+
~~~~~~~~~~~~~~~~~~~~~
137+
138+
When using escapers that with messages that include other messages or terms,
139+
some rules apply:
140+
141+
- A message or term with an escaper applied can include another message or term
142+
with no escaper applied (the included message will have ``escape`` called on
143+
its output).
144+
145+
- A message with an escaper applied can include a message or term with the same
146+
escaper applied.
147+
148+
- A message with an escaper applied cannot include a message or term with a
149+
different esacper applied - this will generate a ``TypeError`` in the list of
150+
errors returned.
151+
152+
- A message with no escaper applied cannot include a message with an escaper
153+
applied.

fluent.runtime/fluent/runtime/__init__.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ class FluentBundleBase(object):
3030
See the documentation of the Fluent syntax for more information.
3131
"""
3232

33-
def __init__(self, locales, functions=None, use_isolating=True):
33+
def __init__(self, locales, functions=None, use_isolating=True, escapers=None):
3434
self.locales = locales
3535
_functions = BUILTINS.copy()
3636
if functions:
@@ -41,6 +41,7 @@ def __init__(self, locales, functions=None, use_isolating=True):
4141
self._parsing_issues = []
4242
self._babel_locale = self._get_babel_locale()
4343
self._plural_form = babel.plural.to_python(self._babel_locale.plural_form)
44+
self._escapers = escapers
4445

4546
def add_messages(self, source):
4647
parser = FluentParser()
@@ -149,7 +150,8 @@ def _compile(self):
149150
self._messages_and_terms,
150151
self._babel_locale,
151152
use_isolating=self._use_isolating,
152-
functions=self._functions)
153+
functions=self._functions,
154+
escapers=self._escapers)
153155
self._mark_clean()
154156

155157
# 'format' is the hot path for many scenarios, so we try to optimize it. To

fluent.runtime/fluent/runtime/codegen.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -550,10 +550,8 @@ class StringJoin(Expression):
550550
def __init__(self, parts):
551551
self.parts = parts
552552

553-
def as_ast(self):
554-
return MethodCall(String(''), 'join',
555-
[List(self.parts)],
556-
expr_type=self.type).as_ast()
553+
def __repr__(self):
554+
return 'StringJoin([{0}])'.format(', '.join(repr(p) for p in self.parts))
557555

558556
def simplify(self, changes, simplifier):
559557
# Simplify sub parts
@@ -573,17 +571,19 @@ def simplify(self, changes, simplifier):
573571
changes.append(True)
574572
self.parts = new_parts
575573

576-
# See if we can eliminate the StringJoin altogether
577-
if len(self.parts) == 0:
574+
# See if we can eliminate the Join altogether
575+
if len(self.parts) == 0 and self.type is text_type:
578576
changes.append(True)
579577
return simplifier(String(''), changes)
580578
if len(self.parts) == 1:
581579
changes.append(True)
582580
return simplifier(self.parts[0], changes)
583581
return simplifier(self, changes)
584582

585-
def __repr__(self):
586-
return 'StringJoin([{0}])'.format(', '.join(repr(p) for p in self.parts))
583+
def as_ast(self):
584+
return MethodCall(String(''), 'join',
585+
[List(self.parts)],
586+
expr_type=self.type).as_ast()
587587

588588

589589
class VariableReference(Expression):

0 commit comments

Comments
 (0)