Skip to content

Implement MessageContext.format #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 76 commits into from
Jan 18, 2019

Conversation

spookylukey
Copy link
Collaborator

Implementation of #65

This is incomplete, but at a reviewable level - the remaining work to be done shouldn't influence the overall design that much.

I'm unlikely to get back to this soon, but leaving this here so that others are aware that a start has been made, and if anyone wants to give feedback then they can.

@stasm
Copy link
Contributor

stasm commented Nov 3, 2018

Thanks again, @spookylukey, for putting so much effort and work into this PR. It's a pleasure to read the code and documentation. I'm still reviewing it, and I'd like to propose the following landing plan.

The current fluent package on PyPI is used in production for building Firefox. It only provides the parser module which is used to validate translations shipping in Firefox. To ensure maximum flexibility, we think it would be smart to version it separately from the bundle implementation from this PR. (MessageContext has been renamed to FluentBundle recently.) By versioning them separately, we'll be able to make changes to the parser on a timeline required by the shipping schedule of Firefox. In practice, this flexibility will continue to be important only for a couple more months, until we hit the 1.0 milestone. After 1.0 the Fluent Syntax spec will only change in a fully backwards-compatible way (compatible with all 1.x versions) making future updates easier. For the time being, a lot of coordination is required to make sure implementations in the runtime and in all tools align and can parse the same versions of the syntax.

This repository currently uses the structure suggested by https://packaging.python.org/guides/packaging-namespace-packages/ which allows other distribution packages to use the same fluent package namespace. We take advantage of this to make the fluent-migration module available as flunt.migrate. I'd like to suggest that we land the bundle implementation in a new repository in the projectfluent org on GitHub, use the package-namespace directory layout there too, and publish it as a separate distribution package on PyPI. We might also want to rename this repository, as well as the current fluent package on PyPI, to better reflect the fact that it only contains the parser implementation. I'm open to naming suggestions. For repos, do you think python-fluent-syntax and python-fluent-bundle would be suitable? Or maybe fluent-syntax.py and fluent-bundle.py? Are there strong conventions in naming repos hosting Python projects? For PyPI, fluent-syntax and fluent-bundle might be good choices, unless there's a way to group all published packages under a single org name, similar to @-scopes in npm? I'm not very familiar with good practices for publishing Python packages, and I'd love to get your advice on this.

I'd be most happy if you agreed to be the maintainer of this new repo and its corresponding package on PyPI. With separate versioning, the bundle development could proceed in its own pace, tied to a specific version of the fluent-syntax dependency. Most code which lands in Mozilla repos is reviewed before it's merged. Both @Pike and I will be happy to serve as first reviewers of changes landing in the new repo.

As far as this PR goes, I suggest that we keep it open for now so that we have a space to discuss the code. Expect review comments from @Pike, @zbraniecki and myself next week. When the review is completed, let's close this PR and land the code in the new repo.

Let me know if this sounds like a good plan. Thanks again!

@stasm
Copy link
Contributor

stasm commented Nov 3, 2018

I'd like to suggest that we land the bundle implementation in a new repository in the projectfluent org on GitHub, use the package-namespace directory layout there too, and publish it as a separate distribution package on PyPI.

Or we could land it here but change the directory structure such that each distribution lives in its own directory, together with its setup.py:

fluent-syntax
    setup.py
    fluent
        __init__.py
        syntax
            __init__.py
            ...
fluent-bundle
    setup.py
    fluent
        __init__.py
        bundle
            __init__.py
            ...

I think it would allow us to publish each distribution independently, e.g. cd fluent-syntax; python setup.py ....

part_count = attr.ib(default=0)


def resolve(context, message, args, errors=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a while away, I have my own comment here!

Instead of passing context to resolve, I think it would probably be nicer if we passed the things from MessageContext that are actually needed, namely:

  • the message and term dicts, possibly combined into a single dict.
  • locale
  • use_isolating
  • functions

This would be a bit more verbose, but it would have the effect that the resolver code no longer needs to access internals of MessageContext (e.g. ._babel_locale). Also plural_form_for_number would then come off MessageContext (which isn't a great home for it) and go somewhere better.

It would also make this code similar to the equivalent code in the compiler implementation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that passing a context is quite safe here and I saw Stas using that convention around

I do agree that a single dict for all IDs would be useful, but instead of passing a locale, I'd recommend passing a locale fallback chain as provided to the FluentBundle constructor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zbraniecki - the problem with passing a locale fallback chain is that resolve would then have to convert this to a Babel locale object every time it is called, in other words for every single call to format (even though it will always return the same thing), rather than doing it once up front.

return self._messages[message_id]

@cachedproperty
def plural_form_for_number(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem a great place for this function, see my comment here - https://github.com/projectfluent/python-fluent/pull/67/files#r231033619

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'd prefer not to put it here if possible. Happy to see it in a follow up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could put it as a prop in the constructor?

extra_requires = ['singledispatch>=3.4']
else:
extra_requires = []

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will give us issues in creating wheels - we'll need different wheels for Python 2, Python 3 < 3.4 and Python 3 >= 3.4 due to different requirements, which is a bit annoying, but possible, as per PEP 425.

@spookylukey
Copy link
Collaborator Author

@stasm I'm not experienced with creating namespace packages, so not sure what to recommend. Both of the options you gave seem sensible to me.

I initially leant to having a single repo with two setup.py files. However, if we want to have independent development, then it is annoying if PRs for the parser get snagged on the fact that the build now fails due to FluentBundle breaking. We probably also want docs to be separate - most people who want to use FluentBundle do not actually care about the parser and serializer, and vice versa. So I think we will have less friction with the rest of the tool ecosystem (Travis, Read The Docs etc) if we have separate repos, and that is probably the right way to go.

Regarding repo names, including .py in the name is not a normal convention, I think we should avoid that. python-fluent-syntax and python-fluent-bundle seem good names for repos, and fluent-syntax and fluent-bundle for packages, you often see similar things. I thought about python-fluent-parser, but there is also the serializer and AST definition, so syntax seems good, we should just make sure the package short description contains "parser" because I think it that is what most people will search for.

@stasm
Copy link
Contributor

stasm commented Nov 6, 2018

However, if we want to have independent development, then it is annoying if PRs for the parser get snagged on the fact that the build now fails due to FluentBundle breaking.

Would this still be a problem if the bundle package always used published versions of the syntax package?

We probably also want docs to be separate - most people who want to use FluentBundle do not actually care about the parser and serializer, and vice versa.

We could still have docs/bundle and docs/syntax. They'd be hosted under the same RTD subdomain, but I see that as a feature. It will make linking and housekeeping easier.

I think I'm leaning towards keeping everything in a single repository. We already do this with fluent.js and I find it convenient that all issues live in the same place. This approach also makes it easier to create milestones and GitHub projects which comprise issues from all affected packages. For instance, we could set up a single short-lived project board tracking the implementation of Syntax 0.8 in both packages, or decide to use two project boards, one for each project.

It's also possible to use a CODEOWNERS file in the single repo scenario to define who is maintaining which directory, and automate review requests.

Lastly, it's easier to perform housekeeping tasks in a single repo. Updating READMEs, linting code, etc. can all be done with a single PR.

@spookylukey
Copy link
Collaborator Author

Would this still be a problem if the bundle package always used published versions of the syntax package?

That's a good solution, I hadn't thought of that. The other things you mentioned sound fine, if you think it would be easier overall to have a single repo let's go with that. In fact if we decide to split at some point we can still do that, and it will be easier than trying to merge I imagine.

@Pike
Copy link
Contributor

Pike commented Nov 6, 2018

I really think we should go for separate repos. Automation is going to be way easier to set up, and it'll be clear what tests to run, and what to blame for bustages.

Also, it took me a a bit to find that you could actually refer to subdirectory for VCS install links, https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support. First time I come across them.

I would find having separate bug/issues lists probably also beneficial. Even more so as you can move issues between repos now.

Happy to help setting up the repository structure for a corresponding repo.

A few highlevel comments I remember from looking at the code.

I'd throw away python 3.4 and older support, I think everybody these days is going for 2.7 and 3.5+?

We also talked about perf a while back, in the meantime I came across https://github.com/benfred/py-spy, which also works on macs :-)

Running the perf tests, I saw that the dispatch code was very hot. I've toyed around removing that for plain text messages, but I think the top-level dispatch on message/term was still hurting. Probably also something that can be lifted?

I did entertain the idea to have the parser generate custom and possibly optimized ASTs, where you could pass in a "resolver" AST module, which would have .resolve methods, but could also optimize out deep AST hierarchies that are not needed at parse time. That's not hard to add to the parser, but adds another difference between the js and the python parser, so we should look at that separately. That AST would also be part of fluent.bundle, though.

I also had a glance on the new dependencies. In the meantime, I've started using attrs on one of my own projects, with mixed emotions ;-). Some things I like, some things don't work as nicely in practice as in theory (the repr stuff, in particular goes large easily on nested objects). I glanced at trying to find something more focused and lightweight than babel for intl/cldr wrapping, but there doesn't seem to be anything maintained? The weight of those dependencies is an argument why I lobbied to have the resolver in a different package behind the scenes.

@spookylukey
Copy link
Collaborator Author

@Pike - regarding performance, I've done almost nothing in this branch, because I put all my effort into the compiler implementation instead - this is always going to be way faster, and if you want a fast implementation you should be using that instead. For the interpreter/evaluator clarity is most important IMO. Of course if there are easy performance gains that we should implement those - a single special case for plain text messages at the entry point probably wouldn't hurt clarity much.

Copy link
Collaborator

@zbraniecki zbraniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking this looks great!

I'm comfortable landing this as my comments are rather minor and can be resolved in follow ups and most of them are implementation details that we can iron our over time.

What I think should happen soon is the naming scheme update to match the JS/Rust implementations, but even that can comfortably land in a follow up (before a release tho!)

>>> translated
"Hello, name!"
>>> errs
[FluentReferenceError('Unknown external: name')]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a side note for @stasm mostly, but every time I put a readers hat on I see this as a flaw that we place the id as a raw string. I'd love us to consider { name } and { $name } here at least.

return babel.plural.to_python(self._babel_locale.plural_form)

@cachedproperty
def _babel_locale(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not resolve it in the constructor?

part_count = attr.ib(default=0)


def resolve(context, message, args, errors=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that passing a context is quite safe here and I saw Stas using that convention around

I do agree that a single dict for all IDs would be useful, but instead of passing a locale, I'd recommend passing a locale fallback chain as provided to the FluentBundle constructor.

fluent/utils.py Outdated
@@ -0,0 +1,43 @@
class cachedproperty(object):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a value of that over typical lazy override where you define a property as a getter that after initial execution overrides the property with its result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cachedproperty is quite neat in that it has zero overhead after it has run once (not even a function call, due to the way that Python look things up in object/class dicts), plus it is neat to implement. (Maybe that is what you meant by "overriding the property with its result", not sure, cachedproperty simply wraps this behaviour up neatly).

However, I don't think we really need it at all - the only use for this laziness was not really necessary.

@zbraniecki zbraniecki requested a review from stasm December 11, 2018 04:39
@zbraniecki
Copy link
Collaborator

I'm adding @stasm as a reviewer since he said he'd like to look at the resolve before we land this.

Stas also wanted to update the directory structure and I'm not sure if it's something we want to do before landing this (and would require rebasing) or after (I'd vote for after ;)).

@stasm stasm mentioned this pull request Dec 18, 2018
@stasm
Copy link
Contributor

stasm commented Dec 18, 2018

I opened #79 to discuss the planned changes to the directory structure. @spookylukey I'd love your input there and I'd also like to coordinate the order of landing. My understanding is that this PR needs to be rebased on top of master anyways. If we do it with the current directory structure, it will also require implementing changes introduced by Syntax 0.8, because that's what the fluent.syntax module supports right now on master. Or, we can first fix #79 which will allow fluent.bundle to pin an older version of fluent.syntax as a dependency. Syntax 0.8 can then be implemented in fluent.bundle on its own timeline.

@stasm
Copy link
Contributor

stasm commented Jan 17, 2019

@spookylukey Should we close this in favor of #81?

@spookylukey
Copy link
Collaborator Author

spookylukey commented Jan 17, 2019

@spookylukey Should we close this in favor of #81?

Yes, I was leaving it open in case there were any more comments on the code, not sure if closing affects that.

@spookylukey spookylukey merged commit 9d84579 into projectfluent:master Jan 18, 2019
@spookylukey spookylukey deleted the implement_format branch March 3, 2019 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants