Skip to content

Introduce FluentResource and MessageContext.addResource #217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zbraniecki opened this issue Jun 4, 2018 · 18 comments
Closed

Introduce FluentResource and MessageContext.addResource #217

zbraniecki opened this issue Jun 4, 2018 · 18 comments

Comments

@zbraniecki
Copy link
Collaborator

The current public API of MessageContext makes it quite challenging to apply onto a dynamic state model like DOM, especially with caching and performance in mind.

If I understand correctly, @stasm wants to keep MessageContext immutable. That's quite possible, if we assume that creating new MessageContexts is cheap.

It will come especially useful in the environment where Fluent works in DOM with lazily loaded Web Components.
In such model we either have an inherently dynamic list of resources per context, or dynamic list of contexts. In the former case, we'd want to start with ["res1.ftl", "res2.ftl"] and then switch to ["res1.ftl", "res2.ftl", "res3.ftl"] without having to reparse res1/res2.
In the latter case, we'd want to start with ["res1.ftl", "rest2.ftl"] and then add a new context ["res1.ftl", "res3.ftl"] without having to reparse res1.

At this point MessageContext is basically a storage for two types of data - a combination of ftl resources plus a memoizer for Intl objects (plus algorithms of to resolve the entries).
I'll talk about memoizer in a separate issue, and here will focus on the set of fluent entries.

If we could allow the user of Fluent to store parsed resources it would become possible to create caches of FTL resources that can be mixed/matched making it cheap to recreate MessageContext whenever the list of resources changes.

One argument Stas brought against this model is that an API that accepts an AST would need such AST to be documented and standardized, and that's a huge effort for something we don't intend to keep frozen.

I'd like to suggest a potential solution for that - a new class FTLResource which has a constructor, and can be accepted into MessageContext. An example of how it might look like:

let res = new FTLResource(source);
let ctx = new MessageContext(locales);

ctx.loadResource(res);

The only piece we need to standardize is the constructor for the FTLResource and one new method which accepts an object of that type.
This allows the res to be cached and fed to new MessageContext with no parsing cost at all making MessageContext` actually just an immutable combination of resources.

The added benefit is that in Firefox scenario, we could easily introduce L10nRegistry.prefetch method to fetch resources eagerly, parse them, and have them ready for when MessageContext is needed. That would be an equivalent of CachedIterable.touchNext except that it wouldn't have to rely on a complete list of resources needed for a context, and wouldn't have to prematurely initialize the context.

Instead, it would react to <link/> being inserted by triggering a prefetch which would load and parse the file into cache.
Then, when MessageContext is constructed it would load already prefetched resources.

It seems like a cleaner and lighter architecture for the DOM needs, without much increase in complexity or API surface.

@stasm, @Pike - thoughts?

@stasm
Copy link
Contributor

stasm commented Jun 5, 2018

It sounds like FTLResource is a MessageContext for a single resource?

What are the benefits of introducing this new class over using existing MessageContexts to build new instances? In bug 1384236 I suggested a static MessageContext.from(...contexts):

let res1 = new MessageContext(locales);
res1.addMessages(source1);

let res2 = new MessageContext(locales);
res2.addMessages(source2);

// Throw if res1.locales ≠ res2.locales.
let ctx = MessageContext.from(res1, res2);

If we prefer to not throw in the constructor, we could also go for an API which is explicit about the locales of the context. In the snippet below, addMessages can detect its argument as an instance of MessageContext and act accordingly.

let ctx = new MessageContext(locales);
ctx.addMessages(res1);
ctx.addMessages(res2);

@zbraniecki
Copy link
Collaborator Author

What are the benefits of introducing this new class over using existing MessageContexts to build new instances?

Mostly scope. As far as I understand, FTLResource will:

  • never need to be composed of multiple resources
  • won't need any constructor options
  • won't need locales information
  • won't have any logic (resolver etc.)
  • won't have any intl memoization

I agree that it's possible to use a MessageContext to store a single, parsed, FTLResource, but it seems like an overkill.
This may come to play when designing IPC and XRay wrappers (see amount of boilerplate code in https://bugzilla.mozilla.org/show_bug.cgi?id=1425104). Designing IPC around a single container with no public methods should be easier than doing the same for the whole MessageContext.

@Pike
Copy link
Contributor

Pike commented Jun 7, 2018

One thing that I expect from FluentResource is an API to expose Terms, which I wouldn't expect on MessageContext. And looking at the members:

    this.locales = Array.isArray(locales) ? locales : [locales];
    this._terms = new Map();
    this._messages = new Map();
    this._functions = functions;
    this._useIsolating = useIsolating;
    this._transform = transform;
    this._intls = new WeakMap();

would just leave

    this.locale = locale;  // just one
    this._terms = new Map();
    this._messages = new Map();

Not sure if we should even do Maps here, given that we can't do final name collision test to begin with.

@stasm
Copy link
Contributor

stasm commented Jun 7, 2018

I think I'm still not understanding something. @zbraniecki Can you explain what you suggest the internal structure of MessageContext to be after this change? Is it still a flat map of messages and terms? Are they copied from the FluentResource? Or is MessageContext going to store a list of references to FluentResources? In this case, should we be concerned about the increased complexity of hasMessage and getMessage? And are FluentResources immutable? The description of this issue doesn't mention any API to amend the contents of a FluentResource instance. If there isn't any, I think the L10nRegistry will need to do the extra work to track which FluentResources are used in which MessageContexts and update or recreate the contexts when resources change. Otherwise the old resources will leak memory.

@zbraniecki
Copy link
Collaborator Author

Can you explain what you suggest the internal structure of MessageContext to be after this change?

I suggest no change.

Are they copied from the FluentResource?

Yeah, in the mock patch I just do:

  addMessages(source) {
    const [entries, errors] = parse(source);
    this.loadEntries(entries, errors);
    return errors;
  }

  loadResource(res) {
    this.loadEntries(res._entries);
  }

where addMessages parses, and loadResource just unpackages the structure for its entries. The rest looks the same as today.

And are FluentResources immutable?

Yes. You create them once, and pass them around eventually consuming them in MessageContext.

If there isn't any, I think the L10nRegistry will need to do the extra work to track which FluentResources are used in which MessageContexts and update or recreate the contexts when resources change. Otherwise the old resources will leak memory.

I don't think I understand. L10nRegistry will just cache parsed FTLResources in the FileSource. It will invalidate cache when source gets removed/updated.
It will return new MessageContext constructed out of FTLResources in generateContexts.

@stasm
Copy link
Contributor

stasm commented Jun 8, 2018

I recall that on Tuesday you said that FluentResource was meant to reduce the memory consumption of Fluent in Gecko. Do I remember this right? Given that most translations are simple translations and are currently stored as strings, if MessageContext stores their copies then the introduction of FluentResource doesn't really help: we're stil lcopying all those strings to every MessageContext that we create. Am I misunderstanding something?

If this is correct, then I think #220 will be important in reducing the memory footprint because it will make MessageContexts store only references to nodes "owned" by FluentResources.

I'm not opposed to the idea of introducing FluentResource. I hope that my questions help us clarify the goal of introducing a new type to Fluent. I think the first time we talked about caching parsed AST it was because of performance. Then we moved to memory consumption. It now seems like the main benefit will be IPC, at least for now.

@zbraniecki
Copy link
Collaborator Author

I recall that on Tuesday you said that FluentResource was meant to reduce the memory consumption of Fluent in Gecko. Do I remember this right? Given that most translations are simple translations and are currently stored as strings, if MessageContext stores their copies then the introduction of FluentResource doesn't really help: we're stil lcopying all those strings to every MessageContext that we create. Am I misunderstanding something?

I think so.

In the model I'm proposing here, MessageContext does not get cached. Only FTLResource does.
Which means that in the scenario I described where we first load Preferences with 12 resources, and then lazily add the thirteen, we currently will do:

  • Parse 12 files
  • Cache a MessageContext with them
  • Parse 13 files
  • Cache a MessageContext with them

In result we parsed 12 files twice, and cache two contexts, one with 12 files, second with 13.

In my proposal we would:

  • Parse and cache 12 files as FTLResource
  • Create a MessageContext for them
  • Parse one file and cache it as FTLResource
  • Create a MessageContext for all 13

In result we would only cache 13 files, and parse each file once.

@stasm
Copy link
Contributor

stasm commented Jun 21, 2018

I talked to @zbraniecki about this last week. The memory footprint of copying simple strings should be OK when translations are copied from FluentResources to MessageContexts. Strings are primitive but also immutable, which means that the engine is able to optimize them via string interning.

Introducing FluentResource is both:

  • a performance optimization: the FTL files are parsed once and then cached, so we only pay the parsing cost once per resource,
  • a memory optimization: we can build MessageContext instances which refer to AST nodes stored in FluentResources which means that complex AST nodes are stored in memory once, rather than as many times as the resource is parsed.

@stasm
Copy link
Contributor

stasm commented Jun 22, 2018

@zbraniecki also suggested that rather than flatten and copy the translations from FluentResources when a MessageContext is created, we could store a map of identifiers to FluentResources with that message or term.

@stasm
Copy link
Contributor

stasm commented Jun 26, 2018

Here's an early API design which I've been thinking about:

class FluentDocument extends Map {
    constructor(entries, errors = []) {
        super(entries);
        this.errors = errors;
    }
}

class MessageContext {
    // …

    static parseDocument(source) {
        // Alternatively, create the instance
        // of FluentDocument inside of parse().
        let [entries, errors] = parse(source);
        return new FluentDocument(entries, errors);
    }

    addDocument(doc) {
        let errors = [];
        for (let [id, entry] of doc) {
            if (id.charAt(0) === "-") {
                // Add to terms
            } else {
                // Add to messages
            }
        }
        return doc.errors.concat(errors);
    }

    addMessages(source) {
        let doc = this.parseDocument(source);
        return this.addDocument(doc);
    }
}

@stasm
Copy link
Contributor

stasm commented Jun 28, 2018

On Bugzilla, @zbraniecki said (comment):

I'm not particularly fond of a name "FluentDocument", because it doesn't correspond to anything else around it - we don't have CSSDocument or JSDocuments flying around.

We do have CSSStyleSheets and JS Programs (MDN), though. I'm looking for something like that but more suitable to Fluent and the fact that the FTL files contain translated content. I think calling them documents comes pretty close.

@Pike
Copy link
Contributor

Pike commented Jun 28, 2018

I'd move parsing out of MessageContext altogether here. I think the parsing abstraction is better suited for FluentDocument, say in a static FluentDocument.fromSource(), and that'd be used in addMessages.

That would leave MessageContext with holding globals, pseudo, and resource set. It would also enforce the difference between Term and Message. FluentDocument would hold the stuff that's about a single source file.

The Term and Message bit made me wonder a bit, but I think that's good that way, since Terms span multiple source files, and that level is MessageContext fodder. Just to answer my own question publicly.

@stasm
Copy link
Contributor

stasm commented Jun 28, 2018

This sounds good as well. I mostly suggested that parsing be in MessageContext because I thought we said FluentDocument should be about storage rather than logic. I think the reason was IPC. I can't find this anywhere, so maybe I remembered wrong.

@Pike
Copy link
Contributor

Pike commented Jun 28, 2018

As for the naming, I expect the FluentDocument to be completely internal to fluent implementations and platform bindings. Thus, we should try to spend only limited energy on its name. I think we're naming a code-representation of a fluent file. We only have two options so far, AFAICT, Document and Resource. From the dictionary finds on google:

A document is a written, drawn, presented, or memorialized representation of thought. The word originates from the Latin documentum, which denotes a "teaching" or "lesson": the verb doceō denotes "to teach".

I see the opportunity of thought in an html document, not so much in a fluent file.

Resource: a stock or supply of money, materials, staff, and other assets that can be drawn on by a person or organization in order to function effectively.

Also a bit of a load to carry for a poor little fluent file, but closer in my book.

Just my 2 cents as someone that's not a native speaker, so "close in my book" has little to do with the actual English language ;-)

@Pike
Copy link
Contributor

Pike commented Jun 28, 2018

Last comment:

I do think we should name things Fluent in code, in favor of FTL. The latter is a insider joke, and our file extension. Let's not bet too heavy on it. Also, Fluent reads better in CamelCase.

Maybe only do that in cases that are really close to fluent itself, and in bindings and above favor L10n and Localization?

@stasm
Copy link
Contributor

stasm commented Jun 28, 2018

As for the naming, I expect the FluentDocument to be completely internal to fluent implementations and platform bindings.

It's not completely internal. It's the name of the top-level AST node in the spec. I'll file an issue in the spec repo about this. (Update: projectfluent/fluent#149)

@stasm
Copy link
Contributor

stasm commented Jun 29, 2018

We talked about FluentDocument in the meeting today. I summarized the conversation in projectfluent/fluent#149. We're going to continue to use Resource.

@stasm stasm changed the title Introduce FTLResource and MessageContext.loadResource Introduce FluentResource and MessageContext.addResource Jul 11, 2018
@stasm
Copy link
Contributor

stasm commented Jul 11, 2018

This was implemented in #244.

@stasm stasm closed this as completed Jul 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants