Skip to content

module extension: add support for canonical name resolution #2013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gabrielschulhof
Copy link
Contributor

Before attempting to load a module, each provided resolver must be given an
opportunity to examine the name of the requested module without actually
loading it so as to canonicalize it, in case a module can be referred to by
multiple names.

Then, modules are loaded and cached by their canonical name.

@gabrielschulhof gabrielschulhof force-pushed the module-canonical-name branch 2 times, most recently from c9cb700 to 4b53e14 Compare September 17, 2017 21:29
jerry_value_t (*get_canonical_name) (const jerry_value_t name);
bool (*resolve) (const jerry_value_t canonical_name,
jerry_value_t *result);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} jerryx_module_resolver_t;

@gabrielschulhof
Copy link
Contributor Author

@LaszloLango fixed.

Copy link
Member

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good patch overall.

{
jerry_value_t resulting_path;

/* Search name for "./" and "../", follow symlinks, etc., then create resulting_path via jerry_create_string () */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full stop after sentence.

```

We can now load JavaScript files:
```c
static const jerryx_module_resolver_t resolvers[] =
{
/* Consult the JerryScript module resolver first, in case the requested module is a compiled-in JerryScript module. */
jerryx_module_native_resolver,
&jerryx_module_native_resolver,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need & ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because jerryx_module_native_resolver is now a structure stored in a global variable, and this is an array of structure pointers, so I will update the declaration of resolvers a few lines up.

bool return_value = false;

jerry_size_t c_module_name_size = jerry_get_utf8_string_size (name);
jerry_char_t c_module_name[c_module_name_size];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this allocation is simple in source code level, I always has doubts about its efficiency. You need to move a lot of data on the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should I use jmem_heap_alloc_block_null_on_error() instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just asking what is its cost. We could introduce convenience stuff for them (I saw you already created some macros for it).

jerry_value_t ret = 0;
const jerryx_module_resolver_t *resolver_p;
jerry_value_t instances;
jerry_value_t canonical_names_static[5];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use a define for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "use a define"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define CANONICAL_NAMES_LOCAL_SIZE 4 or something. I think 4 is enough (probably 2 is the most frequent in practice).


for (index = 0; index < count; index++)
{
canonical_names[index] = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0? Not undefined?

resolver_p = resolvers_p[index];
canonical_names[index] =
resolver_p->get_canonical_name == NULL ? jerry_acquire_value (name) :
resolver_p->get_canonical_name (name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting alignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I move the else under jerry_acquire_value () I get this from check-vera.sh:

./jerry-ext/module/module.c:235: error: Indentation: 45 -> 4.

So, that's why it's like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use round brackets.

jerryx_module_resolver_t jerryx_module_native_resolver =
{
NULL,
jerryx_resolve_native_module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.get_canonical_name = NULL,
.resolve = jerryx_resolve_native_module,

{
jerry_value_t (*get_canonical_name) (const jerry_value_t name); /**< module name to canonicalize */
bool (*resolve) (const jerry_value_t canonical_name, /**< requested module's canonical name */
jerry_value_t *result); /**< resulting module */
Copy link
Contributor

@martijnthe martijnthe Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your rationale for passing around the name strings as jerry_value_ts?
I think most module resolvers will convert it right back to a C string.
Is it to avoid having to deal with memory management of the C strings?
(Probably a good reason since we can't assume malloc/free...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and I was actually thinking that, for path resolution, you might not even convert the whole value to a C string, but you would rather use fixed size chunks to gradually retrieve the string and maybe assemble the canonical name in a similarly gradual fashion. Then, unless you had a directory with a really, really, really long name, you wouldn't really be allocating all that much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call it streamed path resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another very important reason: the module cache is stored as a JavaScript object, so I have to convert a C string back to a jerry_value_t in order to check if that JavaScript object has a module at the module name's key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, yeah that makes sense.

@gabrielschulhof gabrielschulhof force-pushed the module-canonical-name branch 2 times, most recently from cd7e880 to ac23faa Compare September 23, 2017 14:40
@gabrielschulhof
Copy link
Contributor Author

@martijnthe @LaszloLango @zherczeg I have updated the patch to reflect the changes to the moduling system.

{
jerry_value_t resulting_path;

/* Search name for "./" and "../", follow symlinks, etc., then create resulting_path via jerry_create_string (). */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should emphasize here that the point is if two references points to the same physical file, it should be loaded only once.


/**
* Load a copy of a module into the current context using the provided module resolvers, or return one that was already
* loaded if it is found.
*/
jerry_value_t jerryx_module_resolve (const jerry_char_t *name, const jerryx_module_resolver_t *resolvers, size_t count);
jerry_value_t jerryx_module_resolve (const jerry_value_t name, /**< module's name */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

{
jerry_value_t (*get_canonical_name) (const jerry_value_t name); /**< module name to canonicalize */
bool (*resolve) (const jerry_value_t canonical_name, /**< requested module's canonical name */
jerry_value_t *result); /**< resulting module */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use comments in header files.

{
const jerryx_native_module_t *module_p = NULL;

jerry_size_t name_size = jerry_get_utf8_string_size (canonical_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we call tostring on the name? Or check if it is a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to. This function should not be called directly, even though a pointer to it is available. It should only be called by jerryx_module_resolve () and it should only ever receive a jerry_value_t stored in the array of canonical module names created by jerryx_module_resolve (). In fact, this function is not even documented. Only the structure jerryx_module_native_resolver is documented, and only the availability of it as a global data symbol is documented, not its contents.

jerryx_module_resolve (const jerry_char_t *name, /**< name of the module to load */
const jerryx_module_resolver_t *resolvers_p, /**< list of resolvers */
jerryx_module_resolve (const jerry_value_t name, /**< name of the module to load */
const jerryx_module_resolver_t **resolvers_p, /**< list of resolvers */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

** ? A pointer to a list of pointers which points to the resolvers? Why not an array of resolvers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the resolvers are stored in-contiguously, and we do not want to make copies of them, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they used multiple times? Or why they are scattered?

I think one application use one resolver list, but this is not much of a code size increase, so lleave it be.

Copy link
Contributor Author

@gabrielschulhof gabrielschulhof Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolvers may be declared global in separate .c files (jerryx_module_native_resolver certainly is), and they are collected into an array only once - the place that calls jerryx_module_resolve ().

resolver_p = resolvers_p[index];
canonical_names[index] = ((resolver_p && resolver_p->get_canonical_name != NULL) ?
resolver_p->get_canonical_name (name):
jerry_acquire_value (name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not how we align ?:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty tough not breaking 120 columns with this line. I'm still trying to figure out how to align it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed resolver_p to res_p then it fits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the condition into a bool value rather than shoretning the name.

@@ -61,41 +61,61 @@ const char eval_string5[] =
"}) ();";

static bool
resolve_differently_handled_module (const jerry_char_t *name,
resolve_differently_handled_module (const jerry_value_t name,
jerry_value_t *result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is missing, spaces are wrong.

@gabrielschulhof
Copy link
Contributor Author

@zherczeg I have addressed your review comments.

@gabrielschulhof gabrielschulhof force-pushed the module-canonical-name branch 2 times, most recently from 702fdd4 to 45dbd81 Compare September 25, 2017 10:31
@gabrielschulhof
Copy link
Contributor Author

@zherczeg instead of a boolean I went straight for storing the function pointers in variables.

@gabrielschulhof
Copy link
Contributor Author

... aaand I forgot to key the cache on the canonical name. Fixed now.

@gabrielschulhof
Copy link
Contributor Author

@zherczeg could you or somebody with Travis access please restart the lint job? It seems to have been killed because it had stalled.

@gabrielschulhof
Copy link
Contributor Author

Thanks!

struct jerryx_native_module_t *next_p; /**< pointer to next module in the list */
const jerry_char_t *name_p;
const jerryx_native_module_on_resolve_t on_resolve;
struct jerryx_native_module_t *next_p;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the comments from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zherczeg said we didn't add such /**< comments to the header files. Or did I misinterpret?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think there was a misunderstanding here. We do not add comments to declarations, but only to definitions. The type definitions are usually in the header files, so we must put the comments there. Please don't remove these comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean only for function arguments, not for types.

Copy link
Contributor

@LaszloLango LaszloLango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement

@@ -14,11 +14,13 @@
*/

#include <string.h>
#include "jmem.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not use internal headers in jerry-ext. I don't see what are you using from it. Please remove it if possible.

bool jerryx_module_native_resolver (const jerry_char_t *name, jerry_value_t *result);
typedef struct
{
jerry_value_t (*get_canonical_name) (const jerry_value_t name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to know for sure. Do we, or do we not add /**< comments after structure members and function arguments? How do we document function pointer structure members, since the formal parameters of the function signature also needs to be documented?

@gabrielschulhof
Copy link
Contributor Author

@LaszloLango fixed up the comments and added some _ps.

Copy link
Member

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some style fix.

*/
typedef jerry_value_t (*jerryx_module_get_canonical_name_t) (const jerry_value_t name); /**< The name for which to
* compute the canonical
* name */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align the start of the comment lines to the same column.

bool jerryx_module_native_resolver (const jerry_char_t *name, jerry_value_t *result);
typedef bool (*jerryx_module_resolve_t) (const jerry_value_t canonical_name, /**< The module's canonical name */
jerry_value_t *result); /**< The resulting module, if the function returns
* true */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. And similar to below.


return (!strcmp ((char *) name_string, "batman") ? jerry_acquire_value (name):
(!strcmp ((char *) name_string, "bruce-wayne") ? jerry_create_string ((jerry_char_t *) "batman"):
jerry_create_undefined ()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the style of these lines.

jerry_init (JERRY_INIT_EMPTY);

jerry_value_t batman = jerry_create_string ((jerry_char_t *) "batman");
jerry_value_t bruce_wayne = jerry_create_string ((jerry_char_t *) "bruce-wayne");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope these are not copyrighted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll use Liberator, which is public domain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are still, however, "use at your own risk" characters. - this doesn't sound entirely public

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh Alice and Bob it is.

Copy link
Contributor

@LaszloLango LaszloLango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the style fixes.

@gabrielschulhof
Copy link
Contributor Author

@zherczeg @LaszloLango please have another look. I updated the test to use #define-ed strings to store the alias and the actual name at the top, and I replaced the ternary composition with if/else if/else. I also changed the resolver initializer to use the { .member_name = value } notation.

Before attempting to load a module, each provided resolver must be given an
opportunity to examine the name of the requested module without actually
loading it so as to canonicalize it, in case a module can be referred to by
multiple names.

Then, modules are loaded and cached by their canonical name.

JerryScript-DCO-1.0-Signed-off-by: Gabriel Schulhof [email protected]
@LaszloLango
Copy link
Contributor

still LGTM

@zherczeg
Copy link
Member

Still ok.

@zherczeg zherczeg merged commit 6d53931 into jerryscript-project:master Sep 29, 2017
pmarcinkiew pushed a commit to pmarcinkiew/jerryscript that referenced this pull request Oct 30, 2017
…ipt-project#2013)

Before attempting to load a module, each provided resolver must be given an
opportunity to examine the name of the requested module without actually
loading it so as to canonicalize it, in case a module can be referred to by
multiple names.

Then, modules are loaded and cached by their canonical name.

JerryScript-DCO-1.0-Signed-off-by: Gabriel Schulhof [email protected]
pmarcinkiew pushed a commit to pmarcinkiew/jerryscript that referenced this pull request Oct 31, 2017
…ipt-project#2013)

Before attempting to load a module, each provided resolver must be given an
opportunity to examine the name of the requested module without actually
loading it so as to canonicalize it, in case a module can be referred to by
multiple names.

Then, modules are loaded and cached by their canonical name.

JerryScript-DCO-1.0-Signed-off-by: Gabriel Schulhof [email protected]
pmarcinkiew pushed a commit to pmarcinkiew/jerryscript that referenced this pull request Oct 31, 2017
…ipt-project#2013)

Before attempting to load a module, each provided resolver must be given an
opportunity to examine the name of the requested module without actually
loading it so as to canonicalize it, in case a module can be referred to by
multiple names.

Then, modules are loaded and cached by their canonical name.

JerryScript-DCO-1.0-Signed-off-by: Gabriel Schulhof [email protected]
@gabrielschulhof gabrielschulhof deleted the module-canonical-name branch June 18, 2019 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants