Better Importer interface #192

sbarzowski · 2018-02-18T21:55:16Z

As discussed in #190

sbarzowski · 2018-02-18T21:59:18Z

I haven't tested it beyond existing test suite, yet.

coveralls · 2018-02-18T22:00:11Z

Coverage increased (+0.1%) to 73.418% when pulling bd91264 on sbarzowski:importer into fde815f on google:master.

anguslees · 2018-02-18T23:11:28Z

I don't like the separation/race between canonicalise and import. Apart from regular race conditions, in my case I want to fetch from remote servers - and it is much more efficient to fetch content in the same operation as walking the search path (testing if the content actually exists).

Can we roll the two back into one Import operation, like before this PR? I think we can just document that it is important that FoundHere is in an absolute/canonicalised form.
Edit: Hrm, no I see that it isn't that simple (since we want to look for cache hits before even calling into the importer). .. Perhaps we move the cache into the importer? The only other alternative I can see is to expose more of the search logic so the caller can cache check at the right points (build srcfile-relative path; check abspath cache; attempt to read; check search-relative cache; foreach search path attempt to read; store positive/negative result in cache).

sbarzowski · 2018-02-19T08:52:21Z

@anguslees
If you're using urls to access remote servers, then Canonicalize can be a no-op, the paths are already canonical, i.e. it can be:

func (importer *RemoteImporter) Canonicalize(codePath, importedPath string) (string, error) {
	return importedPath, nil
}

The only requirement for canonical path is that importer knows how to fetch it, there is no need to check that it actually exists.

In case of file importer, I'm aware of the race, but it is completely benign. Jsonnet has every right to show an error if an imported file is deleted - and that's the only thing that can happen. And we need to check for existence in a bunch of locations anyway.

sparkprime · 2018-02-20T21:01:40Z

Don't we want the ability for a Jsonnet file hosted at http://foo/bar.jsonnet to do import "baz.jsonnet" and for that to be resolved to http://foo/baz.jsonnet ?

sparkprime · 2018-02-20T21:03:42Z

I wonder if we could let the importer do cacheing by passing it the cache singleton to use. I also think maybe we should cache the non-existence of files, to speed up future searches for the same thing.

sbarzowski · 2018-02-20T22:34:33Z

Don't we want the ability for a Jsonnet file hosted at http://foo/bar.jsonnet to do import "baz.jsonnet" and for that to be resolved to http://foo/baz.jsonnet ?

It doesn't change anything. The redirections can be completely transparent. In your example http://foo/bar.jsonnet doesn't depend on from where it is imported, so it is already a good canonical path.

Perhaps "canonicalize" gives a wrong idea that it needs to be the sole identifier for given content. Maybe getKey or something would be a better name. The only point is that it no longer depends on from where it was imported.

I wonder if we could let the importer do cacheing by passing it the cache singleton to use.

I think it wouldn't help. It would still need to first get some key to the cache, then check it, then fetch the data. It would make it easier to carry some additional data between getting the key and fetching, but I don't know what it could be.

I also think maybe we should cache the non-existence of files, to speed up future searches for the same thing.

It is something the importer could do internally. I see also other, potentially easier options - like for example caching JPATH lookups separately (still within the importer). Anyway it doesn't change the interface.

anguslees · 2018-02-21T00:47:56Z

I think I'm missing something in this discussion:

The only requirement for canonical path is that importer knows how to fetch it, there is no need to check that it actually exists.

So let's walk through an example:

Let's say that http://foo/bar.jsonnet contains import "baz.libsonnet". I think this means:

Canonicalize("http://foo/bar.jsonnet", "baz.libsonnet") will presumably return "http://foo/baz.libsonnet"
check cache, missing
Import("http://foo/baz.libsonnet") is expected to return the content.

Now the above works fine for simple direct imports, but what if baz.libsonnet needs to be found along the search path? Now "http://foo/baz.libsonnet" is the wrong answer, and I need to go and probe for existence at Canonicalize-time, which is expensive (the local-file equivalent is in tryPath, also called by Canonicalize)

I can of course avoid the second fetch by pre-stuffing the result in a cache during Canonicalize, and making Import always hit this cache - but this seems like I'm working around (not with) this new proposed API. In particular, with this workaround Canonicalize now has side effects, while Import doesn't, and I have to maintain my own shadow-cache :/
It seems like it would have been better just to embed the cache in my Importer in the first place, and return Import to the dumb/simple Import(from, path) -> ImportedData API.

sparkprime · 2018-02-21T05:23:23Z

I don't see how

func (importer *RemoteImporter) Canonicalize(codePath, importedPath string) (string, error) {
	return importedPath, nil
}

could ever work if the importedPath is relative, since it would collide with the result of Canonicalize for different files (ones with different content).

sbarzowski · 2018-02-21T09:45:34Z

I don't see how [...] could ever work if the importedPath is relative...

Ok, I assumed that if you use URLs then you always use an absolute one, hence some misunderstandings. Of course if you want to have imports relative to the source and search path for remote stuff, then it gets more complicated.

sbarzowski · 2018-02-21T11:19:18Z

@anguslees
How would the embedded cache work? My guess is that it would keep contents (or nonexistence) of each URL it tries.

sbarzowski · 2018-02-21T13:00:33Z

Hmmm... I'm getting more and more convinced that we should just leave the caching of file contents to the importer, since we're talking about having some sort of cache in the importer anyway. It's also very flexible.

So now, my plan is to:

Change the Importer interface to Import(from, path) -> (contents, foundAt, err). The first argument (from) would be a path to the file, not its directory like it is now (to satisfy Import path should be opaque #190).
Decouple code cache and data (file contents) cache. The code cache would use foundAt as key, so that each file would be parsed and analyzed exactly once.
The Importer wouldn't care about code cache or any Jsonnet data structures.
It would be recommended that the Importer implements cache, but it will be an implementation detail, the rest of the code won't care about it.

Does it make sense?

sparkprime · 2018-02-22T01:50:55Z

Yes except we should encourage people to cache their I/O, otherwise it might expose the laziness.
If you can keep a simple key/value store in core that makes it trivial for the importer to do it, that would be awesome.

anguslees · 2018-02-22T06:00:38Z

Sgtm.

anguslees · 2018-02-22T06:02:37Z

Oh, I encourage you to break the exisiting function signature in some way to force users to reevaluate their use of this API as part of the upgrade. (ie don't just change the semantics of the first string arg)

sparkprime · 2018-02-22T06:16:34Z

I think we're on first name terms with all of the users of this API so this should be OK :)

Quentin-M · 2018-03-06T18:24:37Z

I just briefly asked @sparkprime in Slack if there is reason MakeImportCache is exported if it doesn't implement the Importer interface?

It seems that the VM is very stateless at this point, but it would be very beneficial to have the imported libraries cached across multiple evaluate calls. When working with libraries such as ksonnet-lib, it takes ~1.9s on an Intel i7 (Kaby Lake) @ 3.1 GHz, just to evaluate the following snippet. Those quickly add up when generating several files and it would be silly to use create a snippet that aggregate those files to do an evaluate multi for the purpose of caching the libraries, making the operator loose 1/ ability to debug the snippets easily 2/ ability to pass different ExtVar/ExtCode/TLA to those snippets 3/ file-grained control over what gets executed and when 4/ etc.

local k = import "k.libsonnet";
k.core.v1.list.new([])

I feel it would be reasonable to cache those in the VM/Importer.
I do not thing that the libraries would usually change during the lifespan of a VM.
If they do, the operator is most-likely aware about it, and a new VM/Importer instance can be created.

sbarzowski · 2018-03-07T00:29:09Z

@Quentin-M
Thanks, these are good points.

What takes time when importing k.libsonnet is parsing and analyzing it, so code (ast) cache is the important part. Importer will only have bytes-level cache (doesn't know about the code), so further changes will be needed.

sbarzowski · 2018-03-17T09:22:16Z

@sparkprime I've updated it. Now Importer is responsible for data-level caching. The file cache now caches existence/contents of every tried absolute path.

sparkprime · 2018-03-24T19:42:42Z

imports.go

-	importer Importer
+	foundAtVerification map[string]*string
+	codeCache           map[string]potentialValue
+	importer            Importer
 }

 // MakeImportCache creates and ImportCache using an importer.


creates an ImportCache

also capitalize importer

sparkprime · 2018-03-24T19:58:06Z

imports.go

-func (cache *ImportCache) importData(key importCacheKey) *ImportCacheValue {
-	if cached, ok := cache.cache[key]; ok {
-		return cached
+func (cache *ImportCache) importData(codePath, importedPath string) (contents *string, foundAt string, err error) {


Can you be consistent with the parameter names, because in Importer they are importedFrom, path

sparkprime · 2018-03-24T20:04:45Z

imports.go

-}
-
-type importCacheMap map[importCacheKey]*ImportCacheValue
-
 // ImportCache represents a cache of imported data.


Please add:

While the user-defined Importer implementations are required to cache file content, this cache is an additional layer of optimization that caches values (i.e. the result of executing the file content). It also verifies that the content pointer is the same for two foundAt values.

sparkprime · 2018-03-24T20:08:24Z

imports.go

+	//    returned on subsequent calls.
+	// b) for given foundAt, the contents are always the same
+	//
+	// Note that by "the same contents" we mean the same pointer and


It is a little bit weird (if you haven't read the implementation) to have to return the same pointer. We could alternatively say that "If you return the same foundAt for multiple imports, we use the content for the first one." That would mean cacheing the LiteralString in the codeCache.

On the other hand, it's not a big deal for implementers to actually meet these requirements, so I'm happy either way.

sparkprime · 2018-03-24T20:18:14Z

imports.go

-// Import imports a map entry.
-func (importer *MemoryImporter) Import(dir, importedPath string) (*ImportedData, error) {
+// Import fetches data from a map entry.
+func (importer *MemoryImporter) Import(codePath, importedPath string) (contents *string, foundAt string, err error) {


Ditto about parameter naming

sparkprime · 2018-03-24T20:18:21Z

imports.go

-func (importer *FileImporter) Import(dir, importedPath string) (*ImportedData, error) {
-	found, content, foundHere, err := tryPath(dir, importedPath)
+// Import imports file from the filesystem.
+func (importer *FileImporter) Import(codePath, importedPath string) (contents *string, foundAt string, err error) {


Ditto about parameter naming

sparkprime · 2018-03-24T20:26:30Z

imports.go

+			if os.IsNotExist(err) {
+				entry = &fsCacheEntry{
+					exists:   false,
+					contents: "",


Could omit this line

sparkprime · 2018-03-24T20:29:20Z

imports.go

 }

-// Import imports a map entry.
-func (importer *MemoryImporter) Import(dir, importedPath string) (*ImportedData, error) {
+// Import fetches data from a map entry.


The semantics of this are a bit weird now I think -- all paths are treated as absolute keys into the "Data" map. That probably should be documented, although I understand we only use this for tests so it does not matter?

sparkprime · 2018-03-24T20:30:47Z

LGTM, I'd like to hear @anguslees opinions.

@Quentin-M This does not address your request, but that is strictly orthogonal to this (a case of re-using the FileImporter between VM calls). We should do it in a separate PR.

As discussed in google#190

sbarzowski · 2018-04-21T08:57:50Z

@sparkprime I have just applied the suggestions and I changed the interface slightly (Contents struct) to catch some caching problems. This way it's clear what it means that "the contents" are the same, the user has no way to change the underlying string. The same could be done with builtin string (just a string, not a pointer), but to have guaranteed O(1) comparison here, I would need to use reflection to get StringHeader, so I concluded that a wrapper is less ugly.

sparkprime · 2018-04-22T04:37:36Z

LGTM @anguslees anything to say here? I'll merge if not.

googlebot added the cla: yes label Feb 18, 2018

sbarzowski mentioned this pull request Mar 7, 2018

Preserving cache between evaluate calls #208

Closed

sbarzowski force-pushed the importer branch from 446d177 to a4cec21 Compare March 15, 2018 19:07

sparkprime reviewed Mar 24, 2018

View reviewed changes

imports.go Outdated

if os.IsNotExist(err) {

entry = &fsCacheEntry{

exists: false,

contents: "",

Copy link

Collaborator

sparkprime Mar 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could omit this line

sparkprime reviewed Mar 24, 2018

View reviewed changes

sparkprime approved these changes Mar 24, 2018

View reviewed changes

sbarzowski force-pushed the importer branch from a4cec21 to 77db59b Compare April 21, 2018 06:32

Better Importer interface

bd91264

As discussed in google#190

sbarzowski force-pushed the importer branch from 77db59b to bd91264 Compare April 21, 2018 08:29

sparkprime merged commit f4428e6 into google:master Apr 28, 2018

Better Importer interface #192

Better Importer interface #192

Uh oh!

Conversation

sbarzowski commented Feb 18, 2018

Uh oh!

sbarzowski commented Feb 18, 2018

Uh oh!

coveralls commented Feb 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anguslees commented Feb 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbarzowski commented Feb 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sparkprime commented Feb 20, 2018

Uh oh!

sparkprime commented Feb 20, 2018

Uh oh!

sbarzowski commented Feb 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anguslees commented Feb 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sparkprime commented Feb 21, 2018

Uh oh!

sbarzowski commented Feb 21, 2018

Uh oh!

sbarzowski commented Feb 21, 2018

Uh oh!

sbarzowski commented Feb 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sparkprime commented Feb 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anguslees commented Feb 22, 2018

Uh oh!

anguslees commented Feb 22, 2018

Uh oh!

sparkprime commented Feb 22, 2018

Uh oh!

Quentin-M commented Mar 6, 2018

Uh oh!

sbarzowski commented Mar 7, 2018

Uh oh!

sbarzowski commented Mar 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sparkprime commented Mar 24, 2018

Uh oh!

sbarzowski commented Apr 21, 2018

Uh oh!

sparkprime commented Apr 22, 2018

Uh oh!

Uh oh!

coveralls commented Feb 18, 2018 •

edited

Loading

anguslees commented Feb 18, 2018 •

edited

Loading

sbarzowski commented Feb 19, 2018 •

edited

Loading

sbarzowski commented Feb 20, 2018 •

edited

Loading

anguslees commented Feb 21, 2018 •

edited

Loading

sbarzowski commented Feb 21, 2018 •

edited

Loading

sparkprime commented Feb 22, 2018 •

edited

Loading