-
Notifications
You must be signed in to change notification settings - Fork 243
Better Importer interface #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@petr-k @sparkprime @anguslees I haven't tested it beyond existing test suite, yet. |
I don't like the separation/race between canonicalise and import. Apart from regular race conditions, in my case I want to fetch from remote servers - and it is much more efficient to fetch content in the same operation as walking the search path (testing if the content actually exists). Can we roll the two back into one |
@anguslees
The only requirement for canonical path is that importer knows how to fetch it, there is no need to check that it actually exists. In case of file importer, I'm aware of the race, but it is completely benign. Jsonnet has every right to show an error if an imported file is deleted - and that's the only thing that can happen. And we need to check for existence in a bunch of locations anyway. |
Don't we want the ability for a Jsonnet file hosted at http://foo/bar.jsonnet to do |
I wonder if we could let the importer do cacheing by passing it the cache singleton to use. I also think maybe we should cache the non-existence of files, to speed up future searches for the same thing. |
It doesn't change anything. The redirections can be completely transparent. In your example Perhaps "canonicalize" gives a wrong idea that it needs to be the sole identifier for given content. Maybe
I think it wouldn't help. It would still need to first get some key to the cache, then check it, then fetch the data. It would make it easier to carry some additional data between getting the key and fetching, but I don't know what it could be.
It is something the importer could do internally. I see also other, potentially easier options - like for example caching JPATH lookups separately (still within the importer). Anyway it doesn't change the interface. |
I think I'm missing something in this discussion:
So let's walk through an example: Let's say that
Now the above works fine for simple direct imports, but what if I can of course avoid the second fetch by pre-stuffing the result in a cache during |
I don't see how
could ever work if the importedPath is relative, since it would collide with the result of Canonicalize for different files (ones with different content). |
Ok, I assumed that if you use URLs then you always use an absolute one, hence some misunderstandings. Of course if you want to have imports relative to the source and search path for remote stuff, then it gets more complicated. |
@anguslees |
Hmmm... I'm getting more and more convinced that we should just leave the caching of file contents to the importer, since we're talking about having some sort of cache in the importer anyway. It's also very flexible. So now, my plan is to:
Does it make sense? |
Yes except we should encourage people to cache their I/O, otherwise it might expose the laziness. |
Sgtm. |
Oh, I encourage you to break the exisiting function signature in some way to force users to reevaluate their use of this API as part of the upgrade. (ie don't just change the semantics of the first string arg) |
I think we're on first name terms with all of the users of this API so this should be OK :) |
I just briefly asked @sparkprime in Slack if there is reason It seems that the VM is very stateless at this point, but it would be very beneficial to have the imported libraries cached across multiple
I feel it would be reasonable to cache those in the VM/Importer. |
@Quentin-M What takes time when importing |
@sparkprime I've updated it. Now |
imports.go
Outdated
importer Importer | ||
foundAtVerification map[string]*string | ||
codeCache map[string]potentialValue | ||
importer Importer | ||
} | ||
|
||
// MakeImportCache creates and ImportCache using an importer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creates an ImportCache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also capitalize importer
imports.go
Outdated
func (cache *ImportCache) importData(key importCacheKey) *ImportCacheValue { | ||
if cached, ok := cache.cache[key]; ok { | ||
return cached | ||
func (cache *ImportCache) importData(codePath, importedPath string) (contents *string, foundAt string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be consistent with the parameter names, because in Importer they are importedFrom, path
imports.go
Outdated
} | ||
|
||
type importCacheMap map[importCacheKey]*ImportCacheValue | ||
|
||
// ImportCache represents a cache of imported data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add:
While the user-defined Importer implementations are required to cache file content, this cache is an additional layer of optimization that caches values (i.e. the result of executing the file content). It also verifies that the content pointer is the same for two foundAt values.
imports.go
Outdated
// returned on subsequent calls. | ||
// b) for given foundAt, the contents are always the same | ||
// | ||
// Note that by "the same contents" we mean the same pointer and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little bit weird (if you haven't read the implementation) to have to return the same pointer. We could alternatively say that "If you return the same foundAt for multiple imports, we use the content for the first one." That would mean cacheing the LiteralString in the codeCache.
On the other hand, it's not a big deal for implementers to actually meet these requirements, so I'm happy either way.
imports.go
Outdated
// Import imports a map entry. | ||
func (importer *MemoryImporter) Import(dir, importedPath string) (*ImportedData, error) { | ||
// Import fetches data from a map entry. | ||
func (importer *MemoryImporter) Import(codePath, importedPath string) (contents *string, foundAt string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about parameter naming
imports.go
Outdated
func (importer *FileImporter) Import(dir, importedPath string) (*ImportedData, error) { | ||
found, content, foundHere, err := tryPath(dir, importedPath) | ||
// Import imports file from the filesystem. | ||
func (importer *FileImporter) Import(codePath, importedPath string) (contents *string, foundAt string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about parameter naming
imports.go
Outdated
if os.IsNotExist(err) { | ||
entry = &fsCacheEntry{ | ||
exists: false, | ||
contents: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could omit this line
} | ||
|
||
// Import imports a map entry. | ||
func (importer *MemoryImporter) Import(dir, importedPath string) (*ImportedData, error) { | ||
// Import fetches data from a map entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics of this are a bit weird now I think -- all paths are treated as absolute keys into the "Data" map. That probably should be documented, although I understand we only use this for tests so it does not matter?
LGTM, I'd like to hear @anguslees opinions. @Quentin-M This does not address your request, but that is strictly orthogonal to this (a case of re-using the FileImporter between VM calls). We should do it in a separate PR. |
As discussed in google#190
@sparkprime I have just applied the suggestions and I changed the interface slightly ( |
LGTM @anguslees anything to say here? I'll merge if not. |
As discussed in #190