-
Notifications
You must be signed in to change notification settings - Fork 158
ParseUserAgent, ParseOS and ParseDevice are not using the cache #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Adding a warning to the readme sounds like a good idea, the current caching behaviour is really basic (similar to the old lru cache, the cache is just nuked when full), as well as quite small (20 entries by default), so it would need to be updated more fundamentally to use either a FIFO replacement (which is what A good option might be to stop caching It'd also allow e.g. An interesting alternative would be to keep a bespoke cache (though probably with a better replacement policy than the current) and allow partially filling entries so e.g. It would make |
@Greenscreen23 could you explain how you were using things? Did you explicitly bump up the cache size when using the library? With how small the cache is and the rough replacement policy I wouldn't have expected "significantly" performance degradation. In the meantime I've opened #104 to at least add a note to the readme. |
An even better solution would be to cache them, but *just* caching them seems redundant with the existing cache, so things would need to be rethought. Fixes ua-parser#97
An even better solution would be to cache them, but *just* caching them seems redundant with the existing cache, so things would need to be rethought. Fixes ua-parser#97
An even better solution would be to cache them, but *just* caching them seems redundant with the existing cache, so things would need to be rethought. Fixes ua-parser#97
An even better solution would be to cache them, but *just* caching them seems redundant with the existing cache, so things would need to be rethought. Fixes ua-parser#97
An even better solution would be to cache them, but *just* caching them seems redundant with the existing cache, so things would need to be rethought. Fixes ua-parser#97
An even better solution would be to cache them, but *just* caching them seems redundant with the existing cache, so things would need to be rethought. Fixes ua-parser#97
An even better solution would be to cache them, but *just* caching them seems redundant with the existing cache, so things would need to be rethought. Fixes ua-parser#97
@masklinn I stumbled across this when we had to parse a dataset in one of my courses at the univerity. We had to parse a logfile with ~300.000 user agents, most of them being identical (real world data set). Here are the times I measured with the differnt functions: Using Parse(ua_string)['user_agent]: Using ParseUserAgent(ua_string): I also did not increase the cache size, I just imported the functions and used them. Thanks for opening the pr :) |
Cool, yeah that is significant indeed. Is the dataset public per chance? Is it some sort of log file collected over some period of time? Might be interesting both as a way to build up plain benchmarks, and as a way to investigate changes to caching on something of a real-world dataset. |
And thinking about it more, it does make a lot of sense if the logs are "raw" and each page hit from a given visitor is an entry, identical entries would tend to cluster significantly, meaning even the small cache and simplistic eviction strategy would be helpful. Even more so if you decided to just sort the UAs before processing them. |
It's not public as far as I know and I don't think it will be published. It looks like it was collected on demo servers to include as many interesting user agents, but there are still quite a lot of duplicates and "real world traffic". |
Got it, thanks nonetheless. I guess I'll just add a note for now so unfortunate souls get the information in the meantime, and try to add caching before we cut 0.15, because it's a lot more significant than I would have initially expected. I'll look around for a test dataset to bench with, hopefully one of the sibling project has one somewhere. |
This is an intermediate stopgap before modifying caching so it works with and across all the functions, as the issue does seem pretty significant (cf ua-parser#97)
This is an intermediate stopgap before modifying caching so it works with and across all the functions, as the issue does seem pretty significant (cf #97)
TIL github goes and find "fixes" stanzas in PR descriptions as well as commit messages? That's weird. Anyway as noted above this was not intended to be closed, I want to implement better caching. |
- Adds caching to all Parse* functions, using a unified cache (so the functions can be intermixed, and the cache size applies to everything). - Increase default size of cache, 20 seems very small, 200 doesn't seem too big (though it's completely arbitrary). Tried to play around with random samplings (using non-linear distributions) and it didn't exactly change anything but... - Updated the cache replacement policy to use a FIFO-ish policy. Unified Cache ============= Split the parsers between a caching frontend and a "raw" backend (the old behaviour) to avoid Parse having to pay for multiple cache lookups in order to fill entries. Also decided to have the entries be handed out initialized, so `_lookup` *always* returns an entry, which can be partial. The caller is to check whether the key they handle (for specialised parsers) or all the sub-keys are filled, and fill them if necessary. This makes for relatively straightforward code even if it bounces around a bit. The unified cache makes it so the functions are intermixed and benefit from one another's activity. Also added a `**jsParseBits` parameter to `ParseDevice`: I guess that's basically never used, but `Parse` forwarded its own `jsParseBits`, which would have led to a `TypeError`. Cache Policy ============ The cache now uses a FIFO policy (similar to recent updates to the stdlib's re module) thanks to dict being ordered since 3.6. In my limited bench not showing much (possibly because the workload was so artificial) LRU didn't stat much better than FIFO (based on hit/miss ratios) and FIFO is simpler so for now, FIFO it is. That's easy to change anyway. Anyway the point was mostly that any cache which doesn't blow the entire thing when full is almost certain to be an improvement. A related change is that the cache used to be blown after it had MAX_CACHE_SIZE+1 entries, as it was cleared - on a cache miss - if the cache size was strictly larger than `MAX_CACHE_SIZE`. Meaning the effective size of the cache was 21 (which is a pretty large increment given the small cache size). This has been changed to top out at `MAX_CACHE_SIZE`. Fixes ua-parser#97
Hi,
I was using this library for a project recently, and was really surprised that
ParseUserAgent
was significantly slower thanParse
. After digging a bit into the code, I think this is becauseParseUserAgent
is not using the internal cache, whereasParse
is.As both methods are documented in the README, I was expecting both to have similar performance, with
ParseUserAgent
being a bit faster because it has to parse less. Currently, both methods seem to be recommended for regular use, butParseUserAgent
seems significantly worse thanParse
.In my opinion it might be helpful to document this behavior in the README or change it, because it is currently faster to use
Parse(ua_string)['user_agent']
instead ofParseUserAgent
.The text was updated successfully, but these errors were encountered: