Avoid forks to improve performance (especially on Cygwin) #298

m0vie · 2016-04-22T19:30:16Z

This pull-request improves z-sy-h performance by avoiding all forks.

Especially on Cygwin the improvement is extreme.

The only drawback so far seems to be that "hashed" commands cannot be distinguished from normal "commands" anymore -- which I think is very minor.

Anyways, this is more of a proof-of-concept.

danielshahaf · 2016-04-23T06:44:39Z

By and large looks good.

_zsh_highlight_command_type_cache is populated in the main highlighter but cleared in the driver; I guess it should be cleared in the main highlighter, never used in the driver, and accordingly renamed _zsh_highlight_main_…?

I've merged the first commit. I have some specific review questions on the rest, but first:

Why do you say it's a WIP/PoC? Other than not distinguishing hashed commands from commands?

danielshahaf · 2016-04-23T06:45:41Z

I've merged the first commit.

It's now 38c8fbe.

m0vie · 2016-04-24T15:13:38Z

_zsh_highlight_command_type_cache is populated in the main highlighter but cleared in the driver; I guess it should be cleared in the main highlighter, never used in the driver, and accordingly renamed zsh_highlight_main…?

That would be nicer, yes. But right now there the there is no api function for the individual highlighters called from _zsh_highlight_preexec_hook.

Why do you say it's a WIP/PoC? Other than not distinguishing hashed commands from commands?

This was all done very quickly. But it seems to work a lot better and faster than I originally expected.
I've been using this for a few weeks now without any problems.
I guess the WIP prefix can be removed. ;-)

danielshahaf · 2016-04-24T15:48:05Z

I guess the WIP prefix can be removed. ;-)

Done. Review:

First commit: typo in comment (parameters → parameter). Style nitpick: perhaps append a || true to be clear that an error is expected and ignored.
Second commit: s/local local/local/. The incumbent code makes the assumption that the value of an alias is a command name, which isn't true in general; your patch extends that assumption to the helper's name and docstring. It would be nice to fix this, but this won't block merging the PR.
Third commit: I propose to test ${+commands[…]}, since even taking PATH_DIRS into account, the check can false negative but can't false positive; and to fall back to REPLY=$(type -w …), which will cover both PATH_DIRS and possible future "new types". (There's no need for the none case, then; ~~it will also handle hashed commands correctly~~.) I realize this will require a fork if the command word has a slash, would that be a problem?
The use of (e) in associative array keys is harmless but not required since the keys are parameter expansions.

edited: Stroke out incorrect statement

danielshahaf · 2016-04-24T15:58:59Z

Shouldn't $+saliases[(e)${1#*.}] be $+saliases[(e)${1##*.}]?

% alias -s foo.bar=deep bar=shallow 
% type 1.foo.bar 
bar is a suffix alias for shallow

danielshahaf · 2016-04-25T16:31:26Z

I'd like to merge this and #301 next.

danielshahaf · 2016-04-29T15:18:22Z

This now conflicts due to 0893296 / 341a3ae.

m0vie · 2016-04-29T21:34:05Z

First commit: typo in comment (parameters → parameter). Style nitpick: perhaps append a || true to be clear that an error is expected and ignored.

Done.

Second commit: s/local local/local/. The incumbent code makes the assumption that the value of an alias is a command name, which isn't true in general; your patch extends that assumption to the helper's name and docstring. It would be nice to fix this, but this won't block merging the PR.

Done.

Third commit: I propose to test ${+commands[…]}, since even taking PATH_DIRS into account, the check can false negative but can't false positive; and to fall back to REPLY=$(type -w …), which will cover both PATH_DIRS and possible future "new types".

I'm not a 100% sure where we left off in the IRC discussion. Do you want to go forward with the future-proof approach of using if type -w; then REPLY=$(type -w …), or merge it like it is now? Actually, even then we wouldn't be future-proof since we still had to change the script anyways to actually support potential new types in different highlighting. I'd prefer it this way for the consequent "no forks. ever.".

The use of (e) in associative array keys is harmless but not required since the keys are parameter expansions.

I think it's best to leave them in, to be on the safe side.

Shouldn't $+saliases[(e)${1#.}] be $+saliases[(e)${1##.}]?

Right. Fixed.

danielshahaf · 2016-04-29T23:30:25Z

Third commit: I propose to test ${+commands[…]}, since even taking PATH_DIRS into account, the check can false negative but can't false positive; and to fall back to REPLY=$(type -w …), which will cover both PATH_DIRS and possible future "new types".

I'm not a 100% sure where we left off in the IRC discussion. Do you want to go forward with the future-proof approach of using if type -w; then REPLY=$(type -w …), or merge it like it is now? Actually, even then we wouldn't be future-proof since we still had to change the script anyways to actually support potential new types in different highlighting. I'd prefer it this way for the consequent "no forks. ever.".

First of all, let me post here the patch I devised during our IRC discussion:

--- a/highlighters/main/main-highlighter.zsh
+++ b/highlighters/main/main-highlighter.zsh
@@ -96,6 +96,7 @@ _zsh_highlight_main__type() {
   if (( $#options_to_set )); then
     setopt localoptions $options_to_set;
   fi
+  unset REPLY
   if zmodload -e zsh/parameter; then
     if (( $+aliases[(e)$1] )); then
       REPLY=alias
@@ -107,12 +108,11 @@ _zsh_highlight_main__type() {
       REPLY=function
     elif (( $+builtins[(e)$1] )); then
       REPLY=builtin
-    elif builtin type -w $1 >/dev/null 2>&1; then
+    elif (( $+commands[(e)$1] )); then
       REPLY=command
-    else
-      REPLY=none
     fi
-  else
+  fi
+  if ! (( $+REPLY )); then
     REPLY="${$(LC_ALL=C builtin type -w -- $1 2>/dev/null)#*: }"
   fi
   _zsh_highlight_command_type_cache[(e)$1]=$REPLY

Now, "no forks ever" seems to me like spilling the baby with the bathwater: on most platforms, forks aren't expensive, so a single fork in a rare codepath would be preferable to giving an actively wrong answer (not just "I don't know" instead of "{new-in-5.3 thing}", but "{command}" instead of "{new-in-5.3 thing}"). The point about future-proofness is a legitimate one but it can be handled by changing the last assignment to $REPLY in the above diff to REPLY=typedashwoutput-"${$(LC_ALL=C …)}", i.e., by namespacing the output of type -w to avoid style name clashes.

Keep in mind that there is an open feature request to make the command word as it is being typed a different color than green and red if it is the prefix of a valid command word; that can be implemented without forks (just by iterating once over the zsh/parameter hashes) and should make the fork a rare codepath.

So, while I would listen to an argument that the fork should be avoided on cygwin or made opt-outable, I think the default everywhere else should be to try $(type -w) before giving up, and even on cygwin I'd like to avoid confidently claiming "This word is a command" when that might not be the case.

The use of (e) in associative array keys is harmless but not required since the keys are parameter expansions.

I think it's best to leave them in, to be on the safe side.

+1

danielshahaf · 2016-05-01T00:30:29Z

@m0vie points out that current master doesn't map "type -w returns 0 but doesn't output any of the known strings" correctly. Something like this could handle it:

diff --git a/highlighters/main/main-highlighter.zsh b/highlighters/main/main-highlighter.zsh
index a0f8dba..9247f68 100644
--- a/highlighters/main/main-highlighter.zsh
+++ b/highlighters/main/main-highlighter.zsh
@@ -330,7 +330,8 @@ _zsh_highlight_main_highlighter()
         *': function')  style=function;;
         *': command')   style=command;;
         *': hashed')    style=hashed-command;;
-        *)              if _zsh_highlight_main_highlighter_check_assign; then
+        *)              if [[ $res != *': none' ]] ; then style=NEWTHING ; else 
+                        if _zsh_highlight_main_highlighter_check_assign; then
                           style=assign
                           if [[ $arg[-1] == '(' ]]; then
                             in_array_assignment=true
@@ -383,6 +384,7 @@ _zsh_highlight_main_highlighter()
                             style=unknown-token
                           fi
                         fi
+                        fi
                         ;;
       esac
      fi

But what should this case funnel to? What should NEWTHING be? [command], [unknown-token], [new-command-type-from-the-future]?

edit: This is now tracked separately as #316.

Found-by: m0viefreak (in #298)

danielshahaf · 2016-05-13T02:08:49Z

Review of latest version (cb99b23):

Fourth (ed003db):
1. Log message typo zsh/paramter (twice)
2. The test does not "issue a warning". The test used to PASS and is now an XFAIL (expected failure). Please use correct terminology.
3. The precise fallback order of _zsh_highlight_main_add_region_highlight commandtypefromthefuture-$res commandtypefromthefuture command still needs to be decided. (This is a design issue wider than just this patch; it can be decided post-merging too, so long as it's decided before release.)
Fifth (cb99b23):
1. What is the purpose of the cache? What use-case does it optimize? I believe its design purpose is to fast-path the "Unrecognised command word" case, to avoid forking $(type -w) once per highlighting pass. (It would also help systems where command substitutions (fork()) are expensive and zsh/parameter is unavailable.)
  
  What use-cases will generate large numbers of cache hits? That is: How common is it to have an unrecognised command word passed to type -w more than once? When that happens, what is the performance cost of the second-and-beyond $(type -w) calls (which the cache seeks to avoid)?
  
  How common will it be for a unrecognised command word to be passed to type -w after [#148] command being typed as neither green nor red #244 is merged?
2. Shouldn't the clear-screen widget clear the cache? (Suppose somebody types rsync foo and the command name shows in red, so they install rsync in a different terminal, and then press Ctrl+L with the expectation that rsync will be recolored green.)

I'm sorry for not realizing these command type cache questions earlier, but in any case, I think we can merge the first four commits (with the log message fixes), close this issue, and open two spin-off issues for (a) finalizing the fallback order, (b) adding the command cache.

So, let me know once you've fixup'd the fourth? Or I can do that if you prefer.

* commit '2f18ba': 'main': use zsh/parameter to resolve alias driver: load zsh/parameter if available

danielshahaf · 2016-05-13T02:33:52Z

I've merged the first two commits in f146651. I haven't rebased them but I did change the "area:" leader of the log messages to match convention.

m0vie · 2016-05-13T18:02:29Z

Log message typo zsh/paramter (twice)

Done.

The test does not "issue a warning"

Done.

What is the purpose of the cache?

The cache is actually quite an improvement. Consider long lines such as:

somecommandthatdoesnotexist foo bar; you keep typing here...

That would result in an execution of

_zsh_highlight_main__type somecommandthatdoesnotexist

every time you type a letter, each time calling type -w, which does a full PATH search. I don't have exact numbers right now, but the cache was a HUGE improvement on cygwin, and also very noticeable on native Linux when the BUFFER was large.

Shouldn't the clear-screen widget clear the cache? (Suppose somebody types rsync foo and the command name shows in red, so they install rsync in a different terminal, and then press Ctrl+L with the expectation that rsync will be recolored green.)

Even zsh so won't notice that there are new commands available instantly:
Shell 1: type foo
Shell 2: create foo (+x) in PATH
Shell 1: hit RETURN (also after ^L):

`zsh: correct 'foo' to 'fop' [nyae]?``

So IMHO that's not a big deal since it never worked before either.

... and open two spin-off issues for (a) finalizing the fallback order ... (b) adding the command cache

Yes, sounds good to me. But I'd really like the get both commits merged first. The details we have to think about are very minor and it's probably easier to do built on top of master.

danielshahaf · 2016-05-13T18:06:27Z

That would result in an execution of

_zsh_highlight_main__type somecommandthatdoesnotexist

every time you type a letter

Yes, I realized this was the use-case that would most benefit from the cache; I just wondered how often it occurred in practice.

Even zsh so won't notice that there are new commands available instantly:

Good point: there's no reason to clear the cache more often than every time rehash runs... which by default is never, so no need to hook to clear-screen.

Yes, sounds good to me. But I'd really like the get both commits merged. The details we have to think about are very minor and it's probably easier to do built on top of master.

Okay, so let's merge it without any fallback now (just ..._add_highlight commandfromthefuture-$res). This doesn't actually matter, but it'll probably make the future discussions about fallback order easier.

Thanks!

m0vie · 2016-05-13T18:08:38Z

Okay, so let's merge it without any fallback now (just ..._add_highlight commandfromthefuture-$res)

Okay. changed.

…hlight

If possible, try to use the zsh/parameter module to get information about a shell words. This avoids subshells and is a huge speed improvement on systems such as cygwin. Note 1: $commands does not know about PATH_DIRS. So in case PATH_DIRS is set, 'type -w' is still used if nothing else matches. Note 2: zsh/parameter can't distinguish between 'command' and 'hashed'. Adjusted the test for that case to XFAIL. The ideal solution would be if whence had an option to put the result in REPLY instead of printing it to stdout.

danielshahaf · 2016-05-13T18:42:11Z

Merged 12b879c and its parent. That left two outstanding questions:

Fallback order Forward compatibility: highlight unknown type -w outputs correctly #316 (comment)
The command type cache (currently 957a8b4) hasn't been merged yet

m0vie · 2016-05-13T19:04:43Z

command type cache is tracked in #320.

psprint · 2016-05-13T21:09:24Z

I've compared speeds:

./parse.zsh parse_bash.zsh > 2  2,10s user 0,58s system 92% cpu 2,905 total

vs

./parse.zsh parse_bash.zsh > 1  1,93s user 0,16s system 99% cpu 2,077 total

So almost 0.2 sec. Seems not much, but everything feels to be much faster now.

danielshahaf · 2016-05-14T02:13:09Z

So almost 0.2 sec.

Shouldn't you include system time in your count? The whole purpose of the PR was to eliminate syscalls, and users care about wallclock time, not about userland time (despite the name of the latter).

Seems not much, but everything feels to be much faster now.

:-)

psprint · 2016-05-14T05:53:41Z

Shouldn't you include system time in your count? The whole purpose of the PR was to eliminate syscalls, and users care about wallclock time, not about userland time (despite the name of the latter).

True, total time counts, so there is significant gain almost 0.9 sec

PS. As for the why did I look at user time: it corresponds closely to what zprof reports. I once did vast amount of tests with zprof, and also with time ( ), and user time was closely the same as total running time of main function in zprof.

m0vie mentioned this pull request Apr 22, 2016

Unusably slow on Cygwin #153

Closed

danielshahaf added the Feature label Apr 23, 2016

m0vie force-pushed the p_forks branch from 6c28ee7 to 4e35ed6 Compare April 24, 2016 15:08

danielshahaf changed the title ~~WIP: Avoid forks to improve performance (especially on Cygwin)~~ Avoid forks to improve performance (especially on Cygwin) Apr 24, 2016

danielshahaf mentioned this pull request Apr 25, 2016

Use ZSH_HIGHLIGHT_STYLES keys #301

Closed

m0vie force-pushed the p_forks branch from 4e35ed6 to 1e362f7 Compare April 29, 2016 21:34

This was referenced May 9, 2016

zsh-syntax-highlighting does not like the alias '=' #263

Closed

Optimization that in my tests shows 2.7 sec -> 2.2 sec #312

Closed

danielshahaf added the performance label May 12, 2016

danielshahaf mentioned this pull request May 12, 2016

Forward compatibility: highlight unknown type -w outputs correctly #316

Closed

m0vie force-pushed the p_forks branch 3 times, most recently from c0e9b2e to 4918091 Compare May 12, 2016 18:59

danielshahaf added a commit that referenced this pull request May 12, 2016

'main': Don't leak the PATH_DIRS options.

7b82b88

Found-by: m0viefreak (in #298)

m0vie force-pushed the p_forks branch 2 times, most recently from 2b628ec to cb99b23 Compare May 12, 2016 23:43

danielshahaf added a commit that referenced this pull request May 13, 2016

Merge part of PR #298

f146651

* commit '2f18ba': 'main': use zsh/parameter to resolve alias driver: load zsh/parameter if available

m0vie force-pushed the p_forks branch from cb99b23 to 4e94463 Compare May 13, 2016 18:02

m0vie force-pushed the p_forks branch from 4e94463 to f29b805 Compare May 13, 2016 18:08

m0vie force-pushed the p_forks branch from f29b805 to b0130ac Compare May 13, 2016 18:23

m0vie added 3 commits May 13, 2016 20:25

'main': Support multiple styles in _zsh_highlight_main_add_region_hig…

61f4325

…hlight

'main': Implement simple command type cache

957a8b4

m0vie force-pushed the p_forks branch from b0130ac to 957a8b4 Compare May 13, 2016 18:25

m0vie closed this May 13, 2016

m0vie deleted the p_forks branch May 14, 2016 13:52

Avoid forks to improve performance (especially on Cygwin) #298

Avoid forks to improve performance (especially on Cygwin) #298

Uh oh!

Conversation

m0vie commented Apr 22, 2016

Uh oh!

danielshahaf commented Apr 23, 2016

Uh oh!

danielshahaf commented Apr 23, 2016

Uh oh!

m0vie commented Apr 24, 2016

Uh oh!

danielshahaf commented Apr 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielshahaf commented Apr 24, 2016

Uh oh!

danielshahaf commented Apr 25, 2016

Uh oh!

danielshahaf commented Apr 29, 2016

Uh oh!

m0vie commented Apr 29, 2016

Uh oh!

danielshahaf commented Apr 29, 2016

Uh oh!

danielshahaf commented May 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielshahaf commented May 13, 2016

Uh oh!

danielshahaf commented May 13, 2016

Uh oh!

m0vie commented May 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielshahaf commented May 13, 2016

Uh oh!

m0vie commented May 13, 2016

Uh oh!

danielshahaf commented May 13, 2016

Uh oh!

m0vie commented May 13, 2016

Uh oh!

psprint commented May 13, 2016

Uh oh!

danielshahaf commented May 14, 2016

Uh oh!

psprint commented May 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

danielshahaf commented Apr 24, 2016 •

edited

Loading

danielshahaf commented May 1, 2016 •

edited

Loading

m0vie commented May 13, 2016 •

edited

Loading

psprint commented May 14, 2016 •

edited

Loading