Skip to content

Conversation

psprint
Copy link
Contributor

@psprint psprint commented May 11, 2016

To play around with this, clone my repo with extracted parsers:

https://github.com/psprint/zsh-tools/

and compare files: parse_len.zsh (current code) and parse_new.zsh (the optimization from this PR). Example run:

time ./parse_len.zsh parse_bash.zsh > out_len.txt

My make test doesn't work, however I compared outputs of the extracted parsers and they are the same.

More, it didn't feel that the indexing is the problem when I coded. This speed up is for large file (I parse the parser itself), not sure if pasting a one liner will show any difference, but it should.

@psprint
Copy link
Contributor Author

psprint commented May 11, 2016

If someone would code $REPLY-using type for upstream (no fork), then we would gain 0.8 sec and go below 1.5 sec in the test.

PS. As for the one liners (I mean: pasting a screen wide, i.e. not very short command), the type fork might have more impact there. There are three loop runs for as simple command as "mplayer -fs 15.avi" meaning three forks.

danielshahaf added a commit that referenced this pull request May 11, 2016
@danielshahaf
Copy link
Member

Summary from IRC:

diff --git a/highlighters/main/main-highlighter.zsh b/highlighters/main/main-highlighter.zsh
index a0f8dba..b001422 100644
--- a/highlighters/main/main-highlighter.zsh
+++ b/highlighters/main/main-highlighter.zsh
@@ -199,6 +199,10 @@ _zsh_highlight_main_highlighter()
   #
   local this_word=':start:' next_word
   integer in_redirection
+  # Processing buffer
+  local proc_buf="$buf"
+  # Starting position in previous loop run
+  integer prev_end_pos=0
   for arg in ${interactive_comments-${(z)buf}} \
              ${interactive_comments+${(zZ+c+)buf}}; do
     if (( in_redirection )); then
@@ -234,11 +238,11 @@ _zsh_highlight_main_highlighter()
       # indistinguishable from 'echo foo echo bar' (one command with three
       # words for arguments).
       local needle=$'[;\n]'
-      integer offset=${${buf[start_pos+1,len]}[(i)$needle]}
+      integer offset=${proc_buf[(i)$needle]}
       (( start_pos += offset - 1 ))
       (( end_pos = start_pos + $#arg ))
     else
-      ((start_pos+=(len-start_pos)-${#${${buf[start_pos+1,len]}##([[:space:]]|\\[[:space:]])#}}))
+      ((start_pos+=(len-start_pos)-${#${proc_buf##([[:space:]]|\\[[:space:]])#}}))
       ((end_pos=$start_pos+${#arg}))
     fi

@@ -454,6 +458,9 @@ _zsh_highlight_main_highlighter()
       this_word=':start:'
     fi
     start_pos=$end_pos
+    # start_pos-prev_start_pos is: how many chars did we advance in last loop run
+    proc_buf="${proc_buf[end_pos - prev_end_pos + 1, -1]}"
+    prev_end_pos=end_pos
     (( in_redirection == 0 )) && this_word=$next_word
   done
 }

@danielshahaf
Copy link
Member

There are arrlen improvements discussed upstream , but they haven't been merged to master (let alone released) so we can't rely on them.

@psprint
Copy link
Contributor Author

psprint commented May 12, 2016

I don't think it's cleaner. Looking at start_pos and prev_start_pos is: how many characters did we advance. I'm not sure how to see this in end_pos and prev_end_pos. The goal might be to move the chop-off block to the end of main-highlighter function, that's fine, but I'm more puzzled by end_pos than by the start_pos. The code works correct, however it runs for 2.9s. Tested multiple times, it's 2.2s vs 2.9s. A puzzle, as the code should be equivalent. Maybe location at the end changes something in heap management, dunno (yet).

@psprint
Copy link
Contributor Author

psprint commented May 12, 2016

I've extended the comment on the chop-off line. That said, we might be able to better catch "what's already processed" and change the line, dunno

@danielshahaf
Copy link
Member

The goal might be to move the chop-off block to the end of main-highlighter function, that's fine, but I'm more puzzled by end_pos than by the start_pos.

The goal, as I said on IRC, is code clarity. And I'm surprised that you're confused by the use of $end_pos: your version of the patch simply does end_pos - start_pos + 1 using the values of these two parameters in the previous iteration. (This is due to

, which you yourself pointed out yesterday on IRC.)

Incidentally, simply pointing out the above (either in the code itself, or in code comments) would probably achieve the code clarity I'm seeking.

Anyway, let's hash out these micro refactorings on IRC, that's a better medium than issue comments.

No idea about the 2.2s v. 2.9s difference; it's probably worth investigating, although it may be an upstream issue.

I've extended the comment on the chop-off line. That said, we might be able to better catch "what's already processed" and change the line, dunno

I don't understand the last sentence?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants