Skip to content

Conversation

jww3
Copy link

@jww3 jww3 commented May 22, 2024

Added the following comment to clarify the rationale for using | over || in Char::IsAsciiLetterOrDigit.

// Use of the non-short-circuiting logical OR operator (|) here is an intentional optimization.
// It avoids a low-level branch instruction.  (In other words, the overhead of branching
// is greater than the cost of simply executing IsBetween.)

@ghost ghost added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 22, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 22, 2024
@huoyaoyuan
Copy link
Member

huoyaoyuan commented May 22, 2024

This can lead to the right-hand side of the expression being evaluated unnecessarily.

The JIT is able to optimize this case because the functions in two branches are trivial.

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACMhgYQYG8aHunHjykDYBAgAbBgElcAQVxgAlvIAyMDBhhQAFGAAW2KAzABKBgF4AfA00BXeQDsMRzdoaEGABgSl3JuAwDk2P4mADymAQBe/gx+gf4A3Fw89EwCQiLiUgBCqgDuMDB22noGYGiGJQz49hJ2YKLWuPIAbjDluvpV2Ai19Y0tMCYWSdw29o4uftV2vQ1NraHhYw5O+N2z/a0xVTV1cwNGidQ8DCO8qYLCYpIycooqahoA8lAAIvIA5vIYxZ3GZpYpLIFMpVOotP83Nk8gUimUAu5/OV/ABOYJHE5nFL8S4ZG7A+5g55vT7fUi/UpDQG3EEPcHaEyEKG4HIYfKFbTIxHItGHGgAXyAA===

| and || provides the same assembly code, but || generates slightly longer IL.

@PaulusParssinen
Copy link
Contributor

This is intentional, see https://github.com/dotnet/runtime/pull/69318/files#r872595998

@jww3
Copy link
Author

jww3 commented May 22, 2024

Ah, I see. Thanks for clarifying! Maybe we should add a comment so that we don't keep revisiting this?

@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented May 22, 2024

In my opinion it does warrant a comment so people don't need to pull out sharplab/disasmo to check why it's like this 😆

I think the rationale here is that the input is not considered to be predictable so we don't try have our stupidly smart modern branch predictors predict it(?). If one case is more common the checks could be ordered and the other version could be tiny bit faster (I think this sorta cmov data-dependecy issue vs a branch in a loop.. but kinda not as the current version doesn't even need conditional mov so it's even better like this..)

Also the IL is much nicer for the importer in the branchless version.

@huoyaoyuan
Copy link
Member

Those operators evaluate the right-hand operand only if it's necessary.

In this case, branching itself has more overhead than evaluating either branch.
For short-circuiting, we usually care more about the side effect of evaluating a branch, including triggering NullReferenceException.

This can be a common sense of advanced optimization, but sometimes I can't notice it.

@jww3
Copy link
Author

jww3 commented May 22, 2024

Thanks! I'll convert this PR into just a PR to add an explanatory comment.

@jww3 jww3 changed the title Use short-circuiting || operator in Char::IsAsciiLetterOrDigit Document the use non-short-circuiting | operator in Char::IsAsciiLetterOrDigit May 22, 2024
@jww3 jww3 changed the title Document the use non-short-circuiting | operator in Char::IsAsciiLetterOrDigit Document the use of non-short-circuiting | operator in Char::IsAsciiLetterOrDigit May 22, 2024
Comment on lines 277 to 279
// Use of the non-short-circuiting logical OR operator (|) here is an intentional optimization.
// It avoids a low-level branch instruction. (In other words, the overhead of branching
// is greater than the cost of simply executing IsBetween.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment looks too verbose for users who can understand it. The position is also questionable with xml doc.

I'd suggest a trailing comment on the same line, just saying "use arithmetic or to avoid branching" or similar.

@jww3
Copy link
Author

jww3 commented May 23, 2024

Tests seem to be failing for an unrelated reason. (In the end, all I did was add a comment.) I don't seem to have permission to schedule a retry of the tests.

Any advice would be appreciated.

/// 'a' through 'z', inclusive, or '0' through '9', inclusive.
/// </remarks>
public static bool IsAsciiLetterOrDigit(char c) => IsAsciiLetter(c) | IsBetween(c, '0', '9');
public static bool IsAsciiLetterOrDigit(char c) => IsAsciiLetter(c) | IsBetween(c, '0', '9'); // Using | here is an optimization (avoids branching).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. There are a variety of places we use | like this, which is part of its raison d'etre. Why without a comment is there an assumption that it's a mistake and why is this particular use special so as to warrant a dedicated comment?

Copy link
Author

@jww3 jww3 Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I do a quick search through this repo for " | ", I don't see many instances of | being used for boolean operations.

If there were hundreds (or even dozens) of similar usage patterns, I'd agree that it's a common-enough practice as to not warrant any clarification.

It seems, however, there are only a few.

The net effect is that the codebase exhibits these occasional inconsistencies. Mentally, when you encounter one, it gives you pause: "Wait. That doesn't look right. Why's that?"

Maybe you remember why it's that way. Maybe you have to jog your memory. Maybe you're stumped. (Most people would be stumped.)

Either way, you've wasted time contemplating it and you're likely to do it all over again when you re-encounter it at some in the future.

A simple comment here short-circuits (pun intended) that whole mental exercise.

If there are indeed only a few such instances within this repo, I'd be happy to amend this PR to cover those as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Personally, I still don't believe this is valuable. The only reason to use a | here is for the exact reason the comment would state.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that most C# programmers aren't even aware that this is valid syntax for a boolean expression.

If you put yourself in their 👞👞 ...?

@am11 am11 added area-System.Runtime needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 19, 2024
@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Runtime community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants