Skip to content

Conversation

byroot
Copy link
Member

@byroot byroot commented Feb 26, 2025

In some cases you may want to escape a string in a different way than the default behavior.

For instance, if you are trying to make some JSON save to include in a <script> tag, you may want to escape less, and using JavaScript codepoints:

>> CGI.escapeHTML('Hello </script>', ">" => '\u003e', "<" => '\u003c', "&" => '\u0026')
=> "Hello \\u003c/script\\u003e"

Of course you can always use gsub for that, but CGI.escapeHTML being specialized is able to be very significantly faster:

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    82.135k i/100ms
          escapeHTML   221.405k i/100ms
Calculating -------------------------------------
                gsub    821.890k (± 2.2%) i/s    (1.22 μs/i) -      4.189M in   5.099152s
          escapeHTML      2.330M (± 0.5%) i/s  (429.22 ns/i) -     11.734M in   5.036770s

Comparison:
          escapeHTML:  2329816.5 i/s
                gsub:   821889.7 i/s - 2.83x  slower

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    36.235k i/100ms
          escapeHTML   171.347k i/100ms
Calculating -------------------------------------
                gsub    359.528k (± 1.5%) i/s    (2.78 μs/i) -      1.812M in   5.040422s
          escapeHTML      1.812M (± 0.7%) i/s  (551.84 ns/i) -      9.081M in   5.011762s

Comparison:
          escapeHTML:  1812105.3 i/s
                gsub:   359527.5 i/s - 5.04x  slower
require "benchmark/ips"
require "cgi"

ESCAPE = {
  ">" => '\u003e', "<" => '\u003c', "&" => '\u0026',
}

ESCAPE_B = {
  ">".b => '\u003e'.b,
  "<".b => '\u003c'.b,
  "&".b => '\u0026'.b,
}
ESCAPE_REGEX = Regexp.union(ESCAPE_B.keys)

str = ("a" * 1024).freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end

str = (("a" * 1023) + "<").freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end

NB: I haven't implemented the Java version, but can do it if there is interest in this feature.

In some cases you may want to escape a string in a different way than
the default behavior.

For instance, if you are trying to make some JSON save to include
in a `<script>` tag, you may want to escape less, and using JavaScript
codepoints:

```ruby
>> CGI.escapeHTML('Hello </script>', ">" => '\u003e', "<" => '\u003c', "&" => '\u0026')
=> "Hello \\u003c/script\\u003e"
```

Of course you can always use `gsub` for that, but `CGI.escapeHTML` being
specialized is able to be very significantly faster:

```
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    82.135k i/100ms
          escapeHTML   221.405k i/100ms
Calculating -------------------------------------
                gsub    821.890k (± 2.2%) i/s    (1.22 μs/i) -      4.189M in   5.099152s
          escapeHTML      2.330M (± 0.5%) i/s  (429.22 ns/i) -     11.734M in   5.036770s

Comparison:
          escapeHTML:  2329816.5 i/s
                gsub:   821889.7 i/s - 2.83x  slower

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    36.235k i/100ms
          escapeHTML   171.347k i/100ms
Calculating -------------------------------------
                gsub    359.528k (± 1.5%) i/s    (2.78 μs/i) -      1.812M in   5.040422s
          escapeHTML      1.812M (± 0.7%) i/s  (551.84 ns/i) -      9.081M in   5.011762s

Comparison:
          escapeHTML:  1812105.3 i/s
                gsub:   359527.5 i/s - 5.04x  slower
```

```ruby

require "benchmark/ips"
require "cgi"

ESCAPE = {
  ">" => '\u003e', "<" => '\u003c', "&" => '\u0026',
}

ESCAPE_B = {
  ">".b => '\u003e'.b,
  "<".b => '\u003c'.b,
  "&".b => '\u0026'.b,
}
ESCAPE_REGEX = Regexp.union(ESCAPE_B.keys)

str = ("a" * 1024).freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end

str = (("a" * 1023) + "<").freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant