IOError when scanning file with odd chars #98

rob99 · 2012-11-07T04:33:19Z

I inadvertently pasted in to a ruby comment some text from Word which had inverted commas. When I executed CodeRay.scan_file on that file, it complained with:

IOError (Cannot run program "file" (in directory "C:\tb\port_compare"): CreateProcess error=2, The system cannot find the file specified)

...which was thrown at lib/coderay/scanner.rb:120 (method guess_encoding). Further up the stack in normalize I could see where it was branching to encode_with_encoding (as opposed to to_unix) so I commented that out to force it to use to_unix.

Then I retried and received this error:

CodeRay::Scanners::Scanner::ScanError (

***ERROR in scanner.rb:200:in `tokenize': invalid byte sequence in UTF-8 (after 0 tokens)

tokens:


current line: 55  column: 89  pos: 1673
matched: "# WTF? AND data_srce_sys_cde / id_prod_cmpnt_cde_1 are in \x93Interest Only\x94 list"  state: "Error in CodeRay::Scanners::Ruby#scan_tokens, initial state was: :initial"
bol? = false,  eos? = false

surrounding code:
"_1 are in \u0093Interest Only\u0094 list"  ~~  "\n          return :bullet_inte"


***ERROR***

...which helped my diagnose the root problem.

If would be good if there was some error handling around the IO.popen call to help diagnose, or if the call to guess_encoding was stricter (assuming it was called in error). Not sure how to do this but thought I'd log it here anyway in case someone else has the same error...

Windows XP - Notepad ++ - ANSI file

The text was updated successfully, but these errors were encountered:

korny · 2013-03-04T01:32:59Z

Is it still a problem? I tinkered around a lot in this part of the scanners…also, which Ruby version do you use?

korny · 2013-04-07T09:38:33Z

Sorry, I can't relly replicate the problem. Can you send me the problematic input file?

korny · 2013-06-10T23:23:55Z

No answer, delaying this…

rob99 · 2013-06-11T08:04:36Z

Hi Kornelius,

Sorry for my delayed response.

I no longer have the problem file. I tried to re-create it, but could not. I got a different issue, but it is much less serious.

Here is my test script. Note the inverted quotes in the comment.

require 'rubygems'
require 'coderay'

puts "hello world"   # say “hello”

File.open('out.html', 'wb') do |file|
  file.puts CodeRay.scan(IO.read(__FILE__), :ruby).div(:line_numbers => :table)
end

Output:

<snip>
puts <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">&quot;</span><span style="color:#D20">hello world</span><span style="color:#710">&quot;</span></span>   <span style="color:#777"># say ÔÇ£helloÔÇØ</span>

It does not appear to have URL encoded the quotes properly. But at least it did not crash.

The problem originally occurred when I was probably using JRuby 1.6.?. Now I am using JRuby 1.7.4. I would not be surprised if some string encoding issues were part of the original problem, and perhaps part of this problem also.

Hope that helps...

Rob

korny · 2013-06-11T13:32:30Z

The main question would be: what encoding are you using for that file? It works for me, but I'm using UTF-8 and Ruby 2.0…

CodeRay uses UTF-8. You should probably convert the input before you send it to the scanner.

rob99 · 2013-06-14T12:43:18Z

Hi,

UTF-8, Jruby 1.7.4.

May be a jruby issue?

korny · 2013-06-14T22:28:48Z

Possible, wouldn't be the first time. Can you send me the file? (murphy rubychan de) If we can produce a CodeRay-independent minimal failing test case that works on MRI, then we can file a bug report.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IOError when scanning file with odd chars #98

IOError when scanning file with odd chars #98

rob99 commented Nov 7, 2012

korny commented Mar 4, 2013

Uh oh!

korny commented Apr 7, 2013

Uh oh!

korny commented Jun 10, 2013

Uh oh!

rob99 commented Jun 11, 2013

Uh oh!

korny commented Jun 11, 2013

Uh oh!

rob99 commented Jun 14, 2013

Uh oh!

korny commented Jun 14, 2013

Uh oh!

IOError when scanning file with odd chars #98

IOError when scanning file with odd chars #98

Comments

rob99 commented Nov 7, 2012

korny commented Mar 4, 2013

Uh oh!

korny commented Apr 7, 2013

Uh oh!

korny commented Jun 10, 2013

Uh oh!

rob99 commented Jun 11, 2013

Uh oh!

korny commented Jun 11, 2013

Uh oh!

rob99 commented Jun 14, 2013

Uh oh!

korny commented Jun 14, 2013

Uh oh!