Skip to content

Convert RTF to HTML respecting original code page #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions lib/mapi/convert/note-mime.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def populate_headers
# headers, i think because one recipient was external. the only place the senders email address
# exists is in the transport headers. so its maybe not good to overwrite from.
# recipients however usually have smtp address available.
# maybe we'll do it for all addresses that are smtp? (is that equivalent to
# maybe we'll do it for all addresses that are smtp? (is that equivalent to
# sender_email_address !~ /^\//
name, email = props.sender_name, props.sender_email_address
if props.sender_addrtype == 'SMTP'
Expand Down Expand Up @@ -105,7 +105,7 @@ def populate_headers
# i have no timezone info anyway.
# in gmail, i see stuff like 15 Jan 2007 00:48:19 -0000, and it displays as 11:48.
# can also add .localtime here if desired. but that feels wrong.
headers['Date'] = [Time.iso8601(time.to_s).rfc2822] if time
headers['Date'] = [time.rfc2822] if time
end

# some very simplistic mapping between internet message headers and the
Expand Down Expand Up @@ -235,7 +235,7 @@ def to_mime
# hmmm, have to use read here. that assumes that the data isa stream.
# but if the attachment data is a string, then it won't work. possible?
data_str = if @embedded_msg
mime.headers['Content-Type'] = 'message/rfc822'
mime.headers['Content-Type'] = ['message/rfc822']
# lets try making it not base64 for now
mime.headers.delete 'Content-Transfer-Encoding'
# not filename. rather name, or something else right?
Expand Down Expand Up @@ -266,9 +266,8 @@ def populate_headers
# can employ other methods for getting a time. heres one in a similar vein to msgconvert.pl,
# ie taking the time from an ole object
time = @root.ole.dirents.map { |dirent| dirent.modify_time || dirent.create_time }.compact.sort.last
headers['Date'] = [Time.iso8601(time.to_s).rfc2822] if time
headers['Date'] = [time.rfc2822] if time
end
end
end
end

9 changes: 4 additions & 5 deletions lib/mapi/convert/note-tmail.rb
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def populate_headers
# headers, i think because one recipient was external. the only place the senders email address
# exists is in the transport headers. so its maybe not good to overwrite from.
# recipients however usually have smtp address available.
# maybe we'll do it for all addresses that are smtp? (is that equivalent to
# maybe we'll do it for all addresses that are smtp? (is that equivalent to
# sender_email_address !~ /^\//
name, email = props.sender_name, props.sender_email_address
if props.sender_addrtype == 'SMTP'
Expand Down Expand Up @@ -115,7 +115,7 @@ def populate_headers
# i have no timezone info anyway.
# in gmail, i see stuff like 15 Jan 2007 00:48:19 -0000, and it displays as 11:48.
# can also add .localtime here if desired. but that feels wrong.
headers['Date'] = [Time.iso8601(time.to_s).rfc2822] if time
headers['Date'] = [time.rfc2822] if time
end

# some very simplistic mapping between internet message headers and the
Expand Down Expand Up @@ -248,7 +248,7 @@ def to_tmail
# but if the attachment data is a string, then it won't work. possible?
data_str = if @embedded_msg
raise NotImplementedError
mime.headers['Content-Type'] = 'message/rfc822'
mime.headers['Content-Type'] = ['message/rfc822']
# lets try making it not base64 for now
mime.headers.delete 'Content-Transfer-Encoding'
# not filename. rather name, or something else right?
Expand Down Expand Up @@ -279,9 +279,8 @@ def populate_headers
# can employ other methods for getting a time. heres one in a similar vein to msgconvert.pl,
# ie taking the time from an ole object
time = @root.ole.dirents.map { |dirent| dirent.modify_time || dirent.create_time }.compact.sort.last
headers['Date'] = [Time.iso8601(time.to_s).rfc2822] if time
headers['Date'] = [time.rfc2822] if time
end
end
end
end

10 changes: 7 additions & 3 deletions lib/mapi/mime.rb
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ def initialize str, ignore_body=false
end
end

def encode(x)
x.encoding == Encoding::UTF_8 ? x : x.encode('utf-8', 'cp1252')
end

def multipart?
@content_type && @content_type =~ /^multipart/ ? true : false
end
Expand Down Expand Up @@ -97,7 +101,7 @@ def to_s opts={}
opts = {:boundary_counter => 0}.merge opts
if multipart?
boundary = Mime.make_boundary opts[:boundary_counter] += 1, self
@body = [preamble, parts.map { |part| "\r\n" + part.to_s(opts) + "\r\n" }, "--\r\n" + epilogue].
@body = [encode(preamble), parts.map { |part| "\r\n" + part.to_s(opts) + "\r\n" }, "--\r\n" + encode(epilogue)].
flatten.join("\r\n--" + boundary)
content_type, attrs = Mime.split_header @headers['Content-Type'][0]
attrs['boundary'] = boundary
Expand All @@ -106,9 +110,9 @@ def to_s opts={}

str = ''
@headers.each do |key, vals|
vals.each { |val| str << "#{key}: #{val}\r\n" }
vals.each { |val| str << "#{encode(key)}: #{encode(val)}\r\n" }
end
str << "\r\n" + @body
str << "\r\n" + encode(@body)
end

def self.split_header header
Expand Down
16 changes: 11 additions & 5 deletions lib/mapi/rtf.rb
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ def self.rtf2text str, format=:text
end
end

RTF_PREBUF =
RTF_PREBUF =
"{\\rtf1\\ansi\\mac\\deff0\\deftab720{\\fonttbl;}" \
"{\\f0\\fnil \\froman \\fswiss \\fmodern \\fscript " \
"\\fdecor MS Sans SerifSymbolArialTimes New RomanCourier" \
Expand Down Expand Up @@ -175,7 +175,7 @@ def rtfdecompr data
else # unknown magic number
raise "Unknown compression type (magic number 0x%08x)" % magic
end

# not sure if its due to a bug in the above code. doesn't seem to be
# in my tests, but sometimes there's a trailing null. we chomp it here,
# which actually makes the resultant rtf smaller than its advertised
Expand All @@ -189,7 +189,7 @@ def rtfdecompr data
#
# Returns +nil+ if it doesn't look like an rtf encapsulated rtf.
#
# Some cases that the original didn't deal with have been patched up, eg from
# Some cases that the original didn't deal with have been patched up, eg from
# this chunk, where there are tags outside of the htmlrtf ignore block.
#
# "{\\*\\htmltag116 <br />}\\htmlrtf \\line \\htmlrtf0 \\line {\\*\\htmltag84 <a href..."
Expand Down Expand Up @@ -229,8 +229,14 @@ def rtfdecompr data
def rtf2html rtf
scan = StringScanner.new rtf
# require \fromhtml. is this worth keeping? apparently you see \\fromtext if it
# was converted from plain text.
# was converted from plain text.
return nil unless rtf["\\fromhtml"]
if scan.scan_until(/\\ansicpg/)
code_page = "cp" + scan.scan(/\d+/)
scan.pos = 0
else
code_page = 'ascii'
end
html = ''
ignore_tag = nil
# skip up to the first htmltag. return nil if we don't ever find one
Expand Down Expand Up @@ -270,7 +276,7 @@ def rtf2html rtf
p :wtf
end
end
html.strip.empty? ? nil : html
html.strip.empty? ? nil : html.encode('utf-8', code_page)
end

module_function :rtf2html, :rtfdecompr
Expand Down
2 changes: 1 addition & 1 deletion lib/mapi/version.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module Mapi
VERSION = '1.5.2'
VERSION = '1.5.3'
end