Skip to content

Commit a2bf1be

Browse files
committed
βœ¨πŸ› Update FETCH BODYSTRUCTURE msg-att parser
✨ Add missing "location" extension data. This was missing from RFC2060 but part of RFC3501. It was also missing from Net::IMAP... until now! πŸ˜„ πŸ› Fix many bugs. Most importantly: * More strict about where NIL is allowed, e.g: `number`, `envelope`, and `body`. Ignoring these rare server bugs made it difficult to workaround much more common server bugs elsewhere. * BodyTypeAttachment and BodyTypeExtension won't be returned any more and the constants have been deprecated. * Better workaround for multipart parts with... zero parts. 🚧 TODO: Although this will parse *most* strange BODYSTRUCTURE msg-att found in the wild, a future PR will backtrack on parse errors and try one or more "fool-proof" algorithms that partially parse *nearly* all invalid body structures sent by buggy servers... even in pathological cases, such as when servers send the message-id as a quoted string containing unescaped quotation marks! ♻️ Add lookahead and peek methods to def_char_matchers, and peek_str?, peek_re, for matching without consuming and using MatchData. ♻️ rename case_insensitive__string to match new parser style. ♻️ add number64 aliases. (size is unenforced)
1 parent 179b102 commit a2bf1be

File tree

4 files changed

+383
-376
lines changed

4 files changed

+383
-376
lines changed

β€Žlib/net/imap/response_data.rb

Lines changed: 46 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -891,13 +891,6 @@ class ThreadMember < Struct.new(:seqno, :children)
891891
# should use BodyTypeBasic.
892892
# BodyTypeMultipart:: for <tt>multipart/*</tt> parts
893893
#
894-
# ==== Deprecated BodyStructure classes
895-
# The following classes represent invalid server responses or parser bugs:
896-
# BodyTypeExtension:: parser bug: used for <tt>message/*</tt> where
897-
# BodyTypeBasic should have been used.
898-
# BodyTypeAttachment:: server bug: some servers sometimes return the
899-
# "Content-Disposition: attachment" data where the
900-
# entire body structure for a message part is expected.
901894
module BodyStructure
902895
end
903896

@@ -914,6 +907,7 @@ class BodyTypeBasic < Struct.new(:media_type, :subtype,
914907
:param, :content_id,
915908
:description, :encoding, :size,
916909
:md5, :disposition, :language,
910+
:location,
917911
:extension)
918912
include BodyStructure
919913

@@ -1049,6 +1043,7 @@ class BodyTypeText < Struct.new(:media_type, :subtype,
10491043
:description, :encoding, :size,
10501044
:lines,
10511045
:md5, :disposition, :language,
1046+
:location,
10521047
:extension)
10531048
include BodyStructure
10541049

@@ -1094,6 +1089,7 @@ class BodyTypeMessage < Struct.new(:media_type, :subtype,
10941089
:description, :encoding, :size,
10951090
:envelope, :body, :lines,
10961091
:md5, :disposition, :language,
1092+
:location,
10971093
:extension)
10981094
include BodyStructure
10991095

@@ -1126,36 +1122,41 @@ def media_subtype
11261122
end
11271123
end
11281124

1129-
# === WARNING
1130-
# BodyTypeAttachment represents a <tt>body-fld-dsp</tt> that is
1131-
# incorrectly in a position where the IMAP4rev1 grammar expects a nested
1132-
# +body+ structure.
1125+
# BodyTypeAttachment is not used and will be removed in an upcoming release.
11331126
#
1134-
# >>>
1135-
# \IMAP body structures are parenthesized lists and assign their fields
1136-
# positionally, so missing fields change the intepretation of all
1137-
# following fields. Buggy \IMAP servers sometimes leave fields missing
1138-
# rather than empty, which inevitably confuses parsers.
1139-
# BodyTypeAttachment was an attempt to parse a common type of buggy body
1140-
# structure without crashing.
1141-
#
1142-
# Currently, when Net::IMAP::ResponseParser sees "attachment" as the first
1143-
# entry in a <tt>body-type-1part</tt>, which is where the MIME type should
1144-
# be, it uses BodyTypeAttachment to capture the rest. "attachment" is not
1145-
# a valid MIME type, but _is_ a common <tt>Content-Disposition</tt>. What
1146-
# might have happened was that buggy server could not parse the message
1147-
# (which might have been incorrectly formatted) and output a
1148-
# <tt>body-type-dsp</tt> where a Net::IMAP::ResponseParser expected to see
1149-
# a +body+.
1150-
#
1151-
# A future release will replace this, probably with a ContentDisposition
1152-
# nested inside another body structure object, maybe BodyTypeBasic, or
1153-
# perhaps a new body structure class that represents any unparsable body
1154-
# structure.
1127+
# === Bug Analysis
1128+
#
1129+
# \IMAP body structures are parenthesized lists and assign their fields
1130+
# positionally, so missing fields change the intepretation of all
1131+
# following fields. Additionally, different body types have a different
1132+
# number of required fields, followed by optional "extension" fields.
1133+
#
1134+
# BodyTypeAttachment was previously returned when a "message/rfc822" part,
1135+
# which should be sent as <tt>body-type-msg</tt> with ten required fields,
1136+
# was actually sent as a <tt>body-type-basic</tt> with _seven_ required
1137+
# fields.
1138+
#
1139+
# basic => type, subtype, param, id, desc, enc, octets, md5=nil, dsp=nil, lang=nil, loc=nil, *ext
1140+
# msg => type, subtype, param, id, desc, enc, octets, envelope, body, lines, md5=nil, ...
1141+
#
1142+
# Normally, +envelope+ and +md5+ are incompatible, but Net::IMAP leniently
1143+
# allowed buggy servers to send +NIL+ for +envelope+. As a result, when a
1144+
# server sent a <tt>message/rfc822</tt> part with +NIL+ for +md5+ and a
1145+
# non-<tt>NIL</tt> +dsp+, Net::IMAP mis-interpreted the
1146+
# <tt>Content-Disposition</tt> as if it were a strange body type. In all
1147+
# reported cases, the <tt>Content-Disposition</tt> was "attachment", so
1148+
# BodyTypeAttachment was created as the workaround.
1149+
#
1150+
# === Current behavior
1151+
#
1152+
# When interpreted strictly, +envelope+ and +md5+ are incompatible. So the
1153+
# current parsing algorithm peeks ahead after it has recieved the seventh
1154+
# body field. If the next token is not the start of an +envelope+, we assume
1155+
# the server has incorrectly sent us a <tt>body-type-basic</tt> and return
1156+
# BodyTypeBasic. As a result, what was previously BodyTypeMessage#body =>
1157+
# BodyTypeAttachment is now BodyTypeBasic#disposition => ContentDisposition.
11551158
#
11561159
class BodyTypeAttachment < Struct.new(:dsp_type, :_unused_, :param)
1157-
include BodyStructure
1158-
11591160
# *invalid for BodyTypeAttachment*
11601161
def media_type
11611162
warn(<<~WARN, uplevel: 1)
@@ -1190,11 +1191,14 @@ def multipart?
11901191
end
11911192
end
11921193

1194+
deprecate_constant :BodyTypeAttachment
1195+
11931196
# Net::IMAP::BodyTypeMultipart represents body structures of messages and
11941197
# message parts, when <tt>Content-Type</tt> is <tt>multipart/*</tt>.
11951198
class BodyTypeMultipart < Struct.new(:media_type, :subtype,
11961199
:parts,
11971200
:param, :disposition, :language,
1201+
:location,
11981202
:extension)
11991203
include BodyStructure
12001204

@@ -1265,23 +1269,24 @@ def media_subtype
12651269
end
12661270
end
12671271

1268-
# === WARNING:
1272+
# === Obsolete
1273+
# BodyTypeExtension is not used and will be removed in an upcoming release.
1274+
#
12691275
# >>>
1270-
# BodyTypeExtension is (incorrectly) used for <tt>message/*</tt> parts
1276+
# BodyTypeExtension was (incorrectly) used for <tt>message/*</tt> parts
12711277
# (besides <tt>message/rfc822</tt>, which correctly uses BodyTypeMessage).
12721278
#
1273-
# A future release will replace this class with:
1274-
# * BodyTypeMessage for <tt>message/rfc822</tt> and <tt>message/global</tt>
1275-
# * BodyTypeBasic for any other <tt>message/*</tt>
1279+
# Net::IMAP now (correctly) parses all message types (other than
1280+
# <tt>message/rfc822</tt> or <tt>message/global</tt>) as BodyTypeBasic.
12761281
class BodyTypeExtension < Struct.new(:media_type, :subtype,
12771282
:params, :content_id,
12781283
:description, :encoding, :size)
1279-
include BodyStructure
1280-
12811284
def multipart?
12821285
return false
12831286
end
12841287
end
12851288

1289+
deprecate_constant :BodyTypeExtension
1290+
12861291
end
12871292
end

0 commit comments

Comments
Β (0)