Skip to content

support encoded filename in Content-Disposition for HTTP in cgi.FieldStorage #67622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MyroslavOpyr mannequin opened this issue Feb 10, 2015 · 12 comments
Closed

support encoded filename in Content-Disposition for HTTP in cgi.FieldStorage #67622

MyroslavOpyr mannequin opened this issue Feb 10, 2015 · 12 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life stdlib Python modules in the Lib dir topic-email topic-unicode type-feature A feature request or enhancement

Comments

@MyroslavOpyr
Copy link
Mannequin

MyroslavOpyr mannequin commented Feb 10, 2015

BPO 23434
Nosy @warsaw, @ezio-melotti, @bobince, @bitdancer, @vadmium, @serhiy-storchaka, @demianbrecht, @pawciobiel
PRs
  • bpo-33027: Fix cgi.FieldStorage to handle Content-Disposition filename* with encoding according to RFC5987 #6027
  • Files
  • test_cgi.py-v2.7.5-rfc6266_filename.patch: test revealing the issue
  • cgi.py-v2.7.5-rfc6266_filename.patch: rfc6266 powered fix
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-02-10.14:54:47.750>
    labels = ['3.7', '3.8', 'expert-email', 'type-feature', 'library', 'expert-unicode']
    title = 'support encoded filename in Content-Disposition for HTTP in cgi.FieldStorage'
    updated_at = <Date 2020-01-08.10:45:02.342>
    user = 'https://bugs.python.org/MyroslavOpyr'

    bugs.python.org fields:

    activity = <Date 2020-01-08.10:45:02.342>
    actor = 'aclover'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'Unicode', 'email']
    creation = <Date 2015-02-10.14:54:47.750>
    creator = 'Myroslav.Opyr'
    dependencies = []
    files = ['38092', '38094']
    hgrepos = []
    issue_num = 23434
    keywords = ['patch']
    message_count = 11.0
    messages = ['235688', '235732', '235734', '235909', '236101', '236365', '314265', '314297', '359224', '359533', '359577']
    nosy_count = 10.0
    nosy_names = ['barry', 'ezio.melotti', 'aclover', 'r.david.murray', 'Myroslav.Opyr', 'martin.panter', 'piotr.dobrogost', 'serhiy.storchaka', 'demian.brecht', 'pawciobiel']
    pr_nums = ['6027']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue23434'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8']

    @MyroslavOpyr
    Copy link
    Mannequin Author

    MyroslavOpyr mannequin commented Feb 10, 2015

    cgi.FieldStorage has problems parsing the multipart/form-data request with file fields with non-latin filenames. It drops the filename parameter formatted according to RFC6266 [1] (most modern browsers do). There is already python implementation for that RFC in rfc6266 module [2].

    Ref:
    [1] https://tools.ietf.org/html/rfc6266
    [2] https://pypi.python.org/pypi/rfc6266

    @MyroslavOpyr MyroslavOpyr mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 10, 2015
    @vadmium vadmium changed the title RFC6266 support RFC6266 support (Content-Disposition for HTTP) Feb 10, 2015
    @MyroslavOpyr
    Copy link
    Mannequin Author

    MyroslavOpyr mannequin commented Feb 11, 2015

    In test_cgi.py-v2.7.5-rfc6266_filename.patch there is a patch to test_cgi.py (Python 2.7.5) that reveals the issue.

    @MyroslavOpyr
    Copy link
    Mannequin Author

    MyroslavOpyr mannequin commented Feb 11, 2015

    As a proof of concept there is fix for the issue powered by rfc6266 library[1]. See cgi.py-v2.7.5-rfc6266_filename.patch

    References:
    [1] https://pypi.python.org/pypi/rfc6266

    @bitdancer
    Copy link
    Member

    Since that library is not part of the stdlib, this is not an appropriate patch for CPython.

    Note that this issue is also relevant to the email library, which intends to support RFC2616 header parsing/generation, and therefore should also be enhanced to support RFC 6266.

    @MyroslavOpyr
    Copy link
    Mannequin Author

    MyroslavOpyr mannequin commented Feb 16, 2015

    Hi David,

    According to "Test Cases for HTTP Content-Disposition header field" overview [1], this is not about email headers, but only about HTTP headers. It look like email standards and http standars are different in this area.

    I do know that my patch is poor. It is just proof of concept, to show that there is an issue in stdlib and one of the possible fast patches to get functionality needed.

    Regards,

    Myroslav

    Ref:
    [1] http://greenbytes.de/tech/tc2231/

    @bitdancer
    Copy link
    Member

    I know it is called the 'email' package, but the intent is to support http header parsing as well (cf email.policy.HTTP).

    @pawciobiel
    Copy link
    Mannequin

    pawciobiel mannequin commented Mar 22, 2018

    I didn't find this and created a duplicate
    https://bugs.python.org/issue33027

    I've added similar/updated changes
    #6027

    @r.david.murray wouldn't it be wise to do one step at a time rather than implementing full support for RFC6266? Please tell exactly what is your expectations so I can fix the patch if it needs to be fixed.

    This is also related to RFC5987
    https://tools.ietf.org/html/rfc5987
    https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition

    @pawciobiel pawciobiel mannequin added topic-unicode 3.7 (EOL) end of life 3.8 (EOL) end of life labels Mar 22, 2018
    @pawciobiel pawciobiel mannequin changed the title RFC6266 support (Content-Disposition for HTTP) support encoded filename in Content-Disposition for HTTP in cgi.FieldStorage Mar 22, 2018
    @bitdancer
    Copy link
    Member

    I haven't read the http rfcs, but my understanding is that they follow the MIME standards, and the email library already has code to do proper parsing and decoding of encoded filenames in Content-Disposition headers. It should be possible to call that code for this use case (the http libraries already depend on the email libraries, although I'm not sure if cgi itself does currently). There may be additional considerations involved in fully supporting the http RFCs, but to determine that someone will need to read both and understand them, which is not a small undertaking :)

    In the meantime, I'm pretty sure that using the existing mime header parsing code in the email library (see email.headerregistry) will provide better parsing than the only-handles-simple-cases heuristic in your PR. Granted, I don't think you have to deal with multi-part headers in http, but I vaguely remember that there are other subtleties not handled by a simple split on '.

    @bobince
    Copy link
    Mannequin

    bobince mannequin commented Jan 2, 2020

    HTTP generally isn't an RFC 822-family standard. Its headers look a lot like it, but they have their own defined syntax that differs in niggling little details. Using mail parsing code for HTTP isn't usually the right thing.

    HTTP has always used its own syntax definitions for the headers on the main request/response entities, but it has traditionally partially deferred to RFC 822-family specs for the definitions of structured entity bodies. This is moot, however, as the reality of what browsers support has rarely coincided with those specs.

    Nowadays HTML5.2 explicitly defers to RFC 7578 for definition of multipart/form-data headers. (This RFC is a replacement for the vague and broken RFC 2388.) As is to be expected for an HTML5-related spec, RFC 7578 shrugs and documents existing browser behaviour [section 4.2]:

    • some browsers do UTF-8
    • some browsers do data mangling (IE's %-encoding sadness)
    • some browsers might do something else

    but it explicitly rules out the solution proposed here:

    "The encoding method described in [RFC5987], which would add a 'filename*' parameter to the Content-Disposition header field, MUST NOT be used."

    The introductions of both RFC 5987 and RFC 6266 explicitly exclude multipart/form-data headers from their remit.

    So in summary:

    • we shouldn't do anything
    • the situation with submitted filenames will continue to be broken for everyone indefinitely

    @bitdancer
    Copy link
    Member

    Are you saying there is no (http) RFC compliant way to fix this, or no way to fix it with the email library parsers? If the latter, the library is pretty flexible and for internal stdlib use it would probably be permissible to directly call methods in the internal parsing module, if those would be useful.

    I haven't re-read the issue to reload my brain, so this question may be off point (except for the first clause of the question).

    @bobince
    Copy link
    Mannequin

    bobince mannequin commented Jan 8, 2020

    Are you saying there is no (http) RFC compliant way to fix this

    Sadly, yes.

    And though RFCs aren't always a fair representation of real-world use, RFC 7578 is informative as well as normative: at present nothing produces "filename*=" in multipart/form-data.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @hugovk
    Copy link
    Member

    hugovk commented May 9, 2023

    Closing, because the cgi module was deprecated in Python 3.11 and will be removed in 3.13:

    It has been moved to a separate package on PyPI, maintained by the community:

    @hugovk hugovk closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life stdlib Python modules in the Lib dir topic-email topic-unicode type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants