Skip to content

DEFAULT_ENCODE_SET doesn't percent-encode / and % #154

Closed
@cmbrandenburg

Description

@cmbrandenburg

I'm writing a CouchDB client crate. With CouchDB, unlike with many file-serving web servers, the / and % characters are often valid characters within a path component. For example, foo/bar is a valid database name, and qux%baz is a valid document name. An HTTP request line to get a document might look something like this:

GET /foo%2Fbar/qux%25baz HTTP/1.1

My crate starts with a base Url provided by the application (e.g., http://couchdb-server:1234/) and then manipulates the Url's path to add a database name, document name, etc. to form an HTTP request. The following code snippet demonstrates the path manipulation:

let app_input = "http://couchdb-server/";
let mut u = url::Url::parse(app_input).unwrap();
{
    let mut p = u.path_mut().unwrap();
    p.clear();
    p.push("foo/bar".to_string());
    p.push("qux%baz".to_string());
}
println!("database: {}", u.path().unwrap()[0]);
println!("document: {}", u.path().unwrap()[1]);
println!("URI     : {}", u);

Output I would like to see:

database: foo/bar
document: qux%baz
URI     : http://couchdb-server/foo%2Fbar/qux%25baz

Actual output:

database: foo/bar
document: qux%baz
URI     : http://couchdb-server/foo/bar/qux%baz

Fair enough. After reading through #149, I understand the url crate's goals aren't exactly aligned with my hopes. I only need to be explicit about the percent-encoding and everything will work, right? Wrong.

let app_input = "http://couchdb-server/";
let mut u = url::Url::parse(app_input).unwrap();
{
    use url::percent_encoding::{percent_encode, DEFAULT_ENCODE_SET};
    let mut p = u.path_mut().unwrap();
    p.clear();
    p.push(percent_encode("foo/bar".as_bytes(), DEFAULT_ENCODE_SET));
    p.push(percent_encode("qux%baz".as_bytes(), DEFAULT_ENCODE_SET));
}
println!("database: {}", u.path().unwrap()[0]);
println!("document: {}", u.path().unwrap()[1]);
println!("URI     : {}", u);

Output I expect:

database: foo/bar
document: qux%baz
URI     : http://couchdb-server/foo%2Fbar/qux%25baz

Actual output:

database: foo/bar
document: qux%baz
URI     : http://couchdb-server/foo/bar/qux%baz

WTF? Percent-encoding using the DEFAULT_ENCODE_SET, which is for path components, doesn't percent-encode / and % characters? This means percent-encoding and percent-decoding aren't inverses.

use url::percent_encoding::{percent_decode, percent_encode, DEFAULT_ENCODE_SET};
let a = "qux%baz";
let b = percent_encode(a.as_bytes(), DEFAULT_ENCODE_SET);
let c = String::from_utf8(percent_decode(b.as_bytes())).unwrap();
assert!(a == c);

Actual output:

thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: FromUtf8Error { bytes: [113, 117, 120, 186, 122], error: Utf8Error { valid_up_to: 3 } }', ../src/libcore/result.rs:738
Process didn't exit successfully: `target/debug/rust-scratch` (exit code: 101)

I think this violates the Principle of Least Astonishment. Percent-decoding should undo percent-encoding.

Anyway, for my CouchDB use case, the workaround is to explicitly use string methods to replace / and % characters with %2F and %25 before calling the url::percent_encode function. However, this is a tedious workaround—and it requires a couple extra allocations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions