Description
I'm writing a CouchDB client crate. With CouchDB, unlike with many file-serving web servers, the /
and %
characters are often valid characters within a path component. For example, foo/bar
is a valid database name, and qux%baz
is a valid document name. An HTTP request line to get a document might look something like this:
GET /foo%2Fbar/qux%25baz HTTP/1.1
My crate starts with a base Url
provided by the application (e.g., http://couchdb-server:1234/
) and then manipulates the Url
's path to add a database name, document name, etc. to form an HTTP request. The following code snippet demonstrates the path manipulation:
let app_input = "http://couchdb-server/";
let mut u = url::Url::parse(app_input).unwrap();
{
let mut p = u.path_mut().unwrap();
p.clear();
p.push("foo/bar".to_string());
p.push("qux%baz".to_string());
}
println!("database: {}", u.path().unwrap()[0]);
println!("document: {}", u.path().unwrap()[1]);
println!("URI : {}", u);
Output I would like to see:
database: foo/bar
document: qux%baz
URI : http://couchdb-server/foo%2Fbar/qux%25baz
Actual output:
database: foo/bar
document: qux%baz
URI : http://couchdb-server/foo/bar/qux%baz
Fair enough. After reading through #149, I understand the url
crate's goals aren't exactly aligned with my hopes. I only need to be explicit about the percent-encoding and everything will work, right? Wrong.
let app_input = "http://couchdb-server/";
let mut u = url::Url::parse(app_input).unwrap();
{
use url::percent_encoding::{percent_encode, DEFAULT_ENCODE_SET};
let mut p = u.path_mut().unwrap();
p.clear();
p.push(percent_encode("foo/bar".as_bytes(), DEFAULT_ENCODE_SET));
p.push(percent_encode("qux%baz".as_bytes(), DEFAULT_ENCODE_SET));
}
println!("database: {}", u.path().unwrap()[0]);
println!("document: {}", u.path().unwrap()[1]);
println!("URI : {}", u);
Output I expect:
database: foo/bar
document: qux%baz
URI : http://couchdb-server/foo%2Fbar/qux%25baz
Actual output:
database: foo/bar
document: qux%baz
URI : http://couchdb-server/foo/bar/qux%baz
WTF? Percent-encoding using the DEFAULT_ENCODE_SET
, which is for path components, doesn't percent-encode /
and %
characters? This means percent-encoding and percent-decoding aren't inverses.
use url::percent_encoding::{percent_decode, percent_encode, DEFAULT_ENCODE_SET};
let a = "qux%baz";
let b = percent_encode(a.as_bytes(), DEFAULT_ENCODE_SET);
let c = String::from_utf8(percent_decode(b.as_bytes())).unwrap();
assert!(a == c);
Actual output:
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: FromUtf8Error { bytes: [113, 117, 120, 186, 122], error: Utf8Error { valid_up_to: 3 } }', ../src/libcore/result.rs:738
Process didn't exit successfully: `target/debug/rust-scratch` (exit code: 101)
I think this violates the Principle of Least Astonishment. Percent-decoding should undo percent-encoding.
Anyway, for my CouchDB use case, the workaround is to explicitly use string methods to replace /
and %
characters with %2F
and %25
before calling the url::percent_encode
function. However, this is a tedious workaround—and it requires a couple extra allocations.