Clarification of GET versus POST for information retrieval #277

tunetheweb · 2020-01-22T13:44:19Z

Am sure this has been discussed before, and I'm opening a can of worms but here goes (if nothing else would be nice to have this as an issue to be easily referenceable)...

The traditional accepted use case for GET is for information retreival and POST is for uploading information to a server. This is reflected in the language of the definitions as linked in the previous sentences and does not seem to be materially changed in the new versions currently in draft.

However the sections on Disclosure of Personal Information and Disclosure of Sensitive Information in URIs recognise the privacy and security implications of putting sensitive information in the URL as Query Params and state that "Such services ought to use POST-based form submission instead." Again this language seems unchanged in the current drafts.

This seems out of sync with the definition of POST provided previously and leads to arguments as to whether information retrieval by sensitive params is an "acceptable" use of POST (particularly for those insisting on a rigid definition of REST). Note I'm not talking about authentication cookies here - but more other params which clarify the resouce required, rather than authenticate access to such a resource. For example "download document id 1234" is often an information retreival request, which is supplemented by cookies to confirm access to that document. In this case the document id may be considered sensitive (e.g. if it's a policy number, or a claim identifier) so should not be sent as a URL param which would rule out GET, unless you want to get into providing the params as HTTP Headers instead which just seems like a messy idea.

Of course you could use a GET with a body but the RFC explicitly notes why this is a bad idea: "sending a payload body on a GET request might cause some existing implementations to reject the request." and this has been strengthened in the lastest draft to: "A client SHOULD NOT generate a body in a GET request" in #202

The POST definition doesn't explicitly forbid it's use for information retreival, but all the examples suggest data is being sent to the server for storage (perhaps not the first example, but certainly the others) and there are many examples of this being debated.

While this has always been an issue, it is getting more and more important in this more privacy-aware world and with various legislation and security requiremnts and I feel the specs could be more explicit, and reflect what I imagine is a common use case for POST.

Therefore should text and/or the examples be added to clarify the position in regards to retreival via POST requests, and to resolve the apparent discrepency in regards to the Disclosure of Sensitive Information in URIs section?

asbjornu · 2020-01-22T22:03:22Z

Have you read the discussion in #250? It lead to the creation of httpwg/http-extensions#943 and httpwg/http-extensions#942 that both attempt to tackle this problem from different angles.

tunetheweb · 2020-01-22T23:27:43Z

I saw #202 but missed the discussion in the subsequent pull request to resolve that (#250) so thanks @asbjornu for pointing me at that.

httpwg/http-extensions#942 and httpwg/http-extensions#943 both seem concerned with making payloads with bodies cacheable which is actually the opposite of the uses cases I’m talking about. In my use case, the fact these are fetched with private query params, that we don’t want to store in the URL params so they are not leaked, would also mean (in most cases) that you wouldn’t want the resulting response cached either. This is why POST works fine for this use case, apart from the ideology behind it.

POST has the added benefit of being a well-known and well-supported method whereas introducing a new method like SEARCH will take time to filter through various tech stacks. That makes sense when you want to have a new method definition (explicitly make it cacheable which POST is not) so you must wait until that new scenario is supported in clients, but seems less necessary when we have an existing method that works fine, and you are just seeking clarification on whether this is a valid and acceptable use case.

Of course if this is a terrible use case for POST then that should be clarified in the spec IMHO, with reasons why. Currently it seems open for debate.

mnot · 2020-08-03T08:02:15Z

Perhaps just adding a simple sentence like this to the definition of POST would help:

Note that POST is sometimes used to retrieve information from a server when it is not desirable to expose information in the URI with GET; see [ref to security considerations].

royfielding · 2020-08-22T00:32:37Z

How about

Note that POST is sometimes used for information retrieval or discovery of an unknown resource wherein potentially sensitive data might be passed to the server as query parameters. In this case, POST is often preferred because it avoids placing user-provided data in the request target, where it might later be exposed via links to the resulting information. However, this can also be achieved with GET by redirecting to a stored query (hiding the user-provided data within a new resource) instead of providing the resulting information directly.

tunetheweb · 2020-08-22T06:10:51Z

Not loving that phrasing to be honest.

To give a real example, I work at a Health Insurance company and we have policies and claims online which we make available in an online "Member Area" portal. So the font end app knows the list of claim ids, for example, and there are more than one per customer. So these can be loaded to in a number of ways:

GET /documents/claims/12345.pdf
GET /documents/claim.pdf?claimId=12345
POST /documents/claim/ (with the claim id provided as an param).
GET /document/claim/ (with the claim id provided as a custom HTTP Header).

The first two expose the claim id (which we consider sensitive data) to the URL, browser history, web server logs....etc. The last two do not.

In all cases access to the claim document is controlled via cookies, so the claim id merely provides clarification as to which claim id they want rather than access control (though obviously we check whether the claim id is part of that member's policy to prevent someone retrieving someone else's claim).

Note the URLs may (or may not) represent the actual path to the document - it could be mapped to a real file, or it could be dynamically generated from the database at time of access, where there is no path or document. This is also a slightly simplified example as in reality we provide the claim date and other information as other params to reduce the likelihood of guessing paths as additional protection to the cookie.

Note that POST is sometimes used for information retrieval or discovery of an unknown resource

These are not "unknown resources". I know exactly what list of resources I want to make available in a list to the user.

However, this can also be achieved with GET by redirecting to a stored query (hiding the user-provided data within a new resource) instead of providing the resulting information directly.

I'm not sure how that works for my example? I can't use /document/claim because I have many claims per person. I could use /document/claim/1, /document/claim/2...etc. to obfuscate the real claim id number, but that just leads to messiness when claims are re-ordered plus the fact you have 99 claims is something I'd consider sensitive information as it shows you are frequently ill. The sensitivity of these params are even more important when you consider the other params I have not shown in this simplified example.

Let me know if I'm misunderstanding your last sentence or if there's a better way for this example. And if it does cover this case, then the fact I (as the issue raiser) don't understand it, suggests it could be clearer.

royfielding · 2020-08-22T16:09:14Z

The spec doesn't prevent you from arranging that however you want. Neither does REST, for that matter.

Leaving user-provided data exposed in URLs is a concern, but most of those exposures are true of the original request regardless of method. (People log POST too, including the parameters, and the browser history contains the full query.) What we would try to avoid is exposure within the resulting content, which means redirecting to a different resource and/or changing the bookmarkable links so that they cannot contain that information. If that's not good enough, then use POST.

I'll just assume that claim ID is sensitive because it uniquely identifies the user -- sending a date of birth would be a better example. The app knows the user and can construct a unique identifier even if the claim ID is universal: just hash it with a salt stored in the user account before sending the request. Likewise, there are plenty of data structures that can translate one index to another without suffering from reordering concerns. The remaining concern would be a user (or spider) manually entering the claim ID into a dialog or query form in which you can't run javascript, and there are ways to prevent users from doing that successfully.

There are plenty of other techniques that can be used to achieve the same purpose, which is why we don't make blanket claims about POST being better than a GET with parameters. But, yes, it is something to be considered when designing applications that use HTTP.

royfielding · 2020-08-25T17:05:39Z

After reading all three sections again, I think that it works best as an explanation in GET and a more nuanced warning in the security considerations.

tunetheweb · 2020-08-26T09:47:07Z

I think that's good to get that in there thanks @royfielding

Though for completeness, perhaps should also mention it under POST section, maybe using @mnot 's suggested wording or similar? Or alternatively including as an another example use case at the beginning of the POST definition? But happy to leave it as, if you think that's overkill.

mnot assigned royfielding Feb 2, 2020

mnot added the semantics label Feb 2, 2020

reschke self-assigned this Aug 20, 2020

mnot assigned mnot and unassigned reschke and royfielding Aug 20, 2020

mnot added this to the pre-reorg milestone Aug 20, 2020

mnot assigned royfielding and unassigned mnot Aug 24, 2020

royfielding mentioned this issue Aug 25, 2020

Explain why GET-based queries might transmit sensitive data, suggest … #441

Merged

royfielding added the has-proposal label Aug 25, 2020

royfielding closed this as completed in #441 Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification of GET versus POST for information retrieval #277

Clarification of GET versus POST for information retrieval #277

tunetheweb commented Jan 22, 2020 •

edited

Loading

asbjornu commented Jan 22, 2020

tunetheweb commented Jan 22, 2020 •

edited

Loading

mnot commented Aug 3, 2020

royfielding commented Aug 22, 2020

tunetheweb commented Aug 22, 2020 •

edited

Loading

royfielding commented Aug 22, 2020

royfielding commented Aug 25, 2020

tunetheweb commented Aug 26, 2020

Clarification of GET versus POST for information retrieval #277

Clarification of GET versus POST for information retrieval #277

Comments

tunetheweb commented Jan 22, 2020 • edited Loading

asbjornu commented Jan 22, 2020

tunetheweb commented Jan 22, 2020 • edited Loading

mnot commented Aug 3, 2020

royfielding commented Aug 22, 2020

tunetheweb commented Aug 22, 2020 • edited Loading

royfielding commented Aug 22, 2020

royfielding commented Aug 25, 2020

tunetheweb commented Aug 26, 2020

tunetheweb commented Jan 22, 2020 •

edited

Loading

tunetheweb commented Jan 22, 2020 •

edited

Loading

tunetheweb commented Aug 22, 2020 •

edited

Loading