-
Notifications
You must be signed in to change notification settings - Fork 44
Clarification of GET versus POST for information retrieval #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Have you read the discussion in #250? It lead to the creation of httpwg/http-extensions#943 and httpwg/http-extensions#942 that both attempt to tackle this problem from different angles. |
I saw #202 but missed the discussion in the subsequent pull request to resolve that (#250) so thanks @asbjornu for pointing me at that. httpwg/http-extensions#942 and httpwg/http-extensions#943 both seem concerned with making payloads with bodies cacheable which is actually the opposite of the uses cases I’m talking about. In my use case, the fact these are fetched with private query params, that we don’t want to store in the URL params so they are not leaked, would also mean (in most cases) that you wouldn’t want the resulting response cached either. This is why POST works fine for this use case, apart from the ideology behind it. POST has the added benefit of being a well-known and well-supported method whereas introducing a new method like SEARCH will take time to filter through various tech stacks. That makes sense when you want to have a new method definition (explicitly make it cacheable which POST is not) so you must wait until that new scenario is supported in clients, but seems less necessary when we have an existing method that works fine, and you are just seeking clarification on whether this is a valid and acceptable use case. Of course if this is a terrible use case for POST then that should be clarified in the spec IMHO, with reasons why. Currently it seems open for debate. |
Perhaps just adding a simple sentence like this to the definition of POST would help:
|
How about
|
Not loving that phrasing to be honest. To give a real example, I work at a Health Insurance company and we have policies and claims online which we make available in an online "Member Area" portal. So the font end app knows the list of claim ids, for example, and there are more than one per customer. So these can be loaded to in a number of ways:
The first two expose the claim id (which we consider sensitive data) to the URL, browser history, web server logs....etc. The last two do not. In all cases access to the claim document is controlled via cookies, so the claim id merely provides clarification as to which claim id they want rather than access control (though obviously we check whether the claim id is part of that member's policy to prevent someone retrieving someone else's claim). Note the URLs may (or may not) represent the actual path to the document - it could be mapped to a real file, or it could be dynamically generated from the database at time of access, where there is no path or document. This is also a slightly simplified example as in reality we provide the claim date and other information as other params to reduce the likelihood of guessing paths as additional protection to the cookie.
These are not "unknown resources". I know exactly what list of resources I want to make available in a list to the user.
I'm not sure how that works for my example? I can't use Let me know if I'm misunderstanding your last sentence or if there's a better way for this example. And if it does cover this case, then the fact I (as the issue raiser) don't understand it, suggests it could be clearer. |
The spec doesn't prevent you from arranging that however you want. Neither does REST, for that matter. Leaving user-provided data exposed in URLs is a concern, but most of those exposures are true of the original request regardless of method. (People log POST too, including the parameters, and the browser history contains the full query.) What we would try to avoid is exposure within the resulting content, which means redirecting to a different resource and/or changing the bookmarkable links so that they cannot contain that information. If that's not good enough, then use POST. I'll just assume that claim ID is sensitive because it uniquely identifies the user -- sending a date of birth would be a better example. The app knows the user and can construct a unique identifier even if the claim ID is universal: just hash it with a salt stored in the user account before sending the request. Likewise, there are plenty of data structures that can translate one index to another without suffering from reordering concerns. The remaining concern would be a user (or spider) manually entering the claim ID into a dialog or query form in which you can't run javascript, and there are ways to prevent users from doing that successfully. There are plenty of other techniques that can be used to achieve the same purpose, which is why we don't make blanket claims about POST being better than a GET with parameters. But, yes, it is something to be considered when designing applications that use HTTP. |
After reading all three sections again, I think that it works best as an explanation in GET and a more nuanced warning in the security considerations. |
I think that's good to get that in there thanks @royfielding Though for completeness, perhaps should also mention it under POST section, maybe using @mnot 's suggested wording or similar? Or alternatively including as an another example use case at the beginning of the POST definition? But happy to leave it as, if you think that's overkill. |
Am sure this has been discussed before, and I'm opening a can of worms but here goes (if nothing else would be nice to have this as an issue to be easily referenceable)...
The traditional accepted use case for GET is for information retreival and POST is for uploading information to a server. This is reflected in the language of the definitions as linked in the previous sentences and does not seem to be materially changed in the new versions currently in draft.
However the sections on Disclosure of Personal Information and Disclosure of Sensitive Information in URIs recognise the privacy and security implications of putting sensitive information in the URL as Query Params and state that "Such services ought to use POST-based form submission instead." Again this language seems unchanged in the current drafts.
This seems out of sync with the definition of POST provided previously and leads to arguments as to whether information retrieval by sensitive params is an "acceptable" use of POST (particularly for those insisting on a rigid definition of REST). Note I'm not talking about authentication cookies here - but more other params which clarify the resouce required, rather than authenticate access to such a resource. For example "download document id 1234" is often an information retreival request, which is supplemented by cookies to confirm access to that document. In this case the document id may be considered sensitive (e.g. if it's a policy number, or a claim identifier) so should not be sent as a URL param which would rule out GET, unless you want to get into providing the params as HTTP Headers instead which just seems like a messy idea.
Of course you could use a GET with a body but the RFC explicitly notes why this is a bad idea: "sending a payload body on a GET request might cause some existing implementations to reject the request." and this has been strengthened in the lastest draft to: "A client SHOULD NOT generate a body in a GET request" in #202
The POST definition doesn't explicitly forbid it's use for information retreival, but all the examples suggest data is being sent to the server for storage (perhaps not the first example, but certainly the others) and there are many examples of this being debated.
While this has always been an issue, it is getting more and more important in this more privacy-aware world and with various legislation and security requiremnts and I feel the specs could be more explicit, and reflect what I imagine is a common use case for POST.
Therefore should text and/or the examples be added to clarify the position in regards to retreival via POST requests, and to resolve the apparent discrepency in regards to the Disclosure of Sensitive Information in URIs section?
The text was updated successfully, but these errors were encountered: