Skip to content

CDP Cookies #777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

CDP Cookies #777

wants to merge 13 commits into from

Conversation

sjorsdonkers
Copy link
Contributor

@sjorsdonkers sjorsdonkers commented Jun 11, 2025

Network.setCookie
Network.setCookies
deprecated Network.getAllCookies
Network.deleteCookies
Network.getCookies
Network.clearBrowserCookies
deprecated Network.canClearBrowserCookies
deprecated Page.deleteCookie

Storage.getCookies
Storage.setCookies
Storage.clearCookies

NOTE:
We currently do not support all cookie parameters, this PR only sets and uses the existing ones.
Existing cookies behavior has been modified as follows:

  • Domains and paths are percent encoded in case they were not.
  • Domains are converted to lower case
  • HttpOnly cookies are no longer passed or accepted from JS (document.cookie)

Puppeteer test: lightpanda-io/demo#51

@sjorsdonkers sjorsdonkers changed the title [WIP] CDP Network.Cookies [WIP] CDP Cookies Jun 11, 2025
@sjorsdonkers sjorsdonkers force-pushed the cookies branch 2 times, most recently from a9b9b23 to 14c07ee Compare June 13, 2025 10:55
try w.write(cookie.value);

try w.objectField("domain");
try w.write(cookie.domain); // Should we hide a leading dot?
Copy link
Contributor Author

@sjorsdonkers sjorsdonkers Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably, should. I'm not sure if it matters.
To support it we probably need to store the domain with a flag whether it came from url or was set explicitly.

@sjorsdonkers sjorsdonkers changed the title [WIP] CDP Cookies CDP Cookies Jun 13, 2025
if (self.urls) |urls| {
for (self.cookies) |*cookie| {
for (urls) |*url| {
if (cookie.appliesTo(url, false, false)) { // TBD same_site, should we compare to the pages url?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When getting cookies we can only provide a list of urls, not other configuration (or get all cookies).
We probably should not filter out cookies that are not related to the current page, also because this is not part of the Page api.

Copy link
Contributor Author

@sjorsdonkers sjorsdonkers Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also that Network.getCookies will default to the Pages url if no urls are provided.
same_site and navigation are hard coded to true now

// ignore it and use the "default-path" algorithm
if (explicit_path) |path| {
if (path.len > 0 and path[0] == '/') {
return try arena.dupe(u8, path);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make sure it ends with a /?
Same as when we extract it from the URL
Else, should we take this difference into account when comparing cookie paths

@sjorsdonkers sjorsdonkers marked this pull request as ready for review June 13, 2025 15:38
Comment on lines +92 to +94
const c_no_dot = if (std.mem.startsWith(u8, cookie.domain, ".")) cookie.domain[1..] else cookie.domain;
const d_no_dot = if (std.mem.startsWith(u8, domain_, ".")) domain_[1..] else domain_;
if (!std.mem.eql(u8, c_no_dot, d_no_dot)) return false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the other domain . prefix comment.
Is this the right way to compare domains for the delete scenario?
What about for detection duplicate cookies?

}
}
bc.session.cookie_jar.removeExpired(null);
const writer = CookieWriter{ .cookies = bc.session.cookie_jar.cookies.items };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. Storge.getCookies has the unhelpful description of "Returns all browser cookies." I don't know what "all browser cookies" means, but this behavior is different than Chrome.

If you run:

await page.goto('https://www.google.com/');
console.log(await page.cookies());    // Network.getCookies
console.log(await browser.cookies()); // Storage.getCookies

Then in Chrome, this will return 2 cookies for the first one, and 0 for the second. But with this implementation, it returns 2 cookies for both. I wonder if "brower.cookies", aka Storage.getCookies is only those cookies set directly via Storage.setCookies? Either need to figure out how this is implemented in Chrome, or to play with more permutations to figure it...I tried to see if puppeteer had an explanation about the difference, but also nothing.

Copy link
Contributor Author

@sjorsdonkers sjorsdonkers Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

browser.cookies()

{"method":"Storage.getCookies","params":{},"id":29}
0 results

context.cookies()

{"method":"Storage.getCookies","params":{"browserContextId":"F84F263F9B18E886D63394D299BF5B59"},"id":30}
2 results

The problem is that when we call browser.createBrowserContext(); we set it as the default and only context.
Chrome however already has a default context, so this will create a 2nd one.

It you replace:
const context = await browser.createBrowserContext();
With:
const context = await browser.defaultBrowserContext();
(and not close that context!)
Storage.getCookies returns 2 results as expected.

This is an issue we need to solve, but not in this PR as the implementation of using the default context is correct. It's just that the default context is incorrect.
Issue created: #783

// NOTE: The param.url can affect the default domain, (NOT path), secure, source port, and source scheme.
const uri = if (param.url) |url| std.Uri.parse(url) catch return error.InvalidParams else null;
const uri_ptr = if (uri) |*u| u else null;
const domain = try Cookie.parseDomain(a, uri_ptr, param.domain);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't matter, since it's weird user-input, but if the domain value is 'htttps://lightpanda.io' (i.e. not a domain, which is why I think we can ignore it), this handles it differently than Chrome. Chrome seems like it'll extract the domain correctly, whereas we'll treat the entire input as a domain as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not able to reproduce this one, for me Chrome sets http as the domain, which I think we can consider equally useless. (regardless of how many ts)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants