Skip to content

Add support for multiple sitemaps #1174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsha opened this issue Nov 11, 2020 · 10 comments
Closed

Add support for multiple sitemaps #1174

jsha opened this issue Nov 11, 2020 · 10 comments
Labels
A-backend Area: Webserver backend C-enhancement Category: This is a new feature

Comments

@jsha
Copy link
Contributor

jsha commented Nov 11, 2020

Per https://support.google.com/webmasters/answer/183668?hl=en&ref_topic=4581190,

All formats limit a single sitemap to 50MB (uncompressed) and 50,000 URLs.

The current sitemap at https://docs.rs/sitemap.xml is 5.3MB (fine) and 41,000 URLs. Soon the size of the Rust ecosystem will grow to the point that additional sitemaps are needed.

@jyn514 jyn514 added A-backend Area: Webserver backend C-enhancement Category: This is a new feature labels Nov 11, 2020
@jsha
Copy link
Contributor Author

jsha commented Nov 23, 2020

I saw an announcement today that crates.io had reached 50,000 crates: https://twitter.com/rustlang/status/1331020962628767748. That made me think of this issue again. The sitemap is still only at 41,617 URLs, though. Purely out of curiosity, any idea why the discrepancy?

@jyn514
Copy link
Member

jyn514 commented Nov 24, 2020

Hmm, that sounds like the sitemap is buggy:

cratesfyi=> select count(*) from crates;
 count 
-------
 50938

@njasm
Copy link

njasm commented Nov 24, 2020

Out of those 50k, how many of them have their latest version yanked?

(I expect not have crates in the site map where the latest published version is yanked)

@jyn514
Copy link
Member

jyn514 commented Nov 24, 2020

Rather than guessing, you can just read the source ;)

"SELECT DISTINCT ON (crates.name)
crates.name,
releases.release_time
FROM crates
INNER JOIN releases ON releases.crate_id = crates.id
WHERE rustdoc_status = true",

The difference is that docs.rs doesn't include crates that failed to build in the sitemap.

@njasm
Copy link

njasm commented Nov 24, 2020

sweet! :)

@syphar
Copy link
Member

syphar commented Dec 26, 2020

I created #1222 as a potential solution, up to discussion of course.

@camelid
Copy link
Member

camelid commented Dec 28, 2020

Should this be closed since #1222 has been merged?

@jyn514
Copy link
Member

jyn514 commented Dec 28, 2020

Let's wait until #1222 is deployed.

@Nemo157
Copy link
Member

Nemo157 commented Dec 28, 2020

Is there some way to verify google's view of the sitemaps after deployment?

@syphar
Copy link
Member

syphar commented Dec 28, 2020

in google webmaster tools, yes.

Also you can ping them to directly load the sitemap and see the result, if I remember correctly.

@jyn514 jyn514 closed this as completed Dec 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend Area: Webserver backend C-enhancement Category: This is a new feature
Projects
None yet
Development

No branches or pull requests

6 participants