-
Notifications
You must be signed in to change notification settings - Fork 376
docs: Add guide for running crawler in web server #1174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a guide for running the crawler in a web server by including new FastAPI server and crawler code examples along with configuration updates.
- Updated pyproject.toml to include new file paths and disable specific error codes for the web server examples.
- Added a FastAPI server example (server.py) to illustrate how to run the crawler from a web endpoint.
- Introduced an asynchronous crawler implementation (crawler.py) with lifecycle management using an async context manager.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
pyproject.toml | Updated configuration to include new file mappings for docs examples and added mypy overrides. |
docs/guides/code_examples/running_in_web_server/server.py | Introduces a FastAPI server with endpoints for running and interacting with a crawler. |
docs/guides/code_examples/running_in_web_server/crawler.py | Adds an asynchronous crawler setup with a default request handler and lifecycle management. |
Files not reviewed (1)
- docs/guides/running_in_web_server.mdx: Language not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- `/` - The index is just giving short description of the server with example link to the second endpoint. | ||
- `/scrape` - This is the endpoint that receives a `url` parameter and returns the page title scraped from the URL | ||
|
||
To run the example server, make sure that you have installed the [fastapi[standard]](https://fastapi.tiangolo.com/#installation) and you can use the command `fastapi dev server.py` from the directory where the example code is located. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we have a separate triple-backticks (```) command here for executing the server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. How is it different in this case? It seems to be rendered in the same way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's different - right now the command is "lost" within the paragraph. When I rendered the website locally, I couldn't find it for a while. Since it's an important command, having it separated in triple backticks would make it more visible - especially when you're rushing through the docs, copying the example, and looking for a command to try it out.
from crawlee.crawlers import ParselCrawler, ParselCrawlingContext | ||
|
||
|
||
class State(TypedDict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really important, but in FastAPI, you usually use dependencies for this kind of business.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Add guide for running crawler in web server
Issues