Skip to content

Feat add searchscraper #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ The documentation will be available at `http://localhost:3000`.
├── introduction.mdx # Main introduction page
├── services/ # Core services documentation
│ ├── smartscraper.mdx # SmartScraper service
│ ├── localscraper.mdx # LocalScraper service
│ ├── searchscraper.mdx # SearchScraper service
│ ├── markdownify.mdx # Markdownify service
│ └── extensions/ # Browser extensions
│ └── firefox.mdx # Firefox extension
Expand Down
13 changes: 0 additions & 13 deletions api-reference/endpoint/localscraper/get-status.mdx

This file was deleted.

48 changes: 0 additions & 48 deletions api-reference/endpoint/localscraper/start.mdx

This file was deleted.

93 changes: 93 additions & 0 deletions api-reference/endpoint/searchscraper/get-status.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
title: 'Get SearchScraper Status'
api: 'GET /v1/searchscraper/{request_id}'
description: 'Get the status and results of a previous search request'
---

## Path Parameters

<ParamField path="request_id" type="string" required>
The unique identifier of the search request to retrieve.

Example: "123e4567-e89b-12d3-a456-426614174000"
</ParamField>

## Response

<ResponseField name="request_id" type="string">
The unique identifier of the search request.
</ResponseField>

<ResponseField name="status" type="string">
Status of the request. One of: "queued", "processing", "completed", "failed"
</ResponseField>

<ResponseField name="user_prompt" type="string">
The original search query that was submitted.
</ResponseField>

<ResponseField name="result" type="object">
The search results. If an output_schema was provided in the original request, this will be structured according to that schema.
</ResponseField>

<ResponseField name="reference_urls" type="array">
List of URLs that were used as references for the answer.
</ResponseField>

<ResponseField name="error" type="string">
Error message if the request failed. Empty string if successful.
</ResponseField>

## Example Request

```bash
curl 'https://api.scrapegraphai.com/v1/searchscraper/123e4567-e89b-12d3-a456-426614174000' \
-H 'SGAI-APIKEY: YOUR_API_KEY'
```

## Example Response

```json
{
"request_id": "123e4567-e89b-12d3-a456-426614174000",
"status": "completed",
"user_prompt": "What is the latest version of Python and what are its main features?",
"result": {
"version": "3.12",
"release_date": "October 2, 2023",
"major_features": [
"Improved error messages",
"Per-interpreter GIL",
"Support for the Linux perf profiler",
"Faster startup time"
]
},
"reference_urls": [
"https://www.python.org/downloads/",
"https://docs.python.org/3.12/whatsnew/3.12.html"
],
"error": ""
}
```

## Error Responses

<ResponseField name="400" type="object">
Returned when the request_id is not a valid UUID.

```json
{
"error": "request_id must be a valid UUID"
}
```
</ResponseField>

<ResponseField name="404" type="object">
Returned when the request_id is not found.

```json
{
"error": "Request not found"
}
```
</ResponseField>
111 changes: 111 additions & 0 deletions api-reference/endpoint/searchscraper/start.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: 'Start SearchScraper'
api: 'POST /v1/searchscraper'
description: 'Start a new AI-powered web search request'
---

## Request Body

<ParamField body="user_prompt" type="string" required>
The search query or question you want to ask. This should be a clear and specific prompt that will guide the AI in finding and extracting relevant information.

Example: "What is the latest version of Python and what are its main features?"
</ParamField>

<ParamField body="headers" type="object">
Optional headers to customize the search behavior. This can include user agent, cookies, or other HTTP headers.

Example:
```json
{
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Cookie": "cookie1=value1; cookie2=value2"
}
```
</ParamField>

<ParamField body="output_schema" type="object">
Optional schema to structure the output. If provided, the AI will attempt to format the results according to this schema.

Example:
```json
{
"properties": {
"version": {"type": "string"},
"release_date": {"type": "string"},
"major_features": {"type": "array", "items": {"type": "string"}}
},
"required": ["version", "release_date", "major_features"]
}
```
</ParamField>

## Response

<ResponseField name="request_id" type="string">
Unique identifier for the search request. Use this ID to check the status and retrieve results.
</ResponseField>

<ResponseField name="status" type="string">
Status of the request. One of: "queued", "processing", "completed", "failed"
</ResponseField>

<ResponseField name="user_prompt" type="string">
The original search query that was submitted.
</ResponseField>

<ResponseField name="result" type="object">
The search results. If an output_schema was provided, this will be structured according to that schema.
</ResponseField>

<ResponseField name="reference_urls" type="array">
List of URLs that were used as references for the answer.
</ResponseField>

<ResponseField name="error" type="string">
Error message if the request failed. Empty string if successful.
</ResponseField>

## Example Request

```bash
curl -X POST 'https://api.scrapegraphai.com/v1/searchscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"user_prompt": "What is the latest version of Python and what are its main features?",
"output_schema": {
"properties": {
"version": {"type": "string"},
"release_date": {"type": "string"},
"major_features": {"type": "array", "items": {"type": "string"}}
},
"required": ["version", "release_date", "major_features"]
}
}'
```

## Example Response

```json
{
"request_id": "123e4567-e89b-12d3-a456-426614174000",
"status": "completed",
"user_prompt": "What is the latest version of Python and what are its main features?",
"result": {
"version": "3.12",
"release_date": "October 2, 2023",
"major_features": [
"Improved error messages",
"Per-interpreter GIL",
"Support for the Linux perf profiler",
"Faster startup time"
]
},
"reference_urls": [
"https://www.python.org/downloads/",
"https://docs.python.org/3.12/whatsnew/3.12.html"
],
"error": ""
}
```
6 changes: 3 additions & 3 deletions api-reference/endpoint/user/get-credits.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ description: 'Get the remaining credits and total credits used for your account.
---

This endpoint allows you to check your account's credit balance and usage. Each API request consumes a different number of credits:
- Markdownify: 2 credits per webpage
- SmartScraper: 5 credits per webpage
- LocalScraper: 10 credits per webpage
- Markdownify: 2 credits per request
- SmartScraper: 10 credits per request
- SearchScraper: 30 credits per request

The response shows:
- `remaining_credits`: Number of credits available for use
Expand Down
2 changes: 1 addition & 1 deletion api-reference/endpoint/user/submit-feedback.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ openapi: 'POST /v1/feedback'
description: 'Submit feedback for a specific request with rating and optional comments.'
---

This endpoint allows you to submit feedback for any request you've made using our services (SmartScraper, LocalScraper, or Markdownify). Your feedback helps us improve our services.
This endpoint allows you to submit feedback for any request you've made using our services (SmartScraper, SearchScraper, or Markdownify). Your feedback helps us improve our services.

### Rating System
- Rating scale: 0-5 stars
Expand Down
2 changes: 1 addition & 1 deletion api-reference/errors.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Indicates that the request was malformed or invalid.
"error": "Invalid HTML content"
}
```
Applies to LocalScraper when the provided HTML is invalid.
Applies to SmartScraper when the provided HTML is invalid.
</Accordion>
</AccordionGroup>

Expand Down
6 changes: 3 additions & 3 deletions api-reference/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: 'Complete reference for the ScrapeGraphAI REST API'

## Overview

The ScrapeGraphAI API provides powerful endpoints for AI-powered web scraping and content extraction. Our RESTful API allows you to extract structured data from any website, process local HTML content, and convert web pages to clean markdown.
The ScrapeGraphAI API provides powerful endpoints for AI-powered web scraping and content extraction. Our RESTful API allows you to extract structured data from any website, perform AI-powered web searches, and convert web pages to clean markdown.

## Authentication

Expand All @@ -31,8 +31,8 @@ https://api.scrapegraphai.com/v1
<Card title="SmartScraper" icon="robot" href="/api-reference/endpoint/smartscraper/start">
Extract structured data from any website using AI
</Card>
<Card title="LocalScraper" icon="file-code" href="/api-reference/endpoint/localscraper/start">
Process local HTML content with AI extraction
<Card title="SearchScraper" icon="magnifying-glass" href="/api-reference/endpoint/searchscraper/start">
Perform AI-powered web searches with structured results
</Card>
<Card title="Markdownify" icon="markdown" href="/api-reference/endpoint/markdownify/start">
Convert web content to clean markdown
Expand Down
Loading