diff --git a/services/markdownify.mdx b/services/markdownify.mdx
index ccd9851..7c63272 100644
--- a/services/markdownify.mdx
+++ b/services/markdownify.mdx
@@ -153,7 +153,14 @@ try {
```
```bash cURL
-// TODO
+curl -X 'POST' \
+ 'https://api.scrapegraphai.com/v1/markdownify' \
+ -H 'accept: application/json' \
+ -H 'SGAI-APIKEY: sgai-********************' \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "website_url": "https://example.com"
+}'
```
@@ -168,7 +175,7 @@ try {
### Async Support
-For applications requiring asynchronous execution, Markdownify provides async support through the `AsyncClient`:
+For applications requiring asynchronous execution, Markdownify provides async support through the `AsyncClient`. Here's a basic example:
```python
from scrapegraph_py import AsyncClient
@@ -185,6 +192,51 @@ async def main():
asyncio.run(main())
```
+For more advanced concurrent processing, you can use the following example:
+
+```python
+import asyncio
+from scrapegraph_py import AsyncClient
+from scrapegraph_py.logger import sgai_logger
+
+sgai_logger.set_logging(level="INFO")
+
+async def main():
+ # Initialize async client
+ sgai_client = AsyncClient(api_key="your-api-key-here")
+
+ # Concurrent markdownify requests
+ urls = [
+ "https://scrapegraphai.com/",
+ "https://github.com/ScrapeGraphAI/Scrapegraph-ai",
+ ]
+
+ tasks = [sgai_client.markdownify(website_url=url) for url in urls]
+
+ # Execute requests concurrently
+ responses = await asyncio.gather(*tasks, return_exceptions=True)
+
+ # Process results
+ for i, response in enumerate(responses):
+ if isinstance(response, Exception):
+ print(f"\nError for {urls[i]}: {response}")
+ else:
+ print(f"\nPage {i+1} Markdown:")
+ print(f"URL: {urls[i]}")
+ print(f"Result: {response['result']}")
+
+ await sgai_client.close()
+
+if __name__ == "__main__":
+ asyncio.run(main())
+```
+
+This advanced example demonstrates:
+- Concurrent processing of multiple URLs
+- Error handling for failed requests
+- Proper client cleanup
+- Logging configuration
+
## Integration Options
### Official SDKs
diff --git a/services/searchscraper.mdx b/services/searchscraper.mdx
index a6e871d..459c9fc 100644
--- a/services/searchscraper.mdx
+++ b/services/searchscraper.mdx
@@ -218,7 +218,14 @@ try {
```
```bash cURL
-// TODO
+curl -X 'POST' \
+ 'https://api.scrapegraphai.com/v1/searchscraper' \
+ -H 'accept: application/json' \
+ -H 'SGAI-APIKEY: sgai-********************' \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "user_prompt": "Search for information"
+}'
```
@@ -291,31 +298,238 @@ try {
+### Advanced Schema Usage
+
+The schema system in SearchScraper is a powerful way to ensure you get exactly the data structure you need. Here are some advanced techniques for using schemas effectively:
+
+#### Nested Schemas
+
+You can create complex nested structures to capture hierarchical data:
+
+
+
+```python Python
+from pydantic import BaseModel, Field
+from typing import List, Optional
+
+class Author(BaseModel):
+ name: str = Field(description="Author's full name")
+ bio: Optional[str] = Field(description="Author's biography")
+ expertise: List[str] = Field(description="Areas of expertise")
+
+class Article(BaseModel):
+ title: str = Field(description="Article title")
+ content: str = Field(description="Main article content")
+ author: Author = Field(description="Article author information")
+ publication_date: str = Field(description="Date of publication")
+ tags: List[str] = Field(description="Article tags or categories")
+
+response = client.searchscraper(
+ user_prompt="Find the latest AI research articles",
+ output_schema=Article
+)
+```
+
+```typescript JavaScript
+import { z } from 'zod';
+
+const Author = z.object({
+ name: z.string().describe("Author's full name"),
+ bio: z.string().optional().describe("Author's biography"),
+ expertise: z.array(z.string()).describe("Areas of expertise")
+});
+
+const Article = z.object({
+ title: z.string().describe("Article title"),
+ content: z.string().describe("Main article content"),
+ author: Author.describe("Article author information"),
+ publicationDate: z.string().describe("Date of publication"),
+ tags: z.array(z.string()).describe("Article tags or categories")
+});
+
+const response = await searchScraper(apiKey, prompt, Article);
+```
+
+
+
+#### Schema Validation Rules
+
+Enhance data quality by adding validation rules to your schema:
+
+
+
+```python Python
+from pydantic import BaseModel, Field, validator
+from typing import List
+from datetime import datetime
+
+class ProductInfo(BaseModel):
+ name: str = Field(description="Product name")
+ price: float = Field(description="Product price", gt=0)
+ currency: str = Field(description="Currency code", max_length=3)
+ release_date: str = Field(description="Product release date")
+
+ @validator('currency')
+ def validate_currency(cls, v):
+ if len(v) != 3 or not v.isupper():
+ raise ValueError('Currency must be a 3-letter uppercase code')
+ return v
+
+ @validator('release_date')
+ def validate_date(cls, v):
+ try:
+ datetime.strptime(v, '%Y-%m-%d')
+ return v
+ except ValueError:
+ raise ValueError('Date must be in YYYY-MM-DD format')
+```
+
+```typescript JavaScript
+import { z } from 'zod';
+
+const ProductInfo = z.object({
+ name: z.string().min(1).describe("Product name"),
+ price: z.number().positive().describe("Product price"),
+ currency: z.string().length(3).toUpperCase()
+ .describe("Currency code"),
+ releaseDate: z.string().regex(/^\d{4}-\d{2}-\d{2}$/)
+ .describe("Product release date")
+});
+```
+
+
+
+### Quality Improvement Tips
+
+To get the highest quality results from SearchScraper, follow these best practices:
+
+#### 1. Detailed Field Descriptions
+
+Always provide clear, detailed descriptions for each field in your schema:
+
+```python
+class CompanyInfo(BaseModel):
+ revenue: str = Field(
+ description="Annual revenue in USD, including the year of reporting"
+ # Good: "Annual revenue in USD, including the year of reporting"
+ # Bad: "Revenue"
+ )
+ market_position: str = Field(
+ description="Company's market position including market share percentage and rank among competitors"
+ # Good: "Company's market position including market share percentage and rank among competitors"
+ # Bad: "Position"
+ )
+```
+
+#### 2. Structured Prompts
+
+Combine schemas with well-structured prompts for better results:
+
+```python
+response = client.searchscraper(
+ user_prompt="""
+ Find information about Tesla's electric vehicles with specific focus on:
+ - Latest Model 3 and Model Y specifications
+ - Current pricing structure
+ - Available customization options
+ - Delivery timeframes
+ Please include only verified information from official sources.
+ """,
+ output_schema=TeslaVehicleInfo
+)
+```
+
+#### 3. Data Validation
+
+Implement comprehensive validation to ensure data quality:
+
+```python
+from pydantic import BaseModel, Field, validator
+from typing import List, Optional
+from datetime import datetime
+
+class MarketData(BaseModel):
+ timestamp: str = Field(description="Data timestamp in ISO format")
+ value: float = Field(description="Market value")
+ confidence_score: float = Field(description="Confidence score between 0 and 1")
+
+ @validator('timestamp')
+ def validate_timestamp(cls, v):
+ try:
+ datetime.fromisoformat(v)
+ return v
+ except ValueError:
+ raise ValueError('Invalid ISO timestamp format')
+
+ @validator('confidence_score')
+ def validate_confidence(cls, v):
+ if not 0 <= v <= 1:
+ raise ValueError('Confidence score must be between 0 and 1')
+ return v
+```
+
+#### 4. Error Handling
+
+Implement robust error handling for schema validation:
+
+```python
+try:
+ response = client.searchscraper(
+ user_prompt="Find market data for NASDAQ:AAPL",
+ output_schema=MarketData
+ )
+ validated_data = MarketData(**response.result)
+except ValidationError as e:
+ print(f"Data validation failed: {e.json()}")
+ # Implement fallback logic or error reporting
+except Exception as e:
+ print(f"An error occurred: {str(e)}")
+```
+
### Async Support
-For applications requiring asynchronous execution:
+Example of using the async searchscraper functionality to search for information concurrently:
```python
-from scrapegraph_py import AsyncClient
import asyncio
+from scrapegraph_py import AsyncClient
+from scrapegraph_py.logger import sgai_logger
+
+sgai_logger.set_logging(level="INFO")
async def main():
- async with AsyncClient(api_key="your-api-key") as client:
-
- response = await client.searchscraper(
- user_prompt="Analyze the current AI chip market",
- )
-
- # Process the structured results
- market_data = response.result
- print(f"Market Size: {market_data['market_overview']['total_size']}")
- print(f"Growth Rate: {market_data['market_overview']['growth_rate']}")
- print("\nKey Players:")
- for player in market_data['market_overview']['key_players']:
- print(f"- {player}")
-
-# Run the async function
-asyncio.run(main())
+ # Initialize async client
+ sgai_client = AsyncClient(api_key="your-api-key-here")
+
+ # List of search queries
+ queries = [
+ "What is the latest version of Python and what are its main features?",
+ "What are the key differences between Python 2 and Python 3?",
+ "What is Python's GIL and how does it work?",
+ ]
+
+ # Create tasks for concurrent execution
+ tasks = [sgai_client.searchscraper(user_prompt=query) for query in queries]
+
+ # Execute requests concurrently
+ responses = await asyncio.gather(*tasks, return_exceptions=True)
+
+ # Process results
+ for i, response in enumerate(responses):
+ if isinstance(response, Exception):
+ print(f"\nError for query {i+1}: {response}")
+ else:
+ print(f"\nSearch {i+1}:")
+ print(f"Query: {queries[i]}")
+ print(f"Result: {response['result']}")
+ print("Reference URLs:")
+ for url in response["reference_urls"]:
+ print(f"- {url}")
+
+ await sgai_client.close()
+
+if __name__ == "__main__":
+ asyncio.run(main())
```
## Integration Options
diff --git a/services/smartscraper.mdx b/services/smartscraper.mdx
index 2e597a4..b9d9971 100644
--- a/services/smartscraper.mdx
+++ b/services/smartscraper.mdx
@@ -270,24 +270,160 @@ try {
### Async Support
-For applications requiring asynchronous execution, SmartScraper provides async support through the `AsyncClient`:
+For applications requiring asynchronous execution, SmartScraper provides comprehensive async support through the `AsyncClient`:
-```python
-from scrapegraph_py import AsyncClient
+
+
+```python Python
import asyncio
+from scrapegraph_py import AsyncClient
+from pydantic import BaseModel, Field
+
+# Define your schema
+class WebpageSchema(BaseModel):
+ title: str = Field(description="The title of the webpage")
+ description: str = Field(description="The description of the webpage")
+ summary: str = Field(description="A brief summary of the webpage")
async def main():
+ # Initialize the async client
async with AsyncClient(api_key="your-api-key") as client:
- response = await client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract the main content"
- )
- print(response)
+ # List of URLs to analyze
+ urls = [
+ "https://scrapegraphai.com/",
+ "https://github.com/ScrapeGraphAI/Scrapegraph-ai",
+ ]
+
+ # Create scraping tasks for each URL
+ tasks = [
+ client.smartscraper(
+ website_url=url,
+ user_prompt="Summarize the main content",
+ output_schema=WebpageSchema
+ )
+ for url in urls
+ ]
+
+ # Execute requests concurrently
+ responses = await asyncio.gather(*tasks, return_exceptions=True)
+
+ # Process results
+ for i, response in enumerate(responses):
+ if isinstance(response, Exception):
+ print(f"Error for {urls[i]}: {response}")
+ else:
+ print(f"Result for {urls[i]}: {response['result']}")
# Run the async function
-asyncio.run(main())
+if __name__ == "__main__":
+ asyncio.run(main())
```
+```javascript JavaScript
+import { AsyncSmartScraper } from 'scrapegraph-js';
+import { z } from 'zod';
+
+// Define schema using Zod
+const WebpageSchema = z.object({
+ title: z.string().describe('The title of the webpage'),
+ description: z.string().describe('The description of the webpage'),
+ summary: z.string().describe('A brief summary of the webpage')
+});
+
+async function scrapeMultiplePages() {
+ const apiKey = 'your-api-key';
+ const scraper = new AsyncSmartScraper(apiKey);
+
+ const urls = [
+ 'https://scrapegraphai.com/',
+ 'https://github.com/ScrapeGraphAI/Scrapegraph-ai'
+ ];
+
+ try {
+ const results = await Promise.all(
+ urls.map(url =>
+ scraper.extract({
+ url,
+ prompt: 'Summarize the main content',
+ schema: WebpageSchema
+ })
+ )
+ );
+
+ results.forEach((result, index) => {
+ console.log(`Results for ${urls[index]}:`, result);
+ });
+ } catch (error) {
+ console.error('Error during scraping:', error);
+ }
+}
+
+scrapeMultiplePages();
+```
+
+
+
+### SmartScraper Endpoint
+
+The SmartScraper endpoint is our core service for extracting structured data from any webpage using advanced language models. It automatically adapts to different website layouts and content types, enabling quick and reliable data extraction.
+
+#### Key Capabilities
+
+- **Universal Compatibility**: Works with any website structure, including JavaScript-rendered content
+- **Schema Validation**: Supports both Pydantic (Python) and Zod (JavaScript) schemas
+- **Concurrent Processing**: Efficient handling of multiple URLs through async support
+- **Custom Extraction**: Flexible user prompts for targeted data extraction
+
+#### Endpoint Details
+
+```bash
+POST https://api.scrapegraphai.com/v1/smartscraper
+```
+
+##### Required Headers
+| Header | Description |
+|--------|-------------|
+| SGAI-APIKEY | Your API authentication key |
+| Content-Type | application/json |
+
+##### Request Body
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| website_url | string | Yes* | URL to scrape (*either this or website_html required) |
+| website_html | string | No | Raw HTML content to process |
+| user_prompt | string | Yes | Instructions for data extraction |
+| output_schema | object | No | Pydantic or Zod schema for response validation |
+
+##### Response Format
+```json
+{
+ "request_id": "sg-req-abc123",
+ "status": "completed",
+ "website_url": "https://example.com",
+ "result": {
+ // Structured data based on schema or extraction prompt
+ },
+ "error": null
+}
+```
+
+#### Best Practices
+
+1. **Schema Definition**:
+ - Define schemas to ensure consistent data structure
+ - Use descriptive field names and types
+ - Include field descriptions for better extraction accuracy
+
+2. **Async Processing**:
+ - Use async clients for concurrent requests
+ - Implement proper error handling
+ - Monitor rate limits and implement backoff strategies
+
+3. **Error Handling**:
+ - Always wrap requests in try-catch blocks
+ - Check response status before processing
+ - Implement retry logic for failed requests
+
## Integration Options
### Official SDKs