Skip to content

Commit ced2bbc

Browse files
committed
docs(concurrent): refactor theme and added benchmarck searchgraph
1 parent c0d26d6 commit ced2bbc

File tree

12 files changed

+64
-11
lines changed

12 files changed

+64
-11
lines changed

docs/assets/searchgraph.png

3.14 KB
Loading

docs/assets/smartscrapergraph.png

1.46 KB
Loading

docs/assets/speechgraph.png

2.37 KB
Loading

docs/source/conf.py

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,36 @@
1414
# import all the modules
1515
sys.path.insert(0, os.path.abspath('../../'))
1616

17-
project = 'scrapegraphai'
18-
copyright = '2024, Marco Vinciguerra'
19-
author = 'Marco Vinciguerra'
17+
project = 'ScrapeGraphAI'
18+
copyright = '2024, ScrapeGraphAI'
19+
author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'
20+
21+
html_last_updated_fmt = "%b %d, %Y"
2022

2123
# -- General configuration ---------------------------------------------------
2224
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
2325

24-
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
26+
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme']
2527

2628
templates_path = ['_templates']
2729
exclude_patterns = []
2830

2931
# -- Options for HTML output -------------------------------------------------
3032
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
3133

32-
html_theme = 'sphinx_rtd_theme'
34+
# html_theme = 'sphinx_rtd_theme'
35+
html_theme = 'sphinx_wagtail_theme'
36+
37+
html_theme_options = dict(
38+
project_name = "ScrapeGraphAI",
39+
logo = "scrapegraphai_logo.png",
40+
logo_alt = "ScrapeGraphAI",
41+
logo_height = 59,
42+
logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/",
43+
logo_width = 45,
44+
github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/",
45+
footer_links = ",".join(
46+
["Landing Page|https://scrapegraphai.com/",
47+
"Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"]
48+
),
49+
)

docs/source/getting_started/installation.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@ The library is available on PyPI, so it can be installed using the following com
2121
2222
pip install scrapegraphai
2323
24-
**Note:** It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
24+
.. important::
25+
26+
It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
2527

2628
If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:
2729

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
scrapers/graphs
2525
scrapers/llm
2626
scrapers/graph_config
27+
scrapers/benchmarks
2728

2829
.. toctree::
2930
:maxdepth: 2

docs/source/introduction/overview.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,14 @@ This flexibility ensures that scrapers remain functional even when website layou
2424
We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc.
2525
as well as local models which can run on your machine using Ollama.
2626

27-
Diagram
28-
=======
27+
Library Diagram
28+
===============
29+
2930
With ScrapegraphAI you first construct a pipeline of steps you want to execute by combining nodes into a graph.
3031
Executing the graph takes care of all the steps that are often part of scraping: fetching, parsing etc...
3132
Finally the scraped and processed data gets fed to an LLM which generates a response.
3233

3334
.. image:: ../../assets/project_overview_diagram.png
3435
:align: center
36+
:width: 70%
3537
:alt: ScrapegraphAI Overview
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
Benchmarks
2+
==========
3+
4+
SearchGraph
5+
^^^^^^^^^^^
6+
7+
`SearchGraph` instantiates multiple `SmartScraperGraph` object for each URL and extract the data from the HTML.
8+
A concurrent approach is used to speed up the process and the following table shows the time required for a scraping task with different **batch sizes**.
9+
Only two results are taken into account.
10+
11+
.. list-table:: SearchGraph
12+
:header-rows: 1
13+
14+
* - Batch Size
15+
- Total Time (s)
16+
* - 1
17+
- 31.1
18+
* - 2
19+
- 33.52
20+
* - 4
21+
- 28.47
22+
* - 16
23+
- 21.80

docs/source/scrapers/graph_config.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _Configuration:
2+
13
Additional Parameters
24
=====================
35

docs/source/scrapers/graphs.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ There are currently three types of graphs available in the library:
99
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
1010
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
1111

12-
**Note:** they all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the `LLM`_ and `Configuration`_ sections.
12+
.. note::
13+
14+
They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.
1315

1416
SmartScraperGraph
1517
^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)