Skip to content

Commit 2a57940

Browse files
authored
Merge pull request #234 from VinciGit00/pre/beta
Pre/beta
2 parents df918fa + 8727d03 commit 2a57940

File tree

81 files changed

+6490
-571
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+6490
-571
lines changed

CHANGELOG.md

Lines changed: 154 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,189 @@
1-
## [0.10.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0...v0.10.1) (2024-05-11)
1+
## [0.11.0-beta.11](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.10...v0.11.0-beta.11) (2024-05-14)
2+
3+
4+
### Features
5+
6+
* **omni-search:** added omni search graph and updated docs ([fcb3abb](https://github.com/VinciGit00/Scrapegraph-ai/commit/fcb3abb01d505f634309f9ae3c686bbcaab65107))
7+
* **gpt-4o:** image to text single node test ([90955ca](https://github.com/VinciGit00/Scrapegraph-ai/commit/90955ca52f1e3277072e843fb8d578deea27d09f))
8+
* **omni-scraper:** working OmniScraperGraph with images ([a296927](https://github.com/VinciGit00/Scrapegraph-ai/commit/a2969276245cbedb97741975ea707dab2695f71e))
9+
10+
11+
### Bug Fixes
12+
13+
* **fetch_node:** bug in handling local files ([a6e1813](https://github.com/VinciGit00/Scrapegraph-ai/commit/a6e1813ddd36cc8d7c915e6ea0525835d64d10a2))
14+
15+
## [0.11.0-beta.10](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.9...v0.11.0-beta.10) (2024-05-14)
16+
17+
18+
### Features
19+
20+
* add gpt-4o ([52a4a3b](https://github.com/VinciGit00/Scrapegraph-ai/commit/52a4a3b22d6871b14801a5edbd28aa32a1a2580d)), closes [#232](https://github.com/VinciGit00/Scrapegraph-ai/issues/232)
21+
22+
## [0.11.0-beta.9](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.8...v0.11.0-beta.9) (2024-05-14)
223

324

425
### Bug Fixes
526

627
* crash asyncio due dependency version ([2563773](https://github.com/VinciGit00/Scrapegraph-ai/commit/25637734479a0da293860cf404a618eb5f49c7e2))
728

829

30+
### chore
31+
32+
* update models_tokens.py with new model configurations ([d9752b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/d9752b1619c6f86fdc407c898c8c9b443a50cb07))
33+
34+
935
### Docs
1036

1137
* fixed speechgraphexample ([4bf90f3](https://github.com/VinciGit00/Scrapegraph-ai/commit/4bf90f32a8fbb5a06279ec3002200961458a1250))
1238
* fixed unused param and install ([cc28d5a](https://github.com/VinciGit00/Scrapegraph-ai/commit/cc28d5a64f6e0e061f697262302403db875bc6fe))
1339
* **readme:** improve main readme ([ae5655f](https://github.com/VinciGit00/Scrapegraph-ai/commit/ae5655fdde810e80d20d7918b0b2232e29ee3f56))
40+
* **concurrent:** refactor theme and added benchmarck searchgraph ([ced2bbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/ced2bbcdc9672396e3c8afdc1f7f65c4194d29fd))
1441
* update instructions to use with LocalAI ([198420c](https://github.com/VinciGit00/Scrapegraph-ai/commit/198420c505544c88805e719e2fc864f061c7de05))
1542
* Update README.md ([772e064](https://github.com/VinciGit00/Scrapegraph-ai/commit/772e064c55f38ea296511f737dec9a412e0dbf4e))
1643
* updated sponsor logo ([f8d8d71](https://github.com/VinciGit00/Scrapegraph-ai/commit/f8d8d71589ffc9ccde13259b50d309c7949beeb8))
1744

18-
## [0.10.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0...v0.10.0) (2024-05-08)
45+
46+
### CI
47+
48+
* **release:** 0.10.1 [skip ci] ([d359814](https://github.com/VinciGit00/Scrapegraph-ai/commit/d359814c4a640aa1e3bcde3f3bb3688b03f608d9))
49+
50+
## [0.11.0-beta.8](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.7...v0.11.0-beta.8) (2024-05-13)
1951

2052

2153
### Features
2254

23-
* add claude documentation ([5bdee55](https://github.com/VinciGit00/Scrapegraph-ai/commit/5bdee558760521bab818efc6725739e2a0f55d20))
24-
* add gemini embeddings ([79daa4c](https://github.com/VinciGit00/Scrapegraph-ai/commit/79daa4c112e076e9c5f7cd70bbbc6f5e4930832c))
25-
* add llava integration ([019b722](https://github.com/VinciGit00/Scrapegraph-ai/commit/019b7223dc969c87c3c36b6a42a19b4423b5d2af))
26-
* add new hugging_face models ([d5547a4](https://github.com/VinciGit00/Scrapegraph-ai/commit/d5547a450ccd8908f1cf73707142b3481fbc6baa))
27-
* Fix bug for gemini case when embeddings config not passed ([726de28](https://github.com/VinciGit00/Scrapegraph-ai/commit/726de288982700dab8ab9f22af8e26f01c6198a7))
28-
* fixed custom_graphs example and robots_node ([84fcb44](https://github.com/VinciGit00/Scrapegraph-ai/commit/84fcb44aaa36e84f775884138d04f4a60bb389be))
29-
* multiple graph instances ([dbb614a](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbb614a8dd88d7667fe3daaf0263f5d6e9be1683))
30-
* **node:** multiple url search in SearchGraph + fixes ([930adb3](https://github.com/VinciGit00/Scrapegraph-ai/commit/930adb38f2154ba225342466bfd1846c47df72a0))
31-
* refactoring search function ([aeb1acb](https://github.com/VinciGit00/Scrapegraph-ai/commit/aeb1acbf05e63316c91672c99d88f8a6f338147f))
55+
* **parallel-exeuction:** add asyncio event loop dispatcher with semaphore for parallel graph instances ([627cbee](https://github.com/VinciGit00/Scrapegraph-ai/commit/627cbeeb2096eb4cd5da45015d37fceb7fe7840a))
56+
* **batchsize:** tested different batch sizes and systems ([a8d5e7d](https://github.com/VinciGit00/Scrapegraph-ai/commit/a8d5e7db050e15306780ffca47f998ebaf5c1216))
3257

3358

3459
### Bug Fixes
3560

36-
* bug on .toml ([f7d66f5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f7d66f51818dbdfddd0fa326f26265a3ab686b20))
37-
* **llm:** fixed gemini api_key ([fd01b73](https://github.com/VinciGit00/Scrapegraph-ai/commit/fd01b73b71b515206cfdf51c1d52136293494389))
38-
* **examples:** local, mixed models and fixed SearchGraph embeddings problem ([6b71ec1](https://github.com/VinciGit00/Scrapegraph-ai/commit/6b71ec1d2be953220b6767bc429f4cf6529803fd))
39-
* **examples:** openai std examples ([186c0d0](https://github.com/VinciGit00/Scrapegraph-ai/commit/186c0d035d1d211aff33c38c449f2263d9716a07))
40-
* removed .lock file for deployment ([d4c7d4e](https://github.com/VinciGit00/Scrapegraph-ai/commit/d4c7d4e7fcc2110beadcb2fc91efc657ec6a485c))
61+
* **asyncio:** replaced deepcopy with copy due to serialization problems ([dedc733](https://github.com/VinciGit00/Scrapegraph-ai/commit/dedc73304755c2d540a121d143173f60fb448bbb))
62+
63+
## [0.11.0-beta.7](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.6...v0.11.0-beta.7) (2024-05-13)
64+
65+
66+
### Bug Fixes
67+
68+
* bug for claude ([d0167de](https://github.com/VinciGit00/Scrapegraph-ai/commit/d0167dee71779a3c1e1e042e17a41134b93b3c78))
4169

4270

4371
### Docs
4472

45-
* update README.md ([17ec992](https://github.com/VinciGit00/Scrapegraph-ai/commit/17ec992b498839e001277e7bc3f0ebea49fbd00d))
73+
* **refactor:** changed example ([c7ec114](https://github.com/VinciGit00/Scrapegraph-ai/commit/c7ec114274da64f0b61cee80afe908a36ad26b78))
74+
75+
## [0.11.0-beta.6](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.5...v0.11.0-beta.6) (2024-05-13)
76+
77+
78+
### Bug Fixes
79+
80+
* **fetch-node:** removed isSoup from default ([0c15947](https://github.com/VinciGit00/Scrapegraph-ai/commit/0c1594737f878ed5672f4c889fdf9b4e0d7ec49a))
81+
82+
## [0.11.0-beta.5](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.4...v0.11.0-beta.5) (2024-05-13)
83+
84+
85+
### Features
86+
87+
* **webdriver-backend:** add dynamic import scripts from module and file ([db2234b](https://github.com/VinciGit00/Scrapegraph-ai/commit/db2234bf5d2f2589b080cd4136f33c4f4443bdfb))
88+
* **proxy-rotation:** add parse (IP address) or search (from broker) functionality for proxy rotation ([2170131](https://github.com/VinciGit00/Scrapegraph-ai/commit/217013181da06abe8d71d9db70e809ea4ebd8236))
89+
* added proxy rotation ([0c36a7e](https://github.com/VinciGit00/Scrapegraph-ai/commit/0c36a7ec1f32ee073d9e0f534a2cb97aba3d7a1f))
90+
* **safe-web-driver:** enchanced the original `AsyncChromiumLoader` web driver with proxy protection and flexible kwargs and backend ([768719c](https://github.com/VinciGit00/Scrapegraph-ai/commit/768719cce80953fa6cbe283e442420116c438f16))
91+
92+
93+
### Bug Fixes
94+
95+
* **pytest:** add dependency for mocking testing functions ([2f4fd45](https://github.com/VinciGit00/Scrapegraph-ai/commit/2f4fd45700ebf1db0c429b5a6249386d1a111615))
96+
* **chromium-loader:** ensure it subclasses langchain's base loader ([b54d984](https://github.com/VinciGit00/Scrapegraph-ai/commit/b54d984c134c8cbc432fd111bb161d3d53cf4a85))
97+
* **proxy-rotation:** removed duplicated arg and passed the loader_kwarhs correctly to the node ([1e9a564](https://github.com/VinciGit00/Scrapegraph-ai/commit/1e9a56461632999c5dc09f5aa930c14c954025ad))
98+
* **proxy-rotation:** removed max_shape duplicate ([5d6d996](https://github.com/VinciGit00/Scrapegraph-ai/commit/5d6d996e8f6132101d4c3af835d74f0674baffa1))
99+
100+
101+
### Docs
102+
103+
* **refactor:** added proxy-rotation usage and refactor readthedocs ([e256b75](https://github.com/VinciGit00/Scrapegraph-ai/commit/e256b758b2ada641f97b23b1cf6c6b0174563d8a))
104+
105+
## [0.11.0-beta.4](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.3...v0.11.0-beta.4) (2024-05-12)
106+
107+
108+
### Features
109+
110+
* add new prompt info ([e2350ed](https://github.com/VinciGit00/Scrapegraph-ai/commit/e2350eda6249d8e121344d12c92645a3887a5b76))
111+
112+
## [0.11.0-beta.3](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.2...v0.11.0-beta.3) (2024-05-12)
113+
114+
115+
### Features
116+
117+
* add support for deepseek-chat ([156b67b](https://github.com/VinciGit00/Scrapegraph-ai/commit/156b67b91e1798f67082123e2c0087d358a32d4d)), closes [#222](https://github.com/VinciGit00/Scrapegraph-ai/issues/222)
118+
119+
120+
### Docs
121+
122+
* add diagram showing general structure/flow of the library ([13ae918](https://github.com/VinciGit00/Scrapegraph-ai/commit/13ae9180ac5e7ef11dad1a210cf8790e797397dd))
123+
* update overview diagram with more models ([b441b30](https://github.com/VinciGit00/Scrapegraph-ai/commit/b441b30a5c60dda105964f69bd4cef06825f5c74))
124+
125+
## [0.11.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.1...v0.11.0-beta.2) (2024-05-10)
126+
127+
128+
### Features
129+
130+
* revert fetch_node ([864aa91](https://github.com/VinciGit00/Scrapegraph-ai/commit/864aa91326c360992326e04811d272e55eac8355))
131+
132+
## [0.11.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0...v0.11.0-beta.1) (2024-05-10)
133+
134+
135+
### Features
136+
137+
* Add support for passing pdf path as source ([f10f3b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/f10f3b1438e0c625b7f2fa52faeb5a6c12116113))
138+
* update info ([4ed0fb8](https://github.com/VinciGit00/Scrapegraph-ai/commit/4ed0fb89c3e6068190a7775bedcb6ae65ba59d18))
139+
140+
141+
### Bug Fixes
142+
143+
* add json integration ([0ab31c3](https://github.com/VinciGit00/Scrapegraph-ai/commit/0ab31c3fdbd56652ed306e60109301f60e8042d3))
144+
* Augment the information getting fetched from a webpage ([f8ce3d5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f8ce3d5916eab926275d59d4d48b0d89ec9cd43f))
145+
* fixed bugs for csv and xml ([324e977](https://github.com/VinciGit00/Scrapegraph-ai/commit/324e977b853ecaa55bac4bf86e7cd927f7f43d0d))
146+
* limit python version to < 3.12 ([a37fbbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/a37fbbcbcfc3ddd0cc66f586f279676b52c4abfe))
46147

47148

48149
### CI
49150

50-
* **release:** 0.10.0-beta.1 [skip ci] ([c47a505](https://github.com/VinciGit00/Scrapegraph-ai/commit/c47a505750ee63e0220b339478953155ef1f1771))
51-
* **release:** 0.10.0-beta.2 [skip ci] ([3f0e069](https://github.com/VinciGit00/Scrapegraph-ai/commit/3f0e0694f3b08463f025586777f7c0594b5ecb14))
52-
* **release:** 0.9.0-beta.2 [skip ci] ([5aa600c](https://github.com/VinciGit00/Scrapegraph-ai/commit/5aa600cb0a85d320ad8dc786af26ffa46dd4d097))
53-
* **release:** 0.9.0-beta.3 [skip ci] ([da8c72c](https://github.com/VinciGit00/Scrapegraph-ai/commit/da8c72ce138bcfe2627924d25a67afcd22cfafd5))
54-
* **release:** 0.9.0-beta.4 [skip ci] ([8c5397f](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c5397f67a9f05e0c00f631dd297b5527263a888))
55-
* **release:** 0.9.0-beta.5 [skip ci] ([532adb6](https://github.com/VinciGit00/Scrapegraph-ai/commit/532adb639d58640bc89e8b162903b2ed97be9853))
56-
* **release:** 0.9.0-beta.6 [skip ci] ([8c0b46e](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c0b46eb40b446b270c665c11b2c6508f4d5f4be))
57-
* **release:** 0.9.0-beta.7 [skip ci] ([6911e21](https://github.com/VinciGit00/Scrapegraph-ai/commit/6911e21584767460c59c5a563c3fd010857cbb67))
58-
* **release:** 0.9.0-beta.8 [skip ci] ([739aaa3](https://github.com/VinciGit00/Scrapegraph-ai/commit/739aaa33c39c12e7ab7df8a0656cad140b35c9db))
151+
* **release:** 0.10.0-beta.3 [skip ci] ([ad32298](https://github.com/VinciGit00/Scrapegraph-ai/commit/ad32298e70fc626fd62c897e153b806f79dba9b9))
152+
* **release:** 0.10.0-beta.4 [skip ci] ([548bff9](https://github.com/VinciGit00/Scrapegraph-ai/commit/548bff9d77c8b4d2aadee40e966a06cc9d7fd4ab))
153+
* **release:** 0.10.0-beta.5 [skip ci] ([28c9dce](https://github.com/VinciGit00/Scrapegraph-ai/commit/28c9dce7cbda49750172bafd7767fa48a0c33859))
154+
* **release:** 0.10.0-beta.6 [skip ci] ([460d292](https://github.com/VinciGit00/Scrapegraph-ai/commit/460d292af21fabad3fdd2b66110913ccee22ba91))
155+
156+
### Bug Fixes
157+
158+
* add json integration ([0ab31c3](https://github.com/VinciGit00/Scrapegraph-ai/commit/0ab31c3fdbd56652ed306e60109301f60e8042d3))
159+
160+
## [0.10.0-beta.5](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.4...v0.10.0-beta.5) (2024-05-09)
161+
162+
163+
164+
### Bug Fixes
165+
166+
167+
* fixed bugs for csv and xml ([324e977](https://github.com/VinciGit00/Scrapegraph-ai/commit/324e977b853ecaa55bac4bf86e7cd927f7f43d0d))
168+
169+
## [0.10.0-beta.4](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.3...v0.10.0-beta.4) (2024-05-09)
170+
171+
172+
### Features
173+
174+
* Add support for passing pdf path as source ([f10f3b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/f10f3b1438e0c625b7f2fa52faeb5a6c12116113))
175+
176+
177+
### Bug Fixes
178+
179+
* limit python version to < 3.12 ([a37fbbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/a37fbbcbcfc3ddd0cc66f586f279676b52c4abfe))
180+
181+
## [0.10.0-beta.3](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.2...v0.10.0-beta.3) (2024-05-09)
182+
183+
184+
### Features
185+
186+
* update info ([4ed0fb8](https://github.com/VinciGit00/Scrapegraph-ai/commit/4ed0fb89c3e6068190a7775bedcb6ae65ba59d18))
59187

60188
## [0.10.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.1...v0.10.0-beta.2) (2024-05-08)
61189

docs/assets/omniscrapergraph.png

72.2 KB
Loading

docs/assets/omnisearchgraph.png

56.7 KB
Loading
51.8 KB
Binary file not shown.
82 KB
Loading

docs/assets/searchgraph.png

53.3 KB
Loading

docs/assets/smartscrapergraph.png

59.7 KB
Loading

docs/assets/speechgraph.png

48.2 KB
Loading

docs/source/conf.py

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,36 @@
1414
# import all the modules
1515
sys.path.insert(0, os.path.abspath('../../'))
1616

17-
project = 'scrapegraphai'
18-
copyright = '2024, Marco Vinciguerra'
19-
author = 'Marco Vinciguerra'
17+
project = 'ScrapeGraphAI'
18+
copyright = '2024, ScrapeGraphAI'
19+
author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'
20+
21+
html_last_updated_fmt = "%b %d, %Y"
2022

2123
# -- General configuration ---------------------------------------------------
2224
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
2325

24-
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
26+
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme']
2527

2628
templates_path = ['_templates']
2729
exclude_patterns = []
2830

2931
# -- Options for HTML output -------------------------------------------------
3032
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
3133

32-
html_theme = 'sphinx_rtd_theme'
34+
# html_theme = 'sphinx_rtd_theme'
35+
html_theme = 'sphinx_wagtail_theme'
36+
37+
html_theme_options = dict(
38+
project_name = "ScrapeGraphAI",
39+
logo = "scrapegraphai_logo.png",
40+
logo_alt = "ScrapeGraphAI",
41+
logo_height = 59,
42+
logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/",
43+
logo_width = 45,
44+
github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/",
45+
footer_links = ",".join(
46+
["Landing Page|https://scrapegraphai.com/",
47+
"Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"]
48+
),
49+
)

docs/source/getting_started/examples.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
Examples
22
========
33

4-
Here some example of the different ways to scrape with ScrapegraphAI
4+
Let's suppose you want to scrape a website to get a list of projects with their descriptions.
5+
You can use the `SmartScraperGraph` class to do that.
6+
The following examples show how to use the `SmartScraperGraph` class with OpenAI models and local models.
57

68
OpenAI models
79
^^^^^^^^^^^^^
@@ -78,7 +80,7 @@ After that, you can run the following code, using only your machine resources br
7880
# ************************************************
7981
8082
smart_scraper_graph = SmartScraperGraph(
81-
prompt="List me all the news with their description.",
83+
prompt="List me all the projects with their description.",
8284
# also accepts a string with the already downloaded HTML code
8385
source="https://perinim.github.io/projects",
8486
config=graph_config
@@ -87,3 +89,4 @@ After that, you can run the following code, using only your machine resources br
8789
result = smart_scraper_graph.run()
8890
print(result)
8991
92+
To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!

0 commit comments

Comments
 (0)