-
Notifications
You must be signed in to change notification settings - Fork 437
feat: Add SqlStorageClient
based on sqlalchemy
v2+
#1339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Mantisus
wants to merge
80
commits into
apify:master
Choose a base branch
from
Mantisus:sql-client
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
80 commits
Select commit
Hold shift + click to select a range
a3c5fa0
base implementation sql client
Mantisus 3142bdd
resolve
Mantisus b056505
add dataset tests
Mantisus ae3bc3d
add kvs tests
Mantisus 49f2643
add rq tests
Mantisus 35a27fc
fix docs in tests
Mantisus 52e1ad2
wrap `SQLStorageClient` in _try_import
Mantisus df41c45
update db models
Mantisus 342c65a
dataset optimization
Mantisus 61a2666
kvs optimization
Mantisus 7055f7d
optimization
Mantisus 1884f7d
reduce the refresh rate of `accessed_at`
Mantisus a10e3cf
up docs
Mantisus f7ebbe5
Update src/crawlee/storage_clients/_sql/_request_queue_client.py
Mantisus 83ca6d3
fix tests
Mantisus 1e3474c
Merge master
Mantisus 8086ab2
same updates
Mantisus 9ee93ab
resolve
Mantisus 2934836
Merge branch 'master' into sql-client
Mantisus 6401b65
up pyproject
Mantisus 1c11d97
Merge branch 'master' into sql-client
Mantisus df927d1
refactor
Mantisus 9f5e640
fix len strict for metadata_id in kvs_record
Mantisus 77c1894
fix cache
Mantisus b3c1aad
update queue for support multi-clients
Mantisus fb8ce7d
fix metadata calculate
Mantisus 63249bb
Add experimental warning
Mantisus 0d62dcf
remove mysql
Mantisus dffeb76
raise Error for unsupported dialects
Mantisus 61ba512
optimize update timestamps in metadata
Mantisus 46e12b4
add docs
Mantisus 41fcb35
Merge branch 'master' into sql-client
Mantisus b92e385
Update pyproject.toml
Mantisus 045fe9c
up docs
Mantisus 1a7618e
up database types
Mantisus cf1f722
Up names
Mantisus 9328d9d
Update src/crawlee/storage_clients/_sql/_key_value_store_client.py
Mantisus 9296d90
save session maker
Mantisus bdc1258
some updates
Mantisus 9d47cff
Apply suggestion from @vdusek
Mantisus f69771e
Apply suggestion from @vdusek
Mantisus 3d53ac2
Update docs/guides/storage_clients.mdx
Mantisus c7e3f8c
Update docs/guides/storage_clients.mdx
Mantisus 5d05c06
Update src/crawlee/storage_clients/_sql/_client_mixin.py
Mantisus 7a999a4
Update src/crawlee/storage_clients/_sql/_client_mixin.py
Mantisus bfec174
Update src/crawlee/storage_clients/_sql/_db_models.py
Mantisus a9b466f
Update src/crawlee/storage_clients/_sql/_db_models.py
Mantisus 4443e98
Update src/crawlee/storage_clients/_sql/_storage_client.py
Mantisus 245a4f9
Update docs/guides/storage_clients.mdx
Mantisus fb2937b
Update src/crawlee/storage_clients/_sql/_storage_client.py
Mantisus c3cc554
Update tests/unit/storages/test_request_queue.py
Mantisus 05f59ca
polish sql-client
Mantisus 473610d
Update docs/guides/storage_clients.mdx
Mantisus f17f6ca
Update docs/guides/storage_clients.mdx
Mantisus 2ed4f06
Update docs/guides/storage_clients.mdx
Mantisus 88a60f3
Update docs/guides/storage_clients.mdx
Mantisus a9b9671
chore(deps): update typescript-eslint monorepo to v8.41.0 (#1375)
renovate[bot] f8b2879
docs: Update `RequestLoader.fetch_next_request` docblock (#1374)
janbuchar 4ba3a2e
chore(release): Update changelog and package version [skip ci]
1d0e531
chore(deps): update dependency types-cachetools to ~=6.2.0.20250827 (…
renovate[bot] 5ae2c38
chore(deps): update yarn to v4.9.4 (#1377)
renovate[bot] ceaa9b5
docs: Update Request loaders guide (#1376)
vdusek 3f0bf8a
chore: Fix accidentally missing name of the test (#1380)
Pijukatel 3241785
feat: Persist the `SitemapRequestLoader` state (#1347)
Mantisus caff701
chore(release): Update changelog and package version [skip ci]
29cf5af
suppose warning
Mantisus bf47625
up code block
Mantisus b0e9f66
Merge branch 'master' into sql-client
Mantisus 4d5ade3
up docs
Mantisus 74f8825
drop cast
Mantisus d3a2ebc
fix docs
Mantisus 7081fe4
clean docstrings
Mantisus b1a877e
extra optimization
Mantisus 582adb0
Merge branch 'master' into sql-client
Mantisus d14c43a
handle create tables rom several parallel processes
Mantisus f48887a
add collumn client_key
Mantisus cd44018
few updates
Mantisus e740f21
Merge branch 'master' into sql-client
Mantisus 6e66337
add support for NDU storages
Mantisus 289ab8b
fix
Mantisus File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
12 changes: 12 additions & 0 deletions
12
docs/guides/code_examples/storage_clients/sql_storage_client_basic_example.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
from crawlee.crawlers import ParselCrawler | ||
from crawlee.storage_clients import SqlStorageClient | ||
|
||
|
||
async def main() -> None: | ||
# Create a new instance of storage client. | ||
# This will create an SQLite database file crawlee.db or created tables in your | ||
# database if you pass `connection_string` or `engine` | ||
# Use the context manager to ensure that connections are properly cleaned up. | ||
async with SqlStorageClient() as storage_client: | ||
# And pass it to the crawler. | ||
crawler = ParselCrawler(storage_client=storage_client) |
33 changes: 33 additions & 0 deletions
33
docs/guides/code_examples/storage_clients/sql_storage_client_configuration_example.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
from sqlalchemy.ext.asyncio import create_async_engine | ||
|
||
from crawlee.configuration import Configuration | ||
from crawlee.crawlers import ParselCrawler | ||
from crawlee.storage_clients import SqlStorageClient | ||
|
||
|
||
async def main() -> None: | ||
# Create a new instance of storage client. | ||
# On first run, also creates tables in your PostgreSQL database. | ||
# Use the context manager to ensure that connections are properly cleaned up. | ||
async with SqlStorageClient( | ||
# Create an `engine` with the desired configuration | ||
engine=create_async_engine( | ||
'postgresql+asyncpg://myuser:mypassword@localhost:5432/postgres', | ||
future=True, | ||
pool_size=5, | ||
max_overflow=10, | ||
pool_recycle=3600, | ||
pool_pre_ping=True, | ||
echo=False, | ||
) | ||
) as storage_client: | ||
# Create a configuration with custom settings. | ||
configuration = Configuration( | ||
purge_on_start=False, | ||
) | ||
|
||
# And pass them to the crawler. | ||
crawler = ParselCrawler( | ||
storage_client=storage_client, | ||
configuration=configuration, | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,21 @@ | ||
from crawlee._utils.try_import import install_import_hook as _install_import_hook | ||
from crawlee._utils.try_import import try_import as _try_import | ||
|
||
# These imports have only mandatory dependencies, so they are imported directly. | ||
from ._base import StorageClient | ||
from ._file_system import FileSystemStorageClient | ||
from ._memory import MemoryStorageClient | ||
|
||
_install_import_hook(__name__) | ||
|
||
# The following imports are wrapped in try_import to handle optional dependencies, | ||
# ensuring the module can still function even if these dependencies are missing. | ||
with _try_import(__name__, 'SqlStorageClient'): | ||
from ._sql import SqlStorageClient | ||
|
||
__all__ = [ | ||
'FileSystemStorageClient', | ||
'MemoryStorageClient', | ||
'SqlStorageClient', | ||
'StorageClient', | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
from ._dataset_client import SqlDatasetClient | ||
from ._key_value_store_client import SqlKeyValueStoreClient | ||
from ._request_queue_client import SqlRequestQueueClient | ||
from ._storage_client import SqlStorageClient | ||
|
||
__all__ = ['SqlDatasetClient', 'SqlKeyValueStoreClient', 'SqlRequestQueueClient', 'SqlStorageClient'] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.