Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Upgrade upgrading guide #278

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

janbuchar
Copy link
Contributor

No description provided.

@janbuchar janbuchar added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Sep 18, 2024
@github-actions github-actions bot added this to the 98th sprint - Tooling team milestone Sep 18, 2024
@@ -28,6 +31,7 @@ Attributes suffixed with `_millis` were renamed to remove said suffix and have t
- `Actor.start`, `Actor.call`, `Actor.start_task`, `Actor.set_status_message` and `Actor.abort` return instances of the `ActorRun` model instead of an untyped `dict`.
- Upon entering the context manager (`async with Actor`), the `Actor` puts the default logging configuration in place. This can be disabled using the `configure_logging` parameter.
- The `config` parameter of `Actor` has been renamed to `configuration`.
- Event handlers registered via `Actor.on` will now receive Pydantic objects instead of untyped dicts. For example, where you would do `event['isMigrating']`, you should now use `event.is_migrating`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentionally vague - maybe we should expose the event models somehow so that we can link them from here. Currently, they are internal members of Crawlee.

Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just 2 comments

- The SDK now uses [crawlee](https://github.com/apify/crawlee-python) for local storage emulation. This change should not affect intended usage (working with `Dataset`, `KeyValueStore` and `RequestQueue` classes from the `apify.storages` module or using the shortcuts exposed by the `Actor` class) in any way.
- There is a difference in the `RequestQueue.add_request` method: it accepts an `apify.Request` object instead of a free-form dictionary.
- A quick way to migrate from dict-based arguments is to wrap it with a `Request.model_validate()` call.
- The preferred way is to instantiate it directly, e.g., `Request(url='https://example.tld', ...)`, or using the `Request.from_url` helper which prefills the `unique_key` and `id` attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should mention the from_url as a preferred way of creating new Requests. The need for instantiating it directly should be quite rare.


Removing the `StorageClientManager` class is a significant change. If you need to change the storage client, use `crawlee.service_container` instead.
- The SDK now uses [crawlee](https://github.com/apify/crawlee-python) for local storage emulation. This change should not affect intended usage (working with `Dataset`, `KeyValueStore` and `RequestQueue` classes from the `apify.storages` module or using the shortcuts exposed by the `Actor` class) in any way.
- There is a difference in the `RequestQueue.add_request` method: it accepts an `apify.Request` object instead of a free-form dictionary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only RQ.add_request? Do no more methods work with requests? We should also mention that users can provide just a URL as a string or apify.Request object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the bullets below this one not satisfy this need?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants