Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Session binding capability via session_id in Request #1086

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Mantisus
Copy link
Collaborator

Description

  • Add strict binding of a Request to a specific Session. If the Session is not available in the SessionPool, an error will be raised for the Request which can be handled in the failed_request_handler.

Issues

Testing

Added tests to verify functionality:

  • Binding to a valid session
  • Binding to a non-existent session
  • Catching error in failed_request_handler

Copy link
Collaborator

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I have just two small comments.

@property
def session_id(self) -> str | None:
"""A string used to identify the bound session."""
return cast(UserData, self.user_data).session_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are following the same pattern as already exists, but I think the cast is not necessary if we redefine model like this:

    user_data: Annotated[
        UserData,
        Field(alias='userData', default_factory=UserData),
        PlainValidator(user_data_adapter.validate_python),
        PlainSerializer(
            lambda instance: user_data_adapter.dump_python(
                instance,
                by_alias=True,
                exclude_none=True,
                exclude_unset=True,
                exclude_defaults=True,
            )
        ),
    ] = UserData()

I tried tests and type check and it seems to work fine like this.
@janbuchar was there any specific reason not covered by unit tests to have it defined as dict[str, JsonSerializable] and not as UserData ?

Copy link
Collaborator

@janbuchar janbuchar Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for that is explained in the comment - it's for user convenience. With the dict[str, JsonSerializable], you don't have to import UserData when constructing a Request - anything json-serializable will do.

I don't want to sacrifice that just to avoid some trivial casts.

Copy link
Collaborator

@Pijukatel Pijukatel Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not have to import UserData even when you define your model with user_data: Annotated[UserData.... It still allows you to initialize with dict[str, JsonSerializable]. Mypy is fine with that as well.

So with our current project setup, to me this seems like unnecessary missleading type + several casts for no benefit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy being fine with that sounds like a bug to me, UserData is a "bigger" type than dict[str, JsonSerializable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is the real benefit of this in the end? (import argument not being applicable)

Copy link
Collaborator

@Pijukatel Pijukatel Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use this mypy settigns:

[tool.pydantic-mypy]
init_typed = true

therefore input to our models can be whatever from mypy perspective

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should either use this settings and then there is some justification for this. Or if we do not use the settings, then this has no benefit at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You get precise type checking for user_data when creating a new Request object and the required type is understandable without any further investigation (dict of json-serializable values).

I don't understand why the import argument shouldn't apply btw.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the import argument shouldn't apply btw.

Because there is no extra import needed in either case. Try to run it if you don't believe me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried 🙂

  1. if you use Request.from_url(), it works OK because it has **kwargs: Any
  2. if you construct it directly (Request(url='https://crawlee.dev', user_data={'hello': 'darkness'}, unique_key="", id="")), it's tricky
    a. mypy with init_typed is probably fine with it
    b. pyright (what vscode uses) is not fine with it
    c. a different mypy configuration that our end users might use may or may not be fine with it

@Mantisus Mantisus self-assigned this Mar 14, 2025
@Mantisus Mantisus requested a review from janbuchar March 14, 2025 16:03
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please cover this in the docs? 🙏 Maybe Session management? Or find a better place. Thanks.

@Mantisus Mantisus requested a review from vdusek March 20, 2025 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for request to use a specific session
4 participants