-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Session binding capability via session_id
in Request
#1086
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I have just two small comments.
src/crawlee/_request.py
Outdated
@property | ||
def session_id(self) -> str | None: | ||
"""A string used to identify the bound session.""" | ||
return cast(UserData, self.user_data).session_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you are following the same pattern as already exists, but I think the cast is not necessary if we redefine model like this:
user_data: Annotated[
UserData,
Field(alias='userData', default_factory=UserData),
PlainValidator(user_data_adapter.validate_python),
PlainSerializer(
lambda instance: user_data_adapter.dump_python(
instance,
by_alias=True,
exclude_none=True,
exclude_unset=True,
exclude_defaults=True,
)
),
] = UserData()
I tried tests and type check and it seems to work fine like this.
@janbuchar was there any specific reason not covered by unit tests to have it defined as dict[str, JsonSerializable]
and not as UserData
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for that is explained in the comment - it's for user convenience. With the dict[str, JsonSerializable]
, you don't have to import UserData
when constructing a Request
- anything json-serializable will do.
I don't want to sacrifice that just to avoid some trivial casts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not have to import UserData
even when you define your model with user_data: Annotated[UserData...
. It still allows you to initialize with dict[str, JsonSerializable]
. Mypy is fine with that as well.
So with our current project setup, to me this seems like unnecessary missleading type + several casts for no benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mypy being fine with that sounds like a bug to me, UserData
is a "bigger" type than dict[str, JsonSerializable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is the real benefit of this in the end? (import argument not being applicable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not use this mypy settigns:
[tool.pydantic-mypy]
init_typed = true
therefore input to our models can be whatever from mypy perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should either use this settings and then there is some justification for this. Or if we do not use the settings, then this has no benefit at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You get precise type checking for user_data
when creating a new Request
object and the required type is understandable without any further investigation (dict of json-serializable values).
I don't understand why the import argument shouldn't apply btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the import argument shouldn't apply btw.
Because there is no extra import needed in either case. Try to run it if you don't believe me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried 🙂
- if you use
Request.from_url()
, it works OK because it has**kwargs: Any
- if you construct it directly (
Request(url='https://crawlee.dev', user_data={'hello': 'darkness'}, unique_key="", id="")
), it's tricky
a. mypy withinit_typed
is probably fine with it
b. pyright (what vscode uses) is not fine with it
c. a different mypy configuration that our end users might use may or may not be fine with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please cover this in the docs? 🙏 Maybe Session management? Or find a better place. Thanks.
Description
Request
to a specificSession
. If theSession
is not available in theSessionPool
, an error will be raised for theRequest
which can be handled in thefailed_request_handler
.Issues
Testing
Added tests to verify functionality:
failed_request_handler