-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aryn connectors for reading and writing docsets #1147
Conversation
def __init__(self, docs: list[dict[str, Any]]): | ||
self.docs = docs | ||
|
||
def to_docs(self, query_params: "BaseDBReader.QueryParams") -> list[Document]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some day I feel like we should turn these lists that we're passing around into iterators/generators (all the way to the ray ds construction) but not right now and that applies to all the readers
def create_target_idempotent(self, target_params: "BaseDBWriter.TargetParams"): | ||
pass | ||
|
||
def get_existing_target_params(self, target_params: "BaseDBWriter.TargetParams") -> "BaseDBWriter.TargetParams": | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a huge lift to add docset-creation functionality to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are multiple writer tasks, will each one attempt to create a docset? Since name
is not unique, creating a docset from multiple writer tasks will result in multiple docsets (with the same name) being created.
lib/sycamore/sycamore/reader.py
Outdated
api_key = ArynConfig.get_aryn_api_key() | ||
aryn_url = ArynConfig.get_aryn_url() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be able to pass these in as params here too I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
lib/sycamore/sycamore/reader.py
Outdated
@context_params | ||
def aryn(self, docset_id: str, **kwargs) -> DocSet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What params come from the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I need this. I'll remove it.
lib/sycamore/sycamore/writer.py
Outdated
api_key = ArynConfig.get_aryn_api_key() | ||
aryn_url = ArynConfig.get_aryn_url() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these guys too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
lib/sycamore/sycamore/writer.py
Outdated
if docset_id is None and create_new_docset and name is not None: | ||
headers = { | ||
"Authorization": f"Bearer {api_key}" | ||
} | ||
res = requests.post(url=f"{aryn_url}/docsets", data={"name": name}, headers=headers) | ||
docset_id = res.json()["docset_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess docstore allows multiple docsets with the same name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only IDs are unique.
No description provided.