-
Notifications
You must be signed in to change notification settings - Fork 50
refactor: refactor pipeline engine using ray data #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 17 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
31c5a64
feat: add config and operator node types
ChenZiHong-Gavin 8bcbe51
refactor: refactor readers with ray data
ChenZiHong-Gavin 246348f
fix: delete param parallelism for readers
ChenZiHong-Gavin 319e1e7
fix: fix import error
ChenZiHong-Gavin 42fcb09
refactor read and chunk operators with no side effects
ChenZiHong-Gavin b458e48
fix: fix import error
ChenZiHong-Gavin 95c4783
fix: fix return logic
ChenZiHong-Gavin c844d65
refactor: rename operator split to chunk
ChenZiHong-Gavin c447936
refactor: refactor build_kg to accomodate ray data
ChenZiHong-Gavin 3edbb81
feat: add StorageFactory & global params
ChenZiHong-Gavin ee0639d
refactor: refactor quiz to accomodata ray data engine
ChenZiHong-Gavin 157f0b0
fix: reload graph before quizzing
ChenZiHong-Gavin 99a6e5f
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin ec2033b
Potential fix for pull request finding 'Unreachable code'
ChenZiHong-Gavin bc07222
fix: fix quiz params
ChenZiHong-Gavin c9435d7
refactor: refactor quiz&judge to ray actors
ChenZiHong-Gavin c55fc09
Merge branch 'refactor/refactor-with-ray-data' of https://github.com/…
ChenZiHong-Gavin d7d6c2a
fix: fix transferring quizzed data to JudgeService
ChenZiHong-Gavin a6aedaf
refactor: refactor partition to accomodate ray data
ChenZiHong-Gavin ea1603b
fix: fix lint problem
ChenZiHong-Gavin 244deb4
refactor: refactor op generate
ChenZiHong-Gavin d460a2a
feat: write results in output folder
ChenZiHong-Gavin cd011ad
fix: raise error when no dataset is created
ChenZiHong-Gavin aab7438
fix: return generator in ece_partitioner
ChenZiHong-Gavin 7643b9f
fix: return generator in ece_partitioner
ChenZiHong-Gavin c42b604
refactor: refactor data format to support multi-modal input
ChenZiHong-Gavin 42dc73e
fix: delete fetching schema to avoid ray's duplicate execution
ChenZiHong-Gavin 73f70a5
fix: fix operators' registry
ChenZiHong-Gavin 37cbfcf
feat: refactor schema_guided_extraction & add examples
ChenZiHong-Gavin b400d2e
feat: seperate ray logs and service logs
ChenZiHong-Gavin 0790ba4
feat: use storage actor
ChenZiHong-Gavin 68e5191
feat: add kuzu graph database
ChenZiHong-Gavin 0fbfcf2
feat: add llm as actors
ChenZiHong-Gavin c7e32b0
refactor: delete old runner
ChenZiHong-Gavin 18a67be
fix: fix vllm wrapper
ChenZiHong-Gavin b7d692a
docs: update .env.example
ChenZiHong-Gavin 52519e7
fix: use kuzudb in quiz_service
ChenZiHong-Gavin ee6a927
fix: update webui
ChenZiHong-Gavin 86760e9
feat: make storage backend configuragble
ChenZiHong-Gavin 9b700f5
docs: update README”
ChenZiHong-Gavin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| from .init_llm import init_llm | ||
| from .init_storage import init_storage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| from graphgen.models import JsonKVStorage, NetworkXStorage | ||
|
|
||
|
|
||
| class StorageFactory: | ||
| """ | ||
| Factory class to create storage instances based on backend. | ||
| Supported backends: | ||
| kv_storage(key-value storage): | ||
| - json_kv: JsonKVStorage | ||
| graph_storage: | ||
| - networkx: NetworkXStorage (graph storage) | ||
| """ | ||
|
|
||
| @staticmethod | ||
| def create_storage(backend: str, working_dir: str, namespace: str): | ||
| if backend == "json_kv": | ||
| return JsonKVStorage(working_dir, namespace=namespace) | ||
|
|
||
| if backend == "networkx": | ||
| return NetworkXStorage(working_dir, namespace=namespace) | ||
|
|
||
| raise NotImplementedError( | ||
| f"Storage backend '{backend}' is not implemented yet." | ||
| ) | ||
|
|
||
|
|
||
| def init_storage(backend: str, working_dir: str, namespace: str): | ||
| return StorageFactory.create_storage(backend, working_dir, namespace) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
_should_keep_item, usingassertto validateitem_typewill crash the Ray worker on unsupported types. It's more robust to log a warning and returnFalseto filter out invalid items without halting the pipeline.