-
Notifications
You must be signed in to change notification settings - Fork 15
Parse DataPack on backend #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still too many low-level calls and reliance on the internal serialization format. This does not utilize the power of parsing the data pack from backend, since at the back end we should use all Forte interfaces to make this task more standard. The usage of DataStore is making the problem even worse.
|
|
||
| link_id = uuid.uuid4().int | ||
| link = received_json_data.get('data') | ||
| link["py/state"]['_tid'] = link_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like here we use a certain data structure very similar to the Forte format, and need to use some internal understanding. I feel like this would make the system still depend on some certain Forte version.
| """ | ||
|
|
||
| ANNOTATION_LIST = "annotations" | ||
| LINK_LIST = "links" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why only Annotation and Link have the special class variable, how about Group or other top level ontologies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because Stave users can only add/edit/delete these two types of top ontologies. Looks like we haven't design any mechanism for users to manage Group or other types of ontologies through Stave UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we can, by enabling the Group extension
|
|
||
| # Clear the previous type info to avoid conflicts in onto | ||
| # definitions | ||
| DataStore._type_attributes = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to access the DataStore interface? I think this is a low level interface and if we access this that means we actually don't have backward incompatibility.
Btw, not having backward incompatibility should be fine, as long as we know which Forte-Stave version pairs work, but having access to the low level implementation details here looks problematic.
| """ | ||
| if self._pack is None: | ||
| # Add entry to DataPack with legacy format | ||
| if self._data_store._is_subclass( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many usages of low-level details here, can we use the public API of data pack only?
Note that using the public API we should know whether a type_name is a subclass of Annotation.
| if self._data_store._is_subclass( | ||
| type_name=entry_dict["py/object"], cls=Annotation | ||
| ): | ||
| self._pack_json['py/state']["annotations"].append(entry_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we have to keep the py/state structure here? These are stuff used by jsonpickle.
| def _transform_pack_json(self): | ||
| """ | ||
| Transform a DataPack dictionary with legacy format to a json for | ||
| frontend rendering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transform still looks manual to me. If we want to support legacy format we should directly use Forte data pack API. For example:
for annotation in pack.get("Annotation"):
annotations.append(
"span": {
"begin": annotation.begin,
"end": annotation.end
},
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes most of this part is still manual parsing. To support legacy format, we don't have access to the pack object since the it's not deserializable using the latest forte. We only have access to a json object like this which is derived from old version of forte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we can feel free to directly use Forte to parse the pack object. So for Stave to work, we simply need to install the correct Forte version. For example, we can update the example db with the new Forte format in the master branch now. To use the old forte format we simply downgrade Stave to a prior version.
Cross-version compatibility is another issue that we can deal with separately. My opinion is that that's not very important right now. Even if we want to deal with that, it probably should happen in Forte repo not Stave repo
|
Block by asyml/forte#881. |
This PR fixes #230 .
Description of changes
The
DataPackparsing is moved from frontend to backend.Possible influences of this PR
We still maintain support to the previous DataPack format so that the example projects won't break.