Parse DataPack on backend #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

mylibrar wants to merge 1 commit into asyml:master from mylibrar:parse-datapack

Collaborator

mylibrar commented Jul 5, 2022

This PR fixes #230 .

Description of changes

The DataPack parsing is moved from frontend to backend.

Possible influences of this PR

We still maintain support to the previous DataPack format so that the example projects won't break.


          Parse DataPack on backend

c43b68c

mylibrar requested a review from hunterhector

July 7, 2022 16:10

hunterhector reviewed

View reviewed changes

Member

hunterhector left a comment •

edited

Loading

still too many low-level calls and reliance on the internal serialization format. This does not utilize the power of parsing the data pack from backend, since at the back end we should use all Forte interfaces to make this task more standard. The usage of DataStore is making the problem even worse.

simple-backend/stave_backend/handlers/document.py

    
                  link_id = uuid.uuid4().int

                  link = received_json_data.get('data')

                  link["py/state"]['_tid'] = link_id

Member

hunterhector Jul 7, 2022

Looks like here we use a certain data structure very similar to the Forte format, and need to use some internal understanding. I feel like this would make the system still depend on some certain Forte version.

simple-backend/stave_backend/lib/stave_pack_parser.py

    
                  """

                  ANNOTATION_LIST = "annotations"

                  LINK_LIST = "links"

Member

hunterhector Jul 7, 2022

curious why only Annotation and Link have the special class variable, how about Group or other top level ontologies?

Collaborator Author

mylibrar Jul 7, 2022

This is because Stave users can only add/edit/delete these two types of top ontologies. Looks like we haven't design any mechanism for users to manage Group or other types of ontologies through Stave UI.

Member

hunterhector Jul 7, 2022

actually we can, by enabling the Group extension

simple-backend/stave_backend/lib/stave_pack_parser.py

    
                          # Clear the previous type info to avoid conflicts in onto

                          # definitions

                          DataStore._type_attributes = {}

Member

hunterhector Jul 7, 2022

Do we have to access the DataStore interface? I think this is a low level interface and if we access this that means we actually don't have backward incompatibility.

Btw, not having backward incompatibility should be fine, as long as we know which Forte-Stave version pairs work, but having access to the low level implementation details here looks problematic.

simple-backend/stave_backend/lib/stave_pack_parser.py

    
                      """

                      if self._pack is None:

                          # Add entry to DataPack with legacy format

                          if self._data_store._is_subclass(

Member

hunterhector Jul 7, 2022

Too many usages of low-level details here, can we use the public API of data pack only?

Note that using the public API we should know whether a type_name is a subclass of Annotation.

simple-backend/stave_backend/lib/stave_pack_parser.py

    
                          if self._data_store._is_subclass(

                              type_name=entry_dict["py/object"], cls=Annotation

                          ):

                              self._pack_json['py/state']["annotations"].append(entry_dict)

Member

hunterhector Jul 7, 2022

Also, do we have to keep the py/state structure here? These are stuff used by jsonpickle.

simple-backend/stave_backend/lib/stave_pack_parser.py

    
                  def _transform_pack_json(self):

                      """

                      Transform a DataPack dictionary with legacy format to a json for

                      frontend rendering

Member

hunterhector Jul 7, 2022

This transform still looks manual to me. If we want to support legacy format we should directly use Forte data pack API. For example:

for annotation in pack.get("Annotation"):
  annotations.append(
     "span": {
            "begin": annotation.begin,
            "end": annotation.end
      },
  )

Collaborator Author

mylibrar Jul 7, 2022

Yes most of this part is still manual parsing. To support legacy format, we don't have access to the pack object since the it's not deserializable using the latest forte. We only have access to a json object like this which is derived from old version of forte.

Member

hunterhector Jul 7, 2022

Actually, I think we can feel free to directly use Forte to parse the pack object. So for Stave to work, we simply need to install the correct Forte version. For example, we can update the example db with the new Forte format in the master branch now. To use the old forte format we simply downgrade Stave to a prior version.

Cross-version compatibility is another issue that we can deal with separately. My opinion is that that's not very important right now. Even if we want to deal with that, it probably should happen in Forte repo not Stave repo

Collaborator Author

mylibrar commented Jul 14, 2022

Block by asyml/forte#881.
There is still too much manual parsing in this PR and we should not directly interact with low level data structures like DataStore. To achieve that, we need add new interfaces in DataPack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet