Skip to content

Conversation

alexfurmenkov
Copy link
Collaborator

No description provided.

"""

usdm_dtd = """\
<!ELEMENT ref EMPTY>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need a definition for the usdm:tag element (which must have a name attribute) as described here and mentioned here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

for ref in refs:
ref_copy = etree.Element("ref", attrib=ref.attrib)
if not dtd.validate(ref_copy):
return f"Invalid XML fragment: {dtd.error_log.filter_from_errors()}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will stop the checking of refs as soon as the first invalid one is found. However, it would be preferable if all invalid refs (and/or tags) are reported in one go - so maybe add the error for each invalid ref/tag to a list and return the list of errors if it's not empty.

dataset = self.evaluation_dataset
target = self.params.target
if target not in dataset:
error_list = [f"Target column '{target}' not found"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange to me - I don't think you should get an error reported for every record in the dataset if the specified column does not exist. I would expect this operation to behave in the same way as other operations - i.e., if the column specified in the name parameter is not present in the dataset, then this is dealt with by the error handling in BaseOperation.execute (e.g., the rule would be skipped because the specified variable does not exist).

return [html_error]

xml_text = text
if "usdm:ref" in text and "xmlns:usdm" not in text:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to handle usdm:tag too (here and in any other place that deals with usdm:ref)

html5lib.parse(text)
return None
except Exception as e:
return f"Invalid HTML fragment: {e}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's possible, it would be better to report all the HTML parsing issues in one go instead of stopping when the first is encountered. However, this may depend on how the parser works and whether the parser can differentiate between errors that prevent continued processing and those that don't.

root = etree.fromstring(xml_text.encode("utf-8"), parser=parser)
return root, None
except etree.XMLSyntaxError as e:
return None, f"Invalid XML fragment: {e}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's possible, it would be better to report all the XML parsing issues in one go instead of stopping when the first is encountered. However, this may depend on the type of error encountered and how the parser works - I think this particular parser stops when malformed XML is encountered.

pytest.fail(f"Unknown expected_type {expected_type}")


def test_get_xhtml_errors_missing_column(base_services):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary if a missing variable is handled in the usual way (i.e., leading to an "absent variable skip"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the test to make sure that ColumnNotFoundError is raised

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rule blocked: CORERULES-9628 - new operator to check that values of specified variable is valid XHTML

2 participants