-
Notifications
You must be signed in to change notification settings - Fork 23
897: check that values of specified variable is valid XHTML #1371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…into 897-validate-XHTML-values
""" | ||
|
||
usdm_dtd = """\ | ||
<!ELEMENT ref EMPTY> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
for ref in refs: | ||
ref_copy = etree.Element("ref", attrib=ref.attrib) | ||
if not dtd.validate(ref_copy): | ||
return f"Invalid XML fragment: {dtd.error_log.filter_from_errors()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will stop the checking of refs as soon as the first invalid one is found. However, it would be preferable if all invalid refs (and/or tags) are reported in one go - so maybe add the error for each invalid ref/tag to a list and return the list of errors if it's not empty.
dataset = self.evaluation_dataset | ||
target = self.params.target | ||
if target not in dataset: | ||
error_list = [f"Target column '{target}' not found"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strange to me - I don't think you should get an error reported for every record in the dataset if the specified column does not exist. I would expect this operation to behave in the same way as other operations - i.e., if the column specified in the name
parameter is not present in the dataset, then this is dealt with by the error handling in BaseOperation.execute
(e.g., the rule would be skipped because the specified variable does not exist).
return [html_error] | ||
|
||
xml_text = text | ||
if "usdm:ref" in text and "xmlns:usdm" not in text: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to handle usdm:tag
too (here and in any other place that deals with usdm:ref
)
html5lib.parse(text) | ||
return None | ||
except Exception as e: | ||
return f"Invalid HTML fragment: {e}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible, it would be better to report all the HTML parsing issues in one go instead of stopping when the first is encountered. However, this may depend on how the parser works and whether the parser can differentiate between errors that prevent continued processing and those that don't.
root = etree.fromstring(xml_text.encode("utf-8"), parser=parser) | ||
return root, None | ||
except etree.XMLSyntaxError as e: | ||
return None, f"Invalid XML fragment: {e}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible, it would be better to report all the XML parsing issues in one go instead of stopping when the first is encountered. However, this may depend on the type of error encountered and how the parser works - I think this particular parser stops when malformed XML is encountered.
pytest.fail(f"Unknown expected_type {expected_type}") | ||
|
||
|
||
def test_get_xhtml_errors_missing_column(base_services): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary if a missing variable is handled in the usual way (i.e., leading to an "absent variable skip"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the test to make sure that ColumnNotFoundError is raised
…into 897-validate-XHTML-values
…into 897-validate-XHTML-values # Conflicts: # pyproject.toml # requirements.txt
No description provided.