-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Lazily write the FST padding byte #12981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
Egads, thank you bot! This one had already fallen past the event horizon of my email box. I'll try to review soon. |
// pad: ensure no node gets address 0 which is reserved to mean | ||
// the stop state w/ no arcs | ||
dataOutput.writeByte((byte) 0); | ||
// the stop state w/ no arcs. the actual byte will be written lazily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm could we instead add a boolean paddingBytePending
or so? Set it to true, here, then when the byte is lazily written, set it to false, write the byte, and increment numBytesWritten
at that point? Otherwise it's sort of weird to increment numBytesWritten
when we didn't actually write the byte yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numBytesWritten is used for determine the address of the to be written nodes, so if we don't increment it here, it would mess up the address.
freezeTail(0); | ||
if (root.numArcs == 0) { | ||
if (fst.metadata.emptyOutput == null) { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment that this means a completely empty FST? Accepts nothing?
} | ||
|
||
private void writePaddingByte() throws IOException { | ||
assert numBytesWritten == 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment explaining what this padding byte even is for? I myself cannot remember :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dungba88 -- I left a few small comments.
How serious is this, during off-heap writing? Are we using off-heap writing anywhere in Lucene yet (e.g. block tree)?
This will break the off-heap writing, as we are writting byte which won't belong to any FST, and it would mess up subsequent reads (if multiple FST are written to the same file) We are not using off-heap, but I have 2 other PR to start doing that: |
Oh no, I see. Hmm, well could we increment the |
OK, phew, thanks. Those will be next up :) |
Thanks @mikemccand for reviewing, I've addressed those in the latest revision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it looks great @dungba88, I'll merge!
* lazily write the FST padding byte * Also write the pad byte when there is emptyOutput * add comment * Add more comments
* lazily write the FST padding byte * Also write the pad byte when there is emptyOutput * add comment * Add more comments
Description
Lazily write the FST padding byte, so that in case the FST is empty (no accepted nodes) nothing will be written. This is important for off-heap writing, as we don't want to add that extra byte when the FST would be thrown away. Found while working on #12980