In readers.py, the text extracted from multi-column pdf documents doesn't respect columns, i.e., the text continues across columns. To fix this, the following line:
text = page.get_text("text", sort=True)
should be replaced by these lines:
# Extract text blocks from the page
blocks = page.get_text("blocks")
# Concatenate text blocks, which are already in the correct order, into a single string
text = "\n".join(block[4] for block in blocks)
I'd submit a pull request, but it seems I don't have sufficient permissions to do so.
Thanks!