Skip to content

Commit 956639e

Browse files
committed
PI: Don't load entire file into memory when passed file name
This halves allocated memory when doing a simple PdfWriter(clone_from=«str») I can't just close the self.stream in `__del__` because for some strange reason the unit tests mark it as unflagged even after the test block ends. Something about `__del__` finalizers being run on a second pass while `weakref.finalize()` is run on the first pass.
1 parent 18519a0 commit 956639e

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

pypdf/_reader.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,10 @@
3030
import os
3131
import re
3232
import struct
33+
import weakref
3334
import zlib
3435
from datetime import datetime
35-
from io import BytesIO, UnsupportedOperation
36+
from io import BytesIO, FileIO, UnsupportedOperation
3637
from pathlib import Path
3738
from typing import (
3839
Any,
@@ -310,9 +311,11 @@ def __init__(
310311
"It may not be read correctly.",
311312
__name__,
312313
)
314+
313315
if isinstance(stream, (str, Path)):
314-
with open(stream, "rb") as fh:
315-
stream = BytesIO(fh.read())
316+
stream = FileIO(stream, "rb")
317+
weakref.finalize(self, stream.close)
318+
316319
self.read(stream)
317320
self.stream = stream
318321

@@ -342,6 +345,10 @@ def __init__(
342345
elif password is not None:
343346
raise PdfReadError("Not encrypted file")
344347

348+
def close(self) -> None:
349+
"""Close the underlying file handle"""
350+
self.stream.close()
351+
345352
@property
346353
def root_object(self) -> DictionaryObject:
347354
"""Provide access to "/Root". standardized with PdfWriter."""

0 commit comments

Comments
 (0)