You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compare the actual etags instead of comparing md5sums
The server side is currently using a shortcut in calculating the values
of the `md5sum` field - it uses the Object Storage (S3) ETag value.
ETag is a MD5. But for the multipart uploaded files, the MD5 is computed
from the concatenation of the MD5s of each uploaded part.
Say you uploaded a 14MB file and your part size is 5MB.
Calculate 3 MD5 checksums corresponding to each part, i.e. the checksum
of the first 5MB, the second 5MB, and the last 4MB.
Then take the checksum of their concatenation.
Since MD5 checksums are hex representations of binary data, just make
sure you take the MD5 of the decoded binary concatenation, not of the
ASCII or UTF-8 encoded concatenation.
When that's done, add a hyphen and the number of parts to get the ETag.
show_progress (bool): Show progressbar in the console
588
-
remote_md5sum (str, optional): The md5sum of the remote file. If is None, the download of the file happens even if it already exists locally. Defaults to None.
592
+
remote_etag (str, optional): The ETag of the remote file. If is None, the download of the file happens even if it already exists locally. Defaults to `None`.
589
593
590
594
Raises:
591
595
NotImplementedError: Raised if unknown `download_type` is passed
"""Calculate ETag as in Object Storage (S3) of a local file.
34
+
35
+
ETag is a MD5. But for the multipart uploaded files, the MD5 is computed from the concatenation of the MD5s of each uploaded part.
36
+
37
+
See the inspiration of this implementation here: https://stackoverflow.com/a/58239738/1226137
38
+
39
+
Args:
40
+
filename (str): the local filename
41
+
part_size (int): the size of the Object Storage part. Most Object Storages use 8MB. Defaults to 8*1024*1024.
42
+
43
+
Returns:
44
+
str: the calculated ETag value
45
+
"""
46
+
withopen(filename, "rb") asf:
47
+
file_size=os.fstat(f.fileno()).st_size
48
+
49
+
iffile_size<=part_size:
50
+
BLOCKSIZE=65536
51
+
hasher=hashlib.md5()
52
+
53
+
buf=f.read(BLOCKSIZE)
54
+
whilelen(buf) >0:
55
+
hasher.update(buf)
56
+
buf=f.read(BLOCKSIZE)
57
+
58
+
returnhasher.hexdigest()
59
+
else:
60
+
# Say you uploaded a 14MB file and your part size is 5MB.
61
+
# Calculate 3 MD5 checksums corresponding to each part, i.e. the checksum of the first 5MB, the second 5MB, and the last 4MB.
62
+
# Then take the checksum of their concatenation.
63
+
# Since MD5 checksums are hex representations of binary data, just make sure you take the MD5 of the decoded binary concatenation, not of the ASCII or UTF-8 encoded concatenation.
64
+
# When that's done, add a hyphen and the number of parts to get the ETag.
0 commit comments