You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, we are using this cdx-indexer tool and found out that while replaying our Wacz files in Replayweb.page player, sometimes certain resources were not found, while they were present in the Warc files.
What turned out in our Warc files are CONNECT requests and these are not converted to a SURT. For example, url=distillery.wistia.com:443 remains after surt.surt(url) method call distillery.wistia.com:443. The Replayweb.page player checks whether the index.idx has a surt, using useSurt = prefix.indexOf(")/") > 0; in the MultiWacz.js. If by chance the last line has a CONNECT then this block is considered surt = false in the cdx. Then querying in the browser DB using the upperBound method does not work properly.
Hi, we are using this cdx-indexer tool and found out that while replaying our Wacz files in Replayweb.page player, sometimes certain resources were not found, while they were present in the Warc files.
What turned out in our Warc files are CONNECT requests and these are not converted to a SURT. For example, url=distillery.wistia.com:443 remains after
surt.surt(url)
method call distillery.wistia.com:443. The Replayweb.page player checks whether the index.idx has a surt, using useSurt = prefix.indexOf(")/") > 0; in the MultiWacz.js. If by chance the last line has a CONNECT then this block is considered surt = false in the cdx. Then querying in the browser DB using the upperBound method does not work properly.Given:
A warc file with:
When running the cdxj_indexer with the following parameters:
main.py -p -o index.idx -c index.cdx.gz -s -d -l 1024 small.warc
Then the result in de index is:
excepted:
com,wistia,distillery)/ 20220914144501 {"offset": 0, "length": 377, "digest": "sha256:b75ede157ec02f31a25126270771b287d1ccc42554c9678ebc2c1446249a554d"}
The text was updated successfully, but these errors were encountered: