Skip to content

[Bug]: Cannot properly save pages from web.archive.org - partial saves and blank page rendering #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DevScholar opened this issue Apr 14, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@DevScholar
Copy link

ArchiveWeb.page Version

v0.14.2

What did you expect to happen? What happened instead?

Description:
When using the extension to save pages from web.archive.org, I'm encountering several issues:

  1. Partial Saves: Only the initially loaded pages are saved.

  2. Blank Page Rendering:

    • When viewing saved pages through the extension's interface, the content briefly appears before turning completely blank.
video_20250414_163526_edit.mp4
  • Even when bypassing the iframe to view saved pages directly, images fail to display despite being present in the WACZ file. The image displays correctly when opened in a new tab.

Image

Step-by-step reproduction instructions

Steps to Reproduce:

  1. Visit any archived page on web.archive.org (e.g., https://web.archive.org/web/19961020014044/http://www.microsoft.com/)
  2. Use archiveweb.page extension to save the page
  3. Attempt to view the saved content:
    • Observe momentary display followed by blank screen in extension's viewer
    • Check WACZ file directly - images exist but won't render

Additional details

Environment:

  • Browser: Kiwi Browser 137.0.7337.0 on Android 15
@DevScholar DevScholar added the bug Something isn't working label Apr 14, 2025
@tw4l
Copy link
Member

tw4l commented Apr 21, 2025

Hi @DevScholar, yes our tools largely will have issues with archiving and replaying the replay of other web archives. This is a bit of a niche use case and not one that we explicitly try to support. It's especially tricky because of the rewriting of page content and resources like JavaScript that happens during replay of web archives.

@DevScholar
Copy link
Author

I think this happens because the scripts of web archive websites intercept resources' URLs and use a hardcoded mechanism to rewrite them into new URLs containing the archive site's own domain. Additionally, this extension does not override properties like document.location or handle resource loading in a special way. The archived webpage itself appears to treat the extension's URL as the archive site's URL, resulting in the generation of invalid new URLs.

As a side note, the extension also does not override the navigator.onLine property, which can cause minor behavioral differences in some webpages—though this is unrelated to the current issue.

@DevScholar DevScholar closed this as not planned Won't fix, can't repro, duplicate, stale Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants