Skip to content

How should we handle playback of redirects to the web archive itself? #591

Open
@anjackson

Description

@anjackson

Expected behavior

We've archived this page in the past: http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigprops.aspx

The 2008 copy works fine, but it's been replaced with a redirect to us, the UK Web Archive: https://www.webarchive.org.uk/wayback/archive/20140613220103/http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigprops.aspx

And since then, we've archived the redirect, so now the archive points at itself. This ends with a blank page (at least when using a more recent pywb, here: https://beta.webarchive.org.uk/wayback/archive/20140613220103mp_/http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigprops.aspx)

It should ideally somehow know those are self-redirects and drop them, rolling back to the 2008 version: http://beta.webarchive.org.uk/wayback/archive/cdx?url=http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigprops.aspx


EDIT to try and make what's going on clear: _The actual WARC response record has a Location header that points back the us, the UK Web Archive, i.e. we indexed a redirect to ourselves, because they put in redirects to us, but we kept archiving their pages.

Really, I guess we don't want to index responses that point to any web archive, so perhaps this is an indexing problem not a playback problem?


What actually happened

Blank page instead of 2008 instance.

Browser

All.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions