-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3 support for dumpgenerator.py #331
base: master
Are you sure you want to change the base?
Conversation
Thanks for this patch :) It worked for me, too bad upstream is inactive... |
Doron Behar, 15/10/19 16:49:
Thanks for this patch :) It worked for me, too bad upstream is inactive...
Thank you for testing! Can you also test whether it makes some of the
Unicode bugs better or worse?
As long as the tests are broken I avoid merging things until the next
time I'm actively using dumpgenerator, but the bug reports offer plenty
of test cases. :)
|
I think, your CI checks fail because of the Python version in travis... And I'm not sure what Unicode bugs you are referring to.. |
Now I see: I tried to resume a previous download session and the @@ -1395,12 +1397,12 @@ def domain2prefix(config={}, session=None):
def loadConfig(config={}, configfilename=''):
""" Load config file """
- try:
- with open('%s/%s' % (config['path'], configfilename), 'r') as infile:
- config = pickle.load(infile)
- except:
- print ('There is no config file. we can\'t resume. Start a new dump.')
- sys.exit()
+ # try:
+ with open('%s/%s' % (config['path'], configfilename), 'r') as infile:
+ config = pickle.load(infile)
+ # except:
+ # print ('There is no config file. we can\'t resume. Start a new dump.')
+ # sys.exit()
return config And I got this error:
I took me a while to trace it down, naturally because a "catch all" |
This QA says to use |
Doron Behar, 15/10/19 19:01:
See this QA
Yes, clearly it's not ideal to catch all exceptions. It's just one of
many hacky shortcuts taken to be able to finish running dumpgenerator on
tens of thousands of wikis (<https://archive.org/details/wikiteam>). We
need help to fix, and most importantly test, the underlying issues on
thousands of wikis.
|
I've started testing this, but it's a can of worms. We need to test various kinds of inputs, but a lot of failures are surfaced even with a single wiki, with a single launch or XML/image resumption attempt. Also, wikitools and reverse_readlines don't like python3, while pickle doesn't like strings. Hmpf. I'm using Python 3.7.6, by the way. And yes, there are some files which need to be opened in binary mode for the way this was written, plus there are some errors of concatenation of bytes with non-bytes. I'm not entirely sure what was your intention. |
On the other hand, this rather simplistic change mostly works for me: nemobis@bcecfa2 |
This should add python 3 support to dumpgenerator.py without breaking python 2 behavior.