forked from sbober/levitation-perl
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME
87 lines (60 loc) · 2.48 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
This is a Perl port of scy's levitation. It reads MediaWiki dump files
revision by revision and writes a data stream to stdout suitable for
git fast-import.
The first 1000 pages of the german Wikipedia and all their revisions
(about 390000) can be dumped in about 15 min on relatively moderate
hardware.
INSTALL:
You need at least Perl 5.10. The Perl interpreter has to be compiled
with threads support.
You also need a working C compiler for the inline SHA1 C function.
Although gcc 4.3 was specified by old versions of levitation-perl
(circa May 2010), gcc 4.5.2 also appears to work (at least for some
small wiki dumps).
You need zlib, for example the following should work under Debian/Ubuntu:
sudo apt-get install zlib1g-dev
You need the following modules and their dependencies from CPAN:
- Regexp::Common
- Inline
- JSON::XS
- Compress::Raw::Zlib
- Carp::Assert
- Devel::Size
- CDB_File
- XML::Bare >= 0.44
- Deep::Hash::Utils
- File::Path > 2.04 ? (2.08 is fine; it has to export make_path)
Some Linux distributions will already have the first set.
Under Debian / Ubuntu the following command should set you:
sudo apt-get install libregexp-common-perl \
libinline-perl libjson-xs-perl \
libcompress-raw-zlib-perl libcarp-assert-perl \
libdevel-size-perl
For Fedora, the equivalent is
sudo yum install perl-Regexp-Common \
perl-Inline perl-JSON-XS \
perl-Compress-Raw-Zlib perl-Carp-Assert
For the second set, you may be able to just run:
sudo cpan -i CDB_File
sudo cpan -i XML::Bare
sudo cpan -i Deep::Hash::Utils
USAGE:
First, initialize a git repository:
cd /tmp
mkdir blawiki
cd blawiki
git init
(Alternately, go to the working directory of an existing git repository where you want to import the dump, such as one from a previous run of levitation-perl).
Then, "levitate". This is a three-step process:
cat /path/to/blawiki-dump.xml | /path/to/levitation-perl/step1.pl
LC_ALL=C sort rev-table.txt > rev-sorted.txt
/path/to/levitation-perl/step2.pl | /path/to/levitation-perl/gfi.pl
Alternatively, you can just change to an empty directory and call the
"levitate" helper script with a path to a dump as parameter (may be
7z, bz2, gz or xml):
mkdir /tmp/blawiki
cd /tmp/blawiki
/path/to/levitation-perl/levitate /path/to/blawiki-dump...
Lots of progress information is printed to standard error, so it may be
best to redirect that to a file.
Have fun.