@@ -7,7 +7,8 @@ Schema hints are taken from [a post on Meta.StackExchange](http://meta.stackexch
7
7
## Dependencies
8
8
9
9
- [ ` lxml ` ] ( http://lxml.de/installation.html )
10
- - [ ` psychopg2 ` ] ( http://initd.org/psycopg/docs/install.html )
10
+ - [ ` psycopg2 ` ] ( http://initd.org/psycopg/docs/install.html )
11
+ - [ ` libarchive-c ` ] ( https://pypi.org/project/libarchive-c/ )
11
12
12
13
## Usage
13
14
@@ -18,14 +19,14 @@ Schema hints are taken from [a post on Meta.StackExchange](http://meta.stackexch
18
19
` Badges.xml ` , ` Votes.xml ` , ` Posts.xml ` , ` Users.xml ` , ` Tags.xml ` .
19
20
- In some old dumps, the cases in the filenames are different.
20
21
- Execute in the current folder (in parallel, if desired):
21
- - ` python load_into_pg.py Badges `
22
- - ` python load_into_pg.py Posts `
23
- - ` python load_into_pg.py Tags ` (not present in earliest dumps)
24
- - ` python load_into_pg.py Users `
25
- - ` python load_into_pg.py Votes `
26
- - ` python load_into_pg.py PostLinks `
27
- - ` python load_into_pg.py PostHistory `
28
- - ` python load_into_pg.py Comments `
22
+ - ` python load_into_pg.py -t Badges `
23
+ - ` python load_into_pg.py -t Posts `
24
+ - ` python load_into_pg.py -t Tags ` (not present in earliest dumps)
25
+ - ` python load_into_pg.py -t Users `
26
+ - ` python load_into_pg.py -t Votes `
27
+ - ` python load_into_pg.py -t PostLinks `
28
+ - ` python load_into_pg.py -t PostHistory `
29
+ - ` python load_into_pg.py -t Comments `
29
30
- Finally, after all the initial tables have been created:
30
31
- ` psql stackoverflow < ./sql/final_post.sql `
31
32
- If you used a different database name, make sure to use that instead of
@@ -34,7 +35,25 @@ Schema hints are taken from [a post on Meta.StackExchange](http://meta.stackexch
34
35
- ` psql stackoverflow < ./sql/optional_post.sql `
35
36
- Again, remember to user the correct database name here, if not ` stackoverflow ` .
36
37
37
- ## Caveats
38
+ ## Loading a complete stackexchange project
39
+
40
+ You can use the script to download a given stackexchange compressed file from
41
+ [ archive.org] ( https://ia800107.us.archive.org/27/items/stackexchange/ ) and load
42
+ all the tables at once, using the ` -s ` switch.
43
+
44
+ You will need the ` urllib ` and ` libarchive ` modules.
45
+
46
+ If you give a schema name using the ` -n ` switch, all the tables will be moved
47
+ to the given schema. This schema will be created in the script.
48
+
49
+ To load the _ dba.stackexchange.com_ project in the ` dba ` schema, you would execute:
50
+ ` ./load_into_pg.py -s dba -n dba `
51
+
52
+ The paths are not changed in the final scripts ` sql/final_post.sql ` and
53
+ ` sql/optional_post.sql ` . To run them, first set the _ search_path_ to your
54
+ schema name: ` SET search_path TO <myschema>; `
55
+
56
+ ## Caveats and TODOs
38
57
39
58
- It prepares some indexes and views which may not be necessary for your analysis.
40
59
- The ` Body ` field in ` Posts ` table is NOT populated by default. You have to use ` --with-post-body ` argument to include it.
0 commit comments