You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
download and load a complete stackexchange project
using the '-s' switch, download the compressed file from _https://ia800107.us.archive.org/27/items/stackexchange/_, then, uncompress it and load all the files in the database. Add a '-n' switch to move the tables to a given schema
WARNING: since using the urllib.request module, set the script to use python3
- In some old dumps, the cases in the filenames are different.
20
20
- Execute in the current folder (in parallel, if desired):
21
-
-`python load_into_pg.py Badges`
22
-
-`python load_into_pg.py Posts`
23
-
-`python load_into_pg.py Tags` (not present in earliest dumps)
24
-
-`python load_into_pg.py Users`
25
-
-`python load_into_pg.py Votes`
26
-
-`python load_into_pg.py PostLinks`
27
-
-`python load_into_pg.py PostHistory`
28
-
-`python load_into_pg.py Comments`
21
+
-`python load_into_pg.py -t Badges`
22
+
-`python load_into_pg.py -t Posts`
23
+
-`python load_into_pg.py -t Tags` (not present in earliest dumps)
24
+
-`python load_into_pg.py -t Users`
25
+
-`python load_into_pg.py -t Votes`
26
+
-`python load_into_pg.py -t PostLinks`
27
+
-`python load_into_pg.py -t PostHistory`
28
+
-`python load_into_pg.py -t Comments`
29
29
- Finally, after all the initial tables have been created:
30
30
-`psql stackoverflow < ./sql/final_post.sql`
31
31
- If you used a different database name, make sure to use that instead of
@@ -34,6 +34,20 @@ Schema hints are taken from [a post on Meta.StackExchange](http://meta.stackexch
34
34
-`psql stackoverflow < ./sql/optional_post.sql`
35
35
- Again, remember to user the correct database name here, if not `stackoverflow`.
36
36
37
+
## Loading a complete stackexchange project
38
+
39
+
You can use the script to download a given stackexchange compressed file from [archive.org](https://ia800107.us.archive.org/27/items/stackexchange/) and load all the tables at once, using the `-s` switch.
40
+
41
+
You will need the `urllib` and `libarchive` modules.
42
+
43
+
If you give a schema name using the `-n` switch, all the tables will be moved to the given schema. This schema will be created in the script.
44
+
45
+
To load the _dba.stackexchange.com_ project in the `dba` schema, you would execute:
46
+
`./load_into_pg.py -s dba -n dba`
47
+
48
+
The paths are not changed in the final scripts `sql/final_post.sql` and `sql/optional_post.sql`. To run them, first set the _search_path_ to your schema name:
49
+
`SET search_path TO <myschema>;`
50
+
37
51
## Caveats and TODOs
38
52
39
53
- It prepares some indexes and views which may not be necessary for your analysis.
0 commit comments