Skip to content

Commit 846e9ab

Browse files
authored
Merge pull request #1008 from dondi/beta
v6.0.4
2 parents 4105a20 + ca79174 commit 846e9ab

32 files changed

+1711
-788
lines changed

.eslintrc.yml

+2
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ rules:
4444
- error
4545
brace-style:
4646
- error
47+
- 1tbs
48+
- allowSingleLine: true
4749
comma-spacing:
4850
- error
4951
max-len:

.gitignore

+7
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# dotenv environment variables file
2+
.env
3+
.env.test
4+
15
lib-cov
26
*.seed
37
*.log
@@ -12,13 +16,16 @@ lib-cov
1216
documents/developer_documents/testing_script_generator/GRNsightTestingDocument.pdf
1317
web-client/public/js/grnsight.min.js
1418

19+
1520
pids
1621
logs
1722
results
1823
/.idea
1924

2025
database/network-database/script-results
2126
database/network-database/source-files
27+
database/expression-database/script-results
28+
database/expression-database/source-files
2229

2330
npm-debug.log
2431
node_modules

database/README.md

+87-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,87 @@
1-
Here are the files pertaining to both the network and expression databases. Look within the README.md files of both folders for information pertinent to the schema that you intend to be using.
1+
# GRNsight Database
2+
Here are the files pertaining to both the network and expression databases. Look within the README.md files of both folders for information pertinent to the schema that you intend to be using.
3+
## Setting up a local postgres GRNsight Database
4+
1. Installing PostgreSQL on your computer
5+
- MacOS and Windows can follow these [instructions](https://dondi.lmu.build/share/db/postgresql-setup-day.pdf) on how to install postgreSQL.
6+
- Step 1 tells you how to install postgreSQL on your local machine, initialize a database, and how to start and stop running your database instance.
7+
- If your terminal emits a message that looks like `initdb --locale=C -E UTF-8 location-of-cluster` from Step 1B, then your installer has initialized a database for you.
8+
- Additionally, your installer may start the server for you upon installation. To start the server yourself run `pg_ctl start -D location-of-cluster`. To stop the server run `pg_ctl stop -D location-of-cluster`.
9+
- Linux users
10+
- The MacOS and Windows instructions will _probably_ not work for you. You can try at your own risk to check.
11+
- Linux users can try these [instructions](https://www.geeksforgeeks.org/install-postgresql-on-linux/) and that should work for you (...maybe...). If it doesn't try googling instructions with your specific operating system. Sorry!
12+
2. Loading data to your database
13+
1. Adding the Schemas to your database.
14+
1. Go into your database using the following command:
15+
16+
```
17+
psql postgresql://localhost/postgres
18+
```
19+
20+
From there, create the schemas using the following commands:
21+
22+
```
23+
CREATE SCHEMA spring2022_network;
24+
```
25+
26+
```
27+
CREATE SCHEMA fall2021;
28+
```
29+
30+
Once they are created you can exit your database using the command `\q`.
31+
2. Once your schema's are created, you can add the table specifications using the following commands:
32+
33+
```
34+
psql postgresql://localhost/postgres -f <path to GRNsight/database/network-database>/schema.sql
35+
```
36+
37+
```
38+
psql postgresql://localhost/postgres -f <path to GRNsight/database/expression-database>/schema.sql
39+
```
40+
41+
Your database is now ready to accept expression and network data!
42+
43+
2. Loading the GRNsight Network Data to your local database
44+
1. GRNsight generates Network Data from SGD through YeastMine. In order to run the script that generates these Network files, you must pip3 install the dependencies used. If you get an error saying that a module doesn't exist, just run `pip3 install <Module Name>` and it should fix the error. If the error persists and is found in a specific file on your machine, you might have to manually go into that file and alter the naming conventions of the dependencies that are used. _Note: So far this issue has only occured on Ubuntu 22.04.1, so you might be lucky and not have to do it!_
45+
46+
```
47+
pip3 install pandas requests intermine tzlocal
48+
```
49+
50+
Once the dependencies have been installed, you can run
51+
52+
```
53+
python3 <path to GRNsight/database/network-database/scripts>/generate_network.py
54+
```
55+
56+
This will take a while to get all of the network data and generate all of the files. This will create a folder full of the processed files in `database/network-database/script-results`.
57+
58+
2. Load the processed files into your database.
59+
60+
```
61+
python3 <path to GRNsight/database/network-database/scripts>/loader.py | psql postgresql://localhost/postgres
62+
```
63+
64+
This should output a bunch of COPY print statements to your terminal. Once complete your database is now loaded with the network data.
65+
66+
3. Loading the GRNsight Expression Data to your local database
67+
1. Create a directory (aka folder) in the database/expression-database folder called `source-files`.
68+
69+
```
70+
mkdir <path to GRNsight/database/expression-database>/source-files
71+
```
72+
73+
2. Download the _"Expression 2020"_ folder from Box located in `GRNsight > GRNsight Expression > Expression 2020` to your newly created `source-files` folder
74+
3. Run the pre-processing script on the data. This will create a folder full of the processed files in `database/expression-database/script-results`.
75+
76+
```
77+
python3 <path to GRNsight/database/expression-database/scripts>/preprocessing.py
78+
```
79+
80+
4. Load the processed files into your database.
81+
82+
```
83+
python3 <path to GRNsight/database/expression-database/scripts>/loader.py | psql postgresql://localhost/postgres
84+
```
85+
86+
This should output a bunch of COPY print statements to your terminal. Once complete your database is now loaded with the expression data.
87+
+60
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Expression Database
2+
3+
All files pertaining the expression database live within this directory.
4+
5+
## The basics
6+
7+
#### Schema
8+
9+
All network data is stored within the fall2021 schema on our Postgres database.
10+
11+
The schema is located within this directory at the top level in the file `schema.sql`. It defines the tables located within the fall2021 schema.
12+
13+
Usage:
14+
To load to local database
15+
```
16+
psql postgresql://localhost/postgres -f schema.sql
17+
```
18+
To load to production database
19+
```
20+
psql <address to database> -f schema.sql
21+
```
22+
23+
### Scripts
24+
25+
All scripts live within the subdirectory `scripts`, located in the top-level of the network database directory.
26+
27+
Any source files required to run the scripts live within the subdirectory `source-files`, located in the top-level of the network database directory. As source files may be large, you must create this directory yourself and add any source files you need to use there.
28+
29+
All generated results of the scripts live in the subdirectory `script-results`, located in the top-level of the network database directory. Currently, all scripts that generate code create the directory if it does not currently exist. When adding a new script that generates resulting code, best practice is to create the script-results directory and any subdirectories if it does not exist, in order to prevent errors and snafus for recently cloned repositories.
30+
31+
Within the scripts directory, there are the following files:
32+
33+
- `preprocessing.py`
34+
- `loader.py`
35+
36+
#### Data Preprocessor(s)
37+
*Note: Data Preprocessing is always specific to each dataset that you obtain. `preprocessing.py` is capable of preprocessing the specific Expression data files located in `source-files/Expression 2020`. Because these files are too large to be stored on github, access the direct source files on BOX and move them into this directory. If more data sources are to be added in the database, create a new directory in source-files for it, note it in this `README.md` file and create a new preprocessing script for that data source (if required). Please document the changes in this section so that future developers may use your work to recreate the database if ever required.*
38+
39+
* The script (`preprocessing.py`) is used to preprocess the data in `source-files/Expression 2020`. It parses through each file to construct the processed loader files, so that they are ready to load using `loader.py`. Please read through the code, as there are instructions on what to add within the comments. Good luck!
40+
* The resulting processed loader files are located in `script-results/processed-expression` and the resulting processed loader files are located within `script-results/processed-loader-files`
41+
42+
Usage:
43+
```
44+
python3 preprocessing.py
45+
```
46+
#### Database Loader
47+
48+
This script (`loader.py`) is to be used to load your preprocessed expression data into the database.
49+
50+
This program generates direct SQL statements from the source files generated by the data preprocessor in order to populate a relational database with those files’ data
51+
52+
Usage:
53+
To load to local database
54+
```
55+
python3 loader.py | psql postgresql://localhost/postgres
56+
```
57+
To load to production database
58+
```
59+
python3 loader.py | psql <path to database>
60+
```
+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
CREATE TABLE fall2021.ref (
2+
pubmed_id VARCHAR,
3+
authors VARCHAR,
4+
publication_year VARCHAR,
5+
title VARCHAR,
6+
doi VARCHAR,
7+
ncbi_geo_id VARCHAR,
8+
PRIMARY KEY(ncbi_geo_id, pubmed_id)
9+
);
10+
11+
CREATE TABLE fall2021.gene (
12+
gene_id VARCHAR, -- systematic like name
13+
display_gene_id VARCHAR, -- standard like name
14+
species VARCHAR,
15+
taxon_id VARCHAR,
16+
PRIMARY KEY(gene_id, taxon_id)
17+
);
18+
19+
CREATE TABLE fall2021.expression_metadata (
20+
ncbi_geo_id VARCHAR,
21+
pubmed_id VARCHAR,
22+
FOREIGN KEY (ncbi_geo_id, pubmed_id) REFERENCES fall2021.ref(ncbi_geo_id, pubmed_id),
23+
control_yeast_strain VARCHAR,
24+
treatment_yeast_strain VARCHAR,
25+
control VARCHAR,
26+
treatment VARCHAR,
27+
concentration_value FLOAT,
28+
concentration_unit VARCHAR,
29+
time_value FLOAT,
30+
time_unit VARCHAR,
31+
number_of_replicates INT,
32+
expression_table VARCHAR,
33+
display_expression_table VARCHAR,
34+
PRIMARY KEY(ncbi_geo_id, pubmed_id, time_value)
35+
);
36+
CREATE TABLE fall2021.expression (
37+
gene_id VARCHAR,
38+
taxon_id VARCHAR,
39+
FOREIGN KEY (gene_id, taxon_id) REFERENCES fall2021.gene(gene_id, taxon_id),
40+
-- ncbi_geo_id VARCHAR,
41+
-- pubmed_id VARCHAR,
42+
sort_index INT,
43+
sample_id VARCHAR,
44+
expression FLOAT,
45+
time_point FLOAT,
46+
dataset VARCHAR,
47+
PRIMARY KEY(gene_id, sample_id)
48+
-- FOREIGN KEY (ncbi_geo_id, pubmed_id, time_point) REFERENCES fall2021.expression_metadata(ncbi_geo_id, pubmed_id, time_value)
49+
);
50+
CREATE TABLE fall2021.degradation_rate (
51+
gene_id VARCHAR,
52+
taxon_id VARCHAR,
53+
FOREIGN KEY (gene_id, taxon_id) REFERENCES fall2021.gene(gene_id, taxon_id),
54+
ncbi_geo_id VARCHAR,
55+
pubmed_id VARCHAR,
56+
FOREIGN KEY (ncbi_geo_id, pubmed_id) REFERENCES fall2021.ref(ncbi_geo_id, pubmed_id),
57+
PRIMARY KEY(gene_id, ncbi_geo_id, pubmed_id),
58+
degradation_rate FLOAT
59+
);
60+
61+
CREATE TABLE fall2021.production_rate (
62+
gene_id VARCHAR,
63+
taxon_id VARCHAR,
64+
FOREIGN KEY (gene_id, taxon_id) REFERENCES fall2021.gene(gene_id, taxon_id),
65+
ncbi_geo_id VARCHAR,
66+
pubmed_id VARCHAR,
67+
FOREIGN KEY (ncbi_geo_id, pubmed_id) REFERENCES fall2021.ref(ncbi_geo_id, pubmed_id),
68+
PRIMARY KEY(gene_id, ncbi_geo_id, pubmed_id),
69+
production_rate FLOAT
70+
-- FOREIGN KEY (gene_id, ncbi_geo_id, pubmed_id) REFERENCES fall2021.degradation_rate(gene_id, ncbi_geo_id, pubmed_id) -- not sure if we want to link the generated production rate to it's original degradation rate
71+
);

0 commit comments

Comments
 (0)