You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Added support for MinIO and B2 buckets
-Refactored SilNlpEnv in silnlp/common/environment.py to support connection to either MinIO or B2
-Kept in support for AWS temporarily
-Updated readme and other documentation to show instructions on MinIO and B2 bucket setup
* Updated clean_s3 to support MinIO
* Made 'minio' the default bucket_service
* If you do not intend to use SILNLP with ClearML and/or AWS, you can leave out the respective variables. If you need to generate ClearML credentials, see [ClearML setup](clear_ml_setup.md).
73
-
* Note that this does not give you direct access to an AWS S3 bucket from within the Docker container, it only allows you to run scripts referencing files in the bucket.
* Include SIL_NLP_DATA_PATH="/silnlp" if you are not using MinIO or B2 and will be storing files locally.
75
+
* If you do not intend to use SILNLP with ClearML, MinIO, and/or B2, you can leave out the respective variables. If you need to generate ClearML credentials, see [ClearML setup](clear_ml_setup.md).
76
+
* Note that this does not give you direct access to a MinIO or B2 bucket from within the Docker container, it only allows you to run scripts referencing files in the bucket.
74
77
75
78
6. Start container
76
79
@@ -129,22 +132,25 @@ These are the main requirements for the SILNLP code to run on a local machine. S
129
132
poetry install
130
133
```
131
134
132
-
10. If using ClearMLand/or AWS, set the following environment variables:
135
+
10. If using ClearML, MinIO, and/or B2, set the following environment variables:
* Include SIL_NLP_DATA_PATH="/silnlp" if you are not using MinIO or B2 and will be storing files locally.
142
148
* If you need to generate ClearML credentials, see [ClearML setup](clear_ml_setup.md).
143
-
* Note that this does not give you direct access to an AWS S3 bucket from within the Docker container, it only allows you to run scripts referencing files in the bucket.
149
+
* Note that this does not give you direct access to a MinIO or B2 bucket from within the Docker container, it only allows you to run scripts referencing files in the bucket.
144
150
* For instructions on how to permanently set up environment variables for your operating system, see the corresponding section under the Development Environment Setup header below.
145
151
146
-
11. If using AWS, there are two options:
147
-
* Option 1: Mount the bucket to your filesystem following the instructions under [Install and Configure Rclone](https://github.com/sillsdev/silnlp/blob/master/s3_bucket_setup.md#install-and-configure-rclone).
152
+
11. If using MinIO or B2, there are two options:
153
+
* Option 1: Mount the bucket to your filesystem following the instructions under [Install and Configure Rclone](https://github.com/sillsdev/silnlp/blob/master/bucket_setup.md#install-and-configure-rclone).
148
154
* Option 2: Create a local cache for the bucket following the instructions under [Create SILNLP cache](https://github.com/sillsdev/silnlp/blob/master/manual_setup.md#create-silnlp-cache).
149
155
150
156
## Development Environment Setup
@@ -177,7 +183,7 @@ Follow the instructions below to set up a Dev Container in VS Code. This is the
177
183
178
184
4. Define environment variables.
179
185
180
-
Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY. Additionally, set AWS_REGION. The typical value is "us-east-1".
186
+
Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, B2_KEY_ID, B2_APPLICATION_KEY. Also set MINIO_ENDPOINT_URL to https://truenas.psonet.languagetechnology.org:9000 and B2_ENDPOINT_URL to https://s3.us-east-005.backblazeb2.com with no quotations.
181
187
* Linux / macOS users: To set environment variables permanently, add each variable as a new line to the `.bashrc` file (Linux) or `.profile` file (macOS) in your home directory with the format
182
188
```
183
189
export VAR="VAL"
@@ -210,7 +216,7 @@ Follow the instructions below to set up a Dev Container in VS Code. This is the
210
216
10. Install and activate Poetry environment.
211
217
* In the VS Code terminal, run `poetry install` to install the necessary Python libraries, and then run `poetry shell` to enter the environment in the terminal.
212
218
213
-
11. (Optional) Locally mount the S3 bucket. This will allow you to interact directly with the S3 bucket from your local terminal (outside of the dev container). See instructions [here](s3_bucket_setup.md).
219
+
11. (Optional) Locally mount the MinIO and/or B2 bucket(s). This will allow you to interact directly with the bucket(s) from your local terminal (outside of the dev container). See instructions [here](bucket_setup.md).
214
220
215
221
To get back into the dev container and poetry environment each subsequent time, open the silnlp folder in VS Code, select the "Reopen in Container" option from the Remote Connection menu (bottom left corner), and use the `poetry shell` command in the terminal.
We use MinIO and Backblaze B2 storage for storing our experiment data. Here is some workspace setup to enable a decent workflow.
4
+
5
+
### Note For MinIO setup
6
+
7
+
In order to access the MinIO bucket locally, you must have a VPN connected to its network. If you need VPN access, please reach out to an SILNLP dev team member.
8
+
9
+
### Note For Backblaze B2 usage
10
+
11
+
Backblaze B2 is only used as a backup storage option when the MinIO bucket is unavailable or when running experiments from the ORU Titan Server.
12
+
13
+
### Install and configure rclone
14
+
15
+
**Windows**
16
+
17
+
The following will mount /silnlp on your B drive or /nlp-research on your M drive and allow you to explore, read and write.
18
+
* Install WinFsp: http://www.secfs.net/winfsp/rel/ (Click the button to "Download WinFsp Installer" not the "SSHFS-Win (x64)" installer)
* Unzip to your desktop (or some convient location).
21
+
* Add the folder that contains rclone.exe to your PATH environment variable.
22
+
* Take the `scripts/rclone/rclone.conf` file from this SILNLP repo and copy it to `~\AppData\Roaming\rclone` (creating folders if necessary)
23
+
* Add your credentials in the appropriate fields in `~\AppData\Roaming\rclone`
24
+
* Take the `scripts/rclone/mount_minio_to_m.bat` and `scripts/rclone/mount_b2_to_b.bat` file from this SILNLP repo and copy it to the folder that contains the unzipped rclone.
25
+
* Double-click either bat file. A command window should open and remain open. You should see something like, if running mount_minio_to_m.bat:
26
+
```
27
+
C:\Users\David\Software\rclone>call rclone mount --vfs-cache-mode full --use-server-modtime miniosilnlp:nlp-research M:
28
+
The service rclone has been started.
29
+
```
30
+
31
+
**Linux / macOS**
32
+
33
+
The following will mount /nlp-research to a M folder or /silnlp to a B folder in your home directory and allow you to explore, read and write.
34
+
* For macOS, first download and install macFUSE: https://osxfuse.github.io/
* Take the `scripts/rclone/rclone.conf` file from this SILNLP repo and copy it to `~/.config/rclone/rclone.conf` (creating folders if necessary)
37
+
* Add your credentials in the appropriate fields in `~/.config/rclone/rclone.conf`
38
+
* Create a folder called "M" or "B" in your user directory
39
+
* Run the following command for MinIO:
40
+
```
41
+
rclone mount --vfs-cache-mode full --use-server-modtime miniosilnlp:nlp-research ~/M
42
+
```
43
+
* OR run the following command for B2:
44
+
```
45
+
rclone mount --vfs-cache-mode full --use-server-modtime b2silnlp:silnlp ~/B
46
+
```
47
+
### To start M: and/or B: drive on start up
48
+
49
+
**Windows**
50
+
51
+
Put a shortcut to the mount_minio_to_m.bat and/or mount_b2_to_b.bat file in the Startup folder.
52
+
* In Windows Explorer put `shell:startup` in the address bar or open `C:\Users\<Username>\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup`
53
+
* Right click to add a new shortcut. Choose `mount_minio_to_m.bat` and/or `mount_b2_to_b.bat` as the target, you can leave the name as the default.
54
+
55
+
Now your MinIO or B2 bucket should be mounted as M: or B: drive, respectively, when you start Windows.
56
+
57
+
**Linux / macOS**
58
+
* Run `crontab -e`
59
+
* For MinIO, paste `@reboot rclone mount --vfs-cache-mode full --use-server-modtime miniosilnlp:nlp-research ~/M` into the file, save and exit
60
+
* For B2, paste `@reboot rclone mount --vfs-cache-mode full --use-server-modtime b2silnlp:silnlp ~/B` into the file, save and exit
61
+
* Reboot Linux / macOS
62
+
63
+
Now your MinIO or B2 bucket should be mounted as ~/M or ~/B respectively when you start Linux / macOS.
Copy file name to clipboardExpand all lines: manual_setup.md
+7-4Lines changed: 7 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,9 +73,9 @@ __Download and install__ the following before creating any projects or starting
73
73
"editor.formatOnSave": true,
74
74
```
75
75
76
-
### S3 bucket setup
76
+
### MinIO and/or B2 bucket(s) setup
77
77
78
-
See [S3 bucket setup](s3_bucket_setup.md).
78
+
See [Bucket setup](bucket_setup.md).
79
79
80
80
### ClearML setup
81
81
@@ -88,8 +88,11 @@ See [ClearML setup](clear_ml_setup.md).
88
88
* Create the directory "$HOME/.cache/silnlp/projects" and set the environment variable SIL_NLP_CACHE_PROJECT_DIR to that path.
89
89
90
90
### Additional Environment Variables
91
-
* Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY.
92
-
* Set SIL_NLP_DATA_PATH to "/silnlp" and CLEARML_API_HOST to "https://api.sil.hosted.allegro.ai".
91
+
* Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, MINIO_ACCESS_KEY, MINIO_SECRET_KEY B2_KEY_ID, B2_APPLICATION_KEY.
92
+
* Set SIL_NLP_DATA_PATH to "/silnlp" if you are not using MinIO or B2 and will be storing files locally.
93
+
* Set CLEARML_API_HOST to "https://api.sil.hosted.allegro.ai".
94
+
* Set MINIO_ENDPOINT_URL to https://truenas.psonet.languagetechnology.org:9000
95
+
* Set B2_ENDPOINT_URL to https://s3.us-east-005.backblazeb2.com
0 commit comments