Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First commit with setup and DVC files #23

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
5 changes: 5 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[core]
analytics = false
remote = remote_storage
['remote "remote_storage"']
url = /home/mlops/dvc_remote
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
2 changes: 2 additions & 0 deletions data/prepared/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/train.csv
/test.csv
4 changes: 4 additions & 0 deletions data/prepared/test.csv.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: cbd4ba69ced15e40820a635e7e741627
size: 71491
path: test.csv
4 changes: 4 additions & 0 deletions data/prepared/train.csv.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 7f79cf9a4ab1316f7d7246d0b92ea16c
size: 178060
path: train.csv
2 changes: 2 additions & 0 deletions data/raw/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/train
/val
5 changes: 5 additions & 0 deletions data/raw/train.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
outs:
- md5: 7adc7abb69056f4d7afb512c78f2fce9.dir
size: 75309082
nfiles: 9470
path: train
5 changes: 5 additions & 0 deletions data/raw/val.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
outs:
- md5: 0ad4dcf197b452735726bf8d8777201d.dir
size: 31248080
nfiles: 3925
path: val
1 change: 1 addition & 0 deletions metrics/accuracy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"accuracy": 0.7490494296577946}
1 change: 1 addition & 0 deletions model/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/model.joblib
4 changes: 4 additions & 0 deletions model/model.joblib.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 22490d1b369e3f7423c5b6ebd4db4234
size: 241075
path: model.joblib
2 changes: 1 addition & 1 deletion src/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def load_data(data_path):
def main(repo_path):
train_csv_path = repo_path / "data/prepared/train.csv"
train_data, labels = load_data(train_csv_path)
sgd = SGDClassifier(max_iter=10)
sgd = SGDClassifier(max_iter=100)
trained_model = sgd.fit(train_data, labels)
dump(trained_model, repo_path / "model/model.joblib")

Expand Down