-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Description
Kaggle supplies many datasets, most are in CSV format.
Does adding the feature of directly downloading Kaggle datasets in MLDatasets.jl make any sense?
For example, to download House Prices 2023 Dataset:
Step1: Get kaggle.json file or set the username and key manually.
username = "neroblackstone"
key = "key"or download keggle.json to ~/.kaggle/
Step2: Download
# download dataset to default path and extract csv.
files_path = keggle_download("howisusmanali/house-prices-2023-dataset")Step3: Processing
using CSV
using DataFrames
file_path = joinpath(files_path,"csv_we_want.csv")
data = CSV.read(open(file_path),DataFrame)Implementation:
- Pycall KaggleAPI, a little heavy
- Or use Julia to request Kaggle rest API, this is more lightweight but a bit harder to implement.
What's your thought, do you think this feature makes sense?
I can implement this by myself and make a PR.
Metadata
Metadata
Assignees
Labels
No labels