Skip to content

Commit

Permalink
custom dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
hervevillard committed Sep 11, 2024
1 parent ba9c505 commit 774181e
Show file tree
Hide file tree
Showing 3 changed files with 585 additions and 5 deletions.
116 changes: 111 additions & 5 deletions 03_pytorch_computer_vision.ipynb

Large diffs are not rendered by default.

229 changes: 229 additions & 0 deletions 04_custom_data_creation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PyTorch Custom Data Creation"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torchvision\n",
"import torchvision.datasets as datasets \n",
"import torchvision.transforms as transforms\n",
"\n",
"# setup directory\n",
"import pathlib\n",
"data_dir = pathlib.Path(\"../data\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Download data\n",
"\n",
"Get the Food101 dataset from PyTorch.\n",
"\n",
"Food101 in torchvision.datasets - https://pytorch.org/vision/stable/generated/torchvision.datasets.Food101.html\n",
"Original Food101 dataset - https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/\n",
"\n",
"Note: Downloading the dataset from PyTorch may take ~10-15 minutes depending on your internet speed. It will download ~5GB of data to the specified root directory.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading https://data.vision.ee.ethz.ch/cvl/food-101.tar.gz to ..\\data\\food-101.tar.gz\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" 47%|████▋ | 2346024960/4996278331 [03:34<04:53, 9022497.18it/s] "
]
}
],
"source": [
"# Get training data\n",
"train_data = datasets.Food101(root=data_dir,\n",
" split=\"train\",\n",
" #transform=transforms.ToTensor(),\n",
" download=True)\n",
"# Get testing \n",
"test_data = datasets.Food101(root=data_dir,\n",
" split=\"test\",\n",
" #transform=transforms.ToTensor(),\n",
" download=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class_names = train_data.classes\n",
"class_names[:10]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# View first sample (PIL image format)\n",
"print(class_names[train_data[0][1]])\n",
"train_data[0][0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Find subset of appropriate classes\n",
"\n",
"Want: Steak, pizza, sushi.\n",
"\n",
"Current path setup:\n",
"\n",
"../data/food-101/images/CLASS_NAME/IMAGES.jpg\n",
"\n",
"Going to get a list of the different target image classes (pizza, steak, sushi) filenames and then copy the images to separate folders.\n",
"\n",
"I'd like to get a random 10% of the images from the target classes from both datasets.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit 774181e

Please sign in to comment.