@@ -28,8 +28,8 @@ the background, and there's no \index{Google Sheets}Google Sheet,
28
28
\index{csv}CSV file, or half-baked database query in sight.
29
29
30
30
But that's a myth. If you're a data scientist putting your work in front
31
- of someone else's eyes, you are in production. And, I believe, if
32
- you're in production, this book is for you.
31
+ of someone else's eyes, you are in production. And, I believe, if you're
32
+ in production, this book is for you.
33
33
34
34
You may sensibly ask who I am to make such a proclamation.
35
35
@@ -45,8 +45,8 @@ science products more robust with open-source tooling and
45
45
\index{Posit}Posit's Professional Products.
46
46
47
47
I've seen organizations at every level of data science maturity. For
48
- some organizations, in production means a report that gets rendered
49
- and emailed around. For others, it means hosting a live app or dashboard
48
+ some organizations, in production means a report that gets rendered and
49
+ emailed around. For others, it means hosting a live app or dashboard
50
50
that people visit. For the most sophisticated, it means serving live
51
51
predictions to another service from a machine learning model via an
52
52
application programming interface (\index{API}API).
@@ -230,14 +230,14 @@ skills to get them into production when it's time.
230
230
231
231
To that end, this book is divided into three parts.
232
232
233
- [ Part 1] ( @ sec-1-intro) is about applying DevOps best practices to a data
233
+ [ Part 1] ( sec1/1-0-sec-intro.qmd# sec-1-intro) is about applying DevOps best practices to a data
234
234
science context. Adhering to these best practices will make it easier to
235
235
take projects into production and ensure their security and stability
236
236
once they're there. While these best practices are inspired by DevOps,
237
237
data science and data science projects are different enough from
238
238
general-purpose sofware engineering that some re-thinking is required.
239
239
240
- [ Part 2] ( @sec-2-intro ) is a walkthrough of basic concepts in IT
240
+ [ Part 2] ( sec2/2-0-sec-intro.qmd# @sec-2-intro) is a walkthrough of basic concepts in IT
241
241
Administration that will get you to the point of being able to host and
242
242
manage a basic data science environment. If you are a hobbyist or have
243
243
only a small data science team, this might make you able to operate
@@ -246,7 +246,7 @@ significant IT/Admin support, it will equip you with the vocabulary to
246
246
talk to the IT/Admins at your organization and some basic skills of how
247
247
to do IT/Admin tasks yourself.
248
248
249
- [ Part 3] ( @sec-3-intro ) is about how everything you learned in Part 2 is
249
+ [ Part 3] ( sec3/3-0-sec-intro.qmd# @sec-3-intro) is about how everything you learned in Part 2 is
250
250
inadequate at organizations that operate at enterprise scale. If Part 2
251
251
explains how to do IT/Admin tasks yourself, Part 3 is my attempt to
252
252
explain why you shouldn't.
@@ -298,10 +298,10 @@ length and mass, and we're going to build up an entire data science
298
298
environment dedicated to exploring that relationship.
299
299
300
300
The front end of this environment will be a website built with the
301
- \index{Quarto}Quarto publishing system. It will include an app for fetching penguin mass
302
- predictions from a machine learning model based on bill length and other
303
- features. The website will also have pages dedicated to exploratory data
304
- analysis and model building.
301
+ \index{Quarto}Quarto publishing system. It will include an app for
302
+ fetching penguin mass predictions from a machine learning model based on
303
+ bill length and other features. The website will also have pages
304
+ dedicated to exploratory data analysis and model building.
305
305
306
306
On the backend, we will build a data science workbench on an AWS
307
307
\index{EC2}EC2 instance where we can do this work. It will include
0 commit comments