diff --git a/_freeze/python/accessing-and-managing-financial-data/execute-results/html.json b/_freeze/python/accessing-and-managing-financial-data/execute-results/html.json
index 1e822669..4978b5db 100644
--- a/_freeze/python/accessing-and-managing-financial-data/execute-results/html.json
+++ b/_freeze/python/accessing-and-managing-financial-data/execute-results/html.json
@@ -1,15 +1,15 @@
 {
-  "hash": "9fd328b50bbd06951670e009c36376c7",
+  "hash": "7f6920c8504bdf5ab2869b676468fe9a",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Accessing and Managing Financial Data\nmetadata:\n  pagetitle: Accessing and Managing Financial Data with Python\n  description-meta: Download and organize open-source financial data using the programming language Python. \n---\n\n\n\n::: callout-note\nYou are reading **Tidy Finance with Python**. You can find the equivalent chapter for the sibling **Tidy Finance with R** [here](../r/accessing-and-managing-financial-data.qmd).\n:::\n\nIn this chapter, we suggest a way to organize your financial data. Everybody who has experience with data is also familiar with storing data in various formats like CSV, XLS, XLSX, or other delimited value storage. Reading and saving data can become very cumbersome when using different data formats and across different projects. Moreover, storing data in delimited files often leads to problems with respect to column type consistency. For instance, date-type columns frequently lead to inconsistencies across different data formats and programming languages. \n\nThis chapter shows how to import different open-source datasets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series.\\index{API} We store all the data in a *single* database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.\\index{Database}\n\nFirst, we load the Python packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them. \n\n::: {#52142987 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nimport tidyfinance as tf\n```\n:::\n\n\nMoreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.\n\n::: {#dd94c04a .cell execution_count=3}\n``` {.python .cell-code}\nstart_date = \"1960-01-01\"\nend_date = \"2024-12-31\"\n```\n:::\n\n\n## Fama-French Data\n\nWe start by downloading some famous Fama-French factors [e.g., @Fama1993] and portfolio returns commonly used in empirical asset pricing. Fortunately, the `pandas-datareader` package provides a simple interface to read data from Kenneth French's Data Library.\\index{Data!Fama-French factors}\\index{Kenneth French homepage}\n\n::: {#4b8d1ab2 .cell execution_count=4}\n``` {.python .cell-code}\nimport pandas_datareader as pdr\n```\n:::\n\n\nWe can use the `pdr.DataReader()` function of the package to download monthly Fama-French factors. The set *Fama/French 3 Factors* contains the return time series of the market (`mkt_excess`), size (`smb`), and value (`hml`) factors alongside the risk-free rates (`rf`). Note that we have to do some manual work to parse all the columns correctly and scale them appropriately, as the raw Fama-French data comes in a unique data format. For precise descriptions of the variables, we suggest consulting Prof. Kenneth French's finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to`pandas_datareader`.\\index{Factor!Market}\\index{Factor!Size}\\index{Factor!Value}\\index{Factor!Profitability}\\index{Factor!Investment}\\index{Risk-free rate}\n\n::: {#dbee5fef .cell execution_count=5}\n``` {.python .cell-code}\nfactors_ff3_monthly_raw = pdr.DataReader(\n  name=\"F-F_Research_Data_Factors\",\n  data_source=\"famafrench\", \n  start=start_date, \n  end=end_date)[0]\n\nfactors_ff3_monthly = (factors_ff3_monthly_raw\n  .divide(100)\n  .reset_index(names=\"date\")\n  .assign(date=lambda x: pd.to_datetime(x[\"date\"].astype(str)))\n  .rename(str.lower, axis=\"columns\")\n  .rename(columns={\"mkt-rf\": \"mkt_excess\"})\n)\n```\n:::\n\n\nWe also download the set *5 Factors (2x3)*, which additionally includes the return time series of the profitability (`rmw`) and investment (`cma`) factors. We demonstrate how the monthly factors are constructed in [Replicating Fama and French Factors](replicating-fama-and-french-factors.qmd).\n\n::: {#9e9e1781 .cell execution_count=6}\n``` {.python .cell-code}\nfactors_ff5_monthly_raw = pdr.DataReader(\n  name=\"F-F_Research_Data_5_Factors_2x3\",\n  data_source=\"famafrench\", \n  start=start_date, \n  end=end_date)[0]\n\nfactors_ff5_monthly = (factors_ff5_monthly_raw\n  .divide(100)\n  .reset_index(names=\"date\")\n  .assign(date=lambda x: pd.to_datetime(x[\"date\"].astype(str)))\n  .rename(str.lower, axis=\"columns\")\n  .rename(columns={\"mkt-rf\": \"mkt_excess\"})\n)\n```\n:::\n\n\nIt is straightforward to download the corresponding *daily* Fama-French factors with the same function. \n\n::: {#f0848293 .cell execution_count=7}\n``` {.python .cell-code}\nfactors_ff3_daily_raw = pdr.DataReader(\n  name=\"F-F_Research_Data_Factors_daily\",\n  data_source=\"famafrench\", \n  start=start_date, \n  end=end_date)[0]\n\nfactors_ff3_daily = (factors_ff3_daily_raw\n  .divide(100)\n  .reset_index(names=\"date\")\n  .rename(str.lower, axis=\"columns\")\n  .rename(columns={\"mkt-rf\": \"mkt_excess\"})\n)\n```\n:::\n\n\nIn a subsequent chapter, we also use the monthly returns from ten industry portfolios, so let us fetch that data, too.\\index{Data!Industry portfolios}\n\n::: {#cbe8cb33 .cell execution_count=8}\n``` {.python .cell-code}\nindustries_ff_monthly_raw = pdr.DataReader(\n  name=\"10_Industry_Portfolios\",\n  data_source=\"famafrench\", \n  start=start_date, \n  end=end_date)[0]\n\nindustries_ff_monthly = (industries_ff_monthly_raw\n  .divide(100)\n  .reset_index(names=\"date\")\n  .assign(date=lambda x: pd.to_datetime(x[\"date\"].astype(str)))\n  .rename(str.lower, axis=\"columns\")\n)\n```\n:::\n\n\nIt is worth taking a look at all available portfolio return time series from Kenneth French's homepage. You should check out the other sets by calling `pdr.famafrench.get_available_datasets()`.\n\nTo automatically download and process Fama-French data, you can also use the `tidyfinance` package with `domain=\"factors_ff\"` and the corresponding dataset, e.g.:\n\n::: {#557adcfb .cell execution_count=9}\n``` {.python .cell-code}\ntf.download_data(\n  domain=\"factors_ff\",\n  dataset=\"F-F_Research_Data_Factors\", \n  start_date=start_date, \n  end_date=end_date\n)\n```\n:::\n\n\nThe `tidyfinance` package implements the processing steps as above and returns the same cleaned data frame. \n\n## q-Factors\n\nIn recent years, the academic discourse experienced the rise of alternative factor models, e.g., in the form of the @Hou2015 *q*-factor model. We refer to the [extended background](http://global-q.org/background.html) information provided by the original authors for further information. The *q*-factors can be downloaded directly from the authors' homepage from within `pd.read_csv()`. \\index{Data!q-factors}\\index{Factor!q-factors}\n\nWe also need to adjust this data. First, we discard information we will not use in the remainder of the book. Then, we rename the columns with the \"R_\"-prescript using regular expressions and write all column names in lowercase. We then query the data to select observations between the start and end dates. Finally, we use the double asterisk (`**`) notation in the `assign` function to apply the same transform of dividing by 100 to all four factors by iterating through them. You should always try sticking to a consistent style for naming objects, which we try to illustrate here - the emphasis is on *try*. You can check out style guides available online, e.g., [Hadley Wickham's `tidyverse` style guide.](https://style.tidyverse.org/index.html)\\index{Style guide} note that we temporarily adjust the SSL certificate handling behavior in Python’s \n`ssl` module when retrieving the $q$-factors directly from the web, as demonstrated in [Working with Stock Returns](working-with-stock-returns.qmd). This method should be used with caution, which is why we restore the default settings immediately after successfully downloading the data.\n\n::: {#1bcb6c34 .cell execution_count=10}\n``` {.python .cell-code}\nimport ssl\nssl._create_default_https_context = ssl._create_unverified_context\n\nfactors_q_monthly_link = (\n  \"https://global-q.org/uploads/1/2/2/6/122679606/\"\n  \"q5_factors_monthly_2024.csv\"\n)\n\nfactors_q_monthly = (pd.read_csv(factors_q_monthly_link)\n  .assign(\n    date=lambda x: (\n      pd.to_datetime(x[\"year\"].astype(str) + \"-\" +\n        x[\"month\"].astype(str) + \"-01\"))\n  )\n  .drop(columns=[\"R_F\", \"R_MKT\", \"year\"])\n  .rename(columns=lambda x: x.replace(\"R_\", \"\").lower())\n  .query(f\"date >= '{start_date}' and date <= '{end_date}'\")\n  .assign(\n    **{col: lambda x: x[col]/100 for col in [\"me\", \"ia\", \"roe\", \"eg\"]}\n  )\n)\n\nssl._create_default_https_context = ssl.create_default_context\n```\n:::\n\n\nAgain, you can use the `tidyfinance` package for a shortcut:\n\n::: {#5cc217ab .cell execution_count=11}\n``` {.python .cell-code}\ntf.download_data(\n  domain=\"factors_q\",\n  dataset=\"q5_factors_monthly\", \n  start_date=start_date, \n  end_date=end_date\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>risk_free</th>\n      <th>mkt_excess</th>\n      <th>me</th>\n      <th>ia</th>\n      <th>roe</th>\n      <th>eg</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1967-01-01</td>\n      <td>0.003927</td>\n      <td>0.081852</td>\n      <td>0.068122</td>\n      <td>-0.029263</td>\n      <td>0.018813</td>\n      <td>-0.025511</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1967-02-01</td>\n      <td>0.003743</td>\n      <td>0.007557</td>\n      <td>0.016235</td>\n      <td>-0.002915</td>\n      <td>0.035399</td>\n      <td>0.021792</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1967-03-01</td>\n      <td>0.003693</td>\n      <td>0.040169</td>\n      <td>0.019836</td>\n      <td>-0.016772</td>\n      <td>0.018417</td>\n      <td>-0.011192</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1967-04-01</td>\n      <td>0.003344</td>\n      <td>0.038786</td>\n      <td>-0.006700</td>\n      <td>-0.028972</td>\n      <td>0.010253</td>\n      <td>-0.016371</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1967-05-01</td>\n      <td>0.003126</td>\n      <td>-0.042807</td>\n      <td>0.027457</td>\n      <td>0.021864</td>\n      <td>0.005901</td>\n      <td>0.001191</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>691</th>\n      <td>2024-08-01</td>\n      <td>0.004419</td>\n      <td>0.016518</td>\n      <td>-0.040817</td>\n      <td>0.004687</td>\n      <td>0.018369</td>\n      <td>0.008116</td>\n    </tr>\n    <tr>\n      <th>692</th>\n      <td>2024-09-01</td>\n      <td>0.004619</td>\n      <td>0.016806</td>\n      <td>-0.011967</td>\n      <td>-0.000010</td>\n      <td>0.007408</td>\n      <td>-0.032810</td>\n    </tr>\n    <tr>\n      <th>693</th>\n      <td>2024-10-01</td>\n      <td>0.003907</td>\n      <td>-0.009701</td>\n      <td>-0.011261</td>\n      <td>-0.011676</td>\n      <td>-0.002314</td>\n      <td>-0.008335</td>\n    </tr>\n    <tr>\n      <th>694</th>\n      <td>2024-11-01</td>\n      <td>0.003955</td>\n      <td>0.065002</td>\n      <td>0.043985</td>\n      <td>-0.049491</td>\n      <td>-0.015370</td>\n      <td>-0.021420</td>\n    </tr>\n    <tr>\n      <th>695</th>\n      <td>2024-12-01</td>\n      <td>0.003663</td>\n      <td>-0.031637</td>\n      <td>-0.051564</td>\n      <td>-0.003684</td>\n      <td>-0.021442</td>\n      <td>0.049624</td>\n    </tr>\n  </tbody>\n</table>\n<p>696 rows × 7 columns</p>\n</div>\n```\n:::\n:::\n\n\n## Macroeconomic Predictors\n\nOur next data source is a set of macroeconomic variables often used as predictors for the equity premium. @Goyal2008 comprehensively reexamine the performance of variables suggested by the academic literature to be good predictors of the equity premium. The authors host the data on [Amit Goyal's website.](https://sites.google.com/view/agoyal145) Since the data is an XLSX-file stored on a public Google Drive location, we need additional packages to access the data directly from our Python session. Usually, you need to authenticate if you interact with Google drive directly in Python. Since the data is stored via a public link, we can proceed without any authentication.\\index{Google Drive}\n\n::: {#7bd33a2b .cell execution_count=12}\n``` {.python .cell-code}\nsheet_id = \"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG\"\nsheet_name = \"macro_predictors.xlsx\"\nmacro_predictors_link = (\n  f\"https://docs.google.com/spreadsheets/d/{sheet_id}\" \n  f\"/gviz/tq?tqx=out:csv&sheet={sheet_name}\"\n)\n```\n:::\n\n\nNext, we read in the new data and transform the columns into the variables that we later use:\n\n1. The dividend price ratio (`dp`), the difference between the log of dividends and the log of prices, where dividends are 12-month moving sums of dividends paid on the S&P 500 index, and prices are monthly averages of daily closing prices [@Campbell1988; @Campbell2006]. \n1. Dividend yield (`dy`), the difference between the log of dividends and the log of lagged prices [@Ball1978]. \n1. Earnings price ratio (`ep`), the difference between the log of earnings and the log of prices, where earnings are 12-month moving sums of earnings on the S&P 500 index [@Campbell1988]. \n1. Dividend payout ratio (`de`), the difference between the log of dividends and the log of earnings [@Lamont1998]. \n1. Stock variance (`svar`), the sum of squared daily returns on the S&P 500 index [@Guo2006].\n1. Book-to-market ratio (`bm`), the ratio of book value to market value for the Dow Jones Industrial Average [@Kothari1997].\n1. Net equity expansion (`ntis`), the ratio of 12-month moving sums of net issues by NYSE listed stocks divided by the total end-of-year market capitalization of NYSE stocks [@Campbell2008].\n1. Treasury bills (`tbl`), the 3-Month Treasury Bill: Secondary Market Rate from the economic research database at the Federal Reserve Bank at St. Louis [@Campbell1987].\n1. Long-term yield (`lty`), the long-term government bond yield from Ibbotson's Stocks, Bonds, Bills, and Inflation Yearbook [@Goyal2008].\n1. Long-term rate of returns (`ltr`), the long-term government bond returns from Ibbotson's Stocks, Bonds, Bills, and Inflation Yearbook [@Goyal2008].\n1. Term spread (`tms`), the difference between the long-term yield on government bonds and the Treasury bill [@Campbell1987].\n1. Default yield spread (`dfy`), the difference between BAA and AAA-rated corporate bond yields [@Fama1989]. \n1. Inflation (`infl`), the Consumer Price Index (All Urban Consumers) from the Bureau of Labor Statistics [@Campbell2004].\n\t\t\t\nFor variable definitions and the required data transformations, you can consult the material on [Amit Goyal's website.](https://sites.google.com/view/agoyal145)\n\n::: {#3fc6df88 .cell execution_count=13}\n``` {.python .cell-code}\nssl._create_default_https_context = ssl._create_unverified_context\n\nmacro_predictors = (\n  pd.read_csv(macro_predictors_link, thousands=\",\")\n  .assign(\n    date=lambda x: pd.to_datetime(x[\"yyyymm\"], format=\"%Y%m\"),\n    dp=lambda x: np.log(x[\"D12\"])-np.log(x[\"Index\"]),\n    dy=lambda x: np.log(x[\"D12\"])-np.log(x[\"Index\"].shift(1)),\n    ep=lambda x: np.log(x[\"E12\"])-np.log(x[\"Index\"]),\n    de=lambda x: np.log(x[\"D12\"])-np.log(x[\"E12\"]),\n    tms=lambda x: x[\"lty\"]-x[\"tbl\"],\n    dfy=lambda x: x[\"BAA\"]-x[\"AAA\"]\n  )\n  .rename(columns={\"b/m\": \"bm\"})\n  .get([\"date\", \"dp\", \"dy\", \"ep\", \"de\", \"svar\", \"bm\", \n        \"ntis\", \"tbl\", \"lty\", \"ltr\", \"tms\", \"dfy\", \"infl\"])\n  .query(\"date >= @start_date and date <= @end_date\")\n  .dropna()\n)\n\nssl._create_default_https_context = ssl.create_default_context\n```\n:::\n\n\nTo get the equivalent data through `tidyfinance`, you can call:\n\n::: {#5f267096 .cell execution_count=14}\n``` {.python .cell-code}\ntf.download_data(\n  domain=\"macro_predictors\",\n  dataset=\"monthly\",\n  start_date=start_date, \n  end_date=end_date\n)\n```\n:::\n\n\n## Other Macroeconomic Data\n\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. As an illustration, we use the already familiar `pandas-datareader` package to fetch consumer price index (CPI) data that can be found under the [CPIAUCNS](https://fred.stlouisfed.org/series/CPIAUCNS) key.\\index{Data!FRED}\\index{Data!CPI}\n\n::: {#b0515f3b .cell execution_count=15}\n``` {.python .cell-code}\ncpi_monthly = (pdr.DataReader(\n    name=\"CPIAUCNS\", \n    data_source=\"fred\", \n    start=start_date, \n    end=end_date\n  )\n  .reset_index(names=\"date\")\n  .rename(columns={\"CPIAUCNS\": \"cpi\"})\n  .assign(cpi=lambda x: x[\"cpi\"] / x[\"cpi\"].iloc[-1])\n)\n```\n:::\n\n\nNote that we use the `assign()` in the last line to set the current (latest) price level as the reference inflation level. To download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the [PCU2122212122210](https://fred.stlouisfed.org/series/PCU2122212122210) key.\n\nThe `tidyfinance` package can, of course, also fetch the same daily data and many more data series:\n\n::: {#ceef174a .cell execution_count=16}\n``` {.python .cell-code}\ntf.download_data(\n  domain=\"fred\",\n  series=\"CPIAUCNS\", \n  start_date=start_date, \n  end_date=end_date\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nFailed to retrieve data for series CPIAUCNS: Failed to perform, curl: (6) Could not resolve host: https. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.\nFailed to retrieve data for series CPIAUCNS: 'date'\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>value</th>\n      <th>series</th>\n    </tr>\n  </thead>\n  <tbody>\n  </tbody>\n</table>\n</div>\n```\n:::\n:::\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the [PCU2122212122210](https://fred.stlouisfed.org/series/PCU2122212122210) key. If your desired time series is not supported through tidyfinance, we recommend working with the `fredapi` package. Note that you need to get an API key to use its functionality. We refer to the package documentation for details.\n\n## Setting Up a Database\n\nNow that we have downloaded some (freely available) data from the web into the memory of our Python session, let us set up a database to store that information for future use. We will use the data stored in this database throughout the following chapters, but you could alternatively implement a different strategy and replace the respective code. \n\nThere are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an [SQLite](https://SQLite.org/)-database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured SQL database engine. Note that [SQL](https://en.wikipedia.org/wiki/SQL) (Structured Query Language) is a standard language for accessing and manipulating databases.\\index{Database!SQLite}\n\n::: {#a10081e3 .cell execution_count=17}\n``` {.python .cell-code}\nimport sqlite3\n```\n:::\n\n\nAn SQLite-database is easily created - the code below is really all there is. You do not need any external software. Otherwise, date columns are stored and retrieved as integers.\\index{Database!Creation} We will use the file `tidy_finance_r.sqlite`, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.\n\n::: {#4b49f781 .cell execution_count=18}\n``` {.python .cell-code}\nimport os\n\nif not os.path.exists(\"data\"):\n  os.makedirs(\"data\")\n    \ntidy_finance = sqlite3.connect(database=\"data/tidy_finance_python.sqlite\")\n```\n:::\n\n\nNext, we create a remote table with the monthly Fama-French factor data. We do so with the `pandas` function `to_sql()`, which copies the data to our SQLite-database.\n\n::: {#c2800478 .cell execution_count=19}\n``` {.python .cell-code}\n(factors_ff3_monthly\n  .to_sql(name=\"factors_ff3_monthly\", \n          con=tidy_finance, \n          if_exists=\"replace\",\n          index=False)\n)\n```\n:::\n\n\nNow, if we want to have the whole table in memory, we need to call `pd.read_sql_query()` with the corresponding query. You will see that we regularly load the data into the memory in the next chapters.\\index{Database!Read}\n\n::: {#dbe240b7 .cell execution_count=20}\n``` {.python .cell-code}\npd.read_sql_query(\n  sql=\"SELECT date, rf FROM factors_ff3_monthly\",\n  con=tidy_finance,\n  parse_dates={\"date\"}\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```{=html}\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>rf</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1960-01-01</td>\n      <td>0.0033</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1960-02-01</td>\n      <td>0.0029</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1960-03-01</td>\n      <td>0.0035</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1960-04-01</td>\n      <td>0.0019</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1960-05-01</td>\n      <td>0.0027</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>775</th>\n      <td>2024-08-01</td>\n      <td>0.0048</td>\n    </tr>\n    <tr>\n      <th>776</th>\n      <td>2024-09-01</td>\n      <td>0.0040</td>\n    </tr>\n    <tr>\n      <th>777</th>\n      <td>2024-10-01</td>\n      <td>0.0039</td>\n    </tr>\n    <tr>\n      <th>778</th>\n      <td>2024-11-01</td>\n      <td>0.0040</td>\n    </tr>\n    <tr>\n      <th>779</th>\n      <td>2024-12-01</td>\n      <td>0.0037</td>\n    </tr>\n  </tbody>\n</table>\n<p>780 rows × 2 columns</p>\n</div>\n```\n:::\n:::\n\n\nThe last couple of code chunks are really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages. \n\nBefore we move on to the next data source, let us also store the other six tables in our new SQLite database. \n\n::: {#4a6705c7 .cell execution_count=21}\n``` {.python .cell-code}\ndata_dict = {\n  \"factors_ff5_monthly\": factors_ff5_monthly,\n  \"factors_ff3_daily\": factors_ff3_daily,\n  \"industries_ff_monthly\": industries_ff_monthly, \n  \"factors_q_monthly\": factors_q_monthly,\n  \"macro_predictors\": macro_predictors,\n  \"cpi_monthly\": cpi_monthly\n}\n\nfor key, value in data_dict.items():\n    value.to_sql(name=key,\n                 con=tidy_finance, \n                 if_exists=\"replace\",\n                 index=False)\n```\n:::\n\n\nFrom now on, all you need to do to access data that is stored in the database is to follow two steps: (i) Establish the connection to the SQLite-database and (ii) execute the query to fetch the data. For your convenience, the following steps show all you need in a compact fashion.\\index{Database!Connection}\n\n::: {#045cddbc .cell message='false' results='false' execution_count=22}\n``` {.python .cell-code}\nimport pandas as pd\nimport sqlite3\n\ntidy_finance = sqlite3.connect(database=\"data/tidy_finance_python.sqlite\")\n\nfactors_q_monthly = pd.read_sql_query(\n  sql=\"SELECT * FROM factors_q_monthly\",\n  con=tidy_finance,\n  parse_dates={\"date\"}\n)\n```\n:::\n\n\n## Managing SQLite Databases\n\nFinally, at the end of our data chapter, we revisit the  SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.\\index{Database!Management}\n\nTo optimize the database file, you can run the `VACUUM` command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the `execute()` function. \n\n::: {#b530109e .cell execution_count=23}\n``` {.python .cell-code}\ntidy_finance.execute(\"VACUUM\")\n```\n:::\n\n\nThe `VACUUM` command actually performs a couple of additional cleaning steps, which you can read about in [this tutorial.](https://SQLite.org/docs/sql/statements/vacuum.html) \\index{Database!Cleaning}\n\n## Key Takeaways\n\n- Importing Fama-French factors, q-factors, macroeconomic indicators, and CPI data is simplified through API calls, CSV parsing, and web scraping techniques.\n- The `tidyfinance` Python package offers pre-processed access to financial datasets, reducing manual data cleaning and saving valuable time.\n- Creating a centralized SQLite database helps manage and organize data efficiently across projects, while maintaining reproducibility.\n- Structured database storage supports scalable data access, which is essential for long-term academic projects and collaborative work in finance.\n\n## Exercises\n\n1. Download the monthly Fama-French factors manually from [Kenneth French's data library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) and read them in via `pd.read_csv()`. Validate that you get the same data as via the `pandas-datareader` package. \n1. Download the daily Fama-French 5 factors using the `pdr.DataReader()` package. After the successful download and conversion to the column format that we used above, compare the `rf`, `mkt_excess`, `smb`, and `hml` columns of `factors_ff3_daily` to `factors_ff5_daily`. Discuss any differences you might find. \n\n",
+    "markdown": "---\ntitle: Accessing and Managing Financial Data\nmetadata:\n  pagetitle: Accessing and Managing Financial Data with Python\n  description-meta: Download and organize open-source financial data using the programming language Python. \n---\n\n\n\n::: callout-note\nYou are reading **Tidy Finance with Python**. You can find the equivalent chapter for the sibling **Tidy Finance with R** [here](../r/accessing-and-managing-financial-data.qmd).\n:::\n\nIn this chapter, we suggest a way to organize your financial data. Everybody who has experience with data is also familiar with storing data in various formats like CSV, XLS, XLSX, or other delimited value storage. Reading and saving data can become very cumbersome when using different data formats and across different projects. Moreover, storing data in delimited files often leads to problems with respect to column type consistency. For instance, date-type columns frequently lead to inconsistencies across different data formats and programming languages. \n\nThis chapter shows how to import different open-source datasets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series.\\index{API} We store all the data in a *single* database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.\\index{Database}\n\nFirst, we load the Python packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them. \n\n::: {#2068e538 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nimport io\nimport re\nimport zipfile\nfrom curl_cffi import requests\n```\n:::\n\n\nMoreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.\n\n::: {#c7108e44 .cell execution_count=3}\n``` {.python .cell-code}\nstart_date = \"1960-01-01\"\nend_date = \"2024-12-31\"\n```\n:::\n\n\n## Fama-French Data\n\nWe start by downloading some famous Fama-French factors [e.g., @Fama1993] and portfolio returns commonly used in empirical asset pricing. The data are freely available from Kenneth French’s Data Library, but the raw files come in a rather idiosyncratic format. If you access the data via the website, the manual *raw* workflow looks like this:\n\n1. Go to the website\n1. Find the right dataset\n1. Download a ZIP file\n1. Extract the CSV inside\n1. Select the right data table from the file and import the table into Python\n1. Clean the dates, scale the returns, fix column names, handle missing values, etc.\n\nDoing this once is fine; doing it repeatedly across projects is exactly the type of boilerplate that’s easy to mess up and annoying to maintain. It is therefore natural to automate these steps in Python.\n\n# From manual steps to a download script\n\nA minimal download script mirrors the manual steps one by one. For example, to fetch a Fama–French dataset you first construct the URL:\n\n::: {#9ab8e5db .cell execution_count=4}\n``` {.python .cell-code}\ndataset = \"F-F_Research_Data_Factors\"\nbase_url = \"http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/\"\nurl = f\"{base_url}{dataset}_CSV.zip\"\n```\n:::\n\n\nNext, you replace the browser download with an HTTP request and extract the ZIP in memory:\n\n::: {#aafb772e .cell execution_count=5}\n``` {.python .cell-code}\nresp = requests.get(url)\nresp.raise_for_status()\n\nwith zipfile.ZipFile(io.BytesIO(resp.content)) as zf:\n    file_name = zf.namelist()[0]  # Ken French ZIPs contain one file\n    raw_text = zf.read(file_name).decode(\"latin1\")\n```\n:::\n\n\nThe most important part of this chunk is the `requests.get()` call. This is the moment where we replace all the manual browser work (open the website, click download, save the file) with a single, reproducible line of code. Then, calling `raise_for_status()` ensures we stop immediately if the server returns an error (e.g. HTTP 404 or 500) instead of quietly handling a broken file. Once this succeeds, `resp.content` is guaranteed to contain valid ZIP bytes that we can open in memory.\n\nThe raw file contains documentation text followed by the actual data table(s). To emulate *scrolling down until the numbers start*, you can split the file into blocks and keep the long one that contains the table:\n\n::: {#61d53ef8 .cell execution_count=6}\n``` {.python .cell-code}\nchunks = raw_text.split(\"\\r\\n\\r\\n\")\ntable_text = max(chunks, key=len)  \n```\n:::\n\n\nWithin this block, the first CSV header line starts at the first line beginning with a comma. We add a “Date” label for the index and pass everything to `read_csv`:\n\n::: {#d72929a3 .cell execution_count=7}\n``` {.python .cell-code}\nmatch = re.search(r\"^\\s*,\", table_text, flags=re.M)\nstart = match.start()\ncsv_text = \"Date\" + table_text[start:]\n\nfactors_ff_raw = pd.read_csv(io.StringIO(csv_text), index_col=0)\n```\n:::\n\n\nAt this point, the index still consists of integer date codes with different lengths depending on the frequency. We need a bit of logic to convert them into a proper `DatetimeIndex`:\n\n::: {#7a97a04a .cell execution_count=8}\n``` {.python .cell-code}\ns = factors_ff_raw.index.astype(str)\n\nif (s.str.len() == 8).all():  # daily: YYYYMMDD\n    dt = pd.to_datetime(s, format=\"%Y%m%d\")\nelif (s.str.len() == 6).all():  # monthly: YYYYMM\n    dt = pd.to_datetime(s + \"01\", format=\"%Y%m%d\")\nelif (s.str.len() == 4).all():  # annual: YYYY\n    dt = pd.to_datetime(s + \"0101\", format=\"%Y%m%d\")\n    dt = dt.dt.to_period(\"A-DEC\").dt.to_timestamp(\"end\")\nelse:\n    raise ValueError(\"Unknown date format in Fama–French index.\")\n\nfactors_ff_raw = factors_ff_raw.set_index(dt)\nfactors_ff_raw.index.name = \"date\"\n```\n:::\n\n\nFinally, we still have to clean the data:\n\n- Convert returns from percent to decimal.\n- Standardize column names (e.g., all lower case and Mkt-RF to mkt_excess, RF to risk_free)\n- Replace special missing-value codes (-99.99, -999) with actual missing values\n- Filter the data by a start and end date\n\nThis all could look like this:\n\n::: {#77d01e70 .cell execution_count=9}\n``` {.python .cell-code}\n# start and end dates\nif start_date:\n    factors_ff_raw = factors_ff_raw[factors_ff_raw.index >= pd.to_datetime(start_date)]\nif end_date:\n    factors_ff_raw = factors_ff_raw[factors_ff_raw.index <= pd.to_datetime(end_date)]\n\nfactors_ff3_monthly = (factors_ff_raw\n    .div(100)\n    .reset_index(names=\"date\")\n    .rename(columns=str.lower)\n    .rename(columns={\"mkt-rf\": \"mkt_excess\", \"rf\": \"risk_free\"})\n    .replace({\"-99.99\": pd.NA, -99.99: pd.NA, -999: pd.NA})\n)\nfactors_ff3_monthly\n```\n\n::: {.cell-output .cell-output-display execution_count=37}\n```{=html}\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>mkt_excess</th>\n      <th>smb</th>\n      <th>hml</th>\n      <th>risk_free</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1960-01-01</td>\n      <td>-0.0698</td>\n      <td>0.0212</td>\n      <td>0.0265</td>\n      <td>0.0033</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1960-02-01</td>\n      <td>0.0116</td>\n      <td>0.0060</td>\n      <td>-0.0197</td>\n      <td>0.0029</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1960-03-01</td>\n      <td>-0.0163</td>\n      <td>-0.0055</td>\n      <td>-0.0275</td>\n      <td>0.0035</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1960-04-01</td>\n      <td>-0.0171</td>\n      <td>0.0022</td>\n      <td>-0.0214</td>\n      <td>0.0019</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1960-05-01</td>\n      <td>0.0312</td>\n      <td>0.0129</td>\n      <td>-0.0373</td>\n      <td>0.0027</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>775</th>\n      <td>2024-08-01</td>\n      <td>0.0160</td>\n      <td>-0.0349</td>\n      <td>-0.0110</td>\n      <td>0.0048</td>\n    </tr>\n    <tr>\n      <th>776</th>\n      <td>2024-09-01</td>\n      <td>0.0172</td>\n      <td>-0.0013</td>\n      <td>-0.0277</td>\n      <td>0.0040</td>\n    </tr>\n    <tr>\n      <th>777</th>\n      <td>2024-10-01</td>\n      <td>-0.0100</td>\n      <td>-0.0099</td>\n      <td>0.0086</td>\n      <td>0.0039</td>\n    </tr>\n    <tr>\n      <th>778</th>\n      <td>2024-11-01</td>\n      <td>0.0649</td>\n      <td>0.0446</td>\n      <td>0.0015</td>\n      <td>0.0040</td>\n    </tr>\n    <tr>\n      <th>779</th>\n      <td>2024-12-01</td>\n      <td>-0.0317</td>\n      <td>-0.0271</td>\n      <td>-0.0300</td>\n      <td>0.0037</td>\n    </tr>\n  </tbody>\n</table>\n<p>780 rows × 5 columns</p>\n</div>\n```\n:::\n:::\n\n\nAll of these steps are doable, but none of them are really about finance - they are just the technical scaffolding required before you can work with the actual factor returns. That’s where a dedicated helper or package becomes invaluable. The `tidyfinance` package performs this entire workflow under the hood: you request a Fama–French dataset and receive a clean, consistently formatted data table from Kenneth French's Data Library.\\index{Data!Fama-French factors}\\index{Kenneth French homepage}. This avoids repetitive boilerplate, reduces errors, and lets you focus on modeling and analysis rather than on data plumbing.\n\n# Using `tidyfinance` instead of reimplementing the plumbing\n\n::: {#18f92154 .cell execution_count=10}\n``` {.python .cell-code}\nimport tidyfinance as tf\n```\n:::\n\n\nFor example, we can use the `tf.download_data()` function of the package to download monthly Fama-French factors. The set *Fama/French 3 Factors* contains the return time series of the market (`mkt_excess`), size (`smb`), and value (`hml`) factors alongside the risk-free rates (`risk_free`). Note that the `tf.download_data()` function parses all the columns correctly and already scale them appropriately, as the raw Fama-French data comes in a unique data format. For precise descriptions of the variables, we suggest consulting Prof. Kenneth French's finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to the `tidyfinance` package.\\index{Factor!Market}\\index{Factor!Size}\\index{Factor!Value}\\index{Factor!Profitability}\\index{Factor!Investment}\\index{Risk-free rate}\n\n::: {#fd60bf92 .cell execution_count=11}\n``` {.python .cell-code}\nfactors_ff3_monthly = tf.download_data(\n  domain=\"famafrench\",\n  dataset=\"F-F_Research_Data_Factors\",\n  start_date=start_date,\n  end_date=end_date,\n)\n```\n:::\n\n\nWe also download the set *5 Factors (2x3)*, which additionally includes the return time series of the profitability (`rmw`) and investment (`cma`) factors. We demonstrate how the monthly factors are constructed in [Replicating Fama and French Factors](replicating-fama-and-french-factors.qmd).\n\n::: {#bf347fbd .cell execution_count=12}\n``` {.python .cell-code}\nfactors_ff5_monthly = tf.download_data(\n  domain=\"famafrench\",\n  dataset=\"F-F_Research_Data_5_Factors_2x3\",\n  start_date=start_date,\n  end_date=end_date,\n)\n```\n:::\n\n\nIt is straightforward to download the corresponding *daily* Fama-French factors with the same function. \n\n::: {#7708ed92 .cell execution_count=13}\n``` {.python .cell-code}\nfactors_ff3_daily = tf.download_data(\n  domain=\"famafrench\",\n  dataset=\"F-F_Research_Data_Factors_daily\",\n  start_date=start_date,\n  end_date=end_date,\n)\n```\n:::\n\n\nIn a subsequent chapter, we also use the monthly returns from ten industry portfolios, so let us fetch that data, too.\\index{Data!Industry portfolios}\n\n::: {#6a04aed2 .cell execution_count=14}\n``` {.python .cell-code}\nindustries_ff_monthly = tf.download_data(\n  domain=\"famafrench\",\n  dataset=\"10_Industry_Portfolios\",\n  start_date=start_date,\n  end_date=end_date,\n)\n```\n:::\n\n\nIt is worth taking a look at all available portfolio return time series from Kenneth French's homepage. You should check out the other sets by calling `tf.get_available_famafrench_datasets()`.\n\n## q-Factors\n\nIn recent years, the academic discourse experienced the rise of alternative factor models, e.g., in the form of the @Hou2015 *q*-factor model. We refer to the [extended background](http://global-q.org/background.html) information provided by the original authors for further information. The *q*-factors can be downloaded directly from the authors' homepage from within `pd.read_csv()`. \\index{Data!q-factors}\\index{Factor!q-factors}\n\nWe also need to adjust this data. First, we discard information we will not use in the remainder of the book. Then, we rename the columns with the \"R_\"-prescript using regular expressions and write all column names in lowercase. We then query the data to select observations between the start and end dates. Finally, we use the double asterisk (`**`) notation in the `assign` function to apply the same transform of dividing by 100 to all four factors by iterating through them. You should always try sticking to a consistent style for naming objects, which we try to illustrate here - the emphasis is on *try*. You can check out style guides available online, e.g., [Hadley Wickham's `tidyverse` style guide.](https://style.tidyverse.org/index.html)\\index{Style guide} note that we temporarily adjust the SSL certificate handling behavior in Python’s \n`ssl` module when retrieving the $q$-factors directly from the web, as demonstrated in [Working with Stock Returns](working-with-stock-returns.qmd). This method should be used with caution, which is why we restore the default settings immediately after successfully downloading the data.\n\n::: {#4bf4afe7 .cell execution_count=15}\n``` {.python .cell-code}\nimport ssl\nssl._create_default_https_context = ssl._create_unverified_context\n\nfactors_q_monthly_link = (\n  \"https://global-q.org/uploads/1/2/2/6/122679606/\"\n  \"q5_factors_monthly_2024.csv\"\n)\n\nfactors_q_monthly = (pd.read_csv(factors_q_monthly_link)\n  .assign(\n    date=lambda x: (\n      pd.to_datetime(x[\"year\"].astype(str) + \"-\" +\n        x[\"month\"].astype(str) + \"-01\"))\n  )\n  .drop(columns=[\"R_F\", \"R_MKT\", \"year\"])\n  .rename(columns=lambda x: x.replace(\"R_\", \"\").lower())\n  .query(f\"date >= '{start_date}' and date <= '{end_date}'\")\n  .assign(\n    **{col: lambda x: x[col]/100 for col in [\"me\", \"ia\", \"roe\", \"eg\"]}\n  )\n)\n\nssl._create_default_https_context = ssl.create_default_context\n```\n:::\n\n\nAgain, you can use the `tidyfinance` package for a shortcut:\n\n::: {#87c32f1a .cell execution_count=16}\n``` {.python .cell-code}\ntf.download_data(\n  domain=\"factors_q\",\n  dataset=\"q5_factors_monthly\", \n  start_date=start_date, \n  end_date=end_date\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=44}\n```{=html}\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>risk_free</th>\n      <th>mkt_excess</th>\n      <th>me</th>\n      <th>ia</th>\n      <th>roe</th>\n      <th>eg</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1967-01-01</td>\n      <td>0.003927</td>\n      <td>0.081852</td>\n      <td>0.068122</td>\n      <td>-0.029263</td>\n      <td>0.018813</td>\n      <td>-0.025511</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1967-02-01</td>\n      <td>0.003743</td>\n      <td>0.007557</td>\n      <td>0.016235</td>\n      <td>-0.002915</td>\n      <td>0.035399</td>\n      <td>0.021792</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1967-03-01</td>\n      <td>0.003693</td>\n      <td>0.040169</td>\n      <td>0.019836</td>\n      <td>-0.016772</td>\n      <td>0.018417</td>\n      <td>-0.011192</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1967-04-01</td>\n      <td>0.003344</td>\n      <td>0.038786</td>\n      <td>-0.006700</td>\n      <td>-0.028972</td>\n      <td>0.010253</td>\n      <td>-0.016371</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1967-05-01</td>\n      <td>0.003126</td>\n      <td>-0.042807</td>\n      <td>0.027457</td>\n      <td>0.021864</td>\n      <td>0.005901</td>\n      <td>0.001191</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>691</th>\n      <td>2024-08-01</td>\n      <td>0.004419</td>\n      <td>0.016518</td>\n      <td>-0.040817</td>\n      <td>0.004687</td>\n      <td>0.018369</td>\n      <td>0.008116</td>\n    </tr>\n    <tr>\n      <th>692</th>\n      <td>2024-09-01</td>\n      <td>0.004619</td>\n      <td>0.016806</td>\n      <td>-0.011967</td>\n      <td>-0.000010</td>\n      <td>0.007408</td>\n      <td>-0.032810</td>\n    </tr>\n    <tr>\n      <th>693</th>\n      <td>2024-10-01</td>\n      <td>0.003907</td>\n      <td>-0.009701</td>\n      <td>-0.011261</td>\n      <td>-0.011676</td>\n      <td>-0.002314</td>\n      <td>-0.008335</td>\n    </tr>\n    <tr>\n      <th>694</th>\n      <td>2024-11-01</td>\n      <td>0.003955</td>\n      <td>0.065002</td>\n      <td>0.043985</td>\n      <td>-0.049491</td>\n      <td>-0.015370</td>\n      <td>-0.021420</td>\n    </tr>\n    <tr>\n      <th>695</th>\n      <td>2024-12-01</td>\n      <td>0.003663</td>\n      <td>-0.031637</td>\n      <td>-0.051564</td>\n      <td>-0.003684</td>\n      <td>-0.021442</td>\n      <td>0.049624</td>\n    </tr>\n  </tbody>\n</table>\n<p>696 rows × 7 columns</p>\n</div>\n```\n:::\n:::\n\n\n## Macroeconomic Predictors\n\nOur next data source is a set of macroeconomic variables often used as predictors for the equity premium. @Goyal2008 comprehensively reexamine the performance of variables suggested by the academic literature to be good predictors of the equity premium. The authors host the data on [Amit Goyal's website.](https://sites.google.com/view/agoyal145) Since the data is an XLSX-file stored on a public Google Drive location, we need additional packages to access the data directly from our Python session. Usually, you need to authenticate if you interact with Google drive directly in Python. Since the data is stored via a public link, we can proceed without any authentication.\\index{Google Drive}\n\n::: {#6ed1395b .cell execution_count=17}\n``` {.python .cell-code}\nsheet_id = \"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG\"\nsheet_name = \"macro_predictors.xlsx\"\nmacro_predictors_link = (\n  f\"https://docs.google.com/spreadsheets/d/{sheet_id}\" \n  f\"/gviz/tq?tqx=out:csv&sheet={sheet_name}\"\n)\n```\n:::\n\n\nNext, we read in the new data and transform the columns into the variables that we later use:\n\n1. The dividend price ratio (`dp`), the difference between the log of dividends and the log of prices, where dividends are 12-month moving sums of dividends paid on the S&P 500 index, and prices are monthly averages of daily closing prices [@Campbell1988; @Campbell2006]. \n1. Dividend yield (`dy`), the difference between the log of dividends and the log of lagged prices [@Ball1978]. \n1. Earnings price ratio (`ep`), the difference between the log of earnings and the log of prices, where earnings are 12-month moving sums of earnings on the S&P 500 index [@Campbell1988]. \n1. Dividend payout ratio (`de`), the difference between the log of dividends and the log of earnings [@Lamont1998]. \n1. Stock variance (`svar`), the sum of squared daily returns on the S&P 500 index [@Guo2006].\n1. Book-to-market ratio (`bm`), the ratio of book value to market value for the Dow Jones Industrial Average [@Kothari1997].\n1. Net equity expansion (`ntis`), the ratio of 12-month moving sums of net issues by NYSE listed stocks divided by the total end-of-year market capitalization of NYSE stocks [@Campbell2008].\n1. Treasury bills (`tbl`), the 3-Month Treasury Bill: Secondary Market Rate from the economic research database at the Federal Reserve Bank at St. Louis [@Campbell1987].\n1. Long-term yield (`lty`), the long-term government bond yield from Ibbotson's Stocks, Bonds, Bills, and Inflation Yearbook [@Goyal2008].\n1. Long-term rate of returns (`ltr`), the long-term government bond returns from Ibbotson's Stocks, Bonds, Bills, and Inflation Yearbook [@Goyal2008].\n1. Term spread (`tms`), the difference between the long-term yield on government bonds and the Treasury bill [@Campbell1987].\n1. Default yield spread (`dfy`), the difference between BAA and AAA-rated corporate bond yields [@Fama1989]. \n1. Inflation (`infl`), the Consumer Price Index (All Urban Consumers) from the Bureau of Labor Statistics [@Campbell2004].\n\t\t\t\nFor variable definitions and the required data transformations, you can consult the material on [Amit Goyal's website.](https://sites.google.com/view/agoyal145)\n\n::: {#af3f3685 .cell execution_count=18}\n``` {.python .cell-code}\nssl._create_default_https_context = ssl._create_unverified_context\n\nmacro_predictors = (\n  pd.read_csv(macro_predictors_link, thousands=\",\")\n  .assign(\n    date=lambda x: pd.to_datetime(x[\"yyyymm\"], format=\"%Y%m\"),\n    dp=lambda x: np.log(x[\"D12\"])-np.log(x[\"Index\"]),\n    dy=lambda x: np.log(x[\"D12\"])-np.log(x[\"Index\"].shift(1)),\n    ep=lambda x: np.log(x[\"E12\"])-np.log(x[\"Index\"]),\n    de=lambda x: np.log(x[\"D12\"])-np.log(x[\"E12\"]),\n    tms=lambda x: x[\"lty\"]-x[\"tbl\"],\n    dfy=lambda x: x[\"BAA\"]-x[\"AAA\"]\n  )\n  .rename(columns={\"b/m\": \"bm\"})\n  .get([\"date\", \"dp\", \"dy\", \"ep\", \"de\", \"svar\", \"bm\", \n        \"ntis\", \"tbl\", \"lty\", \"ltr\", \"tms\", \"dfy\", \"infl\"])\n  .query(\"date >= @start_date and date <= @end_date\")\n  .dropna()\n)\n\nssl._create_default_https_context = ssl.create_default_context\n```\n:::\n\n\nTo get the equivalent data through `tidyfinance`, you can call:\n\n::: {#fa0b3e29 .cell execution_count=19}\n``` {.python .cell-code}\ntf.download_data(\n  domain=\"macro_predictors\",\n  dataset=\"monthly\",\n  start_date=start_date, \n  end_date=end_date\n)\n```\n:::\n\n\n## Other Macroeconomic Data\n\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. As an illustration, we use the `tidyfinance` package to fetch consumer price index (CPI) data that can be found under the [CPIAUCNS](https://fred.stlouisfed.org/series/CPIAUCNS) key.\\index{Data!FRED}\\index{Data!CPI}\n\n::: {#3d801a71 .cell execution_count=20}\n``` {.python .cell-code}\nseries = \"CPIAUCNS\"\nurl = f\"https://fred.stlouisfed.org/graph/fredgraph.csv?id={series}\"\n```\n:::\n\n\nWe can then use the `requests` module to request the CSV, extract the data from the response body, and convert the columns to a tidy format:\n\n::: {#80fe0fbe .cell execution_count=21}\n``` {.python .cell-code}\nresp = requests.get(url)\nresp_csv = pd.io.common.StringIO(resp.text)\n\ncpi_monthly = (pd.read_csv(resp_csv)\n  .assign(\n    date=lambda x: pd.to_datetime(x[\"observation_date\"]),\n    value=lambda x: pd.to_numeric(\n      x[series], errors=\"coerce\"\n    ),\n      series=series,\n   )\n  .get([\"date\", \"series\", \"value\"])\n  .query(\"date >= @start_date & date <= @end_date\")\n  .assign(cpi=lambda x: x[\"value\"] / x[\"value\"].iloc[-1])\n)\n```\n:::\n\n\nThe last line sets the current (latest) price level as the reference price level.\n\nThe `tidyfinance` package can, of course, also fetch the same index data and many more data series:\n\n::: {#b94f5bd0 .cell execution_count=22}\n``` {.python .cell-code}\ntf.download_data(\n  domain=\"fred\",\n  series = \"CPIAUCNS\",\n  start_date = start_date,\n  end_date = end_date\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=50}\n```{=html}\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>series</th>\n      <th>value</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1960-01-01</td>\n      <td>CPIAUCNS</td>\n      <td>29.300</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1960-02-01</td>\n      <td>CPIAUCNS</td>\n      <td>29.400</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1960-03-01</td>\n      <td>CPIAUCNS</td>\n      <td>29.400</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1960-04-01</td>\n      <td>CPIAUCNS</td>\n      <td>29.500</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1960-05-01</td>\n      <td>CPIAUCNS</td>\n      <td>29.500</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>775</th>\n      <td>2024-08-01</td>\n      <td>CPIAUCNS</td>\n      <td>314.796</td>\n    </tr>\n    <tr>\n      <th>776</th>\n      <td>2024-09-01</td>\n      <td>CPIAUCNS</td>\n      <td>315.301</td>\n    </tr>\n    <tr>\n      <th>777</th>\n      <td>2024-10-01</td>\n      <td>CPIAUCNS</td>\n      <td>315.664</td>\n    </tr>\n    <tr>\n      <th>778</th>\n      <td>2024-11-01</td>\n      <td>CPIAUCNS</td>\n      <td>315.493</td>\n    </tr>\n    <tr>\n      <th>779</th>\n      <td>2024-12-01</td>\n      <td>CPIAUCNS</td>\n      <td>315.605</td>\n    </tr>\n  </tbody>\n</table>\n<p>780 rows × 3 columns</p>\n</div>\n```\n:::\n:::\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the [PCU2122212122210](https://fred.stlouisfed.org/series/PCU2122212122210) key. If your desired time series is not supported through tidyfinance, we recommend working with the `fredapi` package. Note that you need to get an API key to use its functionality. We refer to the package documentation for details.\n\n## Setting Up a Database\n\nNow that we have downloaded some (freely available) data from the web into the memory of our Python session, let us set up a database to store that information for future use. We will use the data stored in this database throughout the following chapters, but you could alternatively implement a different strategy and replace the respective code. \n\nThere are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an [SQLite](https://SQLite.org/)-database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured SQL database engine. Note that [SQL](https://en.wikipedia.org/wiki/SQL) (Structured Query Language) is a standard language for accessing and manipulating databases.\\index{Database!SQLite}\n\n::: {#b825745c .cell execution_count=23}\n``` {.python .cell-code}\nimport sqlite3\n```\n:::\n\n\nAn SQLite-database is easily created - the code below is really all there is. You do not need any external software. Otherwise, date columns are stored and retrieved as integers.\\index{Database!Creation} We will use the file `tidy_finance_r.sqlite`, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.\n\n::: {#ac03dbae .cell execution_count=24}\n``` {.python .cell-code}\nimport os\n\nif not os.path.exists(\"data\"):\n  os.makedirs(\"data\")\n    \ntidy_finance = sqlite3.connect(database=\"data/tidy_finance_python.sqlite\")\n```\n:::\n\n\nNext, we create a remote table with the monthly Fama-French factor data. We do so with the `pandas` function `to_sql()`, which copies the data to our SQLite-database.\n\n::: {#244fccf8 .cell execution_count=25}\n``` {.python .cell-code}\n(factors_ff3_monthly\n  .to_sql(name=\"factors_ff3_monthly\", \n          con=tidy_finance, \n          if_exists=\"replace\",\n          index=False)\n)\n```\n:::\n\n\nNow, if we want to have the whole table in memory, we need to call `pd.read_sql_query()` with the corresponding query. You will see that we regularly load the data into the memory in the next chapters.\\index{Database!Read}\n\n::: {#dcab3728 .cell execution_count=26}\n``` {.python .cell-code}\npd.read_sql_query(\n  sql=\"SELECT date, risk_free FROM factors_ff3_monthly\",\n  con=tidy_finance,\n  parse_dates={\"date\"}\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=54}\n```{=html}\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>risk_free</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1960-01-01</td>\n      <td>0.0033</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1960-02-01</td>\n      <td>0.0029</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1960-03-01</td>\n      <td>0.0035</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1960-04-01</td>\n      <td>0.0019</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1960-05-01</td>\n      <td>0.0027</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>775</th>\n      <td>2024-08-01</td>\n      <td>0.0048</td>\n    </tr>\n    <tr>\n      <th>776</th>\n      <td>2024-09-01</td>\n      <td>0.0040</td>\n    </tr>\n    <tr>\n      <th>777</th>\n      <td>2024-10-01</td>\n      <td>0.0039</td>\n    </tr>\n    <tr>\n      <th>778</th>\n      <td>2024-11-01</td>\n      <td>0.0040</td>\n    </tr>\n    <tr>\n      <th>779</th>\n      <td>2024-12-01</td>\n      <td>0.0037</td>\n    </tr>\n  </tbody>\n</table>\n<p>780 rows × 2 columns</p>\n</div>\n```\n:::\n:::\n\n\nThe last couple of code chunks are really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages. \n\nBefore we move on to the next data source, let us also store the other six tables in our new SQLite database. \n\n::: {#adf2106d .cell execution_count=27}\n``` {.python .cell-code}\ndata_dict = {\n  \"factors_ff5_monthly\": factors_ff5_monthly,\n  \"factors_ff3_daily\": factors_ff3_daily,\n  \"industries_ff_monthly\": industries_ff_monthly, \n  \"factors_q_monthly\": factors_q_monthly,\n  \"macro_predictors\": macro_predictors,\n  \"cpi_monthly\": cpi_monthly\n}\n\nfor key, value in data_dict.items():\n    value.to_sql(name=key,\n                 con=tidy_finance, \n                 if_exists=\"replace\",\n                 index=False)\n```\n:::\n\n\nFrom now on, all you need to do to access data that is stored in the database is to follow two steps: (i) Establish the connection to the SQLite-database and (ii) execute the query to fetch the data. For your convenience, the following steps show all you need in a compact fashion.\\index{Database!Connection}\n\n::: {#6487b384 .cell message='false' results='false' execution_count=28}\n``` {.python .cell-code}\nimport pandas as pd\nimport sqlite3\n\ntidy_finance = sqlite3.connect(database=\"data/tidy_finance_python.sqlite\")\n\nfactors_q_monthly = pd.read_sql_query(\n  sql=\"SELECT * FROM factors_q_monthly\",\n  con=tidy_finance,\n  parse_dates={\"date\"}\n)\n```\n:::\n\n\n## Managing SQLite Databases\n\nFinally, at the end of our data chapter, we revisit the  SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.\\index{Database!Management}\n\nTo optimize the database file, you can run the `VACUUM` command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the `execute()` function. \n\n::: {#28341992 .cell execution_count=29}\n``` {.python .cell-code}\ntidy_finance.execute(\"VACUUM\")\n```\n:::\n\n\nThe `VACUUM` command actually performs a couple of additional cleaning steps, which you can read about in [this tutorial.](https://SQLite.org/docs/sql/statements/vacuum.html) \\index{Database!Cleaning}\n\n## Key Takeaways\n\n- Importing Fama-French factors, q-factors, macroeconomic indicators, and CPI data is simplified through API calls, CSV parsing, and web scraping techniques.\n- The `tidyfinance` Python package offers pre-processed access to financial datasets, reducing manual data cleaning and saving valuable time.\n- Creating a centralized SQLite database helps manage and organize data efficiently across projects, while maintaining reproducibility.\n- Structured database storage supports scalable data access, which is essential for long-term academic projects and collaborative work in finance.\n\n## Exercises\n\n1. Download the monthly Fama-French factors manually from [Kenneth French's data library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) and read them in via `pd.read_csv()`. Validate that you get the same data as via the `tf.download_data()` package. \n1. Download the daily Fama-French 5 factors using the `tf.download_data()` function. After the successful download and conversion to the column format that we used above, compare the `risk_free`, `mkt_excess`, `smb`, and `hml` columns of `factors_ff3_daily` to `factors_ff5_daily`. Discuss any differences you might find. \n\n",
     "supporting": [
       "accessing-and-managing-financial-data_files"
     ],
     "filters": [],
     "includes": {
       "include-in-header": [
-        "<script src=\"https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js\" integrity=\"sha512-c3Nl8+7g4LMSTdrm621y7kf9v3SDPnhxLNhcjFJbKECVnmZHTdo+IRO05sNLTH/D3vA6u1X32ehoLC7WFVdheg==\" crossorigin=\"anonymous\"></script>\n<script src=\"https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js\" integrity=\"sha512-bLT0Qm9VnAYZDflyKcBaQ2gg0hSYNQrJ8RilYldYQ1FxQYoCLtUjuuRuZo+fjqhx/qtq/1itJ0C2ejDxltZVFg==\" crossorigin=\"anonymous\" data-relocate-top=\"true\"></script>\n<script type=\"application/javascript\">define('jquery', [],function() {return window.jQuery;})</script>\n"
+        "<script src=\"https://cdn.jsdelivr.net/npm/requirejs@2.3.6/require.min.js\" integrity=\"sha384-c9c+LnTbwQ3aujuU7ULEPVvgLs+Fn6fJUvIGTsuu1ZcCf11fiEubah0ttpca4ntM sha384-6V1/AdqZRWk1KAlWbKBlGhN7VG4iE/yAZcO6NZPMF8od0vukrvr0tg4qY6NSrItx\" crossorigin=\"anonymous\"></script>\n<script src=\"https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js\" integrity=\"sha384-ZvpUoO/+PpLXR1lu4jmpXWu80pZlYUAfxl5NsBMWOEPSjUn/6Z/hRTt8+pR6L4N2\" crossorigin=\"anonymous\" data-relocate-top=\"true\"></script>\n<script type=\"application/javascript\">define('jquery', [],function() {return window.jQuery;})</script>\n"
       ]
     }
   }
diff --git a/_freeze/r/accessing-and-managing-financial-data/execute-results/html.json b/_freeze/r/accessing-and-managing-financial-data/execute-results/html.json
index 6a81122a..646eed21 100644
--- a/_freeze/r/accessing-and-managing-financial-data/execute-results/html.json
+++ b/_freeze/r/accessing-and-managing-financial-data/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "58cf68fe24e6b0c8b028f74da0c95bf6",
+  "hash": "20e6e3325026e6ce999b7fe854cd3f10",
   "result": {
     "engine": "knitr",
-    "markdown": "---\ntitle: Accessing and Managing Financial Data\naliases: \n  - ../accessing-and-managing-financial-data.html\nmetadata:\n  pagetitle: Accessing and Managing Financial Data with R\n  description-meta: Download and organize open-source financial data using the programming language R. \n---\n\n::: callout-note\nYou are reading **Tidy Finance with R**. You can find the equivalent chapter for the sibling **Tidy Finance with Python** [here](../python/accessing-and-managing-financial-data.qmd).\n:::\n\nIn this chapter, we suggest a way to organize your financial data. Everybody who has experience with data is also familiar with storing data in various formats like CSV, XLS, XLSX, or other delimited value storage. Reading and saving data can become very cumbersome in the case of using different data formats, both across different projects and across different programming languages. Moreover, storing data in delimited files often leads to problems with respect to column type consistency. For instance, date-type columns frequently lead to inconsistencies across different data formats and programming languages.\n\nThis chapter shows how to import different open source data sets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series that can be scraped directly from a website.\\index{API}\\index{Web scraping} We show how to process these raw data, as well as how to take a shortcut using the `tidyfinance` package, which provides a consistent interface to tidy financial data. We store all the data in a *single* database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.\\index{Database}\n\nFirst, we load the global R packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\n```\n:::\n\n\nMoreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstart_date <- ymd(\"1960-01-01\")\nend_date <- ymd(\"2024-12-31\")\n```\n:::\n\n\n## Fama-French Data\n\nWe start by downloading some famous Fama-French factors [e.g., @Fama1993] and portfolio returns commonly used in empirical asset pricing. Fortunately, there is a neat package by [Nelson Areal](https://github.com/nareal/frenchdata/) that allows us to access the data easily: the `frenchdata` package provides functions to download and read data sets from [Prof. Kenneth French finance data library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) [@frenchdata].\\index{Data!Fama-French factors} \\index{Kenneth French homepage}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(frenchdata)\n```\n:::\n\n\nWe can use the `download_french_data()` function of the package to download monthly Fama-French factors. The set *Fama/French 3 Factors* contains the return time series of the market `mkt_excess`, size `smb` and value `hml` alongside the risk-free rates `rf`. Note that we have to do some manual work to correctly parse all the columns and scale them appropriately, as the raw Fama-French data comes in a very unpractical data format. For precise descriptions of the variables, we suggest consulting Prof. Kenneth French's finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to `frenchdata`.\\index{Factor!Market}\\index{Factor!Size}\\index{Factor!Value}\\index{Factor!Profitability}\\index{Factor!Investment}\\index{Risk-free rate}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_monthly_raw <- download_french_data(\"Fama/French 3 Factors\")\nfactors_ff3_monthly <- factors_ff3_monthly_raw$subsets$data[[1]] |>\n  mutate(\n    date = floor_date(ymd(str_c(date, \"01\")), \"month\"),\n    across(c(RF, `Mkt-RF`, SMB, HML), ~as.numeric(.) / 100),\n    .keep = \"none\"\n  ) |>\n  rename_with(str_to_lower) |>\n  rename(mkt_excess = `mkt-rf`) |> \n  filter(date >= start_date & date <= end_date)\n```\n:::\n\n\nWe also download the set *5 Factors (2x3)*, which additionally includes the return time series of the profitability `rmw` and investment `cma` factors. We demonstrate how the monthly factors are constructed in the chapter [Replicating Fama and French Factors](replicating-fama-and-french-factors.qmd).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff5_monthly_raw <- download_french_data(\"Fama/French 5 Factors (2x3)\")\n\nfactors_ff5_monthly <- factors_ff5_monthly_raw$subsets$data[[1]] |>\n  mutate(\n    date = floor_date(ymd(str_c(date, \"01\")), \"month\"),\n    across(c(RF, `Mkt-RF`, SMB, HML, RMW, CMA), ~as.numeric(.) / 100),\n    .keep = \"none\"\n  ) |>\n  rename_with(str_to_lower) |>\n  rename(mkt_excess = `mkt-rf`) |> \n  filter(date >= start_date & date <= end_date)\n```\n:::\n\n\nIt is straightforward to download the corresponding *daily* Fama-French factors with the same function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_daily_raw <- download_french_data(\"Fama/French 3 Factors [Daily]\")\n\nfactors_ff3_daily <- factors_ff3_daily_raw$subsets$data[[1]] |>\n  mutate(\n    date = ymd(date),\n    across(c(RF, `Mkt-RF`, SMB, HML), ~as.numeric(.) / 100),\n    .keep = \"none\"\n  ) |>\n  rename_with(str_to_lower) |>\n  rename(mkt_excess = `mkt-rf`) |>\n  filter(date >= start_date & date <= end_date)\n```\n:::\n\n\nIn a subsequent chapter, we also use the 10 monthly industry portfolios, so let us fetch that data, too.\\index{Data!Industry portfolios}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nindustries_ff_monthly_raw <- download_french_data(\"10 Industry Portfolios\")\n\nindustries_ff_monthly <- industries_ff_monthly_raw$subsets$data[[1]] |>\n  mutate(date = floor_date(ymd(str_c(date, \"01\")), \"month\")) |>\n  mutate(across(where(is.numeric), ~ . / 100)) |>\n  select(date, everything()) |>\n  filter(date >= start_date & date <= end_date) |> \n  rename_with(str_to_lower)\n```\n:::\n\n\nIt is worth taking a look at all available portfolio return time series from Kenneth French's homepage. You should check out the other sets by calling `get_french_data_list()`. \n\nTo automatically download and process Fama-French data, you can also use the `tidyfinance` package with `type = \"factors_ff_3_monthly\"` or similar, e.g.:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndownload_data(\n  type = \"factors_ff_3_monthly\", \n  start_date = start_date, \n  end_date = end_date\n)\n```\n:::\n\n\nThe `tidyfinance` package implements the processing steps as above and returns the same cleaned data frame. The list of supported Fama-French data types can be called as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist_supported_types(domain = \"Fama-French\")\n```\n:::\n\n\n## q-Factors\n\nIn recent years, the academic discourse experienced the rise of alternative factor models, e.g., in the form of the @Hou2015 *q*-factor model. We refer to the [extended background](http://global-q.org/background.html) information provided by the original authors for further information. The *q* factors can be downloaded directly from the authors' homepage from within `read_csv()`.\\index{Data!q-factors}\\index{Factor!q-factors}\n\nWe also need to adjust this data. First, we discard information we will not use in the remainder of the book. Then, we rename the columns with the \"R\\_\"-prescript using regular expressions and write all column names in lowercase. You should always try sticking to a consistent style for naming objects, which we try to illustrate here - the emphasis is on *try*. You can check out style guides available online, e.g., [Hadley Wickham's `tidyverse` style guide.](https://style.tidyverse.org/index.html)\\index{Style guide}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_q_monthly_link <-\n  \"https://global-q.org/uploads/1/2/2/6/122679606/q5_factors_monthly_2023.csv\"\n\nfactors_q_monthly <- read_csv(factors_q_monthly_link) |>\n  mutate(date = ymd(str_c(year, month, \"01\", sep = \"-\"))) |>\n  rename_with(~str_remove(., \"R_\")) |>\n  rename_with(str_to_lower) |>\n  mutate(across(-date, ~. / 100)) |>\n  select(date, risk_free = f, mkt_excess = mkt, everything()) |>\n  filter(date >= start_date & date <= end_date)\n```\n:::\n\n\nAgain, you can use the `tidyfinance` package for a shortcut:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndownload_data(\n  type = \"factors_q5_monthly\", \n  start_date = start_date, \n  end_date = end_date\n)\n```\n:::\n\n\n## Macroeconomic Predictors\n\nOur next data source is a set of macroeconomic variables often used as predictors for the equity premium. @Goyal2008 comprehensively reexamine the performance of variables suggested by the academic literature to be good predictors of the equity premium. The authors host the data updated to 2022 on [Amit Goyal's website.](https://sites.google.com/view/agoyal145) The data is an XLSX-file stored on a public Google drive location and we directly export a CSV file.\\index{Data!Macro predictors}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsheet_id <- \"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG\"\nsheet_name <- \"Monthly\"\nmacro_predictors_url <- paste0(\n  \"https://docs.google.com/spreadsheets/d/\", sheet_id,\n  \"/gviz/tq?tqx=out:csv&sheet=\", sheet_name\n)\nmacro_predictors_raw <- read_csv(macro_predictors_url)\n```\n:::\n\n\nNext, we transform the columns into the variables that we later use:\n\n1.  The dividend price ratio (`dp`), the difference between the log of dividends and the log of prices, where dividends are 12-month moving sums of dividends paid on the S&P 500 index, and prices are monthly averages of daily closing prices [@Campbell1988; @Campbell2006].\n2.  Dividend yield (`dy`), the difference between the log of dividends and the log of lagged prices [@Ball1978].\n3.  Earnings price ratio (`ep`), the difference between the log of earnings and the log of prices, where earnings are 12-month moving sums of earnings on the S&P 500 index [@Campbell1988].\n4.  Dividend payout ratio (`de`), the difference between the log of dividends and the log of earnings [@Lamont1998].\n5.  Stock variance (`svar`), the sum of squared daily returns on the S&P 500 index [@Guo2006].\n6.  Book-to-market ratio (`bm`), the ratio of book value to market value for the Dow Jones Industrial Average [@Kothari1997].\n7.  Net equity expansion (`ntis`), the ratio of 12-month moving sums of net issues by NYSE listed stocks divided by the total end-of-year market capitalization of NYSE stocks [@Campbell2008].\n8.  Treasury bills (`tbl`), the 3-Month Treasury Bill: Secondary Market Rate from the economic research database at the Federal Reserve Bank at St. Louis [@Campbell1987].\n9.  Long-term yield (`lty`), the long-term government bond yield from Ibbotson's Stocks, Bonds, Bills, and Inflation Yearbook [@Goyal2008].\n10. Long-term rate of returns (`ltr`), the long-term government bond returns from Ibbotson's Stocks, Bonds, Bills, and Inflation Yearbook [@Goyal2008].\n11. Term spread (`tms`), the difference between the long-term yield on government bonds and the Treasury bill [@Campbell1987].\n12. Default yield spread (`dfy`), the difference between BAA and AAA-rated corporate bond yields [@Fama1989].\n13. Inflation (`infl`), the Consumer Price Index (All Urban Consumers) from the Bureau of Labor Statistics [@Campbell2004].\n\nFor variable definitions and the required data transformations, you can consult the material on [Amit Goyal's website](https://sites.google.com/view/agoyal145).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmacro_predictors <- macro_predictors_raw |>\n  mutate(date = ym(yyyymm)) |>\n  mutate(across(where(is.character), as.numeric)) |>\n  mutate(\n    IndexDiv = Index + D12,\n    logret = log(IndexDiv) - log(lag(IndexDiv)),\n    Rfree = log(Rfree + 1),\n    rp_div = lead(logret - Rfree, 1), # Future excess market return\n    dp = log(D12) - log(Index), # Dividend Price ratio\n    dy = log(D12) - log(lag(Index)), # Dividend yield\n    ep = log(E12) - log(Index), # Earnings price ratio\n    de = log(D12) - log(E12), # Dividend payout ratio\n    tms = lty - tbl, # Term spread\n    dfy = BAA - AAA # Default yield spread\n  ) |>\n  select(\n    date, rp_div, dp, dy, ep, de, svar,\n    bm = `b/m`, ntis, tbl, lty, ltr,\n    tms, dfy, infl\n  ) |>\n  filter(date >= start_date & date <= end_date) |>\n  drop_na()\n```\n:::\n\n\nTo get the equivalent data through `tidyfinance`, you can call:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndownload_data(\n  type = \"macro_predictors_monthly\",\n  start_date = start_date,\n  end_date = end_date\n)\n```\n:::\n\n\n## Other Macroeconomic Data\n\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. The data can be downloaded directly from FRED by constructing the appropriate URL. For instance, let us consider the consumer price index (CPI) data that can be found under the [CPIAUCNS](https://fred.stlouisfed.org/series/CPIAUCNS):\\index{Data!FRED}\\index{Data!CPI}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseries <- \"CPIAUCNS\"\ncpi_url <- paste0(\n  \"https://fred.stlouisfed.org/graph/fredgraph.csv?id=\", series\n)\n```\n:::\n\n\nWe can then use the `httr2` [@httr2] package to request the CSV, extract the data from the response body, and convert the columns to a tidy format:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(httr2)\n\ncpi_daily <- request(cpi_url) |>\n  req_perform() |>\n  resp_body_string() |>\n  read_csv() |>\n  mutate(\n    date = as.Date(observation_date),\n    value = as.numeric(.data[[series]]),\n    series = series,\n    .keep = \"none\"\n  )\n```\n:::\n\n\nWe convert the daily CPI data to monthly because we use the latter in later chapters. \n\n\n::: {.cell}\n\n```{.r .cell-code}\ncpi_monthly <- cpi_daily |>\n  mutate(\n    date = floor_date(date, \"month\"),\n    cpi = value / value[date == max(date)],\n    .keep = \"none\"\n  )\n```\n:::\n\n\nThe `tidyfinance` package can, of course, also fetch the same daily data and many more data series:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndownload_data(\n  type = \"fred\",\n  series = \"CPIAUCNS\",\n  start_date = start_date,\n  end_date = end_date\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 0 × 3\n# ℹ 3 variables: date <date>, value <dbl>, series <chr>\n```\n\n\n:::\n:::\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the [PCU2122212122210](https://fred.stlouisfed.org/series/PCU2122212122210) key. If your desired time series is not supported through `tidyfinance`, we recommend working with the `fredr` package [@fredr]. Note that you need to get an API key to use its functionality. We refer to the package documentation for details.\n\n## Setting Up a Database\n\nNow that we have downloaded some (freely available) data from the web into the memory of our R session let us set up a database to store that information for future use. We will use the data stored in this database throughout the following chapters, but you could alternatively implement a different strategy and replace the respective code.\n\nThere are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an [SQLite](https://www.sqlite.org/index.html) database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. Note that [SQL](https://en.wikipedia.org/wiki/SQL) (Structured Query Language) is a standard language for accessing and manipulating databases and heavily inspired the `dplyr` functions. We refer to [this tutorial](https://www.w3schools.com/sql/sql_intro.asp) for more information on SQL.\\index{Database!SQLite}\n\nThere are two packages that make working with SQLite in R very simple: `RSQLite` [@RSQLite] embeds the SQLite database engine in R, and `dbplyr` [@dbplyr] is the database back-end for `dplyr`. These packages allow to set up a database to remotely store tables and use these remote database tables as if they are in-memory data frames by automatically converting `dplyr` into SQL. Check out the [`RSQLite`](https://cran.r-project.org/web/packages/RSQLite/vignettes/RSQLite.html) and [`dbplyr`](https://db.rstudio.com/databases/sqlite/) vignettes for more information.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(RSQLite)\nlibrary(dbplyr)\n```\n:::\n\n\nAn SQLite database is easily created - the code below is really all there is. You do not need any external software. Note that we use the `extended_types = TRUE` option to enable date types when storing and fetching data. Otherwise, date columns are stored and retrieved as integers.\\index{Database!Creation} We will use the file `tidy_finance_r.sqlite`, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (!dir.exists(\"data\")) {\n  dir.create(\"data\")\n}\n\ntidy_finance <- dbConnect(\n  SQLite(),\n  \"data/tidy_finance_r.sqlite\",\n  extended_types = TRUE\n)\n```\n:::\n\n\nNext, we create a remote table with the monthly Fama-French factor data. We do so with the function `dbWriteTable()`, which copies the data to our SQLite-database.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndbWriteTable(\n  tidy_finance,\n  \"factors_ff3_monthly\",\n  value = factors_ff3_monthly,\n  overwrite = TRUE\n)\n```\n:::\n\n\nWe can use the remote table as an in-memory data frame by building a connection via `tbl()`.\\index{Database!Remote connection}\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_monthly_db <- tbl(tidy_finance, \"factors_ff3_monthly\")\n```\n:::\n\n\nAll `dplyr` calls are evaluated lazily, i.e., the data is not in our R session's memory, and the database does most of the work. You can see that by noticing that the output below does not show the number of rows. In fact, the following code chunk only fetches the top 10 rows from the database for printing.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_monthly_db |>\n  select(date, rf)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# Source:   SQL [?? x 2]\n# Database: sqlite 3.47.1 [data/tidy_finance_r.sqlite]\n  date           rf\n  <date>      <dbl>\n1 1960-01-01 0.0033\n2 1960-02-01 0.0029\n3 1960-03-01 0.0035\n4 1960-04-01 0.0019\n5 1960-05-01 0.0027\n# ℹ more rows\n```\n\n\n:::\n:::\n\n\nIf we want to have the whole table in memory, we need to `collect()` it. You will see that we regularly load the data into the memory in the next chapters.\\index{Database!Fetch}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_monthly_db |>\n  select(date, rf) |>\n  collect()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 780 × 2\n  date           rf\n  <date>      <dbl>\n1 1960-01-01 0.0033\n2 1960-02-01 0.0029\n3 1960-03-01 0.0035\n4 1960-04-01 0.0019\n5 1960-05-01 0.0027\n# ℹ 775 more rows\n```\n\n\n:::\n:::\n\n\nThe last couple of code chunks is really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages.\n\nBefore we move on to the next data source, let us also store the other five tables in our new SQLite database.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndbWriteTable(\n  tidy_finance,\n  \"factors_ff5_monthly\",\n  value = factors_ff5_monthly,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"factors_ff3_daily\",\n  value = factors_ff3_daily,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"industries_ff_monthly\",\n  value = industries_ff_monthly,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"factors_q_monthly\",\n  value = factors_q_monthly,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"macro_predictors\",\n  value = macro_predictors,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"cpi_monthly\",\n  value = cpi_monthly,\n  overwrite = TRUE\n)\n```\n:::\n\n\nFrom now on, all you need to do to access data that is stored in the database is to follow three steps: (i) Establish the connection to the SQLite database, (ii) call the table you want to extract, and (iii) collect the data. For your convenience, the following steps show all you need in a compact fashion.\\index{Database!Connection}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(RSQLite)\n\ntidy_finance <- dbConnect(\n  SQLite(),\n  \"data/tidy_finance_r.sqlite\",\n  extended_types = TRUE\n)\n\nfactors_q_monthly <- tbl(tidy_finance, \"factors_q_monthly\")\nfactors_q_monthly <- factors_q_monthly |> collect()\n```\n:::\n\n\n## Managing SQLite Databases\n\nFinally, at the end of our data chapter, we revisit the SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.\\index{Database!Management}\n\nTo optimize the database file, you can run the `VACUUM` command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the `dbSendQuery()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nres <- dbSendQuery(tidy_finance, \"VACUUM\")\nres\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n<SQLiteResult>\n  SQL  VACUUM\n  ROWS Fetched: 0 [complete]\n       Changed: 0\n```\n\n\n:::\n:::\n\n\nThe `VACUUM` command actually performs a couple of additional cleaning steps, which you can read about in [this tutorial.](https://www.sqlitetutorial.net/sqlite-vacuum/) \\index{Database!Cleaning}\n\nWe store the result of the above query in `res` because the database keeps the result set open. To close open results and avoid warnings going forward, we can use `dbClearResult()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndbClearResult(res)\n```\n:::\n\n\nApart from cleaning up, you might be interested in listing all the tables that are currently in your database. You can do this via the `dbListTables()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndbListTables(tidy_finance)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"cpi_monthly\"           \"factors_ff3_daily\"    \n[3] \"factors_ff3_monthly\"   \"factors_ff5_monthly\"  \n[5] \"factors_q_monthly\"     \"industries_ff_monthly\"\n[7] \"macro_predictors\"     \n```\n\n\n:::\n:::\n\n\nThis function comes in handy if you are unsure about the correct naming of the tables in your database.\n\n## Key Takeaways\n\n- Importing Fama-French factors, q-factors, macroeconomic indicators, and CPI data is simplified through API calls, CSV parsing, and web scraping techniques.\n- The `tidyfinance` R package offers pre-processed access to financial datasets, reducing manual data cleaning and saving valuable time.\n- Creating a centralized SQLite database helps manage and organize data efficiently across projects, while maintaining reproducibility.\n- Structured database storage supports scalable data access, which is essential for long-term academic projects and collaborative work in finance.\n\n## Exercises\n\n1.  Download the monthly Fama-French factors manually from [Ken French's data library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) and read them in via `read_csv()`. Validate that you get the same data as via the `frenchdata` package.\n2.  Download the daily Fama-French 5 factors using the `frenchdata` package. Use `get_french_data_list()` to find the corresponding table name. After the successful download and conversion to the column format that we used above, compare the `rf`, `mkt_excess`, `smb`, and `hml` columns of `factors_ff3_daily` to `factors_ff5_daily`. Discuss any differences you might find.\n",
+    "markdown": "---\ntitle: Accessing and Managing Financial Data\naliases: \n  - ../accessing-and-managing-financial-data.html\nmetadata:\n  pagetitle: Accessing and Managing Financial Data with R\n  description-meta: Download and organize open-source financial data using the programming language R. \n---\n\n::: callout-note\nYou are reading **Tidy Finance with R**. You can find the equivalent chapter for the sibling **Tidy Finance with Python** [here](../python/accessing-and-managing-financial-data.qmd).\n:::\n\nIn this chapter, we suggest a way to organize your financial data. Everybody who has experience with data is also familiar with storing data in various formats like CSV, XLS, XLSX, or other delimited value storage. Reading and saving data can become very cumbersome in the case of using different data formats, both across different projects and across different programming languages. Moreover, storing data in delimited files often leads to problems with respect to column type consistency. For instance, date-type columns frequently lead to inconsistencies across different data formats and programming languages.\n\nThis chapter shows how to import different open source data sets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series that can be scraped directly from a website.\\index{API}\\index{Web scraping} We show how to process these raw data, as well as how to take a shortcut using the `tidyfinance` package, which provides a consistent interface to tidy financial data. We store all the data in a *single* database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.\\index{Database}\n\nFirst, we load the global R packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(tidyfinance)\nlibrary(scales)\n```\n:::\n\n\nMoreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstart_date <- ymd(\"1960-01-01\")\nend_date <- ymd(\"2024-12-31\")\n```\n:::\n\n\n## Fama-French Data\n\nWe start by downloading some famous Fama-French factors [e.g., @Fama1993] and portfolio returns commonly used in empirical asset pricing. Fortunately, there is a neat package by [Nelson Areal](https://github.com/nareal/frenchdata/) that allows us to access the data easily: the `frenchdata` package provides functions to download and read data sets from [Prof. Kenneth French finance data library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) [@frenchdata].\\index{Data!Fama-French factors} \\index{Kenneth French homepage}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(frenchdata)\n```\n:::\n\n\nWe can use the `download_french_data()` function of the package to download monthly Fama-French factors. The set *Fama/French 3 Factors* contains the return time series of the market `mkt_excess`, size `smb` and value `hml` alongside the risk-free rates `rf`. Note that we have to do some manual work to correctly parse all the columns and scale them appropriately, as the raw Fama-French data comes in a very unpractical data format. For precise descriptions of the variables, we suggest consulting Prof. Kenneth French's finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to `frenchdata`.\\index{Factor!Market}\\index{Factor!Size}\\index{Factor!Value}\\index{Factor!Profitability}\\index{Factor!Investment}\\index{Risk-free rate}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_monthly_raw <- download_french_data(\"Fama/French 3 Factors\")\nfactors_ff3_monthly <- factors_ff3_monthly_raw$subsets$data[[1]] |>\n  mutate(\n    date = floor_date(ymd(str_c(date, \"01\")), \"month\"),\n    across(c(RF, `Mkt-RF`, SMB, HML), ~as.numeric(.) / 100),\n    .keep = \"none\"\n  ) |>\n  rename_with(str_to_lower) |>\n  rename(mkt_excess = `mkt-rf`) |> \n  filter(date >= start_date & date <= end_date)\n```\n:::\n\n\nWe also download the set *5 Factors (2x3)*, which additionally includes the return time series of the profitability `rmw` and investment `cma` factors. We demonstrate how the monthly factors are constructed in the chapter [Replicating Fama and French Factors](replicating-fama-and-french-factors.qmd).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff5_monthly_raw <- download_french_data(\"Fama/French 5 Factors (2x3)\")\n\nfactors_ff5_monthly <- factors_ff5_monthly_raw$subsets$data[[1]] |>\n  mutate(\n    date = floor_date(ymd(str_c(date, \"01\")), \"month\"),\n    across(c(RF, `Mkt-RF`, SMB, HML, RMW, CMA), ~as.numeric(.) / 100),\n    .keep = \"none\"\n  ) |>\n  rename_with(str_to_lower) |>\n  rename(mkt_excess = `mkt-rf`) |> \n  filter(date >= start_date & date <= end_date)\n```\n:::\n\n\nIt is straightforward to download the corresponding *daily* Fama-French factors with the same function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_daily_raw <- download_french_data(\"Fama/French 3 Factors [Daily]\")\n\nfactors_ff3_daily <- factors_ff3_daily_raw$subsets$data[[1]] |>\n  mutate(\n    date = ymd(date),\n    across(c(RF, `Mkt-RF`, SMB, HML), ~as.numeric(.) / 100),\n    .keep = \"none\"\n  ) |>\n  rename_with(str_to_lower) |>\n  rename(mkt_excess = `mkt-rf`) |>\n  filter(date >= start_date & date <= end_date)\n```\n:::\n\n\nIn a subsequent chapter, we also use the 10 monthly industry portfolios, so let us fetch that data, too.\\index{Data!Industry portfolios}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nindustries_ff_monthly_raw <- download_french_data(\"10 Industry Portfolios\")\n\nindustries_ff_monthly <- industries_ff_monthly_raw$subsets$data[[1]] |>\n  mutate(date = floor_date(ymd(str_c(date, \"01\")), \"month\")) |>\n  mutate(across(where(is.numeric), ~ . / 100)) |>\n  select(date, everything()) |>\n  filter(date >= start_date & date <= end_date) |> \n  rename_with(str_to_lower)\n```\n:::\n\n\nIt is worth taking a look at all available portfolio return time series from Kenneth French's homepage. You should check out the other sets by calling `get_french_data_list()`. \n\nTo automatically download and process Fama-French data, you can also use the `tidyfinance` package with `type = \"factors_ff_3_monthly\"` or similar, e.g.:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndownload_data(\n  type = \"factors_ff_3_monthly\", \n  start_date = start_date, \n  end_date = end_date\n)\n```\n:::\n\n\nThe `tidyfinance` package implements the processing steps as above and returns the same cleaned data frame. The list of supported Fama-French data types can be called as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist_supported_types(domain = \"Fama-French\")\n```\n:::\n\n\n## q-Factors\n\nIn recent years, the academic discourse experienced the rise of alternative factor models, e.g., in the form of the @Hou2015 *q*-factor model. We refer to the [extended background](http://global-q.org/background.html) information provided by the original authors for further information. The *q* factors can be downloaded directly from the authors' homepage from within `read_csv()`.\\index{Data!q-factors}\\index{Factor!q-factors}\n\nWe also need to adjust this data. First, we discard information we will not use in the remainder of the book. Then, we rename the columns with the \"R\\_\"-prescript using regular expressions and write all column names in lowercase. You should always try sticking to a consistent style for naming objects, which we try to illustrate here - the emphasis is on *try*. You can check out style guides available online, e.g., [Hadley Wickham's `tidyverse` style guide.](https://style.tidyverse.org/index.html)\\index{Style guide}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_q_monthly_link <-\n  \"https://global-q.org/uploads/1/2/2/6/122679606/q5_factors_monthly_2023.csv\"\n\nfactors_q_monthly <- read_csv(factors_q_monthly_link) |>\n  mutate(date = ymd(str_c(year, month, \"01\", sep = \"-\"))) |>\n  rename_with(~str_remove(., \"R_\")) |>\n  rename_with(str_to_lower) |>\n  mutate(across(-date, ~. / 100)) |>\n  select(date, risk_free = f, mkt_excess = mkt, everything()) |>\n  filter(date >= start_date & date <= end_date)\n```\n:::\n\n\nAgain, you can use the `tidyfinance` package for a shortcut:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndownload_data(\n  type = \"factors_q5_monthly\", \n  start_date = start_date, \n  end_date = end_date\n)\n```\n:::\n\n\n## Macroeconomic Predictors\n\nOur next data source is a set of macroeconomic variables often used as predictors for the equity premium. @Goyal2008 comprehensively reexamine the performance of variables suggested by the academic literature to be good predictors of the equity premium. The authors host the data updated to 2022 on [Amit Goyal's website.](https://sites.google.com/view/agoyal145) The data is an XLSX-file stored on a public Google drive location and we directly export a CSV file.\\index{Data!Macro predictors}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsheet_id <- \"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG\"\nsheet_name <- \"Monthly\"\nmacro_predictors_url <- paste0(\n  \"https://docs.google.com/spreadsheets/d/\", sheet_id,\n  \"/gviz/tq?tqx=out:csv&sheet=\", sheet_name\n)\nmacro_predictors_raw <- read_csv(macro_predictors_url)\n```\n:::\n\n\nNext, we transform the columns into the variables that we later use:\n\n1.  The dividend price ratio (`dp`), the difference between the log of dividends and the log of prices, where dividends are 12-month moving sums of dividends paid on the S&P 500 index, and prices are monthly averages of daily closing prices [@Campbell1988; @Campbell2006].\n2.  Dividend yield (`dy`), the difference between the log of dividends and the log of lagged prices [@Ball1978].\n3.  Earnings price ratio (`ep`), the difference between the log of earnings and the log of prices, where earnings are 12-month moving sums of earnings on the S&P 500 index [@Campbell1988].\n4.  Dividend payout ratio (`de`), the difference between the log of dividends and the log of earnings [@Lamont1998].\n5.  Stock variance (`svar`), the sum of squared daily returns on the S&P 500 index [@Guo2006].\n6.  Book-to-market ratio (`bm`), the ratio of book value to market value for the Dow Jones Industrial Average [@Kothari1997].\n7.  Net equity expansion (`ntis`), the ratio of 12-month moving sums of net issues by NYSE listed stocks divided by the total end-of-year market capitalization of NYSE stocks [@Campbell2008].\n8.  Treasury bills (`tbl`), the 3-Month Treasury Bill: Secondary Market Rate from the economic research database at the Federal Reserve Bank at St. Louis [@Campbell1987].\n9.  Long-term yield (`lty`), the long-term government bond yield from Ibbotson's Stocks, Bonds, Bills, and Inflation Yearbook [@Goyal2008].\n10. Long-term rate of returns (`ltr`), the long-term government bond returns from Ibbotson's Stocks, Bonds, Bills, and Inflation Yearbook [@Goyal2008].\n11. Term spread (`tms`), the difference between the long-term yield on government bonds and the Treasury bill [@Campbell1987].\n12. Default yield spread (`dfy`), the difference between BAA and AAA-rated corporate bond yields [@Fama1989].\n13. Inflation (`infl`), the Consumer Price Index (All Urban Consumers) from the Bureau of Labor Statistics [@Campbell2004].\n\nFor variable definitions and the required data transformations, you can consult the material on [Amit Goyal's website](https://sites.google.com/view/agoyal145).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmacro_predictors <- macro_predictors_raw |>\n  mutate(date = ym(yyyymm)) |>\n  mutate(across(where(is.character), as.numeric)) |>\n  mutate(\n    IndexDiv = Index + D12,\n    logret = log(IndexDiv) - log(lag(IndexDiv)),\n    Rfree = log(Rfree + 1),\n    rp_div = lead(logret - Rfree, 1), # Future excess market return\n    dp = log(D12) - log(Index), # Dividend Price ratio\n    dy = log(D12) - log(lag(Index)), # Dividend yield\n    ep = log(E12) - log(Index), # Earnings price ratio\n    de = log(D12) - log(E12), # Dividend payout ratio\n    tms = lty - tbl, # Term spread\n    dfy = BAA - AAA # Default yield spread\n  ) |>\n  select(\n    date, rp_div, dp, dy, ep, de, svar,\n    bm = `b/m`, ntis, tbl, lty, ltr,\n    tms, dfy, infl\n  ) |>\n  filter(date >= start_date & date <= end_date) |>\n  drop_na()\n```\n:::\n\n\nTo get the equivalent data through `tidyfinance`, you can call:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndownload_data(\n  type = \"macro_predictors_monthly\",\n  start_date = start_date,\n  end_date = end_date\n)\n```\n:::\n\n\n## Other Macroeconomic Data\n\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. The data can be downloaded directly from FRED by constructing the appropriate URL. For instance, let us consider the consumer price index (CPI) data that can be found under the [CPIAUCNS](https://fred.stlouisfed.org/series/CPIAUCNS):\\index{Data!FRED}\\index{Data!CPI}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseries <- \"CPIAUCNS\"\ncpi_url <- paste0(\n  \"https://fred.stlouisfed.org/graph/fredgraph.csv?id=\", series\n)\n```\n:::\n\n\nWe can then use the `httr2` [@httr2] package to request the CSV, extract the data from the response body, and convert the columns to a tidy format:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(httr2)\n\nresp <- request(cpi_url) |> \n  req_perform()\nresp_csv <- resp |> \n  resp_body_string() \n\ncpi_monthly <- resp_csv |> \n  read_csv() |>\n  mutate(\n    date = as.Date(observation_date),\n    value = as.numeric(.data[[series]]),\n    series = series,\n    .keep = \"none\"\n  ) |>\n  filter(date >= start_date & date <= end_date) |> \n  mutate(\n    cpi = value / value[date == max(date)]\n  )\n```\n:::\n\n\nThe last line sets the current (latest) price level as the reference price level.\n\nThe `tidyfinance` package can, of course, also fetch the same index data and many more data series:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndownload_data(\n  type = \"fred\",\n  series = \"CPIAUCNS\",\n  start_date = start_date,\n  end_date = end_date\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 0 × 3\n# ℹ 3 variables: date <date>, value <dbl>, series <chr>\n```\n\n\n:::\n:::\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the [PCU2122212122210](https://fred.stlouisfed.org/series/PCU2122212122210) key. If your desired time series is not supported through `tidyfinance`, we recommend working with the `fredr` package [@fredr]. Note that you need to get an API key to use its functionality. We refer to the package documentation for details.\n\n## Setting Up a Database\n\nNow that we have downloaded some (freely available) data from the web into the memory of our R session let us set up a database to store that information for future use. We will use the data stored in this database throughout the following chapters, but you could alternatively implement a different strategy and replace the respective code.\n\nThere are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an [SQLite](https://www.sqlite.org/index.html) database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. Note that [SQL](https://en.wikipedia.org/wiki/SQL) (Structured Query Language) is a standard language for accessing and manipulating databases and heavily inspired the `dplyr` functions. We refer to [this tutorial](https://www.w3schools.com/sql/sql_intro.asp) for more information on SQL.\\index{Database!SQLite}\n\nThere are two packages that make working with SQLite in R very simple: `RSQLite` [@RSQLite] embeds the SQLite database engine in R, and `dbplyr` [@dbplyr] is the database back-end for `dplyr`. These packages allow to set up a database to remotely store tables and use these remote database tables as if they are in-memory data frames by automatically converting `dplyr` into SQL. Check out the [`RSQLite`](https://cran.r-project.org/web/packages/RSQLite/vignettes/RSQLite.html) and [`dbplyr`](https://db.rstudio.com/databases/sqlite/) vignettes for more information.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(RSQLite)\nlibrary(dbplyr)\n```\n:::\n\n\nAn SQLite database is easily created - the code below is really all there is. You do not need any external software. Note that we use the `extended_types = TRUE` option to enable date types when storing and fetching data. Otherwise, date columns are stored and retrieved as integers.\\index{Database!Creation} We will use the file `tidy_finance_r.sqlite`, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (!dir.exists(\"data\")) {\n  dir.create(\"data\")\n}\n\ntidy_finance <- dbConnect(\n  SQLite(),\n  \"data/tidy_finance_r.sqlite\",\n  extended_types = TRUE\n)\n```\n:::\n\n\nNext, we create a remote table with the monthly Fama-French factor data. We do so with the function `dbWriteTable()`, which copies the data to our SQLite-database.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndbWriteTable(\n  tidy_finance,\n  \"factors_ff3_monthly\",\n  value = factors_ff3_monthly,\n  overwrite = TRUE\n)\n```\n:::\n\n\nWe can use the remote table as an in-memory data frame by building a connection via `tbl()`.\\index{Database!Remote connection}\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_monthly_db <- tbl(tidy_finance, \"factors_ff3_monthly\")\n```\n:::\n\n\nAll `dplyr` calls are evaluated lazily, i.e., the data is not in our R session's memory, and the database does most of the work. You can see that by noticing that the output below does not show the number of rows. In fact, the following code chunk only fetches the top 10 rows from the database for printing.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_monthly_db |>\n  select(date, rf)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# Source:   SQL [?? x 2]\n# Database: sqlite 3.47.1 [data/tidy_finance_r.sqlite]\n  date           rf\n  <date>      <dbl>\n1 1960-01-01 0.0033\n2 1960-02-01 0.0029\n3 1960-03-01 0.0035\n4 1960-04-01 0.0019\n5 1960-05-01 0.0027\n# ℹ more rows\n```\n\n\n:::\n:::\n\n\nIf we want to have the whole table in memory, we need to `collect()` it. You will see that we regularly load the data into the memory in the next chapters.\\index{Database!Fetch}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactors_ff3_monthly_db |>\n  select(date, rf) |>\n  collect()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 780 × 2\n  date           rf\n  <date>      <dbl>\n1 1960-01-01 0.0033\n2 1960-02-01 0.0029\n3 1960-03-01 0.0035\n4 1960-04-01 0.0019\n5 1960-05-01 0.0027\n# ℹ 775 more rows\n```\n\n\n:::\n:::\n\n\nThe last couple of code chunks is really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages.\n\nBefore we move on to the next data source, let us also store the other five tables in our new SQLite database.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndbWriteTable(\n  tidy_finance,\n  \"factors_ff5_monthly\",\n  value = factors_ff5_monthly,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"factors_ff3_daily\",\n  value = factors_ff3_daily,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"industries_ff_monthly\",\n  value = industries_ff_monthly,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"factors_q_monthly\",\n  value = factors_q_monthly,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"macro_predictors\",\n  value = macro_predictors,\n  overwrite = TRUE\n)\n\ndbWriteTable(\n  tidy_finance,\n  \"cpi_monthly\",\n  value = cpi_monthly,\n  overwrite = TRUE\n)\n```\n:::\n\n\nFrom now on, all you need to do to access data that is stored in the database is to follow three steps: (i) Establish the connection to the SQLite database, (ii) call the table you want to extract, and (iii) collect the data. For your convenience, the following steps show all you need in a compact fashion.\\index{Database!Connection}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(RSQLite)\n\ntidy_finance <- dbConnect(\n  SQLite(),\n  \"data/tidy_finance_r.sqlite\",\n  extended_types = TRUE\n)\n\nfactors_q_monthly <- tbl(tidy_finance, \"factors_q_monthly\")\nfactors_q_monthly <- factors_q_monthly |> collect()\n```\n:::\n\n\n## Managing SQLite Databases\n\nFinally, at the end of our data chapter, we revisit the SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.\\index{Database!Management}\n\nTo optimize the database file, you can run the `VACUUM` command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the `dbSendQuery()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nres <- dbSendQuery(tidy_finance, \"VACUUM\")\nres\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n<SQLiteResult>\n  SQL  VACUUM\n  ROWS Fetched: 0 [complete]\n       Changed: 0\n```\n\n\n:::\n:::\n\n\nThe `VACUUM` command actually performs a couple of additional cleaning steps, which you can read about in [this tutorial.](https://www.sqlitetutorial.net/sqlite-vacuum/) \\index{Database!Cleaning}\n\nWe store the result of the above query in `res` because the database keeps the result set open. To close open results and avoid warnings going forward, we can use `dbClearResult()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndbClearResult(res)\n```\n:::\n\n\nApart from cleaning up, you might be interested in listing all the tables that are currently in your database. You can do this via the `dbListTables()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndbListTables(tidy_finance)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n [1] \"beta\"                  \"compustat\"            \n [3] \"cpi_monthly\"           \"crsp_daily\"           \n [5] \"crsp_monthly\"          \"factors_ff3_daily\"    \n [7] \"factors_ff3_monthly\"   \"factors_ff5_monthly\"  \n [9] \"factors_q_monthly\"     \"fisd\"                 \n[11] \"industries_ff_monthly\" \"macro_predictors\"     \n[13] \"trace_enhanced\"       \n```\n\n\n:::\n:::\n\n\nThis function comes in handy if you are unsure about the correct naming of the tables in your database.\n\n## Key Takeaways\n\n- Importing Fama-French factors, q-factors, macroeconomic indicators, and CPI data is simplified through API calls, CSV parsing, and web scraping techniques.\n- The `tidyfinance` R package offers pre-processed access to financial datasets, reducing manual data cleaning and saving valuable time.\n- Creating a centralized SQLite database helps manage and organize data efficiently across projects, while maintaining reproducibility.\n- Structured database storage supports scalable data access, which is essential for long-term academic projects and collaborative work in finance.\n\n## Exercises\n\n1.  Download the monthly Fama-French factors manually from [Ken French's data library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) and read them in via `read_csv()`. Validate that you get the same data as via the `frenchdata` package.\n2.  Download the daily Fama-French 5 factors using the `frenchdata` package. Use `get_french_data_list()` to find the corresponding table name. After the successful download and conversion to the column format that we used above, compare the `rf`, `mkt_excess`, `smb`, and `hml` columns of `factors_ff3_daily` to `factors_ff5_daily`. Discuss any differences you might find.\n",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"
diff --git a/docs/accessing-and-managing-financial-data.html b/docs/accessing-and-managing-financial-data.html
index 0aa4e119..2c36535a 100644
--- a/docs/accessing-and-managing-financial-data.html
+++ b/docs/accessing-and-managing-financial-data.html
@@ -6,6 +6,10 @@
     var hash = window.location.hash.startsWith('#') ? window.location.hash.slice(1) : window.location.hash;
     var redirect = redirects[hash] || redirects[""] || "/";
     window.document.title = 'Redirect to  ' +  redirect;
+    if (!redirects[hash]) {
+      redirect = redirect + window.location.hash;
+    }
+    redirect = redirect + window.location.search;
     window.location.replace(redirect);
   </script>
 </head>
diff --git a/docs/python/accessing-and-managing-financial-data.html b/docs/python/accessing-and-managing-financial-data.html
index 681f17c8..8e787a63 100644
--- a/docs/python/accessing-and-managing-financial-data.html
+++ b/docs/python/accessing-and-managing-financial-data.html
@@ -2,7 +2,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
 
 <meta charset="utf-8">
-<meta name="generator" content="quarto-1.7.32">
+<meta name="generator" content="quarto-1.8.25">
 
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
@@ -79,7 +79,7 @@
 }</style>
 
 
-<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js" integrity="sha512-bLT0Qm9VnAYZDflyKcBaQ2gg0hSYNQrJ8RilYldYQ1FxQYoCLtUjuuRuZo+fjqhx/qtq/1itJ0C2ejDxltZVFg==" crossorigin="anonymous"></script><script src="../site_libs/quarto-nav/quarto-nav.js"></script>
+<script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js" integrity="sha384-ZvpUoO/+PpLXR1lu4jmpXWu80pZlYUAfxl5NsBMWOEPSjUn/6Z/hRTt8+pR6L4N2" crossorigin="anonymous"></script><script src="../site_libs/quarto-nav/quarto-nav.js"></script>
 <script src="../site_libs/clipboard/clipboard.min.js"></script>
 <script src="../site_libs/quarto-search/autocomplete.umd.js"></script>
 <script src="../site_libs/quarto-search/fuse.min.js"></script>
@@ -92,14 +92,15 @@
 <link href="../site_libs/cookie-consent/cookie-consent.css" rel="stylesheet">
 <script src="../site_libs/quarto-html/quarto.js" type="module"></script>
 <script src="../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script>
+<script src="../site_libs/quarto-html/axe/axe-check.js" type="module"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
 <script src="../site_libs/quarto-html/anchor.min.js"></script>
 <link href="../site_libs/quarto-html/tippy.css" rel="stylesheet">
-<link href="../site_libs/quarto-html/quarto-syntax-highlighting-37eea08aefeeee20ff55810ff984fec1.css" rel="stylesheet" id="quarto-text-highlighting-styles">
+<link href="../site_libs/quarto-html/quarto-syntax-highlighting-7b89279ff1a6dce999919e0e67d4d9ec.css" rel="stylesheet" id="quarto-text-highlighting-styles">
 <script src="../site_libs/bootstrap/bootstrap.min.js"></script>
 <link href="../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
-<link href="../site_libs/bootstrap/bootstrap-99a8851848bcfc05e6486f9d9e6d6ff3.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="light">
+<link href="../site_libs/bootstrap/bootstrap-0514632cc3f7a6d071f65f884e64e3e1.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="light">
 <script id="quarto-search-options" type="application/json">{
   "location": "navbar",
   "copy-button": false,
@@ -155,7 +156,7 @@
 </script> 
   
 <style>html{ scroll-behavior: smooth; }</style>
-<script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js" integrity="sha512-c3Nl8+7g4LMSTdrm621y7kf9v3SDPnhxLNhcjFJbKECVnmZHTdo+IRO05sNLTH/D3vA6u1X32ehoLC7WFVdheg==" crossorigin="anonymous"></script>
+<script src="https://cdn.jsdelivr.net/npm/requirejs@2.3.6/require.min.js" integrity="sha384-c9c+LnTbwQ3aujuU7ULEPVvgLs+Fn6fJUvIGTsuu1ZcCf11fiEubah0ttpca4ntM sha384-6V1/AdqZRWk1KAlWbKBlGhN7VG4iE/yAZcO6NZPMF8od0vukrvr0tg4qY6NSrItx" crossorigin="anonymous"></script>
 
 <script type="application/javascript">define('jquery', [],function() {return window.jQuery;})</script>
 
@@ -173,7 +174,7 @@
     var macros = [];
     for (var i = 0; i < mathElements.length; i++) {
       var texText = mathElements[i].firstChild;
-      if (mathElements[i].tagName == "SPAN") {
+      if (mathElements[i].tagName == "SPAN" && texText && texText.data) {
         window.katex.render(texText.data, mathElements[i], {
           displayMode: mathElements[i].classList.contains('display'),
           throwOnError: false,
@@ -204,7 +205,8 @@
       <div class="navbar-container container-fluid">
       <div class="navbar-brand-container mx-auto">
     <a href="../index.html" class="navbar-brand navbar-brand-logo">
-    <img src="../assets/img/logo-website-white.png" alt="" class="navbar-logo">
+    <img src="../assets/img/logo-website-white.png" alt="" class="navbar-logo light-content">
+    <img src="../assets/img/logo-website-white.png" alt="" class="navbar-logo dark-content">
     </a>
     <a class="navbar-brand" href="../index.html">
     <span class="navbar-title">Tidy Finance</span>
@@ -268,6 +270,10 @@ <h1 class="quarto-secondary-nav-title">Accessing and Managing Financial Data</h1
 <div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar">
 <!-- sidebar -->
   <nav id="quarto-sidebar" class="sidebar collapse collapse-horizontal quarto-sidebar-collapse-item sidebar-navigation floating overflow-auto">
+    <div class="pt-lg-2 mt-2 text-left sidebar-header">
+      <a href="../index.html" class="sidebar-logo-link">
+      </a>
+      </div>
         <div class="mt-2 flex-shrink-0 align-items-center">
         <div class="sidebar-search">
         <div id="quarto-search" class="" title="Search"></div>
@@ -554,6 +560,9 @@ <h2 id="toc-title">On this page</h2>
    
   <ul>
   <li><a href="#fama-french-data" id="toc-fama-french-data" class="nav-link active" data-scroll-target="#fama-french-data">Fama-French Data</a></li>
+  <li><a href="#from-manual-steps-to-a-download-script" id="toc-from-manual-steps-to-a-download-script" class="nav-link" data-scroll-target="#from-manual-steps-to-a-download-script">From manual steps to a download script</a></li>
+  <li><a href="#using-tidyfinance-instead-of-reimplementing-the-plumbing" id="toc-using-tidyfinance-instead-of-reimplementing-the-plumbing" class="nav-link" data-scroll-target="#using-tidyfinance-instead-of-reimplementing-the-plumbing">Using <code>tidyfinance</code> instead of reimplementing the plumbing</a>
+  <ul class="collapse">
   <li><a href="#q-factors" id="toc-q-factors" class="nav-link" data-scroll-target="#q-factors">q-Factors</a></li>
   <li><a href="#macroeconomic-predictors" id="toc-macroeconomic-predictors" class="nav-link" data-scroll-target="#macroeconomic-predictors">Macroeconomic Predictors</a></li>
   <li><a href="#other-macroeconomic-data" id="toc-other-macroeconomic-data" class="nav-link" data-scroll-target="#other-macroeconomic-data">Other Macroeconomic Data</a></li>
@@ -561,6 +570,7 @@ <h2 id="toc-title">On this page</h2>
   <li><a href="#managing-sqlite-databases" id="toc-managing-sqlite-databases" class="nav-link" data-scroll-target="#managing-sqlite-databases">Managing SQLite Databases</a></li>
   <li><a href="#key-takeaways" id="toc-key-takeaways" class="nav-link" data-scroll-target="#key-takeaways">Key Takeaways</a></li>
   <li><a href="#exercises" id="toc-exercises" class="nav-link" data-scroll-target="#exercises">Exercises</a></li>
+  </ul></li>
   </ul>
 <div class="toc-actions"><ul><li><a href="https://github.com/tidy-finance/website/blob/main/python/accessing-and-managing-financial-data.qmd" class="toc-action"><i class="bi bi-github"></i>View source</a></li></ul></div></nav>
     </div>
@@ -602,138 +612,300 @@ <h1 class="title d-none d-lg-block">Accessing and Managing Financial Data</h1>
 <p>In this chapter, we suggest a way to organize your financial data. Everybody who has experience with data is also familiar with storing data in various formats like CSV, XLS, XLSX, or other delimited value storage. Reading and saving data can become very cumbersome when using different data formats and across different projects. Moreover, storing data in delimited files often leads to problems with respect to column type consistency. For instance, date-type columns frequently lead to inconsistencies across different data formats and programming languages.</p>
 <p>This chapter shows how to import different open-source datasets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series. We store all the data in a <em>single</em> database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.</p>
 <p>First, we load the Python packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them.</p>
-<div id="52142987" class="cell" data-execution_count="2">
-<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
+<div id="2068e538" class="cell" data-execution_count="2">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
-<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> tidyfinance <span class="im">as</span> tf</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> io</span>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> re</span>
+<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> zipfile</span>
+<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> curl_cffi <span class="im">import</span> requests</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Moreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.</p>
-<div id="dd94c04a" class="cell" data-execution_count="3">
-<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>start_date <span class="op">=</span> <span class="st">"1960-01-01"</span></span>
-<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>end_date <span class="op">=</span> <span class="st">"2024-12-31"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="c7108e44" class="cell" data-execution_count="3">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>start_date <span class="op">=</span> <span class="st">"1960-01-01"</span></span>
+<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>end_date <span class="op">=</span> <span class="st">"2024-12-31"</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <section id="fama-french-data" class="level2">
 <h2 class="anchored" data-anchor-id="fama-french-data">Fama-French Data</h2>
-<p>We start by downloading some famous Fama-French factors <span class="citation" data-cites="Fama1993">(e.g., <a href="#ref-Fama1993" role="doc-biblioref">Fama and French 1993</a>)</span> and portfolio returns commonly used in empirical asset pricing. Fortunately, the <code>pandas-datareader</code> package provides a simple interface to read data from Kenneth French’s Data Library.</p>
-<div id="4b8d1ab2" class="cell" data-execution_count="4">
-<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas_datareader <span class="im">as</span> pdr</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<p>We start by downloading some famous Fama-French factors <span class="citation" data-cites="Fama1993">(e.g., <a href="#ref-Fama1993" role="doc-biblioref">Fama and French 1993</a>)</span> and portfolio returns commonly used in empirical asset pricing. The data are freely available from Kenneth French’s Data Library, but the raw files come in a rather idiosyncratic format. If you access the data via the website, the manual <em>raw</em> workflow looks like this:</p>
+<ol type="1">
+<li>Go to the website</li>
+<li>Find the right dataset</li>
+<li>Download a ZIP file</li>
+<li>Extract the CSV inside</li>
+<li>Select the right data table from the file and import the table into Python</li>
+<li>Clean the dates, scale the returns, fix column names, handle missing values, etc.</li>
+</ol>
+<p>Doing this once is fine; doing it repeatedly across projects is exactly the type of boilerplate that’s easy to mess up and annoying to maintain. It is therefore natural to automate these steps in Python.</p>
+</section>
+<section id="from-manual-steps-to-a-download-script" class="level1">
+<h1>From manual steps to a download script</h1>
+<p>A minimal download script mirrors the manual steps one by one. For example, to fetch a Fama–French dataset you first construct the URL:</p>
+<div id="9ab8e5db" class="cell" data-execution_count="4">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> <span class="st">"F-F_Research_Data_Factors"</span></span>
+<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>base_url <span class="op">=</span> <span class="st">"http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/"</span></span>
+<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>url <span class="op">=</span> <span class="ss">f"</span><span class="sc">{</span>base_url<span class="sc">}{</span>dataset<span class="sc">}</span><span class="ss">_CSV.zip"</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<p>Next, you replace the browser download with an HTTP request and extract the ZIP in memory:</p>
+<div id="aafb772e" class="cell" data-execution_count="5">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>resp <span class="op">=</span> requests.get(url)</span>
+<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>resp.raise_for_status()</span>
+<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> zipfile.ZipFile(io.BytesIO(resp.content)) <span class="im">as</span> zf:</span>
+<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    file_name <span class="op">=</span> zf.namelist()[<span class="dv">0</span>]  <span class="co"># Ken French ZIPs contain one file</span></span>
+<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>    raw_text <span class="op">=</span> zf.read(file_name).decode(<span class="st">"latin1"</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<p>The most important part of this chunk is the <code>requests.get()</code> call. This is the moment where we replace all the manual browser work (open the website, click download, save the file) with a single, reproducible line of code. Then, calling <code>raise_for_status()</code> ensures we stop immediately if the server returns an error (e.g.&nbsp;HTTP 404 or 500) instead of quietly handling a broken file. Once this succeeds, <code>resp.content</code> is guaranteed to contain valid ZIP bytes that we can open in memory.</p>
+<p>The raw file contains documentation text followed by the actual data table(s). To emulate <em>scrolling down until the numbers start</em>, you can split the file into blocks and keep the long one that contains the table:</p>
+<div id="61d53ef8" class="cell" data-execution_count="6">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>chunks <span class="op">=</span> raw_text.split(<span class="st">"</span><span class="ch">\r\n\r\n</span><span class="st">"</span>)</span>
+<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>table_text <span class="op">=</span> <span class="bu">max</span>(chunks, key<span class="op">=</span><span class="bu">len</span>)  </span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<p>Within this block, the first CSV header line starts at the first line beginning with a comma. We add a “Date” label for the index and pass everything to <code>read_csv</code>:</p>
+<div id="d72929a3" class="cell" data-execution_count="7">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>match <span class="op">=</span> re.search(<span class="vs">r"</span><span class="dv">^\s</span><span class="op">*</span><span class="vs">,"</span>, table_text, flags<span class="op">=</span>re.M)</span>
+<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>start <span class="op">=</span> match.start()</span>
+<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>csv_text <span class="op">=</span> <span class="st">"Date"</span> <span class="op">+</span> table_text[start:]</span>
+<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>factors_ff_raw <span class="op">=</span> pd.read_csv(io.StringIO(csv_text), index_col<span class="op">=</span><span class="dv">0</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
-<p>We can use the <code>pdr.DataReader()</code> function of the package to download monthly Fama-French factors. The set <em>Fama/French 3 Factors</em> contains the return time series of the market (<code>mkt_excess</code>), size (<code>smb</code>), and value (<code>hml</code>) factors alongside the risk-free rates (<code>rf</code>). Note that we have to do some manual work to parse all the columns correctly and scale them appropriately, as the raw Fama-French data comes in a unique data format. For precise descriptions of the variables, we suggest consulting Prof.&nbsp;Kenneth French’s finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to<code>pandas_datareader</code>.</p>
-<div id="dbee5fef" class="cell" data-execution_count="5">
-<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly_raw <span class="op">=</span> pdr.DataReader(</span>
-<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>  name<span class="op">=</span><span class="st">"F-F_Research_Data_Factors"</span>,</span>
-<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>  data_source<span class="op">=</span><span class="st">"famafrench"</span>, </span>
-<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>  start<span class="op">=</span>start_date, </span>
-<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>  end<span class="op">=</span>end_date)[<span class="dv">0</span>]</span>
-<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly <span class="op">=</span> (factors_ff3_monthly_raw</span>
-<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>  .divide(<span class="dv">100</span>)</span>
-<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>  .reset_index(names<span class="op">=</span><span class="st">"date"</span>)</span>
-<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>  .assign(date<span class="op">=</span><span class="kw">lambda</span> x: pd.to_datetime(x[<span class="st">"date"</span>].astype(<span class="bu">str</span>)))</span>
-<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>  .rename(<span class="bu">str</span>.lower, axis<span class="op">=</span><span class="st">"columns"</span>)</span>
-<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>  .rename(columns<span class="op">=</span>{<span class="st">"mkt-rf"</span>: <span class="st">"mkt_excess"</span>})</span>
-<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<p>At this point, the index still consists of integer date codes with different lengths depending on the frequency. We need a bit of logic to convert them into a proper <code>DatetimeIndex</code>:</p>
+<div id="7a97a04a" class="cell" data-execution_count="8">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>s <span class="op">=</span> factors_ff_raw.index.astype(<span class="bu">str</span>)</span>
+<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (s.<span class="bu">str</span>.<span class="bu">len</span>() <span class="op">==</span> <span class="dv">8</span>).<span class="bu">all</span>():  <span class="co"># daily: YYYYMMDD</span></span>
+<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>    dt <span class="op">=</span> pd.to_datetime(s, <span class="bu">format</span><span class="op">=</span><span class="st">"%Y%m</span><span class="sc">%d</span><span class="st">"</span>)</span>
+<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="cf">elif</span> (s.<span class="bu">str</span>.<span class="bu">len</span>() <span class="op">==</span> <span class="dv">6</span>).<span class="bu">all</span>():  <span class="co"># monthly: YYYYMM</span></span>
+<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>    dt <span class="op">=</span> pd.to_datetime(s <span class="op">+</span> <span class="st">"01"</span>, <span class="bu">format</span><span class="op">=</span><span class="st">"%Y%m</span><span class="sc">%d</span><span class="st">"</span>)</span>
+<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="cf">elif</span> (s.<span class="bu">str</span>.<span class="bu">len</span>() <span class="op">==</span> <span class="dv">4</span>).<span class="bu">all</span>():  <span class="co"># annual: YYYY</span></span>
+<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>    dt <span class="op">=</span> pd.to_datetime(s <span class="op">+</span> <span class="st">"0101"</span>, <span class="bu">format</span><span class="op">=</span><span class="st">"%Y%m</span><span class="sc">%d</span><span class="st">"</span>)</span>
+<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>    dt <span class="op">=</span> dt.dt.to_period(<span class="st">"A-DEC"</span>).dt.to_timestamp(<span class="st">"end"</span>)</span>
+<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a><span class="cf">else</span>:</span>
+<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a>    <span class="cf">raise</span> <span class="pp">ValueError</span>(<span class="st">"Unknown date format in Fama–French index."</span>)</span>
+<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a>factors_ff_raw <span class="op">=</span> factors_ff_raw.set_index(dt)</span>
+<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a>factors_ff_raw.index.name <span class="op">=</span> <span class="st">"date"</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<p>Finally, we still have to clean the data:</p>
+<ul>
+<li>Convert returns from percent to decimal.</li>
+<li>Standardize column names (e.g., all lower case and Mkt-RF to mkt_excess, RF to risk_free)</li>
+<li>Replace special missing-value codes (-99.99, -999) with actual missing values</li>
+<li>Filter the data by a start and end date</li>
+</ul>
+<p>This all could look like this:</p>
+<div id="77d01e70" class="cell" data-execution_count="9">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># start and end dates</span></span>
+<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> start_date:</span>
+<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>    factors_ff_raw <span class="op">=</span> factors_ff_raw[factors_ff_raw.index <span class="op">&gt;=</span> pd.to_datetime(start_date)]</span>
+<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> end_date:</span>
+<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>    factors_ff_raw <span class="op">=</span> factors_ff_raw[factors_ff_raw.index <span class="op">&lt;=</span> pd.to_datetime(end_date)]</span>
+<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly <span class="op">=</span> (factors_ff_raw</span>
+<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a>    .div(<span class="dv">100</span>)</span>
+<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a>    .reset_index(names<span class="op">=</span><span class="st">"date"</span>)</span>
+<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>    .rename(columns<span class="op">=</span><span class="bu">str</span>.lower)</span>
+<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a>    .rename(columns<span class="op">=</span>{<span class="st">"mkt-rf"</span>: <span class="st">"mkt_excess"</span>, <span class="st">"rf"</span>: <span class="st">"risk_free"</span>})</span>
+<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a>    .replace({<span class="st">"-99.99"</span>: pd.NA, <span class="op">-</span><span class="fl">99.99</span>: pd.NA, <span class="op">-</span><span class="dv">999</span>: pd.NA})</span>
+<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-display" data-execution_count="37">
+<div>
+
+
+<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
+<thead>
+<tr class="header">
+<th data-quarto-table-cell-role="th"></th>
+<th data-quarto-table-cell-role="th">date</th>
+<th data-quarto-table-cell-role="th">mkt_excess</th>
+<th data-quarto-table-cell-role="th">smb</th>
+<th data-quarto-table-cell-role="th">hml</th>
+<th data-quarto-table-cell-role="th">risk_free</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">0</th>
+<td>1960-01-01</td>
+<td>-0.0698</td>
+<td>0.0212</td>
+<td>0.0265</td>
+<td>0.0033</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">1</th>
+<td>1960-02-01</td>
+<td>0.0116</td>
+<td>0.0060</td>
+<td>-0.0197</td>
+<td>0.0029</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">2</th>
+<td>1960-03-01</td>
+<td>-0.0163</td>
+<td>-0.0055</td>
+<td>-0.0275</td>
+<td>0.0035</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">3</th>
+<td>1960-04-01</td>
+<td>-0.0171</td>
+<td>0.0022</td>
+<td>-0.0214</td>
+<td>0.0019</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">4</th>
+<td>1960-05-01</td>
+<td>0.0312</td>
+<td>0.0129</td>
+<td>-0.0373</td>
+<td>0.0027</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">...</th>
+<td>...</td>
+<td>...</td>
+<td>...</td>
+<td>...</td>
+<td>...</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">775</th>
+<td>2024-08-01</td>
+<td>0.0160</td>
+<td>-0.0349</td>
+<td>-0.0110</td>
+<td>0.0048</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">776</th>
+<td>2024-09-01</td>
+<td>0.0172</td>
+<td>-0.0013</td>
+<td>-0.0277</td>
+<td>0.0040</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">777</th>
+<td>2024-10-01</td>
+<td>-0.0100</td>
+<td>-0.0099</td>
+<td>0.0086</td>
+<td>0.0039</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">778</th>
+<td>2024-11-01</td>
+<td>0.0649</td>
+<td>0.0446</td>
+<td>0.0015</td>
+<td>0.0040</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">779</th>
+<td>2024-12-01</td>
+<td>-0.0317</td>
+<td>-0.0271</td>
+<td>-0.0300</td>
+<td>0.0037</td>
+</tr>
+</tbody>
+</table>
+
+<p>780 rows × 5 columns</p>
+</div>
+</div>
+</div>
+<p>All of these steps are doable, but none of them are really about finance - they are just the technical scaffolding required before you can work with the actual factor returns. That’s where a dedicated helper or package becomes invaluable. The <code>tidyfinance</code> package performs this entire workflow under the hood: you request a Fama–French dataset and receive a clean, consistently formatted data table from Kenneth French’s Data Library.. This avoids repetitive boilerplate, reduces errors, and lets you focus on modeling and analysis rather than on data plumbing.</p>
+</section>
+<section id="using-tidyfinance-instead-of-reimplementing-the-plumbing" class="level1">
+<h1>Using <code>tidyfinance</code> instead of reimplementing the plumbing</h1>
+<div id="18f92154" class="cell" data-execution_count="10">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> tidyfinance <span class="im">as</span> tf</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+</div>
+<p>For example, we can use the <code>tf.download_data()</code> function of the package to download monthly Fama-French factors. The set <em>Fama/French 3 Factors</em> contains the return time series of the market (<code>mkt_excess</code>), size (<code>smb</code>), and value (<code>hml</code>) factors alongside the risk-free rates (<code>risk_free</code>). Note that the <code>tf.download_data()</code> function parses all the columns correctly and already scale them appropriately, as the raw Fama-French data comes in a unique data format. For precise descriptions of the variables, we suggest consulting Prof.&nbsp;Kenneth French’s finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to the <code>tidyfinance</code> package.</p>
+<div id="fd60bf92" class="cell" data-execution_count="11">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly <span class="op">=</span> tf.download_data(</span>
+<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"famafrench"</span>,</span>
+<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>  dataset<span class="op">=</span><span class="st">"F-F_Research_Data_Factors"</span>,</span>
+<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date,</span>
+<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date,</span>
+<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>We also download the set <em>5 Factors (2x3)</em>, which additionally includes the return time series of the profitability (<code>rmw</code>) and investment (<code>cma</code>) factors. We demonstrate how the monthly factors are constructed in <a href="../python/replicating-fama-and-french-factors.html">Replicating Fama and French Factors</a>.</p>
-<div id="9e9e1781" class="cell" data-execution_count="6">
-<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>factors_ff5_monthly_raw <span class="op">=</span> pdr.DataReader(</span>
-<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>  name<span class="op">=</span><span class="st">"F-F_Research_Data_5_Factors_2x3"</span>,</span>
-<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>  data_source<span class="op">=</span><span class="st">"famafrench"</span>, </span>
-<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>  start<span class="op">=</span>start_date, </span>
-<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>  end<span class="op">=</span>end_date)[<span class="dv">0</span>]</span>
-<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>factors_ff5_monthly <span class="op">=</span> (factors_ff5_monthly_raw</span>
-<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>  .divide(<span class="dv">100</span>)</span>
-<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>  .reset_index(names<span class="op">=</span><span class="st">"date"</span>)</span>
-<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>  .assign(date<span class="op">=</span><span class="kw">lambda</span> x: pd.to_datetime(x[<span class="st">"date"</span>].astype(<span class="bu">str</span>)))</span>
-<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>  .rename(<span class="bu">str</span>.lower, axis<span class="op">=</span><span class="st">"columns"</span>)</span>
-<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a>  .rename(columns<span class="op">=</span>{<span class="st">"mkt-rf"</span>: <span class="st">"mkt_excess"</span>})</span>
-<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="bf347fbd" class="cell" data-execution_count="12">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>factors_ff5_monthly <span class="op">=</span> tf.download_data(</span>
+<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"famafrench"</span>,</span>
+<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>  dataset<span class="op">=</span><span class="st">"F-F_Research_Data_5_Factors_2x3"</span>,</span>
+<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date,</span>
+<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date,</span>
+<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>It is straightforward to download the corresponding <em>daily</em> Fama-French factors with the same function.</p>
-<div id="f0848293" class="cell" data-execution_count="7">
-<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_daily_raw <span class="op">=</span> pdr.DataReader(</span>
-<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>  name<span class="op">=</span><span class="st">"F-F_Research_Data_Factors_daily"</span>,</span>
-<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>  data_source<span class="op">=</span><span class="st">"famafrench"</span>, </span>
-<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>  start<span class="op">=</span>start_date, </span>
-<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>  end<span class="op">=</span>end_date)[<span class="dv">0</span>]</span>
-<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a>factors_ff3_daily <span class="op">=</span> (factors_ff3_daily_raw</span>
-<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>  .divide(<span class="dv">100</span>)</span>
-<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>  .reset_index(names<span class="op">=</span><span class="st">"date"</span>)</span>
-<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>  .rename(<span class="bu">str</span>.lower, axis<span class="op">=</span><span class="st">"columns"</span>)</span>
-<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>  .rename(columns<span class="op">=</span>{<span class="st">"mkt-rf"</span>: <span class="st">"mkt_excess"</span>})</span>
-<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="7708ed92" class="cell" data-execution_count="13">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_daily <span class="op">=</span> tf.download_data(</span>
+<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"famafrench"</span>,</span>
+<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>  dataset<span class="op">=</span><span class="st">"F-F_Research_Data_Factors_daily"</span>,</span>
+<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date,</span>
+<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date,</span>
+<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>In a subsequent chapter, we also use the monthly returns from ten industry portfolios, so let us fetch that data, too.</p>
-<div id="cbe8cb33" class="cell" data-execution_count="8">
-<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>industries_ff_monthly_raw <span class="op">=</span> pdr.DataReader(</span>
-<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>  name<span class="op">=</span><span class="st">"10_Industry_Portfolios"</span>,</span>
-<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>  data_source<span class="op">=</span><span class="st">"famafrench"</span>, </span>
-<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>  start<span class="op">=</span>start_date, </span>
-<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>  end<span class="op">=</span>end_date)[<span class="dv">0</span>]</span>
-<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>industries_ff_monthly <span class="op">=</span> (industries_ff_monthly_raw</span>
-<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>  .divide(<span class="dv">100</span>)</span>
-<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>  .reset_index(names<span class="op">=</span><span class="st">"date"</span>)</span>
-<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>  .assign(date<span class="op">=</span><span class="kw">lambda</span> x: pd.to_datetime(x[<span class="st">"date"</span>].astype(<span class="bu">str</span>)))</span>
-<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a>  .rename(<span class="bu">str</span>.lower, axis<span class="op">=</span><span class="st">"columns"</span>)</span>
-<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-</div>
-<p>It is worth taking a look at all available portfolio return time series from Kenneth French’s homepage. You should check out the other sets by calling <code>pdr.famafrench.get_available_datasets()</code>.</p>
-<p>To automatically download and process Fama-French data, you can also use the <code>tidyfinance</code> package with <code>domain="factors_ff"</code> and the corresponding dataset, e.g.:</p>
-<div id="557adcfb" class="cell" data-execution_count="9">
-<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>tf.download_data(</span>
-<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"factors_ff"</span>,</span>
-<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>  dataset<span class="op">=</span><span class="st">"F-F_Research_Data_Factors"</span>, </span>
-<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date, </span>
-<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date</span>
-<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="6a04aed2" class="cell" data-execution_count="14">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>industries_ff_monthly <span class="op">=</span> tf.download_data(</span>
+<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"famafrench"</span>,</span>
+<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>  dataset<span class="op">=</span><span class="st">"10_Industry_Portfolios"</span>,</span>
+<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date,</span>
+<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date,</span>
+<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
-<p>The <code>tidyfinance</code> package implements the processing steps as above and returns the same cleaned data frame.</p>
-</section>
+<p>It is worth taking a look at all available portfolio return time series from Kenneth French’s homepage. You should check out the other sets by calling <code>tf.get_available_famafrench_datasets()</code>.</p>
 <section id="q-factors" class="level2">
 <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <p>In recent years, the academic discourse experienced the rise of alternative factor models, e.g., in the form of the <span class="citation" data-cites="Hou2015">Hou, Xue, and Zhang (<a href="#ref-Hou2015" role="doc-biblioref">2014</a>)</span> <em>q</em>-factor model. We refer to the <a href="http://global-q.org/background.html">extended background</a> information provided by the original authors for further information. The <em>q</em>-factors can be downloaded directly from the authors’ homepage from within <code>pd.read_csv()</code>. </p>
 <p>We also need to adjust this data. First, we discard information we will not use in the remainder of the book. Then, we rename the columns with the “R_”-prescript using regular expressions and write all column names in lowercase. We then query the data to select observations between the start and end dates. Finally, we use the double asterisk (<code>**</code>) notation in the <code>assign</code> function to apply the same transform of dividing by 100 to all four factors by iterating through them. You should always try sticking to a consistent style for naming objects, which we try to illustrate here - the emphasis is on <em>try</em>. You can check out style guides available online, e.g., <a href="https://style.tidyverse.org/index.html">Hadley Wickham’s <code>tidyverse</code> style guide.</a> note that we temporarily adjust the SSL certificate handling behavior in Python’s <code>ssl</code> module when retrieving the <span class="math inline">\(q\)</span>-factors directly from the web, as demonstrated in <a href="../python/working-with-stock-returns.html">Working with Stock Returns</a>. This method should be used with caution, which is why we restore the default settings immediately after successfully downloading the data.</p>
-<div id="1bcb6c34" class="cell" data-execution_count="10">
-<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> ssl</span>
-<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>ssl._create_default_https_context <span class="op">=</span> ssl._create_unverified_context</span>
-<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>factors_q_monthly_link <span class="op">=</span> (</span>
-<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>  <span class="st">"https://global-q.org/uploads/1/2/2/6/122679606/"</span></span>
-<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>  <span class="st">"q5_factors_monthly_2024.csv"</span></span>
-<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>factors_q_monthly <span class="op">=</span> (pd.read_csv(factors_q_monthly_link)</span>
-<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>  .assign(</span>
-<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a>    date<span class="op">=</span><span class="kw">lambda</span> x: (</span>
-<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a>      pd.to_datetime(x[<span class="st">"year"</span>].astype(<span class="bu">str</span>) <span class="op">+</span> <span class="st">"-"</span> <span class="op">+</span></span>
-<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a>        x[<span class="st">"month"</span>].astype(<span class="bu">str</span>) <span class="op">+</span> <span class="st">"-01"</span>))</span>
-<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a>  )</span>
-<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a>  .drop(columns<span class="op">=</span>[<span class="st">"R_F"</span>, <span class="st">"R_MKT"</span>, <span class="st">"year"</span>])</span>
-<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a>  .rename(columns<span class="op">=</span><span class="kw">lambda</span> x: x.replace(<span class="st">"R_"</span>, <span class="st">""</span>).lower())</span>
-<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a>  .query(<span class="ss">f"date &gt;= '</span><span class="sc">{</span>start_date<span class="sc">}</span><span class="ss">' and date &lt;= '</span><span class="sc">{</span>end_date<span class="sc">}</span><span class="ss">'"</span>)</span>
-<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a>  .assign(</span>
-<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a>    <span class="op">**</span>{col: <span class="kw">lambda</span> x: x[col]<span class="op">/</span><span class="dv">100</span> <span class="cf">for</span> col <span class="kw">in</span> [<span class="st">"me"</span>, <span class="st">"ia"</span>, <span class="st">"roe"</span>, <span class="st">"eg"</span>]}</span>
-<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a>  )</span>
-<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a>ssl._create_default_https_context <span class="op">=</span> ssl.create_default_context</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="4bf4afe7" class="cell" data-execution_count="15">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> ssl</span>
+<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>ssl._create_default_https_context <span class="op">=</span> ssl._create_unverified_context</span>
+<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>factors_q_monthly_link <span class="op">=</span> (</span>
+<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>  <span class="st">"https://global-q.org/uploads/1/2/2/6/122679606/"</span></span>
+<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>  <span class="st">"q5_factors_monthly_2024.csv"</span></span>
+<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a>factors_q_monthly <span class="op">=</span> (pd.read_csv(factors_q_monthly_link)</span>
+<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>  .assign(</span>
+<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a>    date<span class="op">=</span><span class="kw">lambda</span> x: (</span>
+<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a>      pd.to_datetime(x[<span class="st">"year"</span>].astype(<span class="bu">str</span>) <span class="op">+</span> <span class="st">"-"</span> <span class="op">+</span></span>
+<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a>        x[<span class="st">"month"</span>].astype(<span class="bu">str</span>) <span class="op">+</span> <span class="st">"-01"</span>))</span>
+<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a>  )</span>
+<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a>  .drop(columns<span class="op">=</span>[<span class="st">"R_F"</span>, <span class="st">"R_MKT"</span>, <span class="st">"year"</span>])</span>
+<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a>  .rename(columns<span class="op">=</span><span class="kw">lambda</span> x: x.replace(<span class="st">"R_"</span>, <span class="st">""</span>).lower())</span>
+<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a>  .query(<span class="ss">f"date &gt;= '</span><span class="sc">{</span>start_date<span class="sc">}</span><span class="ss">' and date &lt;= '</span><span class="sc">{</span>end_date<span class="sc">}</span><span class="ss">'"</span>)</span>
+<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a>  .assign(</span>
+<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a>    <span class="op">**</span>{col: <span class="kw">lambda</span> x: x[col]<span class="op">/</span><span class="dv">100</span> <span class="cf">for</span> col <span class="kw">in</span> [<span class="st">"me"</span>, <span class="st">"ia"</span>, <span class="st">"roe"</span>, <span class="st">"eg"</span>]}</span>
+<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a>  )</span>
+<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a>ssl._create_default_https_context <span class="op">=</span> ssl.create_default_context</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Again, you can use the <code>tidyfinance</code> package for a shortcut:</p>
-<div id="5cc217ab" class="cell" data-execution_count="11">
-<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>tf.download_data(</span>
-<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"factors_q"</span>,</span>
-<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>  dataset<span class="op">=</span><span class="st">"q5_factors_monthly"</span>, </span>
-<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date, </span>
-<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date</span>
-<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<div class="cell-output cell-output-display" data-execution_count="11">
+<div id="87c32f1a" class="cell" data-execution_count="16">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>tf.download_data(</span>
+<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"factors_q"</span>,</span>
+<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>  dataset<span class="op">=</span><span class="st">"q5_factors_monthly"</span>, </span>
+<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date, </span>
+<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date</span>
+<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-display" data-execution_count="44">
 <div>
 
 
-<table class="dataframe caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
+<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
 <thead>
 <tr class="header">
 <th data-quarto-table-cell-role="th"></th>
@@ -748,7 +920,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 </thead>
 <tbody>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">0</td>
+<th data-quarto-table-cell-role="th">0</th>
 <td>1967-01-01</td>
 <td>0.003927</td>
 <td>0.081852</td>
@@ -758,7 +930,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>-0.025511</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">1</td>
+<th data-quarto-table-cell-role="th">1</th>
 <td>1967-02-01</td>
 <td>0.003743</td>
 <td>0.007557</td>
@@ -768,7 +940,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>0.021792</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">2</td>
+<th data-quarto-table-cell-role="th">2</th>
 <td>1967-03-01</td>
 <td>0.003693</td>
 <td>0.040169</td>
@@ -778,7 +950,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>-0.011192</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">3</td>
+<th data-quarto-table-cell-role="th">3</th>
 <td>1967-04-01</td>
 <td>0.003344</td>
 <td>0.038786</td>
@@ -788,7 +960,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>-0.016371</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">4</td>
+<th data-quarto-table-cell-role="th">4</th>
 <td>1967-05-01</td>
 <td>0.003126</td>
 <td>-0.042807</td>
@@ -798,7 +970,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>0.001191</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">...</td>
+<th data-quarto-table-cell-role="th">...</th>
 <td>...</td>
 <td>...</td>
 <td>...</td>
@@ -808,7 +980,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>...</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">691</td>
+<th data-quarto-table-cell-role="th">691</th>
 <td>2024-08-01</td>
 <td>0.004419</td>
 <td>0.016518</td>
@@ -818,7 +990,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>0.008116</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">692</td>
+<th data-quarto-table-cell-role="th">692</th>
 <td>2024-09-01</td>
 <td>0.004619</td>
 <td>0.016806</td>
@@ -828,7 +1000,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>-0.032810</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">693</td>
+<th data-quarto-table-cell-role="th">693</th>
 <td>2024-10-01</td>
 <td>0.003907</td>
 <td>-0.009701</td>
@@ -838,7 +1010,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>-0.008335</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">694</td>
+<th data-quarto-table-cell-role="th">694</th>
 <td>2024-11-01</td>
 <td>0.003955</td>
 <td>0.065002</td>
@@ -848,7 +1020,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <td>-0.021420</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">695</td>
+<th data-quarto-table-cell-role="th">695</th>
 <td>2024-12-01</td>
 <td>0.003663</td>
 <td>-0.031637</td>
@@ -868,13 +1040,13 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <section id="macroeconomic-predictors" class="level2">
 <h2 class="anchored" data-anchor-id="macroeconomic-predictors">Macroeconomic Predictors</h2>
 <p>Our next data source is a set of macroeconomic variables often used as predictors for the equity premium. <span class="citation" data-cites="Goyal2008">Welch and Goyal (<a href="#ref-Goyal2008" role="doc-biblioref">2008</a>)</span> comprehensively reexamine the performance of variables suggested by the academic literature to be good predictors of the equity premium. The authors host the data on <a href="https://sites.google.com/view/agoyal145">Amit Goyal’s website.</a> Since the data is an XLSX-file stored on a public Google Drive location, we need additional packages to access the data directly from our Python session. Usually, you need to authenticate if you interact with Google drive directly in Python. Since the data is stored via a public link, we can proceed without any authentication.</p>
-<div id="7bd33a2b" class="cell" data-execution_count="12">
-<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>sheet_id <span class="op">=</span> <span class="st">"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG"</span></span>
-<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>sheet_name <span class="op">=</span> <span class="st">"macro_predictors.xlsx"</span></span>
-<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>macro_predictors_link <span class="op">=</span> (</span>
-<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>  <span class="ss">f"https://docs.google.com/spreadsheets/d/</span><span class="sc">{</span>sheet_id<span class="sc">}</span><span class="ss">"</span> </span>
-<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>  <span class="ss">f"/gviz/tq?tqx=out:csv&amp;sheet=</span><span class="sc">{</span>sheet_name<span class="sc">}</span><span class="ss">"</span></span>
-<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="6ed1395b" class="cell" data-execution_count="17">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>sheet_id <span class="op">=</span> <span class="st">"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG"</span></span>
+<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>sheet_name <span class="op">=</span> <span class="st">"macro_predictors.xlsx"</span></span>
+<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>macro_predictors_link <span class="op">=</span> (</span>
+<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>  <span class="ss">f"https://docs.google.com/spreadsheets/d/</span><span class="sc">{</span>sheet_id<span class="sc">}</span><span class="ss">"</span> </span>
+<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>  <span class="ss">f"/gviz/tq?tqx=out:csv&amp;sheet=</span><span class="sc">{</span>sheet_name<span class="sc">}</span><span class="ss">"</span></span>
+<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Next, we read in the new data and transform the columns into the variables that we later use:</p>
 <ol type="1">
@@ -893,84 +1065,157 @@ <h2 class="anchored" data-anchor-id="macroeconomic-predictors">Macroeconomic Pre
 <li>Inflation (<code>infl</code>), the Consumer Price Index (All Urban Consumers) from the Bureau of Labor Statistics <span class="citation" data-cites="Campbell2004">(<a href="#ref-Campbell2004" role="doc-biblioref">Campbell and Vuolteenaho 2004</a>)</span>.</li>
 </ol>
 <p>For variable definitions and the required data transformations, you can consult the material on <a href="https://sites.google.com/view/agoyal145">Amit Goyal’s website.</a></p>
-<div id="3fc6df88" class="cell" data-execution_count="13">
-<div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>ssl._create_default_https_context <span class="op">=</span> ssl._create_unverified_context</span>
-<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>macro_predictors <span class="op">=</span> (</span>
-<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>  pd.read_csv(macro_predictors_link, thousands<span class="op">=</span><span class="st">","</span>)</span>
-<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>  .assign(</span>
-<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>    date<span class="op">=</span><span class="kw">lambda</span> x: pd.to_datetime(x[<span class="st">"yyyymm"</span>], <span class="bu">format</span><span class="op">=</span><span class="st">"%Y%m"</span>),</span>
-<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>    dp<span class="op">=</span><span class="kw">lambda</span> x: np.log(x[<span class="st">"D12"</span>])<span class="op">-</span>np.log(x[<span class="st">"Index"</span>]),</span>
-<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a>    dy<span class="op">=</span><span class="kw">lambda</span> x: np.log(x[<span class="st">"D12"</span>])<span class="op">-</span>np.log(x[<span class="st">"Index"</span>].shift(<span class="dv">1</span>)),</span>
-<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a>    ep<span class="op">=</span><span class="kw">lambda</span> x: np.log(x[<span class="st">"E12"</span>])<span class="op">-</span>np.log(x[<span class="st">"Index"</span>]),</span>
-<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a>    de<span class="op">=</span><span class="kw">lambda</span> x: np.log(x[<span class="st">"D12"</span>])<span class="op">-</span>np.log(x[<span class="st">"E12"</span>]),</span>
-<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>    tms<span class="op">=</span><span class="kw">lambda</span> x: x[<span class="st">"lty"</span>]<span class="op">-</span>x[<span class="st">"tbl"</span>],</span>
-<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a>    dfy<span class="op">=</span><span class="kw">lambda</span> x: x[<span class="st">"BAA"</span>]<span class="op">-</span>x[<span class="st">"AAA"</span>]</span>
-<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a>  )</span>
-<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a>  .rename(columns<span class="op">=</span>{<span class="st">"b/m"</span>: <span class="st">"bm"</span>})</span>
-<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a>  .get([<span class="st">"date"</span>, <span class="st">"dp"</span>, <span class="st">"dy"</span>, <span class="st">"ep"</span>, <span class="st">"de"</span>, <span class="st">"svar"</span>, <span class="st">"bm"</span>, </span>
-<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a>        <span class="st">"ntis"</span>, <span class="st">"tbl"</span>, <span class="st">"lty"</span>, <span class="st">"ltr"</span>, <span class="st">"tms"</span>, <span class="st">"dfy"</span>, <span class="st">"infl"</span>])</span>
-<span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a>  .query(<span class="st">"date &gt;= @start_date and date &lt;= @end_date"</span>)</span>
-<span id="cb12-18"><a href="#cb12-18" aria-hidden="true" tabindex="-1"></a>  .dropna()</span>
-<span id="cb12-19"><a href="#cb12-19" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb12-20"><a href="#cb12-20" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb12-21"><a href="#cb12-21" aria-hidden="true" tabindex="-1"></a>ssl._create_default_https_context <span class="op">=</span> ssl.create_default_context</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="af3f3685" class="cell" data-execution_count="18">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a>ssl._create_default_https_context <span class="op">=</span> ssl._create_unverified_context</span>
+<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>macro_predictors <span class="op">=</span> (</span>
+<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>  pd.read_csv(macro_predictors_link, thousands<span class="op">=</span><span class="st">","</span>)</span>
+<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a>  .assign(</span>
+<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>    date<span class="op">=</span><span class="kw">lambda</span> x: pd.to_datetime(x[<span class="st">"yyyymm"</span>], <span class="bu">format</span><span class="op">=</span><span class="st">"%Y%m"</span>),</span>
+<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a>    dp<span class="op">=</span><span class="kw">lambda</span> x: np.log(x[<span class="st">"D12"</span>])<span class="op">-</span>np.log(x[<span class="st">"Index"</span>]),</span>
+<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a>    dy<span class="op">=</span><span class="kw">lambda</span> x: np.log(x[<span class="st">"D12"</span>])<span class="op">-</span>np.log(x[<span class="st">"Index"</span>].shift(<span class="dv">1</span>)),</span>
+<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a>    ep<span class="op">=</span><span class="kw">lambda</span> x: np.log(x[<span class="st">"E12"</span>])<span class="op">-</span>np.log(x[<span class="st">"Index"</span>]),</span>
+<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a>    de<span class="op">=</span><span class="kw">lambda</span> x: np.log(x[<span class="st">"D12"</span>])<span class="op">-</span>np.log(x[<span class="st">"E12"</span>]),</span>
+<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a>    tms<span class="op">=</span><span class="kw">lambda</span> x: x[<span class="st">"lty"</span>]<span class="op">-</span>x[<span class="st">"tbl"</span>],</span>
+<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a>    dfy<span class="op">=</span><span class="kw">lambda</span> x: x[<span class="st">"BAA"</span>]<span class="op">-</span>x[<span class="st">"AAA"</span>]</span>
+<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a>  )</span>
+<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a>  .rename(columns<span class="op">=</span>{<span class="st">"b/m"</span>: <span class="st">"bm"</span>})</span>
+<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a>  .get([<span class="st">"date"</span>, <span class="st">"dp"</span>, <span class="st">"dy"</span>, <span class="st">"ep"</span>, <span class="st">"de"</span>, <span class="st">"svar"</span>, <span class="st">"bm"</span>, </span>
+<span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a>        <span class="st">"ntis"</span>, <span class="st">"tbl"</span>, <span class="st">"lty"</span>, <span class="st">"ltr"</span>, <span class="st">"tms"</span>, <span class="st">"dfy"</span>, <span class="st">"infl"</span>])</span>
+<span id="cb17-17"><a href="#cb17-17" aria-hidden="true" tabindex="-1"></a>  .query(<span class="st">"date &gt;= @start_date and date &lt;= @end_date"</span>)</span>
+<span id="cb17-18"><a href="#cb17-18" aria-hidden="true" tabindex="-1"></a>  .dropna()</span>
+<span id="cb17-19"><a href="#cb17-19" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb17-20"><a href="#cb17-20" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb17-21"><a href="#cb17-21" aria-hidden="true" tabindex="-1"></a>ssl._create_default_https_context <span class="op">=</span> ssl.create_default_context</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>To get the equivalent data through <code>tidyfinance</code>, you can call:</p>
-<div id="5f267096" class="cell" data-execution_count="14">
-<div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>tf.download_data(</span>
-<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"macro_predictors"</span>,</span>
-<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>  dataset<span class="op">=</span><span class="st">"monthly"</span>,</span>
-<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date, </span>
-<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date</span>
-<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="fa0b3e29" class="cell" data-execution_count="19">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a>tf.download_data(</span>
+<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"macro_predictors"</span>,</span>
+<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>  dataset<span class="op">=</span><span class="st">"monthly"</span>,</span>
+<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date, </span>
+<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date</span>
+<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 </section>
 <section id="other-macroeconomic-data" class="level2">
 <h2 class="anchored" data-anchor-id="other-macroeconomic-data">Other Macroeconomic Data</h2>
-<p>The Federal Reserve bank of St.&nbsp;Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. As an illustration, we use the already familiar <code>pandas-datareader</code> package to fetch consumer price index (CPI) data that can be found under the <a href="https://fred.stlouisfed.org/series/CPIAUCNS">CPIAUCNS</a> key.</p>
-<div id="b0515f3b" class="cell" data-execution_count="15">
-<div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>cpi_monthly <span class="op">=</span> (pdr.DataReader(</span>
-<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>    name<span class="op">=</span><span class="st">"CPIAUCNS"</span>, </span>
-<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>    data_source<span class="op">=</span><span class="st">"fred"</span>, </span>
-<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>    start<span class="op">=</span>start_date, </span>
-<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>    end<span class="op">=</span>end_date</span>
-<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>  )</span>
-<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a>  .reset_index(names<span class="op">=</span><span class="st">"date"</span>)</span>
-<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>  .rename(columns<span class="op">=</span>{<span class="st">"CPIAUCNS"</span>: <span class="st">"cpi"</span>})</span>
-<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a>  .assign(cpi<span class="op">=</span><span class="kw">lambda</span> x: x[<span class="st">"cpi"</span>] <span class="op">/</span> x[<span class="st">"cpi"</span>].iloc[<span class="op">-</span><span class="dv">1</span>])</span>
-<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<p>The Federal Reserve bank of St.&nbsp;Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. As an illustration, we use the <code>tidyfinance</code> package to fetch consumer price index (CPI) data that can be found under the <a href="https://fred.stlouisfed.org/series/CPIAUCNS">CPIAUCNS</a> key.</p>
+<div id="3d801a71" class="cell" data-execution_count="20">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a>series <span class="op">=</span> <span class="st">"CPIAUCNS"</span></span>
+<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>url <span class="op">=</span> <span class="ss">f"https://fred.stlouisfed.org/graph/fredgraph.csv?id=</span><span class="sc">{</span>series<span class="sc">}</span><span class="ss">"</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
-<p>Note that we use the <code>assign()</code> in the last line to set the current (latest) price level as the reference inflation level. To download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the <a href="https://fred.stlouisfed.org/series/PCU2122212122210">PCU2122212122210</a> key.</p>
-<p>The <code>tidyfinance</code> package can, of course, also fetch the same daily data and many more data series:</p>
-<div id="ceef174a" class="cell" data-execution_count="16">
-<div class="sourceCode cell-code" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>tf.download_data(</span>
-<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"fred"</span>,</span>
-<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>  series<span class="op">=</span><span class="st">"CPIAUCNS"</span>, </span>
-<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>  start_date<span class="op">=</span>start_date, </span>
-<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>  end_date<span class="op">=</span>end_date</span>
-<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<div class="cell-output cell-output-stdout">
-<pre><code>Failed to retrieve data for series CPIAUCNS: Failed to perform, curl: (6) Could not resolve host: https. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.
-Failed to retrieve data for series CPIAUCNS: 'date'</code></pre>
+<p>We can then use the <code>requests</code> module to request the CSV, extract the data from the response body, and convert the columns to a tidy format:</p>
+<div id="80fe0fbe" class="cell" data-execution_count="21">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a>resp <span class="op">=</span> requests.get(url)</span>
+<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>resp_csv <span class="op">=</span> pd.io.common.StringIO(resp.text)</span>
+<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a>cpi_monthly <span class="op">=</span> (pd.read_csv(resp_csv)</span>
+<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a>  .assign(</span>
+<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a>    date<span class="op">=</span><span class="kw">lambda</span> x: pd.to_datetime(x[<span class="st">"observation_date"</span>]),</span>
+<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a>    value<span class="op">=</span><span class="kw">lambda</span> x: pd.to_numeric(</span>
+<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a>      x[series], errors<span class="op">=</span><span class="st">"coerce"</span></span>
+<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a>    ),</span>
+<span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a>      series<span class="op">=</span>series,</span>
+<span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a>   )</span>
+<span id="cb20-12"><a href="#cb20-12" aria-hidden="true" tabindex="-1"></a>  .get([<span class="st">"date"</span>, <span class="st">"series"</span>, <span class="st">"value"</span>])</span>
+<span id="cb20-13"><a href="#cb20-13" aria-hidden="true" tabindex="-1"></a>  .query(<span class="st">"date &gt;= @start_date &amp; date &lt;= @end_date"</span>)</span>
+<span id="cb20-14"><a href="#cb20-14" aria-hidden="true" tabindex="-1"></a>  .assign(cpi<span class="op">=</span><span class="kw">lambda</span> x: x[<span class="st">"value"</span>] <span class="op">/</span> x[<span class="st">"value"</span>].iloc[<span class="op">-</span><span class="dv">1</span>])</span>
+<span id="cb20-15"><a href="#cb20-15" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
-<div class="cell-output cell-output-display" data-execution_count="16">
+<p>The last line sets the current (latest) price level as the reference price level.</p>
+<p>The <code>tidyfinance</code> package can, of course, also fetch the same index data and many more data series:</p>
+<div id="b94f5bd0" class="cell" data-execution_count="22">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a>tf.download_data(</span>
+<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>  domain<span class="op">=</span><span class="st">"fred"</span>,</span>
+<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>  series <span class="op">=</span> <span class="st">"CPIAUCNS"</span>,</span>
+<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a>  start_date <span class="op">=</span> start_date,</span>
+<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a>  end_date <span class="op">=</span> end_date</span>
+<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-display" data-execution_count="50">
 <div>
 
 
-<table class="dataframe caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
+<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
 <thead>
 <tr class="header">
 <th data-quarto-table-cell-role="th"></th>
 <th data-quarto-table-cell-role="th">date</th>
-<th data-quarto-table-cell-role="th">value</th>
 <th data-quarto-table-cell-role="th">series</th>
+<th data-quarto-table-cell-role="th">value</th>
 </tr>
 </thead>
 <tbody>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">0</th>
+<td>1960-01-01</td>
+<td>CPIAUCNS</td>
+<td>29.300</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">1</th>
+<td>1960-02-01</td>
+<td>CPIAUCNS</td>
+<td>29.400</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">2</th>
+<td>1960-03-01</td>
+<td>CPIAUCNS</td>
+<td>29.400</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">3</th>
+<td>1960-04-01</td>
+<td>CPIAUCNS</td>
+<td>29.500</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">4</th>
+<td>1960-05-01</td>
+<td>CPIAUCNS</td>
+<td>29.500</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">...</th>
+<td>...</td>
+<td>...</td>
+<td>...</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">775</th>
+<td>2024-08-01</td>
+<td>CPIAUCNS</td>
+<td>314.796</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">776</th>
+<td>2024-09-01</td>
+<td>CPIAUCNS</td>
+<td>315.301</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">777</th>
+<td>2024-10-01</td>
+<td>CPIAUCNS</td>
+<td>315.664</td>
+</tr>
+<tr class="even">
+<th data-quarto-table-cell-role="th">778</th>
+<td>2024-11-01</td>
+<td>CPIAUCNS</td>
+<td>315.493</td>
+</tr>
+<tr class="odd">
+<th data-quarto-table-cell-role="th">779</th>
+<td>2024-12-01</td>
+<td>CPIAUCNS</td>
+<td>315.605</td>
+</tr>
 </tbody>
 </table>
 
+<p>780 rows × 3 columns</p>
 </div>
 </div>
 </div>
@@ -980,99 +1225,99 @@ <h2 class="anchored" data-anchor-id="other-macroeconomic-data">Other Macroeconom
 <h2 class="anchored" data-anchor-id="setting-up-a-database">Setting Up a Database</h2>
 <p>Now that we have downloaded some (freely available) data from the web into the memory of our Python session, let us set up a database to store that information for future use. We will use the data stored in this database throughout the following chapters, but you could alternatively implement a different strategy and replace the respective code.</p>
 <p>There are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an <a href="https://SQLite.org/">SQLite</a>-database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured SQL database engine. Note that <a href="https://en.wikipedia.org/wiki/SQL">SQL</a> (Structured Query Language) is a standard language for accessing and manipulating databases.</p>
-<div id="a10081e3" class="cell" data-execution_count="17">
-<div class="sourceCode cell-code" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> sqlite3</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="b825745c" class="cell" data-execution_count="23">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> sqlite3</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>An SQLite-database is easily created - the code below is really all there is. You do not need any external software. Otherwise, date columns are stored and retrieved as integers. We will use the file <code>tidy_finance_r.sqlite</code>, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.</p>
-<div id="4b49f781" class="cell" data-execution_count="18">
-<div class="sourceCode cell-code" id="cb18"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> os</span>
-<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> <span class="kw">not</span> os.path.exists(<span class="st">"data"</span>):</span>
-<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>  os.makedirs(<span class="st">"data"</span>)</span>
-<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>    </span>
-<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>tidy_finance <span class="op">=</span> sqlite3.<span class="ex">connect</span>(database<span class="op">=</span><span class="st">"data/tidy_finance_python.sqlite"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="ac03dbae" class="cell" data-execution_count="24">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> os</span>
+<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> <span class="kw">not</span> os.path.exists(<span class="st">"data"</span>):</span>
+<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a>  os.makedirs(<span class="st">"data"</span>)</span>
+<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a>    </span>
+<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a>tidy_finance <span class="op">=</span> sqlite3.<span class="ex">connect</span>(database<span class="op">=</span><span class="st">"data/tidy_finance_python.sqlite"</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Next, we create a remote table with the monthly Fama-French factor data. We do so with the <code>pandas</code> function <code>to_sql()</code>, which copies the data to our SQLite-database.</p>
-<div id="c2800478" class="cell" data-execution_count="19">
-<div class="sourceCode cell-code" id="cb19"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a>(factors_ff3_monthly</span>
-<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>  .to_sql(name<span class="op">=</span><span class="st">"factors_ff3_monthly"</span>, </span>
-<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a>          con<span class="op">=</span>tidy_finance, </span>
-<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a>          if_exists<span class="op">=</span><span class="st">"replace"</span>,</span>
-<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a>          index<span class="op">=</span><span class="va">False</span>)</span>
-<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="244fccf8" class="cell" data-execution_count="25">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a>(factors_ff3_monthly</span>
+<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>  .to_sql(name<span class="op">=</span><span class="st">"factors_ff3_monthly"</span>, </span>
+<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a>          con<span class="op">=</span>tidy_finance, </span>
+<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a>          if_exists<span class="op">=</span><span class="st">"replace"</span>,</span>
+<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a>          index<span class="op">=</span><span class="va">False</span>)</span>
+<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Now, if we want to have the whole table in memory, we need to call <code>pd.read_sql_query()</code> with the corresponding query. You will see that we regularly load the data into the memory in the next chapters.</p>
-<div id="dbe240b7" class="cell" data-execution_count="20">
-<div class="sourceCode cell-code" id="cb20"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a>pd.read_sql_query(</span>
-<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>  sql<span class="op">=</span><span class="st">"SELECT date, rf FROM factors_ff3_monthly"</span>,</span>
-<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a>  con<span class="op">=</span>tidy_finance,</span>
-<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a>  parse_dates<span class="op">=</span>{<span class="st">"date"</span>}</span>
-<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<div class="cell-output cell-output-display" data-execution_count="20">
+<div id="dcab3728" class="cell" data-execution_count="26">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a>pd.read_sql_query(</span>
+<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>  sql<span class="op">=</span><span class="st">"SELECT date, risk_free FROM factors_ff3_monthly"</span>,</span>
+<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a>  con<span class="op">=</span>tidy_finance,</span>
+<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a>  parse_dates<span class="op">=</span>{<span class="st">"date"</span>}</span>
+<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="cell-output cell-output-display" data-execution_count="54">
 <div>
 
 
-<table class="dataframe caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
+<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
 <thead>
 <tr class="header">
 <th data-quarto-table-cell-role="th"></th>
 <th data-quarto-table-cell-role="th">date</th>
-<th data-quarto-table-cell-role="th">rf</th>
+<th data-quarto-table-cell-role="th">risk_free</th>
 </tr>
 </thead>
 <tbody>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">0</td>
+<th data-quarto-table-cell-role="th">0</th>
 <td>1960-01-01</td>
 <td>0.0033</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">1</td>
+<th data-quarto-table-cell-role="th">1</th>
 <td>1960-02-01</td>
 <td>0.0029</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">2</td>
+<th data-quarto-table-cell-role="th">2</th>
 <td>1960-03-01</td>
 <td>0.0035</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">3</td>
+<th data-quarto-table-cell-role="th">3</th>
 <td>1960-04-01</td>
 <td>0.0019</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">4</td>
+<th data-quarto-table-cell-role="th">4</th>
 <td>1960-05-01</td>
 <td>0.0027</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">...</td>
+<th data-quarto-table-cell-role="th">...</th>
 <td>...</td>
 <td>...</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">775</td>
+<th data-quarto-table-cell-role="th">775</th>
 <td>2024-08-01</td>
 <td>0.0048</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">776</td>
+<th data-quarto-table-cell-role="th">776</th>
 <td>2024-09-01</td>
 <td>0.0040</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">777</td>
+<th data-quarto-table-cell-role="th">777</th>
 <td>2024-10-01</td>
 <td>0.0039</td>
 </tr>
 <tr class="even">
-<td data-quarto-table-cell-role="th">778</td>
+<th data-quarto-table-cell-role="th">778</th>
 <td>2024-11-01</td>
 <td>0.0040</td>
 </tr>
 <tr class="odd">
-<td data-quarto-table-cell-role="th">779</td>
+<th data-quarto-table-cell-role="th">779</th>
 <td>2024-12-01</td>
 <td>0.0037</td>
 </tr>
@@ -1085,42 +1330,42 @@ <h2 class="anchored" data-anchor-id="setting-up-a-database">Setting Up a Databas
 </div>
 <p>The last couple of code chunks are really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages.</p>
 <p>Before we move on to the next data source, let us also store the other six tables in our new SQLite database.</p>
-<div id="4a6705c7" class="cell" data-execution_count="21">
-<div class="sourceCode cell-code" id="cb21"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a>data_dict <span class="op">=</span> {</span>
-<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff5_monthly"</span>: factors_ff5_monthly,</span>
-<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff3_daily"</span>: factors_ff3_daily,</span>
-<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a>  <span class="st">"industries_ff_monthly"</span>: industries_ff_monthly, </span>
-<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_q_monthly"</span>: factors_q_monthly,</span>
-<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a>  <span class="st">"macro_predictors"</span>: macro_predictors,</span>
-<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a>  <span class="st">"cpi_monthly"</span>: cpi_monthly</span>
-<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a>}</span>
-<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> key, value <span class="kw">in</span> data_dict.items():</span>
-<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a>    value.to_sql(name<span class="op">=</span>key,</span>
-<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a>                 con<span class="op">=</span>tidy_finance, </span>
-<span id="cb21-13"><a href="#cb21-13" aria-hidden="true" tabindex="-1"></a>                 if_exists<span class="op">=</span><span class="st">"replace"</span>,</span>
-<span id="cb21-14"><a href="#cb21-14" aria-hidden="true" tabindex="-1"></a>                 index<span class="op">=</span><span class="va">False</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="adf2106d" class="cell" data-execution_count="27">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a>data_dict <span class="op">=</span> {</span>
+<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff5_monthly"</span>: factors_ff5_monthly,</span>
+<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff3_daily"</span>: factors_ff3_daily,</span>
+<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a>  <span class="st">"industries_ff_monthly"</span>: industries_ff_monthly, </span>
+<span id="cb26-5"><a href="#cb26-5" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_q_monthly"</span>: factors_q_monthly,</span>
+<span id="cb26-6"><a href="#cb26-6" aria-hidden="true" tabindex="-1"></a>  <span class="st">"macro_predictors"</span>: macro_predictors,</span>
+<span id="cb26-7"><a href="#cb26-7" aria-hidden="true" tabindex="-1"></a>  <span class="st">"cpi_monthly"</span>: cpi_monthly</span>
+<span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a>}</span>
+<span id="cb26-9"><a href="#cb26-9" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb26-10"><a href="#cb26-10" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> key, value <span class="kw">in</span> data_dict.items():</span>
+<span id="cb26-11"><a href="#cb26-11" aria-hidden="true" tabindex="-1"></a>    value.to_sql(name<span class="op">=</span>key,</span>
+<span id="cb26-12"><a href="#cb26-12" aria-hidden="true" tabindex="-1"></a>                 con<span class="op">=</span>tidy_finance, </span>
+<span id="cb26-13"><a href="#cb26-13" aria-hidden="true" tabindex="-1"></a>                 if_exists<span class="op">=</span><span class="st">"replace"</span>,</span>
+<span id="cb26-14"><a href="#cb26-14" aria-hidden="true" tabindex="-1"></a>                 index<span class="op">=</span><span class="va">False</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>From now on, all you need to do to access data that is stored in the database is to follow two steps: (i) Establish the connection to the SQLite-database and (ii) execute the query to fetch the data. For your convenience, the following steps show all you need in a compact fashion.</p>
-<div id="045cddbc" class="cell" data-message="false" data-results="false" data-execution_count="22">
-<div class="sourceCode cell-code" id="cb22"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
-<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> sqlite3</span>
-<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a>tidy_finance <span class="op">=</span> sqlite3.<span class="ex">connect</span>(database<span class="op">=</span><span class="st">"data/tidy_finance_python.sqlite"</span>)</span>
-<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a>factors_q_monthly <span class="op">=</span> pd.read_sql_query(</span>
-<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a>  sql<span class="op">=</span><span class="st">"SELECT * FROM factors_q_monthly"</span>,</span>
-<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a>  con<span class="op">=</span>tidy_finance,</span>
-<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a>  parse_dates<span class="op">=</span>{<span class="st">"date"</span>}</span>
-<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="6487b384" class="cell" data-message="false" data-results="false" data-execution_count="28">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
+<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> sqlite3</span>
+<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a>tidy_finance <span class="op">=</span> sqlite3.<span class="ex">connect</span>(database<span class="op">=</span><span class="st">"data/tidy_finance_python.sqlite"</span>)</span>
+<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb27-6"><a href="#cb27-6" aria-hidden="true" tabindex="-1"></a>factors_q_monthly <span class="op">=</span> pd.read_sql_query(</span>
+<span id="cb27-7"><a href="#cb27-7" aria-hidden="true" tabindex="-1"></a>  sql<span class="op">=</span><span class="st">"SELECT * FROM factors_q_monthly"</span>,</span>
+<span id="cb27-8"><a href="#cb27-8" aria-hidden="true" tabindex="-1"></a>  con<span class="op">=</span>tidy_finance,</span>
+<span id="cb27-9"><a href="#cb27-9" aria-hidden="true" tabindex="-1"></a>  parse_dates<span class="op">=</span>{<span class="st">"date"</span>}</span>
+<span id="cb27-10"><a href="#cb27-10" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 </section>
 <section id="managing-sqlite-databases" class="level2">
 <h2 class="anchored" data-anchor-id="managing-sqlite-databases">Managing SQLite Databases</h2>
 <p>Finally, at the end of our data chapter, we revisit the SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.</p>
 <p>To optimize the database file, you can run the <code>VACUUM</code> command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the <code>execute()</code> function.</p>
-<div id="b530109e" class="cell" data-execution_count="23">
-<div class="sourceCode cell-code" id="cb23"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a>tidy_finance.execute(<span class="st">"VACUUM"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div id="28341992" class="cell" data-execution_count="29">
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a>tidy_finance.execute(<span class="st">"VACUUM"</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>The <code>VACUUM</code> command actually performs a couple of additional cleaning steps, which you can read about in <a href="https://SQLite.org/docs/sql/statements/vacuum.html">this tutorial.</a> </p>
 </section>
@@ -1136,12 +1381,13 @@ <h2 class="anchored" data-anchor-id="key-takeaways">Key Takeaways</h2>
 <section id="exercises" class="level2">
 <h2 class="anchored" data-anchor-id="exercises">Exercises</h2>
 <ol type="1">
-<li>Download the monthly Fama-French factors manually from <a href="https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html">Kenneth French’s data library</a> and read them in via <code>pd.read_csv()</code>. Validate that you get the same data as via the <code>pandas-datareader</code> package.</li>
-<li>Download the daily Fama-French 5 factors using the <code>pdr.DataReader()</code> package. After the successful download and conversion to the column format that we used above, compare the <code>rf</code>, <code>mkt_excess</code>, <code>smb</code>, and <code>hml</code> columns of <code>factors_ff3_daily</code> to <code>factors_ff5_daily</code>. Discuss any differences you might find.</li>
+<li>Download the monthly Fama-French factors manually from <a href="https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html">Kenneth French’s data library</a> and read them in via <code>pd.read_csv()</code>. Validate that you get the same data as via the <code>tf.download_data()</code> package.</li>
+<li>Download the daily Fama-French 5 factors using the <code>tf.download_data()</code> function. After the successful download and conversion to the column format that we used above, compare the <code>risk_free</code>, <code>mkt_excess</code>, <code>smb</code>, and <code>hml</code> columns of <code>factors_ff3_daily</code> to <code>factors_ff5_daily</code>. Discuss any differences you might find.</li>
 </ol>
 
 
 
+</section>
 </section>
 
 <div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" role="doc-bibliography" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list">
@@ -1241,13 +1487,14 @@ <h2 class="anchored" data-anchor-id="exercises">Exercises</h2>
       e.clearSelection();
     }
     const getTextToCopy = function(trigger) {
-        const codeEl = trigger.previousElementSibling.cloneNode(true);
-        for (const childEl of codeEl.children) {
-          if (isCodeAnnotation(childEl)) {
-            childEl.remove();
-          }
+      const outerScaffold = trigger.parentElement.cloneNode(true);
+      const codeEl = outerScaffold.querySelector('code');
+      for (const childEl of codeEl.children) {
+        if (isCodeAnnotation(childEl)) {
+          childEl.remove();
         }
-        return codeEl.innerText;
+      }
+      return codeEl.innerText;
     }
     const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', {
       text: getTextToCopy
diff --git a/docs/r/accessing-and-managing-financial-data.html b/docs/r/accessing-and-managing-financial-data.html
index 656de73e..3b4577e5 100644
--- a/docs/r/accessing-and-managing-financial-data.html
+++ b/docs/r/accessing-and-managing-financial-data.html
@@ -2,7 +2,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
 
 <meta charset="utf-8">
-<meta name="generator" content="quarto-1.7.32">
+<meta name="generator" content="quarto-1.8.25">
 
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
@@ -92,14 +92,15 @@
 <link href="../site_libs/cookie-consent/cookie-consent.css" rel="stylesheet">
 <script src="../site_libs/quarto-html/quarto.js" type="module"></script>
 <script src="../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script>
+<script src="../site_libs/quarto-html/axe/axe-check.js" type="module"></script>
 <script src="../site_libs/quarto-html/popper.min.js"></script>
 <script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
 <script src="../site_libs/quarto-html/anchor.min.js"></script>
 <link href="../site_libs/quarto-html/tippy.css" rel="stylesheet">
-<link href="../site_libs/quarto-html/quarto-syntax-highlighting-37eea08aefeeee20ff55810ff984fec1.css" rel="stylesheet" id="quarto-text-highlighting-styles">
+<link href="../site_libs/quarto-html/quarto-syntax-highlighting-7b89279ff1a6dce999919e0e67d4d9ec.css" rel="stylesheet" id="quarto-text-highlighting-styles">
 <script src="../site_libs/bootstrap/bootstrap.min.js"></script>
 <link href="../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
-<link href="../site_libs/bootstrap/bootstrap-99a8851848bcfc05e6486f9d9e6d6ff3.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="light">
+<link href="../site_libs/bootstrap/bootstrap-0514632cc3f7a6d071f65f884e64e3e1.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="light">
 <script id="quarto-search-options" type="application/json">{
   "location": "navbar",
   "copy-button": false,
@@ -172,7 +173,8 @@
       <div class="navbar-container container-fluid">
       <div class="navbar-brand-container mx-auto">
     <a href="../index.html" class="navbar-brand navbar-brand-logo">
-    <img src="../assets/img/logo-website-white.png" alt="" class="navbar-logo">
+    <img src="../assets/img/logo-website-white.png" alt="" class="navbar-logo light-content">
+    <img src="../assets/img/logo-website-white.png" alt="" class="navbar-logo dark-content">
     </a>
     <a class="navbar-brand" href="../index.html">
     <span class="navbar-title">Tidy Finance</span>
@@ -236,6 +238,10 @@ <h1 class="quarto-secondary-nav-title">Accessing and Managing Financial Data</h1
 <div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar">
 <!-- sidebar -->
   <nav id="quarto-sidebar" class="sidebar collapse collapse-horizontal quarto-sidebar-collapse-item sidebar-navigation floating overflow-auto">
+    <div class="pt-lg-2 mt-2 text-left sidebar-header">
+      <a href="../index.html" class="sidebar-logo-link">
+      </a>
+      </div>
         <div class="mt-2 flex-shrink-0 align-items-center">
         <div class="sidebar-search">
         <div id="quarto-search" class="" title="Search"></div>
@@ -571,24 +577,24 @@ <h1 class="title d-none d-lg-block">Accessing and Managing Financial Data</h1>
 <p>This chapter shows how to import different open source data sets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series that can be scraped directly from a website. We show how to process these raw data, as well as how to take a shortcut using the <code>tidyfinance</code> package, which provides a consistent interface to tidy financial data. We store all the data in a <em>single</em> database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.</p>
 <p>First, we load the global R packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyverse)</span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyverse)</span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyfinance)</span>
-<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(scales)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(scales)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Moreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>start_date <span class="ot">&lt;-</span> <span class="fu">ymd</span>(<span class="st">"1960-01-01"</span>)</span>
-<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>end_date <span class="ot">&lt;-</span> <span class="fu">ymd</span>(<span class="st">"2024-12-31"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>start_date <span class="ot">&lt;-</span> <span class="fu">ymd</span>(<span class="st">"1960-01-01"</span>)</span>
+<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>end_date <span class="ot">&lt;-</span> <span class="fu">ymd</span>(<span class="st">"2024-12-31"</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <section id="fama-french-data" class="level2">
 <h2 class="anchored" data-anchor-id="fama-french-data">Fama-French Data</h2>
 <p>We start by downloading some famous Fama-French factors <span class="citation" data-cites="Fama1993">(e.g., <a href="#ref-Fama1993" role="doc-biblioref">Fama and French 1993</a>)</span> and portfolio returns commonly used in empirical asset pricing. Fortunately, there is a neat package by <a href="https://github.com/nareal/frenchdata/">Nelson Areal</a> that allows us to access the data easily: the <code>frenchdata</code> package provides functions to download and read data sets from <a href="https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html">Prof.&nbsp;Kenneth French finance data library</a> <span class="citation" data-cites="frenchdata">(<a href="#ref-frenchdata" role="doc-biblioref">Areal 2021</a>)</span>. </p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(frenchdata)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(frenchdata)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>We can use the <code>download_french_data()</code> function of the package to download monthly Fama-French factors. The set <em>Fama/French 3 Factors</em> contains the return time series of the market <code>mkt_excess</code>, size <code>smb</code> and value <code>hml</code> alongside the risk-free rates <code>rf</code>. Note that we have to do some manual work to correctly parse all the columns and scale them appropriately, as the raw Fama-French data comes in a very unpractical data format. For precise descriptions of the variables, we suggest consulting Prof.&nbsp;Kenneth French’s finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to <code>frenchdata</code>.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly_raw <span class="ot">&lt;-</span> <span class="fu">download_french_data</span>(<span class="st">"Fama/French 3 Factors"</span>)</span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly_raw <span class="ot">&lt;-</span> <span class="fu">download_french_data</span>(<span class="st">"Fama/French 3 Factors"</span>)</span>
 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly <span class="ot">&lt;-</span> factors_ff3_monthly_raw<span class="sc">$</span>subsets<span class="sc">$</span>data[[<span class="dv">1</span>]] <span class="sc">|&gt;</span></span>
 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(</span>
 <span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    <span class="at">date =</span> <span class="fu">floor_date</span>(<span class="fu">ymd</span>(<span class="fu">str_c</span>(date, <span class="st">"01"</span>)), <span class="st">"month"</span>),</span>
@@ -597,11 +603,11 @@ <h2 class="anchored" data-anchor-id="fama-french-data">Fama-French Data</h2>
 <span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">|&gt;</span></span>
 <span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>  <span class="fu">rename_with</span>(str_to_lower) <span class="sc">|&gt;</span></span>
 <span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>  <span class="fu">rename</span>(<span class="at">mkt_excess =</span> <span class="st">`</span><span class="at">mkt-rf</span><span class="st">`</span>) <span class="sc">|&gt;</span> </span>
-<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>We also download the set <em>5 Factors (2x3)</em>, which additionally includes the return time series of the profitability <code>rmw</code> and investment <code>cma</code> factors. We demonstrate how the monthly factors are constructed in the chapter <a href="../r/replicating-fama-and-french-factors.html">Replicating Fama and French Factors</a>.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>factors_ff5_monthly_raw <span class="ot">&lt;-</span> <span class="fu">download_french_data</span>(<span class="st">"Fama/French 5 Factors (2x3)"</span>)</span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>factors_ff5_monthly_raw <span class="ot">&lt;-</span> <span class="fu">download_french_data</span>(<span class="st">"Fama/French 5 Factors (2x3)"</span>)</span>
 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>factors_ff5_monthly <span class="ot">&lt;-</span> factors_ff5_monthly_raw<span class="sc">$</span>subsets<span class="sc">$</span>data[[<span class="dv">1</span>]] <span class="sc">|&gt;</span></span>
 <span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(</span>
@@ -611,11 +617,11 @@ <h2 class="anchored" data-anchor-id="fama-french-data">Fama-French Data</h2>
 <span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">|&gt;</span></span>
 <span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>  <span class="fu">rename_with</span>(str_to_lower) <span class="sc">|&gt;</span></span>
 <span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>  <span class="fu">rename</span>(<span class="at">mkt_excess =</span> <span class="st">`</span><span class="at">mkt-rf</span><span class="st">`</span>) <span class="sc">|&gt;</span> </span>
-<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>It is straightforward to download the corresponding <em>daily</em> Fama-French factors with the same function.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_daily_raw <span class="ot">&lt;-</span> <span class="fu">download_french_data</span>(<span class="st">"Fama/French 3 Factors [Daily]"</span>)</span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_daily_raw <span class="ot">&lt;-</span> <span class="fu">download_french_data</span>(<span class="st">"Fama/French 3 Factors [Daily]"</span>)</span>
 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>factors_ff3_daily <span class="ot">&lt;-</span> factors_ff3_daily_raw<span class="sc">$</span>subsets<span class="sc">$</span>data[[<span class="dv">1</span>]] <span class="sc">|&gt;</span></span>
 <span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(</span>
@@ -625,31 +631,31 @@ <h2 class="anchored" data-anchor-id="fama-french-data">Fama-French Data</h2>
 <span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">|&gt;</span></span>
 <span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>  <span class="fu">rename_with</span>(str_to_lower) <span class="sc">|&gt;</span></span>
 <span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>  <span class="fu">rename</span>(<span class="at">mkt_excess =</span> <span class="st">`</span><span class="at">mkt-rf</span><span class="st">`</span>) <span class="sc">|&gt;</span></span>
-<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>In a subsequent chapter, we also use the 10 monthly industry portfolios, so let us fetch that data, too.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>industries_ff_monthly_raw <span class="ot">&lt;-</span> <span class="fu">download_french_data</span>(<span class="st">"10 Industry Portfolios"</span>)</span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>industries_ff_monthly_raw <span class="ot">&lt;-</span> <span class="fu">download_french_data</span>(<span class="st">"10 Industry Portfolios"</span>)</span>
 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>industries_ff_monthly <span class="ot">&lt;-</span> industries_ff_monthly_raw<span class="sc">$</span>subsets<span class="sc">$</span>data[[<span class="dv">1</span>]] <span class="sc">|&gt;</span></span>
 <span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(<span class="at">date =</span> <span class="fu">floor_date</span>(<span class="fu">ymd</span>(<span class="fu">str_c</span>(date, <span class="st">"01"</span>)), <span class="st">"month"</span>)) <span class="sc">|&gt;</span></span>
 <span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(<span class="fu">across</span>(<span class="fu">where</span>(is.numeric), <span class="sc">~</span> . <span class="sc">/</span> <span class="dv">100</span>)) <span class="sc">|&gt;</span></span>
 <span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>  <span class="fu">select</span>(date, <span class="fu">everything</span>()) <span class="sc">|&gt;</span></span>
 <span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date) <span class="sc">|&gt;</span> </span>
-<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>  <span class="fu">rename_with</span>(str_to_lower)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>  <span class="fu">rename_with</span>(str_to_lower)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>It is worth taking a look at all available portfolio return time series from Kenneth French’s homepage. You should check out the other sets by calling <code>get_french_data_list()</code>.</p>
 <p>To automatically download and process Fama-French data, you can also use the <code>tidyfinance</code> package with <code>type = "factors_ff_3_monthly"</code> or similar, e.g.:</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="fu">download_data</span>(</span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="fu">download_data</span>(</span>
 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="st">"factors_ff_3_monthly"</span>, </span>
 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">start_date =</span> start_date, </span>
 <span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">end_date =</span> end_date</span>
-<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>The <code>tidyfinance</code> package implements the processing steps as above and returns the same cleaned data frame. The list of supported Fama-French data types can be called as follows:</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="fu">list_supported_types</span>(<span class="at">domain =</span> <span class="st">"Fama-French"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="fu">list_supported_types</span>(<span class="at">domain =</span> <span class="st">"Fama-French"</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 </section>
 <section id="q-factors" class="level2">
@@ -657,7 +663,7 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <p>In recent years, the academic discourse experienced the rise of alternative factor models, e.g., in the form of the <span class="citation" data-cites="Hou2015">Hou, Xue, and Zhang (<a href="#ref-Hou2015" role="doc-biblioref">2014</a>)</span> <em>q</em>-factor model. We refer to the <a href="http://global-q.org/background.html">extended background</a> information provided by the original authors for further information. The <em>q</em> factors can be downloaded directly from the authors’ homepage from within <code>read_csv()</code>.</p>
 <p>We also need to adjust this data. First, we discard information we will not use in the remainder of the book. Then, we rename the columns with the “R_”-prescript using regular expressions and write all column names in lowercase. You should always try sticking to a consistent style for naming objects, which we try to illustrate here - the emphasis is on <em>try</em>. You can check out style guides available online, e.g., <a href="https://style.tidyverse.org/index.html">Hadley Wickham’s <code>tidyverse</code> style guide.</a></p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>factors_q_monthly_link <span class="ot">&lt;-</span></span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>factors_q_monthly_link <span class="ot">&lt;-</span></span>
 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>  <span class="st">"https://global-q.org/uploads/1/2/2/6/122679606/q5_factors_monthly_2023.csv"</span></span>
 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>factors_q_monthly <span class="ot">&lt;-</span> <span class="fu">read_csv</span>(factors_q_monthly_link) <span class="sc">|&gt;</span></span>
@@ -666,28 +672,28 @@ <h2 class="anchored" data-anchor-id="q-factors">q-Factors</h2>
 <span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a>  <span class="fu">rename_with</span>(str_to_lower) <span class="sc">|&gt;</span></span>
 <span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(<span class="fu">across</span>(<span class="sc">-</span>date, <span class="sc">~</span>. <span class="sc">/</span> <span class="dv">100</span>)) <span class="sc">|&gt;</span></span>
 <span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a>  <span class="fu">select</span>(date, <span class="at">risk_free =</span> f, <span class="at">mkt_excess =</span> mkt, <span class="fu">everything</span>()) <span class="sc">|&gt;</span></span>
-<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Again, you can use the <code>tidyfinance</code> package for a shortcut:</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">download_data</span>(</span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">download_data</span>(</span>
 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="st">"factors_q5_monthly"</span>, </span>
 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">start_date =</span> start_date, </span>
 <span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">end_date =</span> end_date</span>
-<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 </section>
 <section id="macroeconomic-predictors" class="level2">
 <h2 class="anchored" data-anchor-id="macroeconomic-predictors">Macroeconomic Predictors</h2>
 <p>Our next data source is a set of macroeconomic variables often used as predictors for the equity premium. <span class="citation" data-cites="Goyal2008">Welch and Goyal (<a href="#ref-Goyal2008" role="doc-biblioref">2008</a>)</span> comprehensively reexamine the performance of variables suggested by the academic literature to be good predictors of the equity premium. The authors host the data updated to 2022 on <a href="https://sites.google.com/view/agoyal145">Amit Goyal’s website.</a> The data is an XLSX-file stored on a public Google drive location and we directly export a CSV file.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb12"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>sheet_id <span class="ot">&lt;-</span> <span class="st">"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG"</span></span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>sheet_id <span class="ot">&lt;-</span> <span class="st">"1bM7vCWd3WOt95Sf9qjLPZjoiafgF_8EG"</span></span>
 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>sheet_name <span class="ot">&lt;-</span> <span class="st">"Monthly"</span></span>
 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>macro_predictors_url <span class="ot">&lt;-</span> <span class="fu">paste0</span>(</span>
 <span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>  <span class="st">"https://docs.google.com/spreadsheets/d/"</span>, sheet_id,</span>
 <span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>  <span class="st">"/gviz/tq?tqx=out:csv&amp;sheet="</span>, sheet_name</span>
 <span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>macro_predictors_raw <span class="ot">&lt;-</span> <span class="fu">read_csv</span>(macro_predictors_url)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>macro_predictors_raw <span class="ot">&lt;-</span> <span class="fu">read_csv</span>(macro_predictors_url)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Next, we transform the columns into the variables that we later use:</p>
 <ol type="1">
@@ -707,7 +713,7 @@ <h2 class="anchored" data-anchor-id="macroeconomic-predictors">Macroeconomic Pre
 </ol>
 <p>For variable definitions and the required data transformations, you can consult the material on <a href="https://sites.google.com/view/agoyal145">Amit Goyal’s website</a>.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb13"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>macro_predictors <span class="ot">&lt;-</span> macro_predictors_raw <span class="sc">|&gt;</span></span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>macro_predictors <span class="ot">&lt;-</span> macro_predictors_raw <span class="sc">|&gt;</span></span>
 <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(<span class="at">date =</span> <span class="fu">ym</span>(yyyymm)) <span class="sc">|&gt;</span></span>
 <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(<span class="fu">across</span>(<span class="fu">where</span>(is.character), as.numeric)) <span class="sc">|&gt;</span></span>
 <span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(</span>
@@ -728,58 +734,57 @@ <h2 class="anchored" data-anchor-id="macroeconomic-predictors">Macroeconomic Pre
 <span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a>    tms, dfy, infl</span>
 <span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">|&gt;</span></span>
 <span id="cb13-21"><a href="#cb13-21" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date) <span class="sc">|&gt;</span></span>
-<span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a>  <span class="fu">drop_na</span>()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a>  <span class="fu">drop_na</span>()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>To get the equivalent data through <code>tidyfinance</code>, you can call:</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb14"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="fu">download_data</span>(</span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="fu">download_data</span>(</span>
 <span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="st">"macro_predictors_monthly"</span>,</span>
 <span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">start_date =</span> start_date,</span>
 <span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">end_date =</span> end_date</span>
-<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 </section>
 <section id="other-macroeconomic-data" class="level2">
 <h2 class="anchored" data-anchor-id="other-macroeconomic-data">Other Macroeconomic Data</h2>
 <p>The Federal Reserve bank of St.&nbsp;Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. The data can be downloaded directly from FRED by constructing the appropriate URL. For instance, let us consider the consumer price index (CPI) data that can be found under the <a href="https://fred.stlouisfed.org/series/CPIAUCNS">CPIAUCNS</a>:</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb15"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>series <span class="ot">&lt;-</span> <span class="st">"CPIAUCNS"</span></span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>series <span class="ot">&lt;-</span> <span class="st">"CPIAUCNS"</span></span>
 <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>cpi_url <span class="ot">&lt;-</span> <span class="fu">paste0</span>(</span>
 <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>  <span class="st">"https://fred.stlouisfed.org/graph/fredgraph.csv?id="</span>, series</span>
-<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>We can then use the <code>httr2</code> <span class="citation" data-cites="httr2">(<a href="#ref-httr2" role="doc-biblioref">Wickham 2024</a>)</span> package to request the CSV, extract the data from the response body, and convert the columns to a tidy format:</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb16"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(httr2)</span>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(httr2)</span>
 <span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>cpi_daily <span class="ot">&lt;-</span> <span class="fu">request</span>(cpi_url) <span class="sc">|&gt;</span></span>
-<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">req_perform</span>() <span class="sc">|&gt;</span></span>
-<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">resp_body_string</span>() <span class="sc">|&gt;</span></span>
-<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>  <span class="fu">read_csv</span>() <span class="sc">|&gt;</span></span>
-<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(</span>
-<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>    <span class="at">date =</span> <span class="fu">as.Date</span>(observation_date),</span>
-<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a>    <span class="at">value =</span> <span class="fu">as.numeric</span>(.data[[series]]),</span>
-<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a>    <span class="at">series =</span> series,</span>
-<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a>    <span class="at">.keep =</span> <span class="st">"none"</span></span>
-<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a>  )</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-</div>
-<p>We convert the daily CPI data to monthly because we use the latter in later chapters.</p>
-<div class="cell">
-<div class="sourceCode cell-code" id="cb17"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a>cpi_monthly <span class="ot">&lt;-</span> cpi_daily <span class="sc">|&gt;</span></span>
-<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(</span>
-<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>    <span class="at">date =</span> <span class="fu">floor_date</span>(date, <span class="st">"month"</span>),</span>
-<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>    <span class="at">cpi =</span> value <span class="sc">/</span> value[date <span class="sc">==</span> <span class="fu">max</span>(date)],</span>
-<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a>    <span class="at">.keep =</span> <span class="st">"none"</span></span>
-<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>  )</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>resp <span class="ot">&lt;-</span> <span class="fu">request</span>(cpi_url) <span class="sc">|&gt;</span> </span>
+<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">req_perform</span>()</span>
+<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>resp_csv <span class="ot">&lt;-</span> resp <span class="sc">|&gt;</span> </span>
+<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>  <span class="fu">resp_body_string</span>() </span>
+<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>cpi_monthly <span class="ot">&lt;-</span> resp_csv <span class="sc">|&gt;</span> </span>
+<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a>  <span class="fu">read_csv</span>() <span class="sc">|&gt;</span></span>
+<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(</span>
+<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a>    <span class="at">date =</span> <span class="fu">as.Date</span>(observation_date),</span>
+<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a>    <span class="at">value =</span> <span class="fu">as.numeric</span>(.data[[series]]),</span>
+<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a>    <span class="at">series =</span> series,</span>
+<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a>    <span class="at">.keep =</span> <span class="st">"none"</span></span>
+<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a>  ) <span class="sc">|&gt;</span></span>
+<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(date <span class="sc">&gt;=</span> start_date <span class="sc">&amp;</span> date <span class="sc">&lt;=</span> end_date) <span class="sc">|&gt;</span> </span>
+<span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a>  <span class="fu">mutate</span>(</span>
+<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a>    <span class="at">cpi =</span> value <span class="sc">/</span> value[date <span class="sc">==</span> <span class="fu">max</span>(date)]</span>
+<span id="cb16-19"><a href="#cb16-19" aria-hidden="true" tabindex="-1"></a>  )</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
-<p>The <code>tidyfinance</code> package can, of course, also fetch the same daily data and many more data series:</p>
+<p>The last line sets the current (latest) price level as the reference price level.</p>
+<p>The <code>tidyfinance</code> package can, of course, also fetch the same index data and many more data series:</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb18"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="fu">download_data</span>(</span>
-<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="st">"fred"</span>,</span>
-<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">series =</span> <span class="st">"CPIAUCNS"</span>,</span>
-<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">start_date =</span> start_date,</span>
-<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>  <span class="at">end_date =</span> end_date</span>
-<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="fu">download_data</span>(</span>
+<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>  <span class="at">type =</span> <span class="st">"fred"</span>,</span>
+<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>  <span class="at">series =</span> <span class="st">"CPIAUCNS"</span>,</span>
+<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">start_date =</span> start_date,</span>
+<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a>  <span class="at">end_date =</span> end_date</span>
+<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="cell-output cell-output-stdout">
 <pre><code># A tibble: 0 × 3
 # ℹ 3 variables: date &lt;date&gt;, value &lt;dbl&gt;, series &lt;chr&gt;</code></pre>
@@ -793,38 +798,38 @@ <h2 class="anchored" data-anchor-id="setting-up-a-database">Setting Up a Databas
 <p>There are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an <a href="https://www.sqlite.org/index.html">SQLite</a> database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. Note that <a href="https://en.wikipedia.org/wiki/SQL">SQL</a> (Structured Query Language) is a standard language for accessing and manipulating databases and heavily inspired the <code>dplyr</code> functions. We refer to <a href="https://www.w3schools.com/sql/sql_intro.asp">this tutorial</a> for more information on SQL.</p>
 <p>There are two packages that make working with SQLite in R very simple: <code>RSQLite</code> <span class="citation" data-cites="RSQLite">(<a href="#ref-RSQLite" role="doc-biblioref">Müller et al. 2022</a>)</span> embeds the SQLite database engine in R, and <code>dbplyr</code> <span class="citation" data-cites="dbplyr">(<a href="#ref-dbplyr" role="doc-biblioref">Wickham, Girlich, and Ruiz 2022</a>)</span> is the database back-end for <code>dplyr</code>. These packages allow to set up a database to remotely store tables and use these remote database tables as if they are in-memory data frames by automatically converting <code>dplyr</code> into SQL. Check out the <a href="https://cran.r-project.org/web/packages/RSQLite/vignettes/RSQLite.html"><code>RSQLite</code></a> and <a href="https://db.rstudio.com/databases/sqlite/"><code>dbplyr</code></a> vignettes for more information.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb20"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(RSQLite)</span>
-<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(dbplyr)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(RSQLite)</span>
+<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(dbplyr)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>An SQLite database is easily created - the code below is really all there is. You do not need any external software. Note that we use the <code>extended_types = TRUE</code> option to enable date types when storing and fetching data. Otherwise, date columns are stored and retrieved as integers. We will use the file <code>tidy_finance_r.sqlite</code>, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="sc">!</span><span class="fu">dir.exists</span>(<span class="st">"data"</span>)) {</span>
-<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">dir.create</span>(<span class="st">"data"</span>)</span>
-<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>}</span>
-<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a>tidy_finance <span class="ot">&lt;-</span> <span class="fu">dbConnect</span>(</span>
-<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a>  <span class="fu">SQLite</span>(),</span>
-<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a>  <span class="st">"data/tidy_finance_r.sqlite"</span>,</span>
-<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a>  <span class="at">extended_types =</span> <span class="cn">TRUE</span></span>
-<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="sc">!</span><span class="fu">dir.exists</span>(<span class="st">"data"</span>)) {</span>
+<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">dir.create</span>(<span class="st">"data"</span>)</span>
+<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a>}</span>
+<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a>tidy_finance <span class="ot">&lt;-</span> <span class="fu">dbConnect</span>(</span>
+<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a>  <span class="fu">SQLite</span>(),</span>
+<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a>  <span class="st">"data/tidy_finance_r.sqlite"</span>,</span>
+<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a>  <span class="at">extended_types =</span> <span class="cn">TRUE</span></span>
+<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Next, we create a remote table with the monthly Fama-French factor data. We do so with the function <code>dbWriteTable()</code>, which copies the data to our SQLite-database.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb22"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
-<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
-<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff3_monthly"</span>,</span>
-<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> factors_ff3_monthly,</span>
-<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
-<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
+<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
+<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff3_monthly"</span>,</span>
+<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> factors_ff3_monthly,</span>
+<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
+<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>We can use the remote table as an in-memory data frame by building a connection via <code>tbl()</code>.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb23"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly_db <span class="ot">&lt;-</span> <span class="fu">tbl</span>(tidy_finance, <span class="st">"factors_ff3_monthly"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly_db <span class="ot">&lt;-</span> <span class="fu">tbl</span>(tidy_finance, <span class="st">"factors_ff3_monthly"</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>All <code>dplyr</code> calls are evaluated lazily, i.e., the data is not in our R session’s memory, and the database does most of the work. You can see that by noticing that the output below does not show the number of rows. In fact, the following code chunk only fetches the top 10 rows from the database for printing.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb24"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly_db <span class="sc">|&gt;</span></span>
-<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">select</span>(date, rf)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly_db <span class="sc">|&gt;</span></span>
+<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">select</span>(date, rf)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="cell-output cell-output-stdout">
 <pre><code># Source:   SQL [?? x 2]
 # Database: sqlite 3.47.1 [data/tidy_finance_r.sqlite]
@@ -840,9 +845,9 @@ <h2 class="anchored" data-anchor-id="setting-up-a-database">Setting Up a Databas
 </div>
 <p>If we want to have the whole table in memory, we need to <code>collect()</code> it. You will see that we regularly load the data into the memory in the next chapters.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb26"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly_db <span class="sc">|&gt;</span></span>
-<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">select</span>(date, rf) <span class="sc">|&gt;</span></span>
-<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">collect</span>()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a>factors_ff3_monthly_db <span class="sc">|&gt;</span></span>
+<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">select</span>(date, rf) <span class="sc">|&gt;</span></span>
+<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">collect</span>()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="cell-output cell-output-stdout">
 <pre><code># A tibble: 780 × 2
   date           rf
@@ -858,61 +863,61 @@ <h2 class="anchored" data-anchor-id="setting-up-a-database">Setting Up a Databas
 <p>The last couple of code chunks is really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages.</p>
 <p>Before we move on to the next data source, let us also store the other five tables in our new SQLite database.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb28"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
-<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
-<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff5_monthly"</span>,</span>
-<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> factors_ff5_monthly,</span>
-<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
-<span id="cb28-6"><a href="#cb28-6" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb28-7"><a href="#cb28-7" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb28-8"><a href="#cb28-8" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
-<span id="cb28-9"><a href="#cb28-9" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
-<span id="cb28-10"><a href="#cb28-10" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff3_daily"</span>,</span>
-<span id="cb28-11"><a href="#cb28-11" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> factors_ff3_daily,</span>
-<span id="cb28-12"><a href="#cb28-12" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
-<span id="cb28-13"><a href="#cb28-13" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb28-14"><a href="#cb28-14" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb28-15"><a href="#cb28-15" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
-<span id="cb28-16"><a href="#cb28-16" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
-<span id="cb28-17"><a href="#cb28-17" aria-hidden="true" tabindex="-1"></a>  <span class="st">"industries_ff_monthly"</span>,</span>
-<span id="cb28-18"><a href="#cb28-18" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> industries_ff_monthly,</span>
-<span id="cb28-19"><a href="#cb28-19" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
-<span id="cb28-20"><a href="#cb28-20" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb28-21"><a href="#cb28-21" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb28-22"><a href="#cb28-22" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
-<span id="cb28-23"><a href="#cb28-23" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
-<span id="cb28-24"><a href="#cb28-24" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_q_monthly"</span>,</span>
-<span id="cb28-25"><a href="#cb28-25" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> factors_q_monthly,</span>
-<span id="cb28-26"><a href="#cb28-26" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
-<span id="cb28-27"><a href="#cb28-27" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb28-28"><a href="#cb28-28" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb28-29"><a href="#cb28-29" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
-<span id="cb28-30"><a href="#cb28-30" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
-<span id="cb28-31"><a href="#cb28-31" aria-hidden="true" tabindex="-1"></a>  <span class="st">"macro_predictors"</span>,</span>
-<span id="cb28-32"><a href="#cb28-32" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> macro_predictors,</span>
-<span id="cb28-33"><a href="#cb28-33" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
-<span id="cb28-34"><a href="#cb28-34" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb28-35"><a href="#cb28-35" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb28-36"><a href="#cb28-36" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
-<span id="cb28-37"><a href="#cb28-37" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
-<span id="cb28-38"><a href="#cb28-38" aria-hidden="true" tabindex="-1"></a>  <span class="st">"cpi_monthly"</span>,</span>
-<span id="cb28-39"><a href="#cb28-39" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> cpi_monthly,</span>
-<span id="cb28-40"><a href="#cb28-40" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
-<span id="cb28-41"><a href="#cb28-41" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
+<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
+<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff5_monthly"</span>,</span>
+<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> factors_ff5_monthly,</span>
+<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
+<span id="cb27-6"><a href="#cb27-6" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb27-7"><a href="#cb27-7" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb27-8"><a href="#cb27-8" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
+<span id="cb27-9"><a href="#cb27-9" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
+<span id="cb27-10"><a href="#cb27-10" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_ff3_daily"</span>,</span>
+<span id="cb27-11"><a href="#cb27-11" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> factors_ff3_daily,</span>
+<span id="cb27-12"><a href="#cb27-12" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
+<span id="cb27-13"><a href="#cb27-13" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb27-14"><a href="#cb27-14" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb27-15"><a href="#cb27-15" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
+<span id="cb27-16"><a href="#cb27-16" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
+<span id="cb27-17"><a href="#cb27-17" aria-hidden="true" tabindex="-1"></a>  <span class="st">"industries_ff_monthly"</span>,</span>
+<span id="cb27-18"><a href="#cb27-18" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> industries_ff_monthly,</span>
+<span id="cb27-19"><a href="#cb27-19" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
+<span id="cb27-20"><a href="#cb27-20" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb27-21"><a href="#cb27-21" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb27-22"><a href="#cb27-22" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
+<span id="cb27-23"><a href="#cb27-23" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
+<span id="cb27-24"><a href="#cb27-24" aria-hidden="true" tabindex="-1"></a>  <span class="st">"factors_q_monthly"</span>,</span>
+<span id="cb27-25"><a href="#cb27-25" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> factors_q_monthly,</span>
+<span id="cb27-26"><a href="#cb27-26" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
+<span id="cb27-27"><a href="#cb27-27" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb27-28"><a href="#cb27-28" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb27-29"><a href="#cb27-29" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
+<span id="cb27-30"><a href="#cb27-30" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
+<span id="cb27-31"><a href="#cb27-31" aria-hidden="true" tabindex="-1"></a>  <span class="st">"macro_predictors"</span>,</span>
+<span id="cb27-32"><a href="#cb27-32" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> macro_predictors,</span>
+<span id="cb27-33"><a href="#cb27-33" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
+<span id="cb27-34"><a href="#cb27-34" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb27-35"><a href="#cb27-35" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb27-36"><a href="#cb27-36" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
+<span id="cb27-37"><a href="#cb27-37" aria-hidden="true" tabindex="-1"></a>  tidy_finance,</span>
+<span id="cb27-38"><a href="#cb27-38" aria-hidden="true" tabindex="-1"></a>  <span class="st">"cpi_monthly"</span>,</span>
+<span id="cb27-39"><a href="#cb27-39" aria-hidden="true" tabindex="-1"></a>  <span class="at">value =</span> cpi_monthly,</span>
+<span id="cb27-40"><a href="#cb27-40" aria-hidden="true" tabindex="-1"></a>  <span class="at">overwrite =</span> <span class="cn">TRUE</span></span>
+<span id="cb27-41"><a href="#cb27-41" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>From now on, all you need to do to access data that is stored in the database is to follow three steps: (i) Establish the connection to the SQLite database, (ii) call the table you want to extract, and (iii) collect the data. For your convenience, the following steps show all you need in a compact fashion.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb29"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyverse)</span>
-<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(RSQLite)</span>
-<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb29-4"><a href="#cb29-4" aria-hidden="true" tabindex="-1"></a>tidy_finance <span class="ot">&lt;-</span> <span class="fu">dbConnect</span>(</span>
-<span id="cb29-5"><a href="#cb29-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">SQLite</span>(),</span>
-<span id="cb29-6"><a href="#cb29-6" aria-hidden="true" tabindex="-1"></a>  <span class="st">"data/tidy_finance_r.sqlite"</span>,</span>
-<span id="cb29-7"><a href="#cb29-7" aria-hidden="true" tabindex="-1"></a>  <span class="at">extended_types =</span> <span class="cn">TRUE</span></span>
-<span id="cb29-8"><a href="#cb29-8" aria-hidden="true" tabindex="-1"></a>)</span>
-<span id="cb29-9"><a href="#cb29-9" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb29-10"><a href="#cb29-10" aria-hidden="true" tabindex="-1"></a>factors_q_monthly <span class="ot">&lt;-</span> <span class="fu">tbl</span>(tidy_finance, <span class="st">"factors_q_monthly"</span>)</span>
-<span id="cb29-11"><a href="#cb29-11" aria-hidden="true" tabindex="-1"></a>factors_q_monthly <span class="ot">&lt;-</span> factors_q_monthly <span class="sc">|&gt;</span> <span class="fu">collect</span>()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyverse)</span>
+<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(RSQLite)</span>
+<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a>tidy_finance <span class="ot">&lt;-</span> <span class="fu">dbConnect</span>(</span>
+<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">SQLite</span>(),</span>
+<span id="cb28-6"><a href="#cb28-6" aria-hidden="true" tabindex="-1"></a>  <span class="st">"data/tidy_finance_r.sqlite"</span>,</span>
+<span id="cb28-7"><a href="#cb28-7" aria-hidden="true" tabindex="-1"></a>  <span class="at">extended_types =</span> <span class="cn">TRUE</span></span>
+<span id="cb28-8"><a href="#cb28-8" aria-hidden="true" tabindex="-1"></a>)</span>
+<span id="cb28-9"><a href="#cb28-9" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb28-10"><a href="#cb28-10" aria-hidden="true" tabindex="-1"></a>factors_q_monthly <span class="ot">&lt;-</span> <span class="fu">tbl</span>(tidy_finance, <span class="st">"factors_q_monthly"</span>)</span>
+<span id="cb28-11"><a href="#cb28-11" aria-hidden="true" tabindex="-1"></a>factors_q_monthly <span class="ot">&lt;-</span> factors_q_monthly <span class="sc">|&gt;</span> <span class="fu">collect</span>()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 </section>
 <section id="managing-sqlite-databases" class="level2">
@@ -920,8 +925,8 @@ <h2 class="anchored" data-anchor-id="managing-sqlite-databases">Managing SQLite
 <p>Finally, at the end of our data chapter, we revisit the SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.</p>
 <p>To optimize the database file, you can run the <code>VACUUM</code> command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the <code>dbSendQuery()</code> function.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb30"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a>res <span class="ot">&lt;-</span> <span class="fu">dbSendQuery</span>(tidy_finance, <span class="st">"VACUUM"</span>)</span>
-<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a>res</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a>res <span class="ot">&lt;-</span> <span class="fu">dbSendQuery</span>(tidy_finance, <span class="st">"VACUUM"</span>)</span>
+<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a>res</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="cell-output cell-output-stdout">
 <pre><code>&lt;SQLiteResult&gt;
   SQL  VACUUM
@@ -932,16 +937,19 @@ <h2 class="anchored" data-anchor-id="managing-sqlite-databases">Managing SQLite
 <p>The <code>VACUUM</code> command actually performs a couple of additional cleaning steps, which you can read about in <a href="https://www.sqlitetutorial.net/sqlite-vacuum/">this tutorial.</a> </p>
 <p>We store the result of the above query in <code>res</code> because the database keeps the result set open. To close open results and avoid warnings going forward, we can use <code>dbClearResult()</code>.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb32"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a><span class="fu">dbClearResult</span>(res)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="fu">dbClearResult</span>(res)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <p>Apart from cleaning up, you might be interested in listing all the tables that are currently in your database. You can do this via the <code>dbListTables()</code> function.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb33"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><a href="#cb33-1" aria-hidden="true" tabindex="-1"></a><span class="fu">dbListTables</span>(tidy_finance)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a><span class="fu">dbListTables</span>(tidy_finance)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="cell-output cell-output-stdout">
-<pre><code>[1] "cpi_monthly"           "factors_ff3_daily"    
-[3] "factors_ff3_monthly"   "factors_ff5_monthly"  
-[5] "factors_q_monthly"     "industries_ff_monthly"
-[7] "macro_predictors"     </code></pre>
+<pre><code> [1] "beta"                  "compustat"            
+ [3] "cpi_monthly"           "crsp_daily"           
+ [5] "crsp_monthly"          "factors_ff3_daily"    
+ [7] "factors_ff3_monthly"   "factors_ff5_monthly"  
+ [9] "factors_q_monthly"     "fisd"                 
+[11] "industries_ff_monthly" "macro_predictors"     
+[13] "trace_enhanced"       </code></pre>
 </div>
 </div>
 <p>This function comes in handy if you are unsure about the correct naming of the tables in your database.</p>
@@ -1078,13 +1086,14 @@ <h2 class="anchored" data-anchor-id="exercises">Exercises</h2>
       e.clearSelection();
     }
     const getTextToCopy = function(trigger) {
-        const codeEl = trigger.previousElementSibling.cloneNode(true);
-        for (const childEl of codeEl.children) {
-          if (isCodeAnnotation(childEl)) {
-            childEl.remove();
-          }
+      const outerScaffold = trigger.parentElement.cloneNode(true);
+      const codeEl = outerScaffold.querySelector('code');
+      for (const childEl of codeEl.children) {
+        if (isCodeAnnotation(childEl)) {
+          childEl.remove();
         }
-        return codeEl.innerText;
+      }
+      return codeEl.innerText;
     }
     const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', {
       text: getTextToCopy
diff --git a/docs/search.json b/docs/search.json
index 10878f9f..c7c93a78 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -3677,7 +3677,7 @@
     "href": "python/accessing-and-managing-financial-data.html",
     "title": "Accessing and Managing Financial Data",
     "section": "",
-    "text": "Note\n\n\n\nYou are reading Tidy Finance with Python. You can find the equivalent chapter for the sibling Tidy Finance with R here.\nIn this chapter, we suggest a way to organize your financial data. Everybody who has experience with data is also familiar with storing data in various formats like CSV, XLS, XLSX, or other delimited value storage. Reading and saving data can become very cumbersome when using different data formats and across different projects. Moreover, storing data in delimited files often leads to problems with respect to column type consistency. For instance, date-type columns frequently lead to inconsistencies across different data formats and programming languages.\nThis chapter shows how to import different open-source datasets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series. We store all the data in a single database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.\nFirst, we load the Python packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them.\nimport pandas as pd\nimport numpy as np\nimport tidyfinance as tf\nMoreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.\nstart_date = \"1960-01-01\"\nend_date = \"2024-12-31\"",
+    "text": "Note\n\n\n\nYou are reading Tidy Finance with Python. You can find the equivalent chapter for the sibling Tidy Finance with R here.\nIn this chapter, we suggest a way to organize your financial data. Everybody who has experience with data is also familiar with storing data in various formats like CSV, XLS, XLSX, or other delimited value storage. Reading and saving data can become very cumbersome when using different data formats and across different projects. Moreover, storing data in delimited files often leads to problems with respect to column type consistency. For instance, date-type columns frequently lead to inconsistencies across different data formats and programming languages.\nThis chapter shows how to import different open-source datasets. Specifically, our data comes from the application programming interface (API) of Yahoo Finance, a downloaded standard CSV file, an XLSX file stored in a public Google Drive repository, and other macroeconomic time series. We store all the data in a single database, which serves as the only source of data in subsequent chapters. We conclude the chapter by providing some tips on managing databases.\nFirst, we load the Python packages that we use throughout this chapter. Later on, we load more packages in the sections where we need them.\nimport pandas as pd\nimport numpy as np\nimport io\nimport re\nimport zipfile\nfrom curl_cffi import requests\nMoreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.\nstart_date = \"1960-01-01\"\nend_date = \"2024-12-31\"",
     "crumbs": [
       "R",
       "Financial Data",
@@ -3689,7 +3689,7 @@
     "href": "python/accessing-and-managing-financial-data.html#fama-french-data",
     "title": "Accessing and Managing Financial Data",
     "section": "Fama-French Data",
-    "text": "Fama-French Data\nWe start by downloading some famous Fama-French factors (e.g., Fama and French 1993) and portfolio returns commonly used in empirical asset pricing. Fortunately, the pandas-datareader package provides a simple interface to read data from Kenneth French’s Data Library.\n\nimport pandas_datareader as pdr\n\nWe can use the pdr.DataReader() function of the package to download monthly Fama-French factors. The set Fama/French 3 Factors contains the return time series of the market (mkt_excess), size (smb), and value (hml) factors alongside the risk-free rates (rf). Note that we have to do some manual work to parse all the columns correctly and scale them appropriately, as the raw Fama-French data comes in a unique data format. For precise descriptions of the variables, we suggest consulting Prof. Kenneth French’s finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks topandas_datareader.\n\nfactors_ff3_monthly_raw = pdr.DataReader(\n  name=\"F-F_Research_Data_Factors\",\n  data_source=\"famafrench\", \n  start=start_date, \n  end=end_date)[0]\n\nfactors_ff3_monthly = (factors_ff3_monthly_raw\n  .divide(100)\n  .reset_index(names=\"date\")\n  .assign(date=lambda x: pd.to_datetime(x[\"date\"].astype(str)))\n  .rename(str.lower, axis=\"columns\")\n  .rename(columns={\"mkt-rf\": \"mkt_excess\"})\n)\n\nWe also download the set 5 Factors (2x3), which additionally includes the return time series of the profitability (rmw) and investment (cma) factors. We demonstrate how the monthly factors are constructed in Replicating Fama and French Factors.\n\nfactors_ff5_monthly_raw = pdr.DataReader(\n  name=\"F-F_Research_Data_5_Factors_2x3\",\n  data_source=\"famafrench\", \n  start=start_date, \n  end=end_date)[0]\n\nfactors_ff5_monthly = (factors_ff5_monthly_raw\n  .divide(100)\n  .reset_index(names=\"date\")\n  .assign(date=lambda x: pd.to_datetime(x[\"date\"].astype(str)))\n  .rename(str.lower, axis=\"columns\")\n  .rename(columns={\"mkt-rf\": \"mkt_excess\"})\n)\n\nIt is straightforward to download the corresponding daily Fama-French factors with the same function.\n\nfactors_ff3_daily_raw = pdr.DataReader(\n  name=\"F-F_Research_Data_Factors_daily\",\n  data_source=\"famafrench\", \n  start=start_date, \n  end=end_date)[0]\n\nfactors_ff3_daily = (factors_ff3_daily_raw\n  .divide(100)\n  .reset_index(names=\"date\")\n  .rename(str.lower, axis=\"columns\")\n  .rename(columns={\"mkt-rf\": \"mkt_excess\"})\n)\n\nIn a subsequent chapter, we also use the monthly returns from ten industry portfolios, so let us fetch that data, too.\n\nindustries_ff_monthly_raw = pdr.DataReader(\n  name=\"10_Industry_Portfolios\",\n  data_source=\"famafrench\", \n  start=start_date, \n  end=end_date)[0]\n\nindustries_ff_monthly = (industries_ff_monthly_raw\n  .divide(100)\n  .reset_index(names=\"date\")\n  .assign(date=lambda x: pd.to_datetime(x[\"date\"].astype(str)))\n  .rename(str.lower, axis=\"columns\")\n)\n\nIt is worth taking a look at all available portfolio return time series from Kenneth French’s homepage. You should check out the other sets by calling pdr.famafrench.get_available_datasets().\nTo automatically download and process Fama-French data, you can also use the tidyfinance package with domain=\"factors_ff\" and the corresponding dataset, e.g.:\n\ntf.download_data(\n  domain=\"factors_ff\",\n  dataset=\"F-F_Research_Data_Factors\", \n  start_date=start_date, \n  end_date=end_date\n)\n\nThe tidyfinance package implements the processing steps as above and returns the same cleaned data frame.",
+    "text": "Fama-French Data\nWe start by downloading some famous Fama-French factors (e.g., Fama and French 1993) and portfolio returns commonly used in empirical asset pricing. The data are freely available from Kenneth French’s Data Library, but the raw files come in a rather idiosyncratic format. If you access the data via the website, the manual raw workflow looks like this:\n\nGo to the website\nFind the right dataset\nDownload a ZIP file\nExtract the CSV inside\nSelect the right data table from the file and import the table into Python\nClean the dates, scale the returns, fix column names, handle missing values, etc.\n\nDoing this once is fine; doing it repeatedly across projects is exactly the type of boilerplate that’s easy to mess up and annoying to maintain. It is therefore natural to automate these steps in Python.",
     "crumbs": [
       "R",
       "Financial Data",
@@ -3725,7 +3725,7 @@
     "href": "python/accessing-and-managing-financial-data.html#other-macroeconomic-data",
     "title": "Accessing and Managing Financial Data",
     "section": "Other Macroeconomic Data",
-    "text": "Other Macroeconomic Data\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. As an illustration, we use the already familiar pandas-datareader package to fetch consumer price index (CPI) data that can be found under the CPIAUCNS key.\n\ncpi_monthly = (pdr.DataReader(\n    name=\"CPIAUCNS\", \n    data_source=\"fred\", \n    start=start_date, \n    end=end_date\n  )\n  .reset_index(names=\"date\")\n  .rename(columns={\"CPIAUCNS\": \"cpi\"})\n  .assign(cpi=lambda x: x[\"cpi\"] / x[\"cpi\"].iloc[-1])\n)\n\nNote that we use the assign() in the last line to set the current (latest) price level as the reference inflation level. To download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the PCU2122212122210 key.\nThe tidyfinance package can, of course, also fetch the same daily data and many more data series:\n\ntf.download_data(\n  domain=\"fred\",\n  series=\"CPIAUCNS\", \n  start_date=start_date, \n  end_date=end_date\n)\n\nFailed to retrieve data for series CPIAUCNS: Failed to perform, curl: (6) Could not resolve host: https. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.\nFailed to retrieve data for series CPIAUCNS: 'date'\n\n\n\n\n\n\n\n\n\ndate\nvalue\nseries\n\n\n\n\n\n\n\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the PCU2122212122210 key. If your desired time series is not supported through tidyfinance, we recommend working with the fredapi package. Note that you need to get an API key to use its functionality. We refer to the package documentation for details.",
+    "text": "Other Macroeconomic Data\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. As an illustration, we use the tidyfinance package to fetch consumer price index (CPI) data that can be found under the CPIAUCNS key.\n\nseries = \"CPIAUCNS\"\nurl = f\"https://fred.stlouisfed.org/graph/fredgraph.csv?id={series}\"\n\nWe can then use the requests module to request the CSV, extract the data from the response body, and convert the columns to a tidy format:\n\nresp = requests.get(url)\nresp_csv = pd.io.common.StringIO(resp.text)\n\ncpi_monthly = (pd.read_csv(resp_csv)\n  .assign(\n    date=lambda x: pd.to_datetime(x[\"observation_date\"]),\n    value=lambda x: pd.to_numeric(\n      x[series], errors=\"coerce\"\n    ),\n      series=series,\n   )\n  .get([\"date\", \"series\", \"value\"])\n  .query(\"date &gt;= @start_date & date &lt;= @end_date\")\n  .assign(cpi=lambda x: x[\"value\"] / x[\"value\"].iloc[-1])\n)\n\nThe last line sets the current (latest) price level as the reference price level.\nThe tidyfinance package can, of course, also fetch the same index data and many more data series:\n\ntf.download_data(\n  domain=\"fred\",\n  series = \"CPIAUCNS\",\n  start_date = start_date,\n  end_date = end_date\n)\n\n\n\n\n\n\n\n\ndate\nseries\nvalue\n\n\n\n\n0\n1960-01-01\nCPIAUCNS\n29.300\n\n\n1\n1960-02-01\nCPIAUCNS\n29.400\n\n\n2\n1960-03-01\nCPIAUCNS\n29.400\n\n\n3\n1960-04-01\nCPIAUCNS\n29.500\n\n\n4\n1960-05-01\nCPIAUCNS\n29.500\n\n\n...\n...\n...\n...\n\n\n775\n2024-08-01\nCPIAUCNS\n314.796\n\n\n776\n2024-09-01\nCPIAUCNS\n315.301\n\n\n777\n2024-10-01\nCPIAUCNS\n315.664\n\n\n778\n2024-11-01\nCPIAUCNS\n315.493\n\n\n779\n2024-12-01\nCPIAUCNS\n315.605\n\n\n\n\n780 rows × 3 columns\n\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the PCU2122212122210 key. If your desired time series is not supported through tidyfinance, we recommend working with the fredapi package. Note that you need to get an API key to use its functionality. We refer to the package documentation for details.",
     "crumbs": [
       "R",
       "Financial Data",
@@ -3737,7 +3737,7 @@
     "href": "python/accessing-and-managing-financial-data.html#setting-up-a-database",
     "title": "Accessing and Managing Financial Data",
     "section": "Setting Up a Database",
-    "text": "Setting Up a Database\nNow that we have downloaded some (freely available) data from the web into the memory of our Python session, let us set up a database to store that information for future use. We will use the data stored in this database throughout the following chapters, but you could alternatively implement a different strategy and replace the respective code.\nThere are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an SQLite-database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured SQL database engine. Note that SQL (Structured Query Language) is a standard language for accessing and manipulating databases.\n\nimport sqlite3\n\nAn SQLite-database is easily created - the code below is really all there is. You do not need any external software. Otherwise, date columns are stored and retrieved as integers. We will use the file tidy_finance_r.sqlite, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.\n\nimport os\n\nif not os.path.exists(\"data\"):\n  os.makedirs(\"data\")\n    \ntidy_finance = sqlite3.connect(database=\"data/tidy_finance_python.sqlite\")\n\nNext, we create a remote table with the monthly Fama-French factor data. We do so with the pandas function to_sql(), which copies the data to our SQLite-database.\n\n(factors_ff3_monthly\n  .to_sql(name=\"factors_ff3_monthly\", \n          con=tidy_finance, \n          if_exists=\"replace\",\n          index=False)\n)\n\nNow, if we want to have the whole table in memory, we need to call pd.read_sql_query() with the corresponding query. You will see that we regularly load the data into the memory in the next chapters.\n\npd.read_sql_query(\n  sql=\"SELECT date, rf FROM factors_ff3_monthly\",\n  con=tidy_finance,\n  parse_dates={\"date\"}\n)\n\n\n\n\n\n\n\n\ndate\nrf\n\n\n\n\n0\n1960-01-01\n0.0033\n\n\n1\n1960-02-01\n0.0029\n\n\n2\n1960-03-01\n0.0035\n\n\n3\n1960-04-01\n0.0019\n\n\n4\n1960-05-01\n0.0027\n\n\n...\n...\n...\n\n\n775\n2024-08-01\n0.0048\n\n\n776\n2024-09-01\n0.0040\n\n\n777\n2024-10-01\n0.0039\n\n\n778\n2024-11-01\n0.0040\n\n\n779\n2024-12-01\n0.0037\n\n\n\n\n780 rows × 2 columns\n\n\n\nThe last couple of code chunks are really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages.\nBefore we move on to the next data source, let us also store the other six tables in our new SQLite database.\n\ndata_dict = {\n  \"factors_ff5_monthly\": factors_ff5_monthly,\n  \"factors_ff3_daily\": factors_ff3_daily,\n  \"industries_ff_monthly\": industries_ff_monthly, \n  \"factors_q_monthly\": factors_q_monthly,\n  \"macro_predictors\": macro_predictors,\n  \"cpi_monthly\": cpi_monthly\n}\n\nfor key, value in data_dict.items():\n    value.to_sql(name=key,\n                 con=tidy_finance, \n                 if_exists=\"replace\",\n                 index=False)\n\nFrom now on, all you need to do to access data that is stored in the database is to follow two steps: (i) Establish the connection to the SQLite-database and (ii) execute the query to fetch the data. For your convenience, the following steps show all you need in a compact fashion.\n\nimport pandas as pd\nimport sqlite3\n\ntidy_finance = sqlite3.connect(database=\"data/tidy_finance_python.sqlite\")\n\nfactors_q_monthly = pd.read_sql_query(\n  sql=\"SELECT * FROM factors_q_monthly\",\n  con=tidy_finance,\n  parse_dates={\"date\"}\n)",
+    "text": "Setting Up a Database\nNow that we have downloaded some (freely available) data from the web into the memory of our Python session, let us set up a database to store that information for future use. We will use the data stored in this database throughout the following chapters, but you could alternatively implement a different strategy and replace the respective code.\nThere are many ways to set up and organize a database, depending on the use case. For our purpose, the most efficient way is to use an SQLite-database, which is the C-language library that implements a small, fast, self-contained, high-reliability, full-featured SQL database engine. Note that SQL (Structured Query Language) is a standard language for accessing and manipulating databases.\n\nimport sqlite3\n\nAn SQLite-database is easily created - the code below is really all there is. You do not need any external software. Otherwise, date columns are stored and retrieved as integers. We will use the file tidy_finance_r.sqlite, located in the data subfolder, to retrieve data for all subsequent chapters. The initial part of the code ensures that the directory is created if it does not already exist.\n\nimport os\n\nif not os.path.exists(\"data\"):\n  os.makedirs(\"data\")\n    \ntidy_finance = sqlite3.connect(database=\"data/tidy_finance_python.sqlite\")\n\nNext, we create a remote table with the monthly Fama-French factor data. We do so with the pandas function to_sql(), which copies the data to our SQLite-database.\n\n(factors_ff3_monthly\n  .to_sql(name=\"factors_ff3_monthly\", \n          con=tidy_finance, \n          if_exists=\"replace\",\n          index=False)\n)\n\nNow, if we want to have the whole table in memory, we need to call pd.read_sql_query() with the corresponding query. You will see that we regularly load the data into the memory in the next chapters.\n\npd.read_sql_query(\n  sql=\"SELECT date, risk_free FROM factors_ff3_monthly\",\n  con=tidy_finance,\n  parse_dates={\"date\"}\n)\n\n\n\n\n\n\n\n\ndate\nrisk_free\n\n\n\n\n0\n1960-01-01\n0.0033\n\n\n1\n1960-02-01\n0.0029\n\n\n2\n1960-03-01\n0.0035\n\n\n3\n1960-04-01\n0.0019\n\n\n4\n1960-05-01\n0.0027\n\n\n...\n...\n...\n\n\n775\n2024-08-01\n0.0048\n\n\n776\n2024-09-01\n0.0040\n\n\n777\n2024-10-01\n0.0039\n\n\n778\n2024-11-01\n0.0040\n\n\n779\n2024-12-01\n0.0037\n\n\n\n\n780 rows × 2 columns\n\n\n\nThe last couple of code chunks are really all there is to organizing a simple database! You can also share the SQLite database across devices and programming languages.\nBefore we move on to the next data source, let us also store the other six tables in our new SQLite database.\n\ndata_dict = {\n  \"factors_ff5_monthly\": factors_ff5_monthly,\n  \"factors_ff3_daily\": factors_ff3_daily,\n  \"industries_ff_monthly\": industries_ff_monthly, \n  \"factors_q_monthly\": factors_q_monthly,\n  \"macro_predictors\": macro_predictors,\n  \"cpi_monthly\": cpi_monthly\n}\n\nfor key, value in data_dict.items():\n    value.to_sql(name=key,\n                 con=tidy_finance, \n                 if_exists=\"replace\",\n                 index=False)\n\nFrom now on, all you need to do to access data that is stored in the database is to follow two steps: (i) Establish the connection to the SQLite-database and (ii) execute the query to fetch the data. For your convenience, the following steps show all you need in a compact fashion.\n\nimport pandas as pd\nimport sqlite3\n\ntidy_finance = sqlite3.connect(database=\"data/tidy_finance_python.sqlite\")\n\nfactors_q_monthly = pd.read_sql_query(\n  sql=\"SELECT * FROM factors_q_monthly\",\n  con=tidy_finance,\n  parse_dates={\"date\"}\n)",
     "crumbs": [
       "R",
       "Financial Data",
@@ -3773,7 +3773,7 @@
     "href": "python/accessing-and-managing-financial-data.html#exercises",
     "title": "Accessing and Managing Financial Data",
     "section": "Exercises",
-    "text": "Exercises\n\nDownload the monthly Fama-French factors manually from Kenneth French’s data library and read them in via pd.read_csv(). Validate that you get the same data as via the pandas-datareader package.\nDownload the daily Fama-French 5 factors using the pdr.DataReader() package. After the successful download and conversion to the column format that we used above, compare the rf, mkt_excess, smb, and hml columns of factors_ff3_daily to factors_ff5_daily. Discuss any differences you might find.",
+    "text": "Exercises\n\nDownload the monthly Fama-French factors manually from Kenneth French’s data library and read them in via pd.read_csv(). Validate that you get the same data as via the tf.download_data() package.\nDownload the daily Fama-French 5 factors using the tf.download_data() function. After the successful download and conversion to the column format that we used above, compare the risk_free, mkt_excess, smb, and hml columns of factors_ff3_daily to factors_ff5_daily. Discuss any differences you might find.",
     "crumbs": [
       "R",
       "Financial Data",
@@ -4881,7 +4881,7 @@
     "href": "r/accessing-and-managing-financial-data.html#other-macroeconomic-data",
     "title": "Accessing and Managing Financial Data",
     "section": "Other Macroeconomic Data",
-    "text": "Other Macroeconomic Data\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. The data can be downloaded directly from FRED by constructing the appropriate URL. For instance, let us consider the consumer price index (CPI) data that can be found under the CPIAUCNS:\n\nseries &lt;- \"CPIAUCNS\"\ncpi_url &lt;- paste0(\n  \"https://fred.stlouisfed.org/graph/fredgraph.csv?id=\", series\n)\n\nWe can then use the httr2 (Wickham 2024) package to request the CSV, extract the data from the response body, and convert the columns to a tidy format:\n\nlibrary(httr2)\n\ncpi_daily &lt;- request(cpi_url) |&gt;\n  req_perform() |&gt;\n  resp_body_string() |&gt;\n  read_csv() |&gt;\n  mutate(\n    date = as.Date(observation_date),\n    value = as.numeric(.data[[series]]),\n    series = series,\n    .keep = \"none\"\n  )\n\nWe convert the daily CPI data to monthly because we use the latter in later chapters.\n\ncpi_monthly &lt;- cpi_daily |&gt;\n  mutate(\n    date = floor_date(date, \"month\"),\n    cpi = value / value[date == max(date)],\n    .keep = \"none\"\n  )\n\nThe tidyfinance package can, of course, also fetch the same daily data and many more data series:\n\ndownload_data(\n  type = \"fred\",\n  series = \"CPIAUCNS\",\n  start_date = start_date,\n  end_date = end_date\n)\n\n# A tibble: 0 × 3\n# ℹ 3 variables: date &lt;date&gt;, value &lt;dbl&gt;, series &lt;chr&gt;\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the PCU2122212122210 key. If your desired time series is not supported through tidyfinance, we recommend working with the fredr package (Boysel and Vaughan 2021). Note that you need to get an API key to use its functionality. We refer to the package documentation for details.",
+    "text": "Other Macroeconomic Data\nThe Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. The data can be downloaded directly from FRED by constructing the appropriate URL. For instance, let us consider the consumer price index (CPI) data that can be found under the CPIAUCNS:\n\nseries &lt;- \"CPIAUCNS\"\ncpi_url &lt;- paste0(\n  \"https://fred.stlouisfed.org/graph/fredgraph.csv?id=\", series\n)\n\nWe can then use the httr2 (Wickham 2024) package to request the CSV, extract the data from the response body, and convert the columns to a tidy format:\n\nlibrary(httr2)\n\nresp &lt;- request(cpi_url) |&gt; \n  req_perform()\nresp_csv &lt;- resp |&gt; \n  resp_body_string() \n\ncpi_monthly &lt;- resp_csv |&gt; \n  read_csv() |&gt;\n  mutate(\n    date = as.Date(observation_date),\n    value = as.numeric(.data[[series]]),\n    series = series,\n    .keep = \"none\"\n  ) |&gt;\n  filter(date &gt;= start_date & date &lt;= end_date) |&gt; \n  mutate(\n    cpi = value / value[date == max(date)]\n  )\n\nThe last line sets the current (latest) price level as the reference price level.\nThe tidyfinance package can, of course, also fetch the same index data and many more data series:\n\ndownload_data(\n  type = \"fred\",\n  series = \"CPIAUCNS\",\n  start_date = start_date,\n  end_date = end_date\n)\n\n# A tibble: 0 × 3\n# ℹ 3 variables: date &lt;date&gt;, value &lt;dbl&gt;, series &lt;chr&gt;\n\n\nTo download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the PCU2122212122210 key. If your desired time series is not supported through tidyfinance, we recommend working with the fredr package (Boysel and Vaughan 2021). Note that you need to get an API key to use its functionality. We refer to the package documentation for details.",
     "crumbs": [
       "R",
       "Financial Data",
@@ -4905,7 +4905,7 @@
     "href": "r/accessing-and-managing-financial-data.html#managing-sqlite-databases",
     "title": "Accessing and Managing Financial Data",
     "section": "Managing SQLite Databases",
-    "text": "Managing SQLite Databases\nFinally, at the end of our data chapter, we revisit the SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.\nTo optimize the database file, you can run the VACUUM command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the dbSendQuery() function.\n\nres &lt;- dbSendQuery(tidy_finance, \"VACUUM\")\nres\n\n&lt;SQLiteResult&gt;\n  SQL  VACUUM\n  ROWS Fetched: 0 [complete]\n       Changed: 0\n\n\nThe VACUUM command actually performs a couple of additional cleaning steps, which you can read about in this tutorial. \nWe store the result of the above query in res because the database keeps the result set open. To close open results and avoid warnings going forward, we can use dbClearResult().\n\ndbClearResult(res)\n\nApart from cleaning up, you might be interested in listing all the tables that are currently in your database. You can do this via the dbListTables() function.\n\ndbListTables(tidy_finance)\n\n[1] \"cpi_monthly\"           \"factors_ff3_daily\"    \n[3] \"factors_ff3_monthly\"   \"factors_ff5_monthly\"  \n[5] \"factors_q_monthly\"     \"industries_ff_monthly\"\n[7] \"macro_predictors\"     \n\n\nThis function comes in handy if you are unsure about the correct naming of the tables in your database.",
+    "text": "Managing SQLite Databases\nFinally, at the end of our data chapter, we revisit the SQLite database itself. When you drop database objects such as tables or delete data from tables, the database file size remains unchanged because SQLite just marks the deleted objects as free and reserves their space for future uses. As a result, the database file always grows in size.\nTo optimize the database file, you can run the VACUUM command in the database, which rebuilds the database and frees up unused space. You can execute the command in the database using the dbSendQuery() function.\n\nres &lt;- dbSendQuery(tidy_finance, \"VACUUM\")\nres\n\n&lt;SQLiteResult&gt;\n  SQL  VACUUM\n  ROWS Fetched: 0 [complete]\n       Changed: 0\n\n\nThe VACUUM command actually performs a couple of additional cleaning steps, which you can read about in this tutorial. \nWe store the result of the above query in res because the database keeps the result set open. To close open results and avoid warnings going forward, we can use dbClearResult().\n\ndbClearResult(res)\n\nApart from cleaning up, you might be interested in listing all the tables that are currently in your database. You can do this via the dbListTables() function.\n\ndbListTables(tidy_finance)\n\n [1] \"beta\"                  \"compustat\"            \n [3] \"cpi_monthly\"           \"crsp_daily\"           \n [5] \"crsp_monthly\"          \"factors_ff3_daily\"    \n [7] \"factors_ff3_monthly\"   \"factors_ff5_monthly\"  \n [9] \"factors_q_monthly\"     \"fisd\"                 \n[11] \"industries_ff_monthly\" \"macro_predictors\"     \n[13] \"trace_enhanced\"       \n\n\nThis function comes in handy if you are unsure about the correct naming of the tables in your database.",
     "crumbs": [
       "R",
       "Financial Data",
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 928c7375..ba6e7e95 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -214,7 +214,7 @@
   </url>
   <url>
     <loc>https://www.tidy-finance.org/python/accessing-and-managing-financial-data.html</loc>
-    <lastmod>2025-09-04T10:41:27.789Z</lastmod>
+    <lastmod>2025-12-07T22:27:22.252Z</lastmod>
   </url>
   <url>
     <loc>https://www.tidy-finance.org/python/changelog.html</loc>
@@ -274,7 +274,7 @@
   </url>
   <url>
     <loc>https://www.tidy-finance.org/r/accessing-and-managing-financial-data.html</loc>
-    <lastmod>2025-09-04T08:29:23.489Z</lastmod>
+    <lastmod>2025-12-07T22:27:49.384Z</lastmod>
   </url>
   <url>
     <loc>https://www.tidy-finance.org/r/changelog.html</loc>
diff --git a/python/accessing-and-managing-financial-data.qmd b/python/accessing-and-managing-financial-data.qmd
index 9af54b17..c96179e6 100644
--- a/python/accessing-and-managing-financial-data.qmd
+++ b/python/accessing-and-managing-financial-data.qmd
@@ -23,7 +23,10 @@ First, we load the Python packages that we use throughout this chapter. Later on
 ```{python}
 import pandas as pd
 import numpy as np
-import tidyfinance as tf
+import io
+import re
+import zipfile
+from curl_cffi import requests
 ```
 
 Moreover, we initially define the date range for which we fetch and store the financial data, making future data updates tractable. In case you need another time frame, you can adjust the dates below. Our data starts with 1960 since most asset pricing studies use data from 1962 on.
@@ -35,97 +38,155 @@ end_date = "2024-12-31"
 
 ## Fama-French Data
 
-We start by downloading some famous Fama-French factors [e.g., @Fama1993] and portfolio returns commonly used in empirical asset pricing. Fortunately, the `pandas-datareader` package provides a simple interface to read data from Kenneth French's Data Library.\index{Data!Fama-French factors}\index{Kenneth French homepage}
+We start by downloading some famous Fama-French factors [e.g., @Fama1993] and portfolio returns commonly used in empirical asset pricing. The data are freely available from Kenneth French’s Data Library, but the raw files come in a rather idiosyncratic format. If you access the data via the website, the manual *raw* workflow looks like this:
+
+1. Go to the website
+1. Find the right dataset
+1. Download a ZIP file
+1. Extract the CSV inside
+1. Select the right data table from the file and import the table into Python
+1. Clean the dates, scale the returns, fix column names, handle missing values, etc.
+
+Doing this once is fine; doing it repeatedly across projects is exactly the type of boilerplate that’s easy to mess up and annoying to maintain. It is therefore natural to automate these steps in Python.
+
+# From manual steps to a download script
+
+A minimal download script mirrors the manual steps one by one. For example, to fetch a Fama–French dataset you first construct the URL:
 
 ```{python}
-import pandas_datareader as pdr
+dataset = "F-F_Research_Data_Factors"
+base_url = "http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/"
+url = f"{base_url}{dataset}_CSV.zip"
 ```
 
-We can use the `pdr.DataReader()` function of the package to download monthly Fama-French factors. The set *Fama/French 3 Factors* contains the return time series of the market (`mkt_excess`), size (`smb`), and value (`hml`) factors alongside the risk-free rates (`rf`). Note that we have to do some manual work to parse all the columns correctly and scale them appropriately, as the raw Fama-French data comes in a unique data format. For precise descriptions of the variables, we suggest consulting Prof. Kenneth French's finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to`pandas_datareader`.\index{Factor!Market}\index{Factor!Size}\index{Factor!Value}\index{Factor!Profitability}\index{Factor!Investment}\index{Risk-free rate}
+Next, you replace the browser download with an HTTP request and extract the ZIP in memory:
 
 ```{python}
-factors_ff3_monthly_raw = pdr.DataReader(
-  name="F-F_Research_Data_Factors",
-  data_source="famafrench", 
-  start=start_date, 
-  end=end_date)[0]
-
-factors_ff3_monthly = (factors_ff3_monthly_raw
-  .divide(100)
-  .reset_index(names="date")
-  .assign(date=lambda x: pd.to_datetime(x["date"].astype(str)))
-  .rename(str.lower, axis="columns")
-  .rename(columns={"mkt-rf": "mkt_excess"})
-)
+resp = requests.get(url)
+resp.raise_for_status()
+
+with zipfile.ZipFile(io.BytesIO(resp.content)) as zf:
+    file_name = zf.namelist()[0]  # Ken French ZIPs contain one file
+    raw_text = zf.read(file_name).decode("latin1")
 ```
 
-We also download the set *5 Factors (2x3)*, which additionally includes the return time series of the profitability (`rmw`) and investment (`cma`) factors. We demonstrate how the monthly factors are constructed in [Replicating Fama and French Factors](replicating-fama-and-french-factors.qmd).
+The most important part of this chunk is the `requests.get()` call. This is the moment where we replace all the manual browser work (open the website, click download, save the file) with a single, reproducible line of code. Then, calling `raise_for_status()` ensures we stop immediately if the server returns an error (e.g. HTTP 404 or 500) instead of quietly handling a broken file. Once this succeeds, `resp.content` is guaranteed to contain valid ZIP bytes that we can open in memory.
+
+The raw file contains documentation text followed by the actual data table(s). To emulate *scrolling down until the numbers start*, you can split the file into blocks and keep the long one that contains the table:
+
+```{python}
+chunks = raw_text.split("\r\n\r\n")
+table_text = max(chunks, key=len)  
+```
+
+Within this block, the first CSV header line starts at the first line beginning with a comma. We add a “Date” label for the index and pass everything to `read_csv`:
 
 ```{python}
-factors_ff5_monthly_raw = pdr.DataReader(
-  name="F-F_Research_Data_5_Factors_2x3",
-  data_source="famafrench", 
-  start=start_date, 
-  end=end_date)[0]
-
-factors_ff5_monthly = (factors_ff5_monthly_raw
-  .divide(100)
-  .reset_index(names="date")
-  .assign(date=lambda x: pd.to_datetime(x["date"].astype(str)))
-  .rename(str.lower, axis="columns")
-  .rename(columns={"mkt-rf": "mkt_excess"})
+match = re.search(r"^\s*,", table_text, flags=re.M)
+start = match.start()
+csv_text = "Date" + table_text[start:]
+
+factors_ff_raw = pd.read_csv(io.StringIO(csv_text), index_col=0)
+```
+
+At this point, the index still consists of integer date codes with different lengths depending on the frequency. We need a bit of logic to convert them into a proper `DatetimeIndex`:
+
+```{python}
+s = factors_ff_raw.index.astype(str)
+
+if (s.str.len() == 8).all():  # daily: YYYYMMDD
+    dt = pd.to_datetime(s, format="%Y%m%d")
+elif (s.str.len() == 6).all():  # monthly: YYYYMM
+    dt = pd.to_datetime(s + "01", format="%Y%m%d")
+elif (s.str.len() == 4).all():  # annual: YYYY
+    dt = pd.to_datetime(s + "0101", format="%Y%m%d")
+    dt = dt.dt.to_period("A-DEC").dt.to_timestamp("end")
+else:
+    raise ValueError("Unknown date format in Fama–French index.")
+
+factors_ff_raw = factors_ff_raw.set_index(dt)
+factors_ff_raw.index.name = "date"
+```
+
+Finally, we still have to clean the data:
+
+- Convert returns from percent to decimal.
+- Standardize column names (e.g., all lower case and Mkt-RF to mkt_excess, RF to risk_free)
+- Replace special missing-value codes (-99.99, -999) with actual missing values
+- Filter the data by a start and end date
+
+This all could look like this:
+
+```{python}
+# start and end dates
+if start_date:
+    factors_ff_raw = factors_ff_raw[factors_ff_raw.index >= pd.to_datetime(start_date)]
+if end_date:
+    factors_ff_raw = factors_ff_raw[factors_ff_raw.index <= pd.to_datetime(end_date)]
+
+factors_ff3_monthly = (factors_ff_raw
+    .div(100)
+    .reset_index(names="date")
+    .rename(columns=str.lower)
+    .rename(columns={"mkt-rf": "mkt_excess", "rf": "risk_free"})
+    .replace({"-99.99": pd.NA, -99.99: pd.NA, -999: pd.NA})
 )
+factors_ff3_monthly
 ```
 
-It is straightforward to download the corresponding *daily* Fama-French factors with the same function. 
+All of these steps are doable, but none of them are really about finance - they are just the technical scaffolding required before you can work with the actual factor returns. That’s where a dedicated helper or package becomes invaluable. The `tidyfinance` package performs this entire workflow under the hood: you request a Fama–French dataset and receive a clean, consistently formatted data table from Kenneth French's Data Library.\index{Data!Fama-French factors}\index{Kenneth French homepage}. This avoids repetitive boilerplate, reduces errors, and lets you focus on modeling and analysis rather than on data plumbing.
+
+# Using `tidyfinance` instead of reimplementing the plumbing
 
 ```{python}
-factors_ff3_daily_raw = pdr.DataReader(
-  name="F-F_Research_Data_Factors_daily",
-  data_source="famafrench", 
-  start=start_date, 
-  end=end_date)[0]
-
-factors_ff3_daily = (factors_ff3_daily_raw
-  .divide(100)
-  .reset_index(names="date")
-  .rename(str.lower, axis="columns")
-  .rename(columns={"mkt-rf": "mkt_excess"})
+import tidyfinance as tf
+```
+
+For example, we can use the `tf.download_data()` function of the package to download monthly Fama-French factors. The set *Fama/French 3 Factors* contains the return time series of the market (`mkt_excess`), size (`smb`), and value (`hml`) factors alongside the risk-free rates (`risk_free`). Note that the `tf.download_data()` function parses all the columns correctly and already scale them appropriately, as the raw Fama-French data comes in a unique data format. For precise descriptions of the variables, we suggest consulting Prof. Kenneth French's finance data library directly. If you are on the website, check the raw data files to appreciate the time you can save thanks to the `tidyfinance` package.\index{Factor!Market}\index{Factor!Size}\index{Factor!Value}\index{Factor!Profitability}\index{Factor!Investment}\index{Risk-free rate}
+
+```{python}
+factors_ff3_monthly = tf.download_data(
+  domain="famafrench",
+  dataset="F-F_Research_Data_Factors",
+  start_date=start_date,
+  end_date=end_date,
 )
 ```
 
-In a subsequent chapter, we also use the monthly returns from ten industry portfolios, so let us fetch that data, too.\index{Data!Industry portfolios}
+We also download the set *5 Factors (2x3)*, which additionally includes the return time series of the profitability (`rmw`) and investment (`cma`) factors. We demonstrate how the monthly factors are constructed in [Replicating Fama and French Factors](replicating-fama-and-french-factors.qmd).
 
 ```{python}
-industries_ff_monthly_raw = pdr.DataReader(
-  name="10_Industry_Portfolios",
-  data_source="famafrench", 
-  start=start_date, 
-  end=end_date)[0]
-
-industries_ff_monthly = (industries_ff_monthly_raw
-  .divide(100)
-  .reset_index(names="date")
-  .assign(date=lambda x: pd.to_datetime(x["date"].astype(str)))
-  .rename(str.lower, axis="columns")
+factors_ff5_monthly = tf.download_data(
+  domain="famafrench",
+  dataset="F-F_Research_Data_5_Factors_2x3",
+  start_date=start_date,
+  end_date=end_date,
 )
 ```
 
-It is worth taking a look at all available portfolio return time series from Kenneth French's homepage. You should check out the other sets by calling `pdr.famafrench.get_available_datasets()`.
+It is straightforward to download the corresponding *daily* Fama-French factors with the same function. 
+
+```{python}
+factors_ff3_daily = tf.download_data(
+  domain="famafrench",
+  dataset="F-F_Research_Data_Factors_daily",
+  start_date=start_date,
+  end_date=end_date,
+)
+```
 
-To automatically download and process Fama-French data, you can also use the `tidyfinance` package with `domain="factors_ff"` and the corresponding dataset, e.g.:
+In a subsequent chapter, we also use the monthly returns from ten industry portfolios, so let us fetch that data, too.\index{Data!Industry portfolios}
 
 ```{python}
-#| output: false
-tf.download_data(
-  domain="factors_ff",
-  dataset="F-F_Research_Data_Factors", 
-  start_date=start_date, 
-  end_date=end_date
+industries_ff_monthly = tf.download_data(
+  domain="famafrench",
+  dataset="10_Industry_Portfolios",
+  start_date=start_date,
+  end_date=end_date,
 )
 ```
 
-The `tidyfinance` package implements the processing steps as above and returns the same cleaned data frame. 
+It is worth taking a look at all available portfolio return time series from Kenneth French's homepage. You should check out the other sets by calling `tf.get_available_famafrench_datasets()`.
 
 ## q-Factors
 
@@ -240,31 +301,43 @@ tf.download_data(
 
 ## Other Macroeconomic Data
 
-The Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. As an illustration, we use the already familiar `pandas-datareader` package to fetch consumer price index (CPI) data that can be found under the [CPIAUCNS](https://fred.stlouisfed.org/series/CPIAUCNS) key.\index{Data!FRED}\index{Data!CPI}
+The Federal Reserve bank of St. Louis provides the Federal Reserve Economic Data (FRED), an extensive database for macroeconomic data. In total, there are 817,000 US and international time series from 108 different sources. As an illustration, we use the `tidyfinance` package to fetch consumer price index (CPI) data that can be found under the [CPIAUCNS](https://fred.stlouisfed.org/series/CPIAUCNS) key.\index{Data!FRED}\index{Data!CPI}
 
 ```{python}
-cpi_monthly = (pdr.DataReader(
-    name="CPIAUCNS", 
-    data_source="fred", 
-    start=start_date, 
-    end=end_date
-  )
-  .reset_index(names="date")
-  .rename(columns={"CPIAUCNS": "cpi"})
-  .assign(cpi=lambda x: x["cpi"] / x["cpi"].iloc[-1])
+series = "CPIAUCNS"
+url = f"https://fred.stlouisfed.org/graph/fredgraph.csv?id={series}"
+```
+
+We can then use the `requests` module to request the CSV, extract the data from the response body, and convert the columns to a tidy format:
+
+```{python}
+resp = requests.get(url)
+resp_csv = pd.io.common.StringIO(resp.text)
+
+cpi_monthly = (pd.read_csv(resp_csv)
+  .assign(
+    date=lambda x: pd.to_datetime(x["observation_date"]),
+    value=lambda x: pd.to_numeric(
+      x[series], errors="coerce"
+    ),
+      series=series,
+   )
+  .get(["date", "series", "value"])
+  .query("date >= @start_date & date <= @end_date")
+  .assign(cpi=lambda x: x["value"] / x["value"].iloc[-1])
 )
 ```
 
-Note that we use the `assign()` in the last line to set the current (latest) price level as the reference inflation level. To download other time series, we just have to look it up on the FRED website and extract the corresponding key from the address. For instance, the producer price index for gold ores can be found under the [PCU2122212122210](https://fred.stlouisfed.org/series/PCU2122212122210) key.
+The last line sets the current (latest) price level as the reference price level.
 
-The `tidyfinance` package can, of course, also fetch the same daily data and many more data series:
+The `tidyfinance` package can, of course, also fetch the same index data and many more data series:
 
 ```{python}
 tf.download_data(
   domain="fred",
-  series="CPIAUCNS", 
-  start_date=start_date, 
-  end_date=end_date
+  series = "CPIAUCNS",
+  start_date = start_date,
+  end_date = end_date
 )
 ```
 
@@ -307,7 +380,7 @@ Now, if we want to have the whole table in memory, we need to call `pd.read_sql_
 
 ```{python}
 pd.read_sql_query(
-  sql="SELECT date, rf FROM factors_ff3_monthly",
+  sql="SELECT date, risk_free FROM factors_ff3_monthly",
   con=tidy_finance,
   parse_dates={"date"}
 )
@@ -375,5 +448,5 @@ The `VACUUM` command actually performs a couple of additional cleaning steps, wh
 
 ## Exercises
 
-1. Download the monthly Fama-French factors manually from [Kenneth French's data library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) and read them in via `pd.read_csv()`. Validate that you get the same data as via the `pandas-datareader` package. 
-1. Download the daily Fama-French 5 factors using the `pdr.DataReader()` package. After the successful download and conversion to the column format that we used above, compare the `rf`, `mkt_excess`, `smb`, and `hml` columns of `factors_ff3_daily` to `factors_ff5_daily`. Discuss any differences you might find. 
+1. Download the monthly Fama-French factors manually from [Kenneth French's data library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) and read them in via `pd.read_csv()`. Validate that you get the same data as via the `tf.download_data()` package. 
+1. Download the daily Fama-French 5 factors using the `tf.download_data()` function. After the successful download and conversion to the column format that we used above, compare the `risk_free`, `mkt_excess`, `smb`, and `hml` columns of `factors_ff3_daily` to `factors_ff5_daily`. Discuss any differences you might find. 
diff --git a/r/accessing-and-managing-financial-data.qmd b/r/accessing-and-managing-financial-data.qmd
index 013e28f5..1d183f5c 100644
--- a/r/accessing-and-managing-financial-data.qmd
+++ b/r/accessing-and-managing-financial-data.qmd
@@ -241,30 +241,28 @@ We can then use the `httr2` [@httr2] package to request the CSV, extract the dat
 #| message: false
 library(httr2)
 
-cpi_daily <- request(cpi_url) |>
-  req_perform() |>
-  resp_body_string() |>
+resp <- request(cpi_url) |> 
+  req_perform()
+resp_csv <- resp |> 
+  resp_body_string() 
+
+cpi_monthly <- resp_csv |> 
   read_csv() |>
   mutate(
     date = as.Date(observation_date),
     value = as.numeric(.data[[series]]),
     series = series,
     .keep = "none"
-  )
-```
-
-We convert the daily CPI data to monthly because we use the latter in later chapters. 
-
-```{r}
-cpi_monthly <- cpi_daily |>
+  ) |>
+  filter(date >= start_date & date <= end_date) |> 
   mutate(
-    date = floor_date(date, "month"),
-    cpi = value / value[date == max(date)],
-    .keep = "none"
+    cpi = value / value[date == max(date)]
   )
 ```
 
-The `tidyfinance` package can, of course, also fetch the same daily data and many more data series:
+The last line sets the current (latest) price level as the reference price level.
+
+The `tidyfinance` package can, of course, also fetch the same index data and many more data series:
 
 ```{r}
 #| message: false
diff --git a/requirements.txt b/requirements.txt
index 21ab3bc5..f3d48826 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -145,7 +145,7 @@ statsmodels==0.14.4
 tabulate==0.9.0
 terminado==0.17.1
 threadpoolctl==3.2.0
-tidyfinance==0.1.2
+tidyfinance==0.2.4
 tinycss2==1.2.1
 tomli==2.0.1
 tornado==6.3.3