diff --git a/biblio.bib b/biblio.bib
new file mode 100644
index 0000000..e22441f
--- /dev/null
+++ b/biblio.bib
@@ -0,0 +1,7 @@
+@article{Culkin_Das_2017,
+ title={Machine Learning in Finance: The Case of Deep Learning for Option Pricing},
+ url={https://srdas.github.io/Papers/BlackScholesNN.pdf},
+ author={Culkin, Robert and Das, Sanjiv},
+ year={2017},
+ month={August}
+ }
\ No newline at end of file
diff --git a/docs/BS_Model_draft/BS_Model_draft.md b/docs/BS_Model_draft/BS_Model_draft.md
index f0a52a5..243b872 100644
--- a/docs/BS_Model_draft/BS_Model_draft.md
+++ b/docs/BS_Model_draft/BS_Model_draft.md
@@ -12,13 +12,13 @@ The **Black-Scholes formula for European call and put options** are:
$$C(S_0,t)=S_0N(d_1)-Ke^{-r(T-t)}N(d_2)$$
$$P(S_0,t)=Ke^{-r(T-t)}N(-d_2)-S_0N(-d_1)$$
-where \
-- $S_0$: Stock Price \
-- $C(S_0,t)$: Price of the Call Option \
-- $K$: Exercise Price \
-- $(T-t)$: Time to Maturity, where T is Exercise Date \
-- $\sigma$: Underlying Volatility (a standard deviation of log returns) \
-- $r$: Risk-free Interest Rate (i.e., T-bill Rate) \
+where
+- $S_0$: Stock Price
+- $C(S_0,t)$: Price of the Call Option
+- $K$: Exercise Price
+- $(T-t)$: Time to Maturity, where T is Exercise Date
+- $\sigma$: Underlying Volatility (a standard deviation of log returns)
+- $r$: Risk-free Interest Rate (i.e., T-bill Rate)
The $d_i$ variables are defined as:
$$d_1=\frac{\ln\frac{S_0}{K}+(r+\frac{\sigma^2}{2})(T-t)}{\sigma\sqrt{T-t}}$$
@@ -29,10 +29,10 @@ Finally, $N(x)$ is cumulative distribution function for the standard normal dist
### Project Objectives
-In this project, we aim to do the following:\
-1. Recreate Culkin and Das' work\
-2. See whether fitted simulated model performs well on actual data \
-3. Observe if the model can perform better based on different datasets
+In this project, we aim to do the following:
+1) Recreate Culkin and Das' work\
+2) See whether fitted simulated model performs well on actual data \
+3) Observe if the model can perform better based on different datasets
## Methodologies
### Data
@@ -47,13 +47,13 @@ To train a neural network to learn the call option pricing equation, Culkin and
| Parameter | Range |
|:-----------------------|:------------------|
-| Stock Price $(S)$ | $10 — $50 |
-| Strike Price $(K)$ | $7 — $650 |
+| Stock Price $(S)$ | \\$10 — \\$50 |
+| Strike Price $(K)$ | \\$7 — \\$650 |
| Maturity $(T-t)$ | 1 day to 3 years |
| Dividend Rate $(q)$ | 0\% — 3\% |
| Risk Free Rate $(r)$ | 1\% — 3\% |
| Volatility $(\sigma)$ | 5\% — 90\% |
-| Call Price $(C)$ | $0 — $328 |
+| Call Price $(C)$ | \\$0 — \\$328 |
In total, the dataset contains 300,000 observations.
@@ -359,20 +359,14 @@ np.random.seed(32)
mlp.fit(X_train, y_train)
```
- Iteration 1, loss = 0.00035534
- Iteration 2, loss = 0.00009519
- Iteration 3, loss = 0.00006493
- Iteration 4, loss = 0.00004772
- Iteration 5, loss = 0.00003906
- Iteration 6, loss = 0.00003374
- Iteration 7, loss = 0.00002936
- Iteration 8, loss = 0.00002706
- Iteration 9, loss = 0.00002523
- Iteration 10, loss = 0.00002390
+ Iteration 1, loss = 0.00035527
+ Iteration 2, loss = 0.00009529
+ Iteration 3, loss = 0.00006484
+ Iteration 4, loss = 0.00004754
- /opt/conda/envs/rapids/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py:585: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (10) reached and the optimization hasn't converged yet.
- % self.max_iter, ConvergenceWarning)
+ D:\Programs\anaconda3\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:587: UserWarning: Training interrupted by user.
+ warnings.warn("Training interrupted by user.")
@@ -383,7 +377,7 @@ mlp.fit(X_train, y_train)
-Since it is important to save model for reproducibility, we will be saving the model in every phase:
+Since it is important to save model for reproducability, we will be saving the model in every phase:
```python
@@ -402,6 +396,10 @@ filename = 'models/BS_model.sav'
mlp = pickle.load(open(filename, 'rb'))
```
+ D:\Programs\anaconda3\lib\site-packages\sklearn\base.py:329: UserWarning: Trying to unpickle estimator MLPRegressor from version 0.23.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
+ warnings.warn(
+
+
```python
print("Training set score: %f" % mlp.score(X_train, y_train))
@@ -427,9 +425,7 @@ plt.show()
```
-

-
We can also explore the distribution of both the in-sample and out of sample error:
@@ -446,9 +442,7 @@ plt.show()
```
-

-
@@ -462,9 +456,7 @@ plt.show()
```
-

-
@@ -487,14 +479,14 @@ a.style.hide_index().set_table_attributes("style='display:inline'").set_caption(
Descriptive Statistics of Pricing Error in Training Set - Simulated nobs | minmax | mean | variance | skewness | kurtosis |
+Descriptive Statistics of Pricing Error in Training Set - Simulated nobs | minmax | mean | variance | skewness | kurtosis |
- 240000 |
- (-0.04503973715887655, 0.038556769019683426) |
- -0.003111 |
- 0.000007 |
- -0.400349 |
- 12.083641 |
+ 240000 |
+ (-0.04503973715887655, 0.038556769019683426) |
+ -0.003111 |
+ 0.000007 |
+ -0.400349 |
+ 12.083641 |
@@ -510,14 +502,14 @@ b.style.hide_index().set_table_attributes("style='display:inline'").set_caption(
Descriptive Statistics of Pricing Error in Test Set - Simulated nobs | minmax | mean | variance | skewness | kurtosis |
+Descriptive Statistics of Pricing Error in Test Set - Simulated nobs | minmax | mean | variance | skewness | kurtosis |
- 11841 |
- (-1.406634871550537, 204.1845735730094) |
- 0.091134 |
- 10.956834 |
- 54.035579 |
- 3074.578127 |
+ 60000 |
+ (-0.03991259932903224, 0.031677805872430874) |
+ -0.003110 |
+ 0.000007 |
+ -0.529404 |
+ 12.293611 |
@@ -655,9 +647,7 @@ plt.show()
```
-

-
From a quick glance, there seems to be some minor deviations. Let's see the $R^2$ for this regression:
@@ -682,9 +672,7 @@ plt.show()
```
-

-
While the model performed worse relative to previous sample, it still achieved a high R-squared value considering that the training data and the test data came from different sources. Hence, the above graph is summarized as below:
@@ -699,14 +687,14 @@ a.style.hide_index().set_table_attributes("style='display:inline'").set_caption(
Descriptive Statistics: Simulation Model on UKX nobs | minmax | mean | variance | skewness | kurtosis |
+Descriptive Statistics: Simulation Model on UKX nobs | minmax | mean | variance | skewness | kurtosis |
- 1685 |
- (-3088.616090144955, 0.35170913484653754) |
- -28.827895 |
- 44608.131476 |
- -10.456920 |
- 123.587743 |
+ 1685 |
+ (-3088.616090144955, 0.35170913484653743) |
+ -28.827895 |
+ 44608.131476 |
+ -10.456920 |
+ 123.587743 |
@@ -721,7 +709,10 @@ To ameliorate the effect of having less data, we increased the number of epochs
np.random.seed(32)
X_train_ukx, X_test_ukx, y_train_ukx, y_test_ukx = train_test_split(ukx.drop('Call Price', axis=1),
ukx['Call Price'], test_size=0.2)
+```
+
+```python
mlp_u = MLPRegressor(hidden_layer_sizes=(100,100,100,100),
solver='adam', shuffle = False, batch_size=64, verbose=True,
max_iter= 20
@@ -755,6 +746,10 @@ filename = 'models/BS_ukx_model.sav'
mlp_u = pickle.load(open(filename, 'rb'))
```
+ D:\Programs\anaconda3\lib\site-packages\sklearn\base.py:329: UserWarning: Trying to unpickle estimator MLPRegressor from version 0.23.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
+ warnings.warn(
+
+
```python
print("Training Set Score: %f" % mlp_u.score(X_train_ukx, y_train_ukx))
@@ -776,9 +771,7 @@ plt.show()
```
-
-
-
+
@@ -791,14 +784,14 @@ a.style.hide_index().set_table_attributes("style='display:inline'").set_caption(
Descriptive Statistics: UKX Model nobs | minmax | mean | variance | skewness | kurtosis |
+Descriptive Statistics: UKX Model nobs | minmax | mean | variance | skewness | kurtosis |
- 337 |
- (-0.31623338202718, 9.835618886098018) |
- 0.163965 |
- 0.804381 |
- 7.900739 |
- 69.225675 |
+ 337 |
+ (-0.31623338202718, 9.835618886098018) |
+ 0.163965 |
+ 0.804381 |
+ 7.900739 |
+ 69.225675 |
@@ -927,12 +920,10 @@ plt.show()
```
-
-
-
+
-From the above, we can see that there seems to be more deviations than predictions made on previous data. To assess a relationship between the In-The-Money (ITM) call options and the Out-of-The-Money (OTM) call options, we plotted a new graph that is focused on a bottom-left cluster. In usual cases, OTM call options would have higher prices than the predicted points.
+From the above, we can see that there seems to be more deviations than predictions made on previous data. To assess a relationship between the In-The-Money (ITM) call options and the Out-of-The-Money (OTM) call options, we plotted a new graph that is focused on a bottom-left cluster. In the current environment, Black Scholes tends to misprice calls that are both deeply ITM or OTM. For more information about this phenomenon, we suggest to look up information on the [implied volatility smile](https://www.investopedia.com/terms/v/volatilitysmile.asp)
```python
@@ -940,21 +931,19 @@ X_snp_itm = snp[snp['Strike Price'] < snp['Stock Price']]
X_snp_otm = snp[snp['Strike Price'] >= snp['Stock Price']]
Y_snp_itm = X_snp_itm['Call Price']
Y_snp_otm = X_snp_otm['Call Price']
-plt.scatter(Y_snp_otm, mlp.predict(X_snp_otm.drop('Call Price', axis=1)), c='r', s=2)
plt.scatter(Y_snp_itm, mlp.predict(X_snp_itm.drop('Call Price', axis=1)), s=2)
+plt.scatter(Y_snp_otm, mlp.predict(X_snp_otm.drop('Call Price', axis=1)), c='r', s=2)
plt.ylabel("Predicted Price")
plt.xlabel("Actual Price")
plt.title("Actual vs Predicted Price")
-plt.xlim(0, 5)
-plt.ylim(0, 5)
-plt.legend(['OTM', 'ITM'])
+plt.xlim(0, 2)
+plt.ylim(0, 2)
+plt.legend(['ITM', 'OTM'])
plt.show()
```
-
-
-
+
The above graph exhibits what we have discussed in the above. Furthermore, while a variation is relatively higher, the model seems to find some success. In fact, we can see that $R^2$ value is:
@@ -981,9 +970,7 @@ plt.show()
```
-
-
-
+
@@ -1022,7 +1009,10 @@ np.random.seed(32)
df2 = pd.concat([ukx, snp])
X_train2, X_test2, y_train2, y_test2 = train_test_split(df2.drop('Call Price', axis=1),
df2['Call Price'], test_size=0.2)
+```
+
+```python
mlp2 = MLPRegressor(hidden_layer_sizes=(100,100,100,100),
solver='adam', shuffle = False, batch_size=64, verbose=False,
max_iter= 20
@@ -1034,14 +1024,23 @@ filename = 'models/BS_final_model.sav'
pickle.dump(mlp2, open(filename, 'wb'))
```
+ D:\Programs\anaconda3\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:582: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (20) reached and the optimization hasn't converged yet.
+ warnings.warn(
+
+
+
+```python
+mlp2 = pickle.load(open(filename, 'rb'))
+```
+
```python
print("Training set score: %f" % mlp2.score(X_train2, y_train2))
print("Test set score: %f" % mlp2.score(X_test2, y_test2))
```
- Training set score: 0.999435
- Test set score: 0.999439
+ Training set score: 0.999941
+ Test set score: 0.999943
Surprisingly, we observed from the above that there was an insignificant difference in a performance between the two models. Furthermore, we analyzed the pricing error of the real data model:
@@ -1057,9 +1056,7 @@ plt.show()
```
-
-
-
+
@@ -1072,14 +1069,14 @@ a.style.hide_index().set_table_attributes("style='display:inline'").set_caption(
Descriptive Statistics: Real Data Error nobs | minmax | mean | variance | skewness | kurtosis |
+Descriptive Statistics: Real Data Error nobs | minmax | mean | variance | skewness | kurtosis |
- 11841 |
- (-1.406634871550537, 204.1845735730094) |
- 0.091134 |
- 10.956834 |
- 54.035579 |
- 3074.578127 |
+ 11841 |
+ (-1.4149300780995988, 65.29475394211659) |
+ 0.041074 |
+ 1.112086 |
+ 53.563095 |
+ 3044.092604 |
@@ -1095,7 +1092,7 @@ preds_synthetic= mlp2.predict(X_test)
print("R-Squared Value: %.4f" % r2_score(y_test, preds_synthetic))
```
- R-Squared Value: 0.8296
+ R-Squared Value: 0.7786
We observed that the model performed slightly weaker, a smiliar result that we had seen from the above predictions.
@@ -1110,9 +1107,7 @@ plt.show()
```
-
-
-
+
@@ -1121,19 +1116,38 @@ X_df2_itm = df2[df2['Strike Price'] < df2['Stock Price']]
X_df2_otm = df2[df2['Strike Price'] >= df2['Stock Price']]
Y_df2_itm = X_df2_itm['Call Price']
Y_df2_otm = X_df2_otm['Call Price']
+plt.scatter(Y_df2_itm, mlp2.predict(X_df2_itm.drop('Call Price', axis=1)), s=3)
plt.scatter(Y_df2_otm, mlp2.predict(X_df2_otm.drop('Call Price', axis=1)), c='r', s=3)
+plt.ylabel("Predicted Price")
+plt.xlabel("Actual Price")
+plt.title("Actual vs Predicted Price")
+plt.legend(['OTM', 'ITM'])
+plt.show()
+```
+
+
+
+
+
+
+```python
+X_df2_itm = df2[df2['Strike Price'] < df2['Stock Price']]
+X_df2_otm = df2[df2['Strike Price'] >= df2['Stock Price']]
+Y_df2_itm = X_df2_itm['Call Price']
+Y_df2_otm = X_df2_otm['Call Price']
plt.scatter(Y_df2_itm, mlp2.predict(X_df2_itm.drop('Call Price', axis=1)), s=3)
+plt.scatter(Y_df2_otm, mlp2.predict(X_df2_otm.drop('Call Price', axis=1)), c='r', s=3)
plt.ylabel("Predicted Price")
plt.xlabel("Actual Price")
plt.title("Actual vs Predicted Price")
+plt.xlim(0, 2)
+plt.ylim(0, 2)
plt.legend(['OTM', 'ITM'])
plt.show()
```
-
-
-
+
Interestingly, we see a quite similar behavior to the model trained on synthetic (yet less noisy) data. Since we are only looking at portion of the SNP and UKX data, something worth exploring would be to scrape all of the SNP data and test the model again.
diff --git a/docs/biblio.bib b/docs/biblio.bib
new file mode 100644
index 0000000..e22441f
--- /dev/null
+++ b/docs/biblio.bib
@@ -0,0 +1,7 @@
+@article{Culkin_Das_2017,
+ title={Machine Learning in Finance: The Case of Deep Learning for Option Pricing},
+ url={https://srdas.github.io/Papers/BlackScholesNN.pdf},
+ author={Culkin, Robert and Das, Sanjiv},
+ year={2017},
+ month={August}
+ }
\ No newline at end of file
diff --git a/docs/index.html b/docs/index.html
index f353ebf..22f1004 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -19,7 +19,7 @@
- CSCI145 - Option Pricing: Recreation of Professor Das' Deep Learning Application on the Black-Scholes Model
+ CSCI145 - Option Pricing: Deep Learning Application on the Black-Scholes Model
@@ -31,7 +31,7 @@
-
+
@@ -39,14 +39,16 @@
-
+
+
+
@@ -1284,7 +1286,7 @@
@@ -1309,7 +1311,7 @@
@@ -1346,7 +1348,7 @@ Data
To recreate Culkin and Das’ work we utilized the same simulated data used in the paper to train and validate the neural network.
Aditionally, we queried UKX options data and the options’ underlying stock infromation from Bloomberg (see Bloomberg Query File). We also created another dataset by scraping information for S&P500 companies from Yahoo Finance and AlphaQuery.
1. Culkin and Das (2017)
-To train a neural network to learn the call option pricing equation, Culkin and Das (2017) simulated a range of call option prices with ranges of different parameters:
+To train a neural network to learn the call option pricing equation, Culkin and Das (2017) simulated a range of call option prices with ranges of different parameters(Culkin and Das 2017):