From 95bc59a2b05a40906229b19593cc21ca0ca23ed8 Mon Sep 17 00:00:00 2001 From: "Seungho (Samuel) Lee" Date: Tue, 24 Nov 2020 03:41:47 +0900 Subject: [PATCH] Citation Commit --- biblio.bib | 7 + docs/BS_Model_draft/BS_Model_draft.md | 216 ++++++++++++++------------ docs/biblio.bib | 7 + docs/index.html | 30 +++- index.Rmd | 5 +- 5 files changed, 155 insertions(+), 110 deletions(-) create mode 100644 biblio.bib create mode 100644 docs/biblio.bib diff --git a/biblio.bib b/biblio.bib new file mode 100644 index 0000000..e22441f --- /dev/null +++ b/biblio.bib @@ -0,0 +1,7 @@ +@article{Culkin_Das_2017, + title={Machine Learning in Finance: The Case of Deep Learning for Option Pricing}, + url={https://srdas.github.io/Papers/BlackScholesNN.pdf}, + author={Culkin, Robert and Das, Sanjiv}, + year={2017}, + month={August} + } \ No newline at end of file diff --git a/docs/BS_Model_draft/BS_Model_draft.md b/docs/BS_Model_draft/BS_Model_draft.md index f0a52a5..243b872 100644 --- a/docs/BS_Model_draft/BS_Model_draft.md +++ b/docs/BS_Model_draft/BS_Model_draft.md @@ -12,13 +12,13 @@ The **Black-Scholes formula for European call and put options** are: $$C(S_0,t)=S_0N(d_1)-Ke^{-r(T-t)}N(d_2)$$ $$P(S_0,t)=Ke^{-r(T-t)}N(-d_2)-S_0N(-d_1)$$ -where \ -- $S_0$: Stock Price \ -- $C(S_0,t)$: Price of the Call Option \ -- $K$: Exercise Price \ -- $(T-t)$: Time to Maturity, where T is Exercise Date \ -- $\sigma$: Underlying Volatility (a standard deviation of log returns) \ -- $r$: Risk-free Interest Rate (i.e., T-bill Rate) \ +where +- $S_0$: Stock Price +- $C(S_0,t)$: Price of the Call Option +- $K$: Exercise Price +- $(T-t)$: Time to Maturity, where T is Exercise Date +- $\sigma$: Underlying Volatility (a standard deviation of log returns) +- $r$: Risk-free Interest Rate (i.e., T-bill Rate) The $d_i$ variables are defined as: $$d_1=\frac{\ln\frac{S_0}{K}+(r+\frac{\sigma^2}{2})(T-t)}{\sigma\sqrt{T-t}}$$ @@ -29,10 +29,10 @@ Finally, $N(x)$ is cumulative distribution function for the standard normal dist ### Project Objectives -In this project, we aim to do the following:\ -1. Recreate Culkin and Das' work\ -2. See whether fitted simulated model performs well on actual data \ -3. Observe if the model can perform better based on different datasets +In this project, we aim to do the following: +1) Recreate Culkin and Das' work\ +2) See whether fitted simulated model performs well on actual data \ +3) Observe if the model can perform better based on different datasets ## Methodologies ### Data @@ -47,13 +47,13 @@ To train a neural network to learn the call option pricing equation, Culkin and | Parameter | Range | |:-----------------------|:------------------| -| Stock Price $(S)$ | $10 — $50 | -| Strike Price $(K)$ | $7 — $650 | +| Stock Price $(S)$ | \\$10 — \\$50 | +| Strike Price $(K)$ | \\$7 — \\$650 | | Maturity $(T-t)$ | 1 day to 3 years | | Dividend Rate $(q)$ | 0\% — 3\% | | Risk Free Rate $(r)$ | 1\% — 3\% | | Volatility $(\sigma)$ | 5\% — 90\% | -| Call Price $(C)$ | $0 — $328 | +| Call Price $(C)$ | \\$0 — \\$328 | In total, the dataset contains 300,000 observations. @@ -359,20 +359,14 @@ np.random.seed(32) mlp.fit(X_train, y_train) ``` - Iteration 1, loss = 0.00035534 - Iteration 2, loss = 0.00009519 - Iteration 3, loss = 0.00006493 - Iteration 4, loss = 0.00004772 - Iteration 5, loss = 0.00003906 - Iteration 6, loss = 0.00003374 - Iteration 7, loss = 0.00002936 - Iteration 8, loss = 0.00002706 - Iteration 9, loss = 0.00002523 - Iteration 10, loss = 0.00002390 + Iteration 1, loss = 0.00035527 + Iteration 2, loss = 0.00009529 + Iteration 3, loss = 0.00006484 + Iteration 4, loss = 0.00004754 - /opt/conda/envs/rapids/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py:585: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (10) reached and the optimization hasn't converged yet. - % self.max_iter, ConvergenceWarning) + D:\Programs\anaconda3\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:587: UserWarning: Training interrupted by user. + warnings.warn("Training interrupted by user.") @@ -383,7 +377,7 @@ mlp.fit(X_train, y_train) -Since it is important to save model for reproducibility, we will be saving the model in every phase: +Since it is important to save model for reproducability, we will be saving the model in every phase: ```python @@ -402,6 +396,10 @@ filename = 'models/BS_model.sav' mlp = pickle.load(open(filename, 'rb')) ``` + D:\Programs\anaconda3\lib\site-packages\sklearn\base.py:329: UserWarning: Trying to unpickle estimator MLPRegressor from version 0.23.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk. + warnings.warn( + + ```python print("Training set score: %f" % mlp.score(X_train, y_train)) @@ -427,9 +425,7 @@ plt.show() ``` - ![png](output_27_0.png) - We can also explore the distribution of both the in-sample and out of sample error: @@ -446,9 +442,7 @@ plt.show() ``` - ![png](output_29_0.png) - @@ -462,9 +456,7 @@ plt.show() ``` - ![png](output_30_0.png) - @@ -487,14 +479,14 @@ a.style.hide_index().set_table_attributes("style='display:inline'").set_caption( +
Descriptive Statistics of Pricing Error in Training Set - Simulated
nobs minmax mean variance skewness kurtosis
- - - - - - + + + + + +
Descriptive Statistics of Pricing Error in Training Set - Simulated
nobs minmax mean variance skewness kurtosis
240000(-0.04503973715887655, 0.038556769019683426)-0.0031110.000007-0.40034912.083641240000(-0.04503973715887655, 0.038556769019683426)-0.0031110.000007-0.40034912.083641
@@ -510,14 +502,14 @@ b.style.hide_index().set_table_attributes("style='display:inline'").set_caption( +
Descriptive Statistics of Pricing Error in Test Set - Simulated
nobs minmax mean variance skewness kurtosis
- - - - - - + + + + + +
Descriptive Statistics of Pricing Error in Test Set - Simulated
nobs minmax mean variance skewness kurtosis
11841(-1.406634871550537, 204.1845735730094)0.09113410.95683454.0355793074.57812760000(-0.03991259932903224, 0.031677805872430874)-0.0031100.000007-0.52940412.293611
@@ -655,9 +647,7 @@ plt.show() ``` - ![png](output_40_0.png) - From a quick glance, there seems to be some minor deviations. Let's see the $R^2$ for this regression: @@ -682,9 +672,7 @@ plt.show() ``` - ![png](output_44_0.png) - While the model performed worse relative to previous sample, it still achieved a high R-squared value considering that the training data and the test data came from different sources. Hence, the above graph is summarized as below: @@ -699,14 +687,14 @@ a.style.hide_index().set_table_attributes("style='display:inline'").set_caption( +
Descriptive Statistics: Simulation Model on UKX
nobs minmax mean variance skewness kurtosis
- - - - - - + + + + + +
Descriptive Statistics: Simulation Model on UKX
nobs minmax mean variance skewness kurtosis
1685(-3088.616090144955, 0.35170913484653754)-28.82789544608.131476-10.456920123.5877431685(-3088.616090144955, 0.35170913484653743)-28.82789544608.131476-10.456920123.587743
@@ -721,7 +709,10 @@ To ameliorate the effect of having less data, we increased the number of epochs np.random.seed(32) X_train_ukx, X_test_ukx, y_train_ukx, y_test_ukx = train_test_split(ukx.drop('Call Price', axis=1), ukx['Call Price'], test_size=0.2) +``` + +```python mlp_u = MLPRegressor(hidden_layer_sizes=(100,100,100,100), solver='adam', shuffle = False, batch_size=64, verbose=True, max_iter= 20 @@ -755,6 +746,10 @@ filename = 'models/BS_ukx_model.sav' mlp_u = pickle.load(open(filename, 'rb')) ``` + D:\Programs\anaconda3\lib\site-packages\sklearn\base.py:329: UserWarning: Trying to unpickle estimator MLPRegressor from version 0.23.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk. + warnings.warn( + + ```python print("Training Set Score: %f" % mlp_u.score(X_train_ukx, y_train_ukx)) @@ -776,9 +771,7 @@ plt.show() ``` - -![png](output_51_0.png) - +![png](output_52_0.png) @@ -791,14 +784,14 @@ a.style.hide_index().set_table_attributes("style='display:inline'").set_caption( +
Descriptive Statistics: UKX Model
nobs minmax mean variance skewness kurtosis
- - - - - - + + + + + +
Descriptive Statistics: UKX Model
nobs minmax mean variance skewness kurtosis
337(-0.31623338202718, 9.835618886098018)0.1639650.8043817.90073969.225675337(-0.31623338202718, 9.835618886098018)0.1639650.8043817.90073969.225675
@@ -927,12 +920,10 @@ plt.show() ``` - -![png](output_57_0.png) - +![png](output_58_0.png) -From the above, we can see that there seems to be more deviations than predictions made on previous data. To assess a relationship between the In-The-Money (ITM) call options and the Out-of-The-Money (OTM) call options, we plotted a new graph that is focused on a bottom-left cluster. In usual cases, OTM call options would have higher prices than the predicted points. +From the above, we can see that there seems to be more deviations than predictions made on previous data. To assess a relationship between the In-The-Money (ITM) call options and the Out-of-The-Money (OTM) call options, we plotted a new graph that is focused on a bottom-left cluster. In the current environment, Black Scholes tends to misprice calls that are both deeply ITM or OTM. For more information about this phenomenon, we suggest to look up information on the [implied volatility smile](https://www.investopedia.com/terms/v/volatilitysmile.asp) ```python @@ -940,21 +931,19 @@ X_snp_itm = snp[snp['Strike Price'] < snp['Stock Price']] X_snp_otm = snp[snp['Strike Price'] >= snp['Stock Price']] Y_snp_itm = X_snp_itm['Call Price'] Y_snp_otm = X_snp_otm['Call Price'] -plt.scatter(Y_snp_otm, mlp.predict(X_snp_otm.drop('Call Price', axis=1)), c='r', s=2) plt.scatter(Y_snp_itm, mlp.predict(X_snp_itm.drop('Call Price', axis=1)), s=2) +plt.scatter(Y_snp_otm, mlp.predict(X_snp_otm.drop('Call Price', axis=1)), c='r', s=2) plt.ylabel("Predicted Price") plt.xlabel("Actual Price") plt.title("Actual vs Predicted Price") -plt.xlim(0, 5) -plt.ylim(0, 5) -plt.legend(['OTM', 'ITM']) +plt.xlim(0, 2) +plt.ylim(0, 2) +plt.legend(['ITM', 'OTM']) plt.show() ``` - -![png](output_59_0.png) - +![png](output_60_0.png) The above graph exhibits what we have discussed in the above. Furthermore, while a variation is relatively higher, the model seems to find some success. In fact, we can see that $R^2$ value is: @@ -981,9 +970,7 @@ plt.show() ``` - -![png](output_63_0.png) - +![png](output_64_0.png) @@ -1022,7 +1009,10 @@ np.random.seed(32) df2 = pd.concat([ukx, snp]) X_train2, X_test2, y_train2, y_test2 = train_test_split(df2.drop('Call Price', axis=1), df2['Call Price'], test_size=0.2) +``` + +```python mlp2 = MLPRegressor(hidden_layer_sizes=(100,100,100,100), solver='adam', shuffle = False, batch_size=64, verbose=False, max_iter= 20 @@ -1034,14 +1024,23 @@ filename = 'models/BS_final_model.sav' pickle.dump(mlp2, open(filename, 'wb')) ``` + D:\Programs\anaconda3\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:582: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (20) reached and the optimization hasn't converged yet. + warnings.warn( + + + +```python +mlp2 = pickle.load(open(filename, 'rb')) +``` + ```python print("Training set score: %f" % mlp2.score(X_train2, y_train2)) print("Test set score: %f" % mlp2.score(X_test2, y_test2)) ``` - Training set score: 0.999435 - Test set score: 0.999439 + Training set score: 0.999941 + Test set score: 0.999943 Surprisingly, we observed from the above that there was an insignificant difference in a performance between the two models. Furthermore, we analyzed the pricing error of the real data model: @@ -1057,9 +1056,7 @@ plt.show() ``` - -![png](output_70_0.png) - +![png](output_73_0.png) @@ -1072,14 +1069,14 @@ a.style.hide_index().set_table_attributes("style='display:inline'").set_caption( +
Descriptive Statistics: Real Data Error
nobs minmax mean variance skewness kurtosis
- - - - - - + + + + + +
Descriptive Statistics: Real Data Error
nobs minmax mean variance skewness kurtosis
11841(-1.406634871550537, 204.1845735730094)0.09113410.95683454.0355793074.57812711841(-1.4149300780995988, 65.29475394211659)0.0410741.11208653.5630953044.092604
@@ -1095,7 +1092,7 @@ preds_synthetic= mlp2.predict(X_test) print("R-Squared Value: %.4f" % r2_score(y_test, preds_synthetic)) ``` - R-Squared Value: 0.8296 + R-Squared Value: 0.7786 We observed that the model performed slightly weaker, a smiliar result that we had seen from the above predictions. @@ -1110,9 +1107,7 @@ plt.show() ``` - -![png](output_75_0.png) - +![png](output_78_0.png) @@ -1121,19 +1116,38 @@ X_df2_itm = df2[df2['Strike Price'] < df2['Stock Price']] X_df2_otm = df2[df2['Strike Price'] >= df2['Stock Price']] Y_df2_itm = X_df2_itm['Call Price'] Y_df2_otm = X_df2_otm['Call Price'] +plt.scatter(Y_df2_itm, mlp2.predict(X_df2_itm.drop('Call Price', axis=1)), s=3) plt.scatter(Y_df2_otm, mlp2.predict(X_df2_otm.drop('Call Price', axis=1)), c='r', s=3) +plt.ylabel("Predicted Price") +plt.xlabel("Actual Price") +plt.title("Actual vs Predicted Price") +plt.legend(['OTM', 'ITM']) +plt.show() +``` + + +![png](output_79_0.png) + + + +```python +X_df2_itm = df2[df2['Strike Price'] < df2['Stock Price']] +X_df2_otm = df2[df2['Strike Price'] >= df2['Stock Price']] +Y_df2_itm = X_df2_itm['Call Price'] +Y_df2_otm = X_df2_otm['Call Price'] plt.scatter(Y_df2_itm, mlp2.predict(X_df2_itm.drop('Call Price', axis=1)), s=3) +plt.scatter(Y_df2_otm, mlp2.predict(X_df2_otm.drop('Call Price', axis=1)), c='r', s=3) plt.ylabel("Predicted Price") plt.xlabel("Actual Price") plt.title("Actual vs Predicted Price") +plt.xlim(0, 2) +plt.ylim(0, 2) plt.legend(['OTM', 'ITM']) plt.show() ``` - -![png](output_76_0.png) - +![png](output_80_0.png) Interestingly, we see a quite similar behavior to the model trained on synthetic (yet less noisy) data. Since we are only looking at portion of the SNP and UKX data, something worth exploring would be to scrape all of the SNP data and test the model again. diff --git a/docs/biblio.bib b/docs/biblio.bib new file mode 100644 index 0000000..e22441f --- /dev/null +++ b/docs/biblio.bib @@ -0,0 +1,7 @@ +@article{Culkin_Das_2017, + title={Machine Learning in Finance: The Case of Deep Learning for Option Pricing}, + url={https://srdas.github.io/Papers/BlackScholesNN.pdf}, + author={Culkin, Robert and Das, Sanjiv}, + year={2017}, + month={August} + } \ No newline at end of file diff --git a/docs/index.html b/docs/index.html index f353ebf..22f1004 100644 --- a/docs/index.html +++ b/docs/index.html @@ -19,7 +19,7 @@ - CSCI145 - Option Pricing: Recreation of Professor Das' Deep Learning Application on the Black-Scholes Model + CSCI145 - Option Pricing: Deep Learning Application on the Black-Scholes Model @@ -31,7 +31,7 @@ - + @@ -39,14 +39,16 @@ - + + + @@ -1284,7 +1286,7 @@ @@ -1309,7 +1311,7 @@
-

Recreation of Professor Das’ Deep Learning Application on the Black-Scholes Model

+

Deep Learning Application on the Black-Scholes Model

This project is conducted by Juan Diego Herrera and Seungho (Samuel) Lee. Here, we expand upon Robert Culkin’s and Sanjiv R. Das’ effort to recreate the Black-Scholes option pricing model using neural networks.

@@ -1346,7 +1348,7 @@

Data

To recreate Culkin and Das’ work we utilized the same simulated data used in the paper to train and validate the neural network.

Aditionally, we queried UKX options data and the options’ underlying stock infromation from Bloomberg (see Bloomberg Query File). We also created another dataset by scraping information for S&P500 companies from Yahoo Finance and AlphaQuery.

1. Culkin and Das (2017)

-

To train a neural network to learn the call option pricing equation, Culkin and Das (2017) simulated a range of call option prices with ranges of different parameters:

+

To train a neural network to learn the call option pricing equation, Culkin and Das (2017) simulated a range of call option prices with ranges of different parameters(Culkin and Das 2017):

@@ -1966,6 +1968,11 @@

Conclusion

In this research project, we validated Culkin and Das conclusion: a machine learning field is facing a rebirth that has found applications in highly dynamic fields such as finance. Certain financial pricing schemes could be non-linear (e.g., Black-Scholes) and undergo a significant amount of calculation to reach a conclusion. However, neural networks can learn and predict non-linear behavior fairly accurately, which can be seen from a relatively high \(R^2\) values of the above models.

However, we also saw that the fitted models produced drastically different test accuracies, depending on a seed level, and that the error distributions of those models exhibited high skewedness and kurtosis. Therefore, more consistent models could be generated through optimizing hyperparameters and conducting a comprehensive data transformation process. It is also important to note that the above models failed to fit all options with different maturities and strikes.

In theory, it is well regarded that it is nearly impossible to predict stock prices in consistent manner as the actual market is highly efficient (i.e., Efficient Market Hypothesis). Plus, a high level of noise exists in real data. Therefore, creating a model that predicts stock prices in a very consistent manner would be extremely difficult to create, yet would yield inmense monetary benefits.

+
+
+

Culkin, Robert, and Sanjiv Das. 2017. “Machine Learning in Finance: The Case of Deep Learning for Option Pricing.” https://srdas.github.io/Papers/BlackScholesNN.pdf.

+
+
@@ -1987,6 +1994,15 @@

Conclusion

+ diff --git a/index.Rmd b/index.Rmd index bf1b80e..2e6131c 100644 --- a/index.Rmd +++ b/index.Rmd @@ -1,5 +1,5 @@ --- -title: "Recreation of Professor Das' Deep Learning Application on the Black-Scholes Model" +title: "Deep Learning Application on the Black-Scholes Model" description: | This project is conducted by [Juan Diego Herrera](https://github.com/jknaudt21) and [Seungho (Samuel) Lee](https://github.com/samuellee19). Here, we expand upon Robert Culkin's and Sanjiv R. Das' effort to recreate [the Black-Scholes option pricing model using neural networks](https://srdas.github.io/Papers/BlackScholesNN.pdf). author: @@ -10,6 +10,7 @@ author: url: mailto:jknaudt21@students.cmc.edu affiliation: Claremont McKenna College date: "`r Sys.Date()`" +bibliography: biblio.bib output: distill::distill_website --- @@ -58,7 +59,7 @@ Aditionally, we queried UKX options data and the options' underlying stock infro #### 1. Culkin and Das (2017) -To train a neural network to learn the call option pricing equation, Culkin and Das (2017) simulated a range of call option prices with ranges of different parameters: +To train a neural network to learn the call option pricing equation, Culkin and Das (2017) simulated a range of call option prices with ranges of different parameters[@Culkin_Das_2017]: | Parameter | Range | |:-----------------------|:------------------|