Skip to content

Commit

Permalink
Fix: Add README.md and processing scripts
Browse files Browse the repository at this point in the history
Data processing script for original dataset from Fannie Mae and a
project README.md file are added to the repo.

* Dataset processing files are now under /processing folder
* A new README.md file is added in the root folder.
  • Loading branch information
DigitalPig committed May 22, 2016
1 parent 0bcac6c commit 4ca1de9
Show file tree
Hide file tree
Showing 10 changed files with 895 additions and 36 deletions.
57 changes: 57 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Smart Underwriter

## Background

The underwriting process is gradually becoming more and more
automatic. Fannie Mae and Freddie Mac, two major US housing GSE (government
sponsor enterprise) have their own AUS (automated underwriting system): [Desktop
Underwriter](https://www.fanniemae.com/singlefamily/desktop-underwriter "Desktop
Underwriter") and [Loan Prospector](http://www.loanprospector.com/ "Loan
Prospector"). The AUS system is great as it can provide an objective and fast
decision based on the mortgage data.

Fannie Mae is the largest housing mortgage backer in US housing market. It has
released "a subset of Fannie Mae’s 30-year, fully amortizing, full
documentation, single-family, conventional fixed-rate mortgages" on its website
to "promote better understanding of the credit performance of Fannie Mae
mortgage loans". This data is also a perfect source to build our own mortgage
risk assessment model.

In this project, those Fannie Mae data was downloaded, compiled, aggregated and
then fed into a machine learning model to build a credit risk prediction
model. You can find the demo site at [Here]().

## Workflow

To recreate data processing, modeling and web development. You can follow the
following steps:

1. Download the dataset at
[Fannie Mae's website](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html
"Download the data"). You can use the `download.sh` script in the `processing`
folder. But keep in mind that you will need to supply a separate cookie file
in order to download the data. If you are using Firefox, you can install
[Export Cookie](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/)
extension.

2. Aggregate the loan performance data. At this time, I am only focusing on the
terminal status of the loan. `data_process.py` will get the last status of
each loan and disgard any intermediate status. In the future, a time-series
based model will be developed to predict the time dependent loan status.
**CAUTION:** Data was processed on a DO droplet containing 16G
memory. Current script uses Python pandas to process the data. My plan is to
rewrite the whole thing by Spark.

3. Further aggregate quarterly data into yearly and then
multi-years. `merged-quarter.py` and `merged-year.py`.

4. Use the `learning.py` script to do the machine learning. Currently, logistic
model and stochastic gradient descent (SGD) based support vector machine algorithm
are used. SGD gives better AUC-ROC value so it is picked.

5. Run flask web server by using `python3 run.py`.

## License

(C) copyright by Zhenqing Li. GPL v3

46 changes: 45 additions & 1 deletion app/templates/base.html
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,55 @@
{% endif %}
<meta name="viewport" content="width=device-width, initial-scale = 1.0">
<meta name="author" content="Zhenqing (ZQ) Li">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" crossorigin="anonymous">
<link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
crossorigin="anonymous">
<link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap-theme.min.css"
integrity="sha384-fLW2N01lMqjakBkx3l/M9EahuwpSfeNvV63J5ezn3uZzapT0u7EYsXMjQV+0En5r"
crossorigin="anonymous">
<style>
flash {background:#cee5F5; padding:0.5em;border:1px solid #aacbe2;}
</style>
</head>
<body>
<div>Smart Underwriter by Zhenqing (ZQ) Li</div>
<hr>
<!-- Fixed navbar -->
<nav class="navbar navbar-default navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/index">Smart Underwriter</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li class="active"><a href="#">Home</a></li>
<li><a href="#about">About</a></li>
<li><a href="#contact">Contact</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Model<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="/about#data">Data Aquisition</a></li>
<li><a href="#">Data Cleaning</a></li>
<li><a href="#">Machine Learning</a></li>
<li><a href="#">Prediction</a>
<!-- <li role="separator" class="divider"></li> -->
<!-- <li class="dropdown-header">Nav header</li> -->
<!-- <li><a href="#">Separated link</a></li> -->
<!-- <li><a href="#">One more separated link</a></li> -->
</ul>
</li>
</ul>
</div><!--/.nav-collapse -->
</div>
</nav>
<script src="http://code.jquery.com/jquery.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
Expand Down
87 changes: 53 additions & 34 deletions app/templates/index.html
Original file line number Diff line number Diff line change
@@ -1,38 +1,57 @@
{% extends "base.html" %}
{% block content %}
<h1>Prediction of Default of Home Morgage</h1>
<form action="/index" method="post" name="input_morgage_data">
<!-- {{ form.csrf_token }} -->
<p>
{{form.loan_amount.label}}: {{form.loan_amount(size=10)}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
{{form.buyer_credit.label}}: {{form.buyer_credit()}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
{{form.cobuyer_credit.label}} (if available): {{form.cobuyer_credit()}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
{{form.loan_to_value.label}}: {{form.loan_to_value()}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
{{form.debt_to_income.label}}: {{form.debt_to_income()}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
</p>
<p>
{{form.loan_state.label}}: {{form.loan_state()}} <br>
{{form.loan_purpose.label}}: {{form.loan_purpose()}} <br>
{{form.property_type.label}}: {{form.property_type()}} <br>
{{form.occupancy_type.label}}: {{form.occupancy_type()}} <br>
</p>
<p><input type="submit" value="Get Loan Prediction"></p>
</form>
<p> This morgage application is {{result}} </p>

<div class="container">
<div class="jumbotron">
<h2>Prediction of Default of Home Morgage</h2>
<form action="/index" method="post" name="input_morgage_data">
<!-- {{ form.csrf_token }} -->
<p>
{{form.loan_amount.label}}: {{form.loan_amount(size=10)}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
{{form.buyer_credit.label}}: {{form.buyer_credit()}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
{{form.cobuyer_credit.label}} (if available): {{form.cobuyer_credit()}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
{{form.loan_to_value.label}}: {{form.loan_to_value()}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
{{form.debt_to_income.label}}: {{form.debt_to_income()}} <br>
{% for error in form.loan_amount.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}<br>
</p>
<p>
{{form.loan_state.label}}: {{form.loan_state()}} <br>
{{form.loan_purpose.label}}: {{form.loan_purpose()}} <br>
{{form.property_type.label}}: {{form.property_type()}} <br>
{{form.occupancy_type.label}}: {{form.occupancy_type()}} <br>
</p>
<p><input type="submit" value="Get Loan Prediction"></p>
</form>
<p> This morgage application is {{result}} </p>
</div>
</div>

<div class="container">
<div class="jumbotron">
<h4 id="about">About</h4>
<p>This model is build on Fannie Mae's data from 2000 to 2012,
downloaded
from <a href="http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html">Fannie
Mae Single Family Housing Data</a>. The data was processed by
Python with the help of numpy/scipy, pandas and scikit-learn. You
can find the source code
in <a href="https://github.com/DigitalPig/SmartUnderwriter">This</a>
Github repo.<br></p>
<p>&copy; 2016 Zhenqing Li</p>
</div>
</div>
{% endblock %}
Loading

0 comments on commit 4ca1de9

Please sign in to comment.