Smart Underwriter

Background

The underwriting process is gradually becoming more and more automatic. Fannie Mae and Freddie Mac, two major US housing GSE (government sponsor enterprise) have their own AUS (automated underwriting system): Desktop Underwriter and Loan Prospector. The AUS system is great as it can provide an objective and fast decision based on the mortgage data.

Fannie Mae is the largest housing mortgage backer in US housing market. It has released "a subset of Fannie Mae’s 30-year, fully amortizing, full documentation, single-family, conventional fixed-rate mortgages" on its website to "promote better understanding of the credit performance of Fannie Mae mortgage loans". This data is also a perfect source to build our own mortgage risk assessment model.

In this project, those Fannie Mae data was downloaded, compiled, aggregated and then fed into a machine learning model to build a credit risk prediction model. You can find the demo site at Here.

Workflow

To recreate data processing, modeling and web development. You can follow the following steps:

Download the dataset at Fannie Mae's website. You can use the download.sh script in the processing folder. But keep in mind that you will need to supply a separate cookie file in order to download the data. If you are using Firefox, you can install Export Cookie extension.
Aggregate the loan performance data. At this time, I am only focusing on the terminal status of the loan. data_process.py will get the last status of each loan and disgard any intermediate status. In the future, a time-series based model will be developed to predict the time dependent loan status. CAUTION: Data was processed on a DO droplet containing 16G memory. Current script uses Python pandas to process the data. My plan is to rewrite the whole thing by Spark.
Further aggregate quarterly data into yearly and then multi-years. merged-quarter.py and merged-year.py.
Use the learning.py script to do the machine learning. Currently, logistic model and stochastic gradient descent (SGD) based support vector machine algorithm are used. SGD gives better AUC-ROC value so it is picked.
Run flask web server by using python3 run.py.

License

(C) copyright by Zhenqing Li. GPL v3

Name	Name	Last commit message	Last commit date
Latest commit DigitalPig Update the flask script Sep 27, 2023 f88f362 · Sep 27, 2023 History 24 Commits
app	app	Update the flask script	Sep 27, 2023
notebook	notebook	Enhance: Add explorotary data analysis	May 24, 2016
processing	processing	Fix: Add README.md and processing scripts	May 22, 2016
.gitignore	.gitignore	Enhance: Change the pred result into BS alert	May 24, 2016
Dockerfile	Dockerfile	New Docker build	Sep 27, 2023
LICENSE	LICENSE	Initial commit	Mar 23, 2016
Procfile	Procfile	Fix: Modify the Profile to reflect the correct setting	May 23, 2016
README.md	README.md	Fix: Add README.md and processing scripts	May 22, 2016
config.py	config.py	Addition: Add the requirements.txt to the repo	May 23, 2016
environment.yml	environment.yml	Update the flask script	Sep 27, 2023
model_performance.txt	model_performance.txt	Enhance: Add explorotary data analysis	May 24, 2016
run.py	run.py	Added Dockerfile	Mar 22, 2020
runtime.txt	runtime.txt	Fix: Bug Fix for Heroku Deployment	May 23, 2016
start.sh	start.sh	Added Dockerfile	Mar 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart Underwriter

Background

Workflow

License

About

Releases

Packages

Languages

License

DigitalPig/SmartUnderwriter

Folders and files

Latest commit

History

Repository files navigation

Smart Underwriter

Background

Workflow

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages