Skip to content

DigitalPig/SmartUnderwriter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f88f362 · Sep 27, 2023

History

24 Commits
Sep 27, 2023
May 24, 2016
May 22, 2016
May 24, 2016
Sep 27, 2023
Mar 23, 2016
May 23, 2016
May 22, 2016
May 23, 2016
Sep 27, 2023
May 24, 2016
Mar 22, 2020
May 23, 2016
Mar 22, 2020

Repository files navigation

Smart Underwriter

Background

The underwriting process is gradually becoming more and more automatic. Fannie Mae and Freddie Mac, two major US housing GSE (government sponsor enterprise) have their own AUS (automated underwriting system): Desktop Underwriter and Loan Prospector. The AUS system is great as it can provide an objective and fast decision based on the mortgage data.

Fannie Mae is the largest housing mortgage backer in US housing market. It has released "a subset of Fannie Mae’s 30-year, fully amortizing, full documentation, single-family, conventional fixed-rate mortgages" on its website to "promote better understanding of the credit performance of Fannie Mae mortgage loans". This data is also a perfect source to build our own mortgage risk assessment model.

In this project, those Fannie Mae data was downloaded, compiled, aggregated and then fed into a machine learning model to build a credit risk prediction model. You can find the demo site at Here.

Workflow

To recreate data processing, modeling and web development. You can follow the following steps:

  1. Download the dataset at Fannie Mae's website. You can use the download.sh script in the processing folder. But keep in mind that you will need to supply a separate cookie file in order to download the data. If you are using Firefox, you can install Export Cookie extension.

  2. Aggregate the loan performance data. At this time, I am only focusing on the terminal status of the loan. data_process.py will get the last status of each loan and disgard any intermediate status. In the future, a time-series based model will be developed to predict the time dependent loan status. CAUTION: Data was processed on a DO droplet containing 16G memory. Current script uses Python pandas to process the data. My plan is to rewrite the whole thing by Spark.

  3. Further aggregate quarterly data into yearly and then multi-years. merged-quarter.py and merged-year.py.

  4. Use the learning.py script to do the machine learning. Currently, logistic model and stochastic gradient descent (SGD) based support vector machine algorithm are used. SGD gives better AUC-ROC value so it is picked.

  5. Run flask web server by using python3 run.py.

License

(C) copyright by Zhenqing Li. GPL v3

About

Housing loan risk assessment from its origination data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published