-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
7a496a7
commit 6848a99
Showing
270 changed files
with
4,239 additions
and
28,184 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
.DS_Store | ||
.DS_Store | ||
*.csv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
*.xml | ||
*.db | ||
*.db-journal | ||
*.pyc | ||
*.pyo | ||
*.csv | ||
*.zip | ||
*.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# Craigslist Filter | ||
#### In Development... | ||
|
||
Craigslist Filter (working title) is a web application built with python and flask which scrapes data about vehicles for sale from all Craigslist pages in America and allows users to filter them by criteria such as city, price, manufacturer, and odometer. This project is currently in development and is not yet completed. | ||
|
||
## Installing | ||
|
||
This application requires Python 3.6 or greater and several installable modules. | ||
|
||
You will need to install flask, sqlite3, wtforms, lxml, and requests_html. | ||
|
||
``` | ||
pip3 install Flask | ||
``` | ||
``` | ||
pip3 install setuptools | ||
``` | ||
``` | ||
pip3 install pysqlite3 | ||
``` | ||
``` | ||
pip3 install wtforms | ||
``` | ||
``` | ||
pip3 install lxml | ||
``` | ||
``` | ||
pip3 install requests_html | ||
``` | ||
``` | ||
pip3 install flask_wtf | ||
``` | ||
``` | ||
pip3 install flask_bootstrap | ||
``` | ||
``` | ||
pip3 install geopy | ||
``` | ||
|
||
## Deploying | ||
|
||
To run this application locally you will first need to run both crawlCities.py and scrapeVehicles.py (or download a cached version [here](https://files.fm/u/p5z4fbkn)) in order to generate the databases used by the application. | ||
|
||
Once these applications have completed, simply run app.py and copy and paste the address provided in the terminal into your browser. | ||
|
||
## Specific Future Implementations | ||
|
||
* Remain on the form page when a search yields no results. | ||
|
||
* Allow users to specify a search radius and return more specific results when searching by location. | ||
|
||
* Add Google maps API feature to allow users to browse sales in specific areas. | ||
|
||
## Broad Future Implementations | ||
|
||
* Integrate visualization project with filter, let users generate graphs to help narrow decisisons when purchasing cars e.g. show me a line graph of the average price of Ford pickups based on the odometer of the vehicle (this code has already been written with pandas, rewriting with SQL will take some time). | ||
|
||
* Login/Logout functionality which allows users to save certain filter combinations and search results. | ||
|
||
* Better site layout, less bootstrap-esque and more creative. | ||
|
||
* Improved security. | ||
|
||
* Frequent automated database updates. | ||
|
||
* User-specific sale tracking (price has changed, listing has been removed, etc.). | ||
|
||
## Blocked | ||
|
||
* Pivot to multiprocessing to allow for many requests to be made at once, speeding up the scraper exponentially (I am worried about Craigslist blacklisting IPs). | ||
|
||
## Completed Tasks | ||
|
||
* Filter implemented. | ||
|
||
* Improved filter form including dropdown lists automatically generated by column entries in the database. | ||
|
||
* Scraped the map on the listing page to extract more specific location (lat/long) instead of just the region. | ||
|
||
* Added a message that alerts a user when their search yields no results. | ||
|
||
* Allowed for filtering between two values for fields such as price and odometer. | ||
|
||
* Added photos to results page. | ||
|
||
* Allow for users to search by any city using latitude and longitude instead of specific craigslist regions. | ||
|
||
* Track which cities have been scraped recently to add order to the scraping process. | ||
|
||
## Contributors | ||
|
||
This application is being developed by Austin Reese. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
from errHandle import errHandle | ||
|
||
errHandle() | ||
|
||
#this is where the magic happens, we currently have only one route but that will change | ||
|
||
from flask import Flask, render_template | ||
from flask_wtf import FlaskForm | ||
from flask_bootstrap import Bootstrap | ||
from wtforms import Form, BooleanField, StringField, IntegerField, SelectField, validators | ||
from wtforms.validators import Length, ValidationError, DataRequired | ||
from queryForm import queryForm | ||
from queryDropdowns import queryDropdowns | ||
from datetime import datetime | ||
|
||
app = Flask(__name__) | ||
app.config['SECRET_KEY'] = "CraigsistFilter" | ||
bootstrap = Bootstrap(app) | ||
|
||
|
||
class FilterForm(FlaskForm): | ||
#set up the form and grabbing dropdowns, a dictionary of unique values to populate select fields | ||
dropdowns = queryDropdowns() | ||
|
||
year = datetime.now().year | ||
|
||
city = StringField("City", validators = [Length(max=40)]) | ||
state = SelectField("State", choices = dropdowns["states"], validators = [validators.optional()]) | ||
manufacturer = SelectField("Manufacturer", choices = dropdowns["manufacturer"], validators = [validators.optional()]) | ||
make = StringField("Make", validators = [Length(max=40)]) | ||
condition = SelectField("Condition", choices = dropdowns["condition"], validators = [validators.optional()]) | ||
cylinders = SelectField("Cylinders", choices = dropdowns["cylinders"], validators = [validators.optional()]) | ||
fuel = SelectField("Fuel", choices = dropdowns["fuel"], validators = [validators.optional()]) | ||
transmission = SelectField("Transmission", choices = dropdowns["transmission"], validators = [validators.optional()]) | ||
titleStatus = SelectField("Title Status", choices = dropdowns["titleStatus"], validators = [validators.optional()]) | ||
vin = StringField("VIN", validators = [Length(max=40)]) | ||
drive = SelectField("Drive", choices = dropdowns["drive"], validators = [validators.optional()]) | ||
size = SelectField("Size", choices = dropdowns["size"], validators = [validators.optional()]) | ||
vehicleType = SelectField("Vehicle Type", choices = dropdowns["vehicleType"], validators = [validators.optional()]) | ||
paintColor = SelectField("Paint Color", choices = dropdowns["paintColor"], validators = [validators.optional()]) | ||
priceStart = IntegerField("Minimum Price", validators=[validators.optional(), validators.NumberRange(min=0, max=10000000, message="Please enter a value between 0 and 10,000,000")]) | ||
priceEnd = IntegerField("Maximum Price", validators=[validators.optional(), validators.NumberRange(min=0, max=10000000, message="Please enter a value between 0 and 10,000,000")]) | ||
yearStart = IntegerField("Minimum Year", validators=[validators.optional(), validators.NumberRange(min=1880, max=year + 1, message="Please enter a year between 1880 and {}".format(year + 1))]) | ||
yearEnd = IntegerField("Maximum Year", validators=[validators.optional(), validators.NumberRange(min=1880, max=year + 1, message="Please enter a year between 1880 and {}".format(year + 1))]) | ||
odometerStart = IntegerField("Minimum Odometer", validators=[validators.optional(), validators.NumberRange(min=0, max=10000000, message="Please enter a value between 0 and 10,000,000")]) | ||
odometerEnd = IntegerField("Maximum Odometer", validators=[validators.optional(), validators.NumberRange(min=0, max=100000000, message="Please enter a value between 0 and 10,000,000")]) | ||
|
||
|
||
@app.route('/', methods=['GET', 'POST']) | ||
def index(): | ||
#render index.html with form passed through as a variable | ||
form = FilterForm() | ||
#validate_on_submit() runs when the form is submitted. we then redirect to search.html with the data fetched from queryForm.py | ||
if form.is_submitted(): | ||
data = queryForm(form) | ||
return render_template("search.html", data = data) | ||
return render_template("index.html", form = form) | ||
|
||
if __name__ == '__main__': | ||
app.run(debug=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
#crawlCities grabs every city on Craigslist | ||
|
||
from lxml import html | ||
from datetime import datetime | ||
import requests | ||
import sqlite3 | ||
|
||
db = sqlite3.connect("cities.db") | ||
curs = db.cursor() | ||
curs.execute("DROP TABLE IF EXISTS cities") | ||
curs.execute("CREATE TABLE IF NOT EXISTS cities(cityId STRING PRIMARY KEY, cityTitle STRING)") | ||
|
||
s = requests.Session() | ||
|
||
def cityLooper(baseCase): | ||
start = datetime.now() | ||
try: | ||
origin = s.get("https://{}.craigslist.com".format(baseCase)) | ||
except: | ||
print("Could not reach {}.craigslist.com, is this link broken?".format(baseCase)) | ||
return None | ||
|
||
tree = (html.fromstring(origin.content)) | ||
#so each city page on Craigslist has a recommeded cities page, essentially we grab each recommended city from the current city | ||
#and store them in the cityQueue (which is a set so we cant have duplicates) | ||
cityQueue = set(tree.xpath('//li[@class="s"]//a')) | ||
crawled = set() | ||
newEntry = True | ||
|
||
while len(cityQueue) != 0: | ||
city = cityQueue.pop() | ||
moreCities, crawled, updated = cityCrawler(city, crawled) | ||
if updated: | ||
cityQueue.update(moreCities) | ||
#difference_update will remove entries from cityQueue if the same entry is already in crawled | ||
cityQueue.difference_update(crawled) | ||
print("Added {}. {} regions crawled through, {} regions in the queue.".format(city.text.title(), len(crawled), len(cityQueue))) | ||
db.commit() | ||
db.close() | ||
end = datetime.now() | ||
print("Program complete. Run time: {} seconds. File cities.db contains entries for {} regions on craigslist.com".format(int((end - start).total_seconds()), len(crawled))) | ||
|
||
def cityCrawler(city, crawled): | ||
cityCode = city.attrib["href"][2:city.attrib["href"].index(".")] | ||
|
||
if cityCode in crawled: | ||
#this means we've already checked it out, no need to execute anything | ||
return set(), crawled, False | ||
else: | ||
#otherwise put the city in the db and fetch the 'recommended cities' from the current target | ||
curs.execute("INSERT INTO cities(cityId, cityTitle) VALUES(?,?)", (cityCode, city.text)) | ||
|
||
try: | ||
newOrigin = s.get("https://{}.craigslist.com".format(cityCode)) | ||
except: | ||
print("Could not reach {}.craigslist.com, is this link broken?".format(baseCase)) | ||
return set(), crawled, False | ||
|
||
crawled.add(cityCode) | ||
tree = (html.fromstring(newOrigin.content)) | ||
newCities = set(tree.xpath('//li[@class="s"]//a')) | ||
#newCities is a set of the recommended cities featured on the current city | ||
return newCities, crawled, True | ||
|
||
|
||
def main(): | ||
cityLooper("kansascity") | ||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
#confirm the necessary tables exist to run this app | ||
|
||
import sqlite3 | ||
|
||
def errHandle(): | ||
try: | ||
db = sqlite3.connect("cities.db") | ||
curs = db.cursor() | ||
curs.execute("SELECT 1 FROM vehicles LIMIT 1") | ||
db.close() | ||
except: | ||
raise EnvironmentError("Please install cities.db from https://files.fm/u/yw247cuc and place the current directory") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
#this is a work in progress that will grab unique column values to allow for dropdown menus instead of text boxes in the search form | ||
|
||
import sqlite3 | ||
|
||
def queryDropdowns(): | ||
dropdowns = {} | ||
db = sqlite3.connect("cities.db") | ||
curs = db.cursor() | ||
curs.execute("SELECT DISTINCT cylinders FROM vehicles") | ||
cylinders = curs.fetchall() | ||
curs.execute("SELECT DISTINCT fuel FROM vehicles") | ||
fuel = curs.fetchall() | ||
curs.execute("SELECT DISTINCT title_status FROM vehicles") | ||
titleStatus = curs.fetchall() | ||
curs.execute("SELECT DISTINCT drive FROM vehicles") | ||
drive = curs.fetchall() | ||
curs.execute("SELECT DISTINCT type FROM vehicles") | ||
vehicleType = curs.fetchall() | ||
curs.execute("SELECT DISTINCT paint_color FROM vehicles") | ||
paintColor = curs.fetchall() | ||
curs.execute("SELECT DISTINCT year FROM vehicles") | ||
year = curs.fetchall() | ||
curs.execute("SELECT DISTINCT manufacturer FROM vehicles") | ||
manufacturer = curs.fetchall() | ||
curs.execute("SELECT DISTINCT condition FROM vehicles") | ||
condition = curs.fetchall() | ||
curs.execute("SELECT DISTINCT size FROM vehicles") | ||
size = curs.fetchall() | ||
curs.execute("SELECT DISTINCT transmission FROM vehicles") | ||
transmission = curs.fetchall() | ||
db.close() | ||
transmissions = [] | ||
for item in transmission: | ||
item = item[0] | ||
if item != None: | ||
transmissions.append((item, item)) | ||
transmissions.append(("", "")) | ||
transmissions.sort() | ||
dropdowns["transmission"] = transmissions | ||
sizes = [] | ||
for item in size: | ||
item = item[0] | ||
if item != None: | ||
sizes.append((item, item)) | ||
sizes.append(("", "")) | ||
sizes.sort() | ||
dropdowns["size"] = sizes | ||
cyls = [] | ||
for item in cylinders: | ||
item = item[0] | ||
if item != None: | ||
cyls.append((item, item)) | ||
cyls.append(("", "")) | ||
cyls.sort() | ||
dropdowns["cylinders"] = cyls | ||
fuels = [] | ||
for item in fuel: | ||
item = item[0] | ||
if item != None: | ||
fuels.append((item, item)) | ||
fuels.append(("", "")) | ||
fuels.sort() | ||
dropdowns["fuel"] = fuels | ||
titleStatusList = [] | ||
for item in titleStatus: | ||
item = item[0] | ||
if item != None: | ||
titleStatusList.append((item, item)) | ||
titleStatusList.append(("", "")) | ||
titleStatusList.sort() | ||
dropdowns["titleStatus"] = titleStatusList | ||
drives = [] | ||
for item in drive: | ||
item = item[0] | ||
if item != None: | ||
drives.append((item, item)) | ||
drives.append(("", "")) | ||
drives.sort() | ||
dropdowns["drive"] = drives | ||
vehicleTypes = [] | ||
for item in vehicleType: | ||
item = item[0] | ||
if item != None: | ||
vehicleTypes.append((item, item)) | ||
vehicleTypes.append(("", "")) | ||
vehicleTypes.sort() | ||
dropdowns["vehicleType"] = vehicleTypes | ||
paintColors = [] | ||
for item in paintColor: | ||
item = item[0] | ||
if item != None: | ||
paintColors.append((item, item)) | ||
paintColors.append(("", "")) | ||
paintColors.sort() | ||
dropdowns["paintColor"] = paintColors | ||
manufacturers = [] | ||
for item in manufacturer: | ||
item = item[0] | ||
if item != None: | ||
manufacturers.append((item, item)) | ||
manufacturers.append(("", "")) | ||
manufacturers.sort() | ||
refinedManufacturers = [] | ||
dropdowns["manufacturer"] = manufacturers | ||
years = [] | ||
for item in year: | ||
item = item[0] | ||
if item != None: | ||
years.append((item, item)) | ||
years.sort() | ||
years = [("", "")] + years | ||
dropdowns["year"] = years | ||
conditions = [] | ||
for item in condition: | ||
item = item[0] | ||
if item != None: | ||
conditions.append((item, item)) | ||
conditions.append(("", "")) | ||
conditions.sort() | ||
dropdowns["condition"] = conditions | ||
states = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", | ||
"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", | ||
"MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", | ||
"NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", | ||
"SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"] | ||
stateTuples = [] | ||
stateTuples.append(("", "")) | ||
for item in states: | ||
stateTuples.append((item, item)) | ||
dropdowns["states"] = stateTuples | ||
|
||
return dropdowns | ||
queryDropdowns() | ||
|
||
|
Oops, something went wrong.