Skip to content
This repository was archived by the owner on Sep 21, 2021. It is now read-only.

adding deliverable 0 #68

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Binary file not shown.
Binary file added Civera/Civera_Revised_SOW.pdf
Binary file not shown.
27 changes: 27 additions & 0 deletions Civera/Deliverables/deliverable_0.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Civera Software
Project Deliverable 0
Teams should have set up weekly meetings with their client for the remainder of the semester,  reviewed the project scope, and submitted a pull request with the revised and final project description. Project descriptions should include data sources that your team will collect, including any additional datasets you identify that you think would enhance the project, specific questions that will be answered and the step-by-step approach you will take for transforming the data (cleaning) and answering strategic questions.

Checklist
1. Reviewed all previous material.
Each team member has reviewed the specifications of Mr. Friedman.
We had some difficulty accessing the SQL files but Civera resolved this issue.
We have access to the datasets, the legacy code on GitHub and write access to a database for our output.

2. Revised scope of the project if needed.
We’ve revised the SOW (attached) to address the client’s priorities.

3. Identify / list limitations with data and potential risks of achieving project goal.
Mr. Friedman has acknowledged that their scraping of this data incurs a lot of duplicate and blank fields, which makes cleaning the data rigorously a priority.
Finding a good means of completing this(ngrams, spaCy) and pipelines for funneling the data to each team member for parallel processing may present a challenge.

4. Meet with client to review the project.
Our team has met with Mr. Friedman several times to specify priorities.
A good working relationship has been established.

5. Schedule weekly meetings with PMs and bi-weekly with client.
We have regular meetings set as follows:
PM, Rishab Nayak – each Thursday, 03:00 – 03:30, EST.
Client, Adam Friedman, biweekly, Thurs, 03:00 – 03:30, EST.

6. Submit a PR with the revised project proposal including list of limitations.
23 changes: 23 additions & 0 deletions Civera/Deliverables/deliverable_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Civera
Project Deliverable 1
Sufficient data should have been collected to perform a preliminary analysis of the data and attempt to answer one question relevant to your project proposal which you will submit as a pull request. If data has already been collected for your project you must answer two questions.

Checklist
1. Collect and pre-process a preliminary batch of data.
We have familiarized ourselves with the datasets.
Four tables with millions of rows and 5 to 13 fields in SQL.
Many duplicates.
Many missing fields.

2. Perform a preliminary analysis of the data.
After reviewing the data, we decided to:
Write scripts to pull rows of the case_action_index table incrementally.
Update the client’s legacy “brute-force” regex in PHP with spaCy.
Normalize by the same primary id to match it one-to-one with the source table, where we’ll add the critical actor and actions fields.

3. Answer one key question.

4. Refine project scope and list of limitations with data and potential risks of achieving project goal.
Given the difficulty and importance of the tasks defined above, we met with the client and amended the SOW. Basically due to the garbage-in, garbage-out philosophy of data cleaning.

5. Submit a PR with the above report and modifications to original proposal.
Binary file added Civera/SCRUM_3_5_2021.docx
Binary file not shown.
Binary file not shown.
180,480 changes: 0 additions & 180,480 deletions Datasets/State Data/All_House_Reps_Contributions 2010-2020.csv

This file was deleted.

Binary file not shown.
Binary file not shown.
35,036 changes: 0 additions & 35,036 deletions Police_Budget_Overtime_Project/data/Court Overtime 2014 - 2019 - 2014.csv

This file was deleted.

21,859 changes: 0 additions & 21,859 deletions Police_Budget_Overtime_Project/data/Employee Earnings Report 2011-2019.csv

This file was deleted.

Binary file not shown.
Loading