Anh vu-assignment 3 #428

aidanvu1992 · 2019-09-11T01:07:09Z

No description provided.

Assignment 1

llpk79 · 2019-09-11T02:12:16Z

Load a dataset from Github (via its RAW URL)

Nice job here. Not much to get excited about.

Load a dataset from your local machine

Using the colab library, nice.

Load a dataset from UCI using !wget
-Good job here as well. Would be interesting to see where you can take the poker data.

llpk79 · 2019-09-16T18:33:01Z

Sprint challenge code review:

Part 1 - Load and validate the data

Load the data as a pandas data frame.
- You've lost a row because the first row is read as the header by defualt.
- You've added an unnecessary step. df = pd.read_csv(cancer_survival_url, header=['list', 'of', 'columns']) is sufficient.
Validate that it has the appropriate number of observations (you can check the raw file, and also read the dataset description from UCI).
- Use df.shape to view number of rows. !cat <file.name> | wc -l to view number of rows in file, or go to data page to confirm expected rows in dataframe.
Validate that you have no missing values.
- Use df.isnull().sum() to see an easy to read summation of null values per column.
Add informative names to the features.
- Complete.
The survival variable is encoded as 1 for surviving >5 years and 2 for not - change this to be 0 for not surviving and 1 for surviving >5 years (0/1 is a more traditional encoding of binary variables)
- Nicely done.
At the end, print the first five rows of the dataset to demonstrate the above.
- Complete.

Part 2 - Examine the distribution and relationships of the features

Explore the data - create at least 2 tables (can be summary statistics or crosstabulations) and 2 plots illustrating the nature of the data.
- Good job exploring several cross-tabs. Consider binning the 'Number of positive axillary nodes detected'' column as well.
- Check out pd.qcut()
- Try using bar graphs instead of line charts when comparing discreet values. Tend to reserve line graphs for time series.

Part 3 - DataFrame Filtering

Use DataFrame filtering to subset the data into two smaller dataframes. You should make one dataframe for individuals who survived >5 years and a second dataframe for individuals who did not.
- It would be more informative to cross-tab with age vs nodes or year vs 'nodesrather than with survival because we know all are eithersurvivedornot_survived`.
Create a graph with each of the dataframes (can be the same graph type) to show the differences in Age and Number of Positive Axillary Nodes Detected between the two groups.
- Try bar graphs and heat-maps with cross-tabs as above. Try plotting on the same bar graph with similar cross-tabs for both 'survived' and 'not_survived' populations.

Part 4 - Analysis and Interpretation

What is at least one feature that looks to have a positive relationship with survival? (As that feature goes up in value rate of survival increases)
- 👎
- year_of_operation and survival have a positive relationship.
What is at least one feature that looks to have a negative relationship with survival? (As that feature goes down in value rate of survival increases)
- 👍
- Age and number of nodes are negatively correlated.
How are those two features related with each other, and what might that mean?
- 👍
- Age and year_of_operation are positively correlated.

Not bad, Anh. I'm going to give a 2. Do go back and review some of this material as we will continue to build on this to rapidly more complex applications.

alex000kim and others added 6 commits September 3, 2019 22:20

Created using Colaboratory

ead06aa

Created using Colaboratory

eb33944

Created using Colaboratory

4964c83

Assignment complete

cdc35dd

Merge pull request #1 from aidanvu1992/master

109d5cc

Assignment 1

Created using Colaboratory

617e8a6

Created using Colaboratory

551e925

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Anh vu-assignment 3 #428

Anh vu-assignment 3 #428

Uh oh!

aidanvu1992 commented Sep 11, 2019

Uh oh!

llpk79 commented Sep 11, 2019

Uh oh!

llpk79 commented Sep 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Anh vu-assignment 3 #428

Are you sure you want to change the base?

Anh vu-assignment 3 #428

Uh oh!

Conversation

aidanvu1992 commented Sep 11, 2019

Uh oh!

llpk79 commented Sep 11, 2019

Uh oh!

llpk79 commented Sep 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants