Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBM materialize the dev on NBM #13

Merged
merged 9 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ docs
data-raw/fcc_staff.R
data_swamp/
*.parquet
inst/doc
7 changes: 5 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: cori.data.fcc
Title: Process FCC data
Version: 0.0.1
Version: 0.1.0
Authors@R:
person(given="Olivier", family="Leroy", email="[email protected]", role = c("aut", "cre"))
Description: Functions to get and process FCC data.
Expand All @@ -15,7 +15,9 @@ Suggests:
testthat (>= 3.0.0),
pkgdown,
dplyr,
DT
DT,
knitr,
rmarkdown
Config/testthat/edition: 3
Imports:
curl,
Expand All @@ -26,3 +28,4 @@ Imports:
stringi
URL: https://ruralinnovation.github.io/cori.data.fcc/
Config/Needs/website: rmarkdown
VignetteBuilder: knitr
10 changes: 8 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# cori.data.fcc (development version)
# cori.data.fcc 0.1.0

## Major Changes

Expand All @@ -12,11 +12,17 @@

* `get_frn_nbm_bl()` allows you to get all block where this FRN reported had services (minus satellite BSL and 0/0 speeds services)

* `get_nbm_bl()`allows you to get all block from one county

* `get_county_nbm_raws()` allows you to get raws NBM data for a specific county and for a release, by default the last one.

### Updated functions

* update to `get_fcc_dictionary.R` description for new data set and their fields
* update to `get_fcc_dictionary.R` description for new data set ("nbm_block", "nbm_raw") and their fields

### Removed functions

* `fcc_to_parquet()` not needed and/or too opinionated to be useful

# cori.data.fcc 0.0.1

Expand Down
6 changes: 3 additions & 3 deletions R/dl_nbm.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Download NBM data
#'
#' Just a draft of a function that download all NBM data related to CORI works
#' It motsly works in my setup (download in ~/data_swamp)
#' Function that download all NBM data related to CORI works.
#' It takes a path to download the zipped csv.
#'
#' @param path_to_dl a string by default "~/data_swamp"
#' @param release_date a string can be "December 31, 2023" or "June 30, 2023"
Expand All @@ -10,7 +10,7 @@
#' @param user_agent a string set up by default
#' @param ... additional parameters for download.file()
#'
#' @return A lot of zipped file
#' @return Zipped csv
#' @export
#'
#' @examples
Expand Down
2 changes: 1 addition & 1 deletion R/get_county_nbm_raw.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
#'
#'@examples
#'\dontrun{
#'guilford_cty <- get_nbm_raw(geoid_co = "37081")
#'guilford_cty <- get_county_nbm_raw(geoid_co = "37081")
#'}

get_county_nbm_raw <- function(geoid_co, frn = "all", release = "2023-12-01") {
Expand Down
15 changes: 11 additions & 4 deletions R/get_fcc_dictionary.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#' Display FCC variable and descriptions
#' Display FCC variables and associated descriptions
#'
#' @param dataset a string matching a dataset
#' Return dictionary for a all datasets. Available dataset are "f477", "nbm_raw" and "nbm_block".
#'
#' @param dataset a string matching a dataset, default is "all"
#'
#' @return a data frame
#'
Expand All @@ -9,7 +11,12 @@
#' @examples
#' get_fcc_dictionary("nbm_block")

get_fcc_dictionary <- function(dataset) {
get_fcc_dictionary <- function(dataset = "all") {
dict <- cori.data.fcc::fcc_dictionary
dict[dict[["dataset"]] == dataset, ]
if (dataset == "all") {
return(dict)
} else {
filter <- dict[dict[["dataset"]] == dataset, ]
return(filter)
}
}
5 changes: 2 additions & 3 deletions R/get_frn_nbm_bl.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
#' Load part of NBM at Census Block from CORI s3 bucket
#'
#' Get all the data related to a FRN.
#' Get all the data related to a FRN.
#' A row in this data represent a census block (2020 vintage).
#'
#'
#' IMPORTANT: We are not counting blocks:
#' * when covered only by satellite servives
#' * and discarding a location when a service of 0/0 download/uploads speeds.
#' * and discarding a location when a service of 0/0 download/uploads speeds.
#'
#' Use `get_fcc_dictionary("nbm_block")` to get a description of the date.
#' A FRN is a 10 number strings, ie "0007435902" can also be used to be more specific.
Expand Down
4 changes: 2 additions & 2 deletions R/get_nbm_available.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Get release available in FCC NBM
#' Get a list of files availables in FCC servers
#'
#' NBM's API:
#' NBM's API:
#' ```
#' paste0("https://broadbandmap.fcc.gov/nbm/",
#' "map/api/national_map_process/nbm_get_data_download/")
Expand Down
1 change: 0 additions & 1 deletion R/get_nbm_bl.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
#'
#' Get all the data related to a states or county.
#'
#'
#' A row in this data represent a census block (2020 vintage).
#' Use `get_fcc_dictionary("nbm_block")` to get a description of the date.
#'
Expand Down
2 changes: 1 addition & 1 deletion R/get_nbm_release.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Get release available in FCC NBM
#' Get a list of release available in FCC NBM
#'
#' @param filing_url a string providing NBM filing API. Default is "https://broadbandmap.fcc.gov/nbm/map/api/published/filing"
#' @param user_agent set a default user agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:128.0) Gecko/20100101 Firefox/128.0"
Expand Down
60 changes: 40 additions & 20 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -38,34 +38,54 @@ You can install the development version of cori.data.fcc from [GitHub](https://g
devtools::install_github("ruralinnovation/cori.data.fcc")
```

## Example
## Examples

This is a basic example which shows some basic workflow:

```{r example-release}
```{r load}
library(cori.data.fcc)
```

### National Broadband Map

- The package is providing you a way to download zipped `csv` see the vignette "Check and download NBM data"

- Access a parquet files stored in CORI s3 bucket per county:

```{r nbm_raw}
guilford_cty <- get_county_nbm_raw(geoid_co = "37081")
head(guilford_cty)
```

- Use the CORI opinionated version at the Census block level for the **last NBM's release**:

```{r nbm_block}
# get a county
nbm_bl <- get_nbm_bl(geoid_co = "47051")
dim(nbm_bl)

# get census block covered by an ISP identified by their FRN
skymesh <- get_frn_nbm_bl("0027136753")
dim(skymesh)

release <- get_nbm_release() # get the available releases
release
```

You can also inspect what is available:
### Form 477

Sadly automating the download of some of the source data is harder for Form 477.
We are not providing that functionality.

You can get all data (multiple years) covering a State from Form 477:

```{r get_f477_example}
f477_vt <- get_f477("VT")
head(f477_vt)
```

```{r example-available}
nbm <- get_nbm_available() # get what data is available
# if we are intrested in "Fixed Broadband" / "Nationwide" / released "June 30, 2023"
nbm_filter <- nbm[which(nbm$release == "June 30, 2023" &
nbm$data_type == "Fixed Broadband" &
nbm$data_category == "Nationwide"), ]
rownames(nbm_filter) <- NULL
### Utilities

Getting the dictionnary for each dataset:

# or
nbm_dplyr_filter <- nbm |> dplyr::filter(release == "June 30, 2023" &
data_type == "Fixed Broadband" &
data_category == "Nationwide")
all.equal(nbm_filter, nbm_dplyr_filter)
head(nbm_filter)
```{r get_fcc_dictionary_ex}
head(get_fcc_dictionary())
```


Expand Down
163 changes: 113 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,64 +31,127 @@ You can install the development version of cori.data.fcc from
devtools::install_github("ruralinnovation/cori.data.fcc")
```

## Example

This is a basic example which shows some basic workflow:
## Examples

``` r
library(cori.data.fcc)
```

### National Broadband Map

- The package is providing you a way to download zipped `csv` see the
vignette “Check and download NBM data”

- Access a parquet files stored in CORI s3 bucket per county:

release <- get_nbm_release() # get the available releases
release
#> filing_type_id filing_type filing_subtype
#> 1 100006 Biannual December 31, 2022
#> 2 100000 Biannual June 30, 2022
#> 3 100007 Biannual June 30, 2023
#> 4 100011 Biannual December 31, 2023
#> process_uuid enable_bfm_link
#> 1 bbfba324-616d-4247-ab49-933fdd97ff12 TRUE
#> 2 7b81911a-c0cb-4be6-8e6c-63a32e8bf917 TRUE
#> 3 59f1d8d7-e532-468a-b68f-826c7945f918 TRUE
#> 4 dc5111bf-7169-40bf-bb23-f0827250cc04 TRUE
#> enable_challenge_download
#> 1 TRUE
#> 2 TRUE
#> 3 TRUE
#> 4 TRUE
``` r
guilford_cty <- get_county_nbm_raw(geoid_co = "37081")
head(guilford_cty)
#> frn provider_id brand_name location_id technology
#> 1 0001857952 130077 AT&T 1344960789 10
#> 2 0001857952 130077 AT&T 1344965855 10
#> 3 0001857952 130077 AT&T 1344971572 10
#> 4 0001857952 130077 AT&T 1344982708 10
#> 5 0001857952 130077 AT&T 1344991329 10
#> 6 0001857952 130077 AT&T 1344996969 10
#> max_advertised_download_speed max_advertised_upload_speed low_latency
#> 1 10 1 TRUE
#> 2 0 0 TRUE
#> 3 10 1 TRUE
#> 4 50 10 TRUE
#> 5 50 10 TRUE
#> 6 75 20 TRUE
#> business_residential_code state_usps geoid_bl geoid_co file_time_stamp
#> 1 X NC 370810161022008 37081 2024-09-03
#> 2 X NC 370810168003003 37081 2024-09-03
#> 3 X NC 370810125051020 37081 2024-09-03
#> 4 X NC 370810171011021 37081 2024-09-03
#> 5 X NC 370810157042006 37081 2024-09-03
#> 6 X NC 370810127052022 37081 2024-09-03
#> release
#> 1 2023-12-01
#> 2 2023-12-01
#> 3 2023-12-01
#> 4 2023-12-01
#> 5 2023-12-01
#> 6 2023-12-01
```

You can also inspect what is available:
- Use the CORI opinionated version at the Census block level for the
**last NBM’s release**:

``` r
# get a county
nbm_bl <- get_nbm_bl(geoid_co = "47051")
dim(nbm_bl)
#> [1] 2146 21

# get census block covered by an ISP identified by their FRN
skymesh <- get_frn_nbm_bl("0027136753")
dim(skymesh)
#> [1] 3 21
```

### Form 477

Sadly automating the download of some of the source data is harder for
Form 477. We are not providing that functionality.

You can get all data (multiple years) covering a State from Form 477:

``` r
f477_vt <- get_f477("VT")
head(f477_vt)
#> Provider_Id FRN ProviderName DBAName
#> 1 9395 0021002092 Stowe Cablevision, Inc. Stowe Access, LLC
#> 2 9395 0021002092 Stowe Cablevision, Inc. Stowe Access, LLC
#> 3 9395 0021002092 Stowe Cablevision, Inc. Stowe Access, LLC
#> 4 9395 0021002092 Stowe Cablevision, Inc. Stowe Access, LLC
#> 5 9395 0021002092 Stowe Cablevision, Inc. Stowe Access, LLC
#> 6 9395 0021002092 Stowe Cablevision, Inc. Stowe Access, LLC
#> HoldingCompanyName HocoNum HocoFinal StateAbbr
#> 1 Stowe Cablevision, Inc. 240090 Stowe Cablevision, Inc. VT
#> 2 Stowe Cablevision, Inc. 240090 Stowe Cablevision, Inc. VT
#> 3 Stowe Cablevision, Inc. 240090 Stowe Cablevision, Inc. VT
#> 4 Stowe Cablevision, Inc. 240090 Stowe Cablevision, Inc. VT
#> 5 Stowe Cablevision, Inc. 240090 Stowe Cablevision, Inc. VT
#> 6 Stowe Cablevision, Inc. 240090 Stowe Cablevision, Inc. VT
#> BlockCode TechCode Consumer MaxAdDown MaxAdUp Business Date
#> 1 500159531001026 42 TRUE 25 5 TRUE 2014-12-01
#> 2 500159531001026 41 TRUE 25 5 TRUE 2014-12-01
#> 3 500159531001026 50 FALSE 0 0 TRUE 2014-12-01
#> 4 500159531001027 42 TRUE 25 5 TRUE 2014-12-01
#> 5 500159531001027 41 TRUE 25 5 TRUE 2014-12-01
#> 6 500159531001027 50 FALSE 0 0 TRUE 2014-12-01
```

### Utilities

Getting the dictionnary for each dataset:

``` r
nbm <- get_nbm_available() # get what data is available
# if we are intrested in "Fixed Broadband" / "Nationwide" / released "June 30, 2023"
nbm_filter <- nbm[which(nbm$release == "June 30, 2023" &
nbm$data_type == "Fixed Broadband" &
nbm$data_category == "Nationwide"), ]
rownames(nbm_filter) <- NULL


# or
nbm_dplyr_filter <- nbm |> dplyr::filter(release == "June 30, 2023" &
data_type == "Fixed Broadband" &
data_category == "Nationwide")
all.equal(nbm_filter, nbm_dplyr_filter)
#> [1] TRUE
head(nbm_filter)
#> id release data_type technology_code state_fips provider_id
#> 1 689598 June 30, 2023 Fixed Broadband 0 01 <NA>
#> 2 689599 June 30, 2023 Fixed Broadband 0 04 <NA>
#> 3 689600 June 30, 2023 Fixed Broadband 0 06 <NA>
#> 4 689601 June 30, 2023 Fixed Broadband 0 12 <NA>
#> 5 689602 June 30, 2023 Fixed Broadband 0 17 <NA>
#> 6 689603 June 30, 2023 Fixed Broadband 0 18 <NA>
#> file_name file_type data_category
#> 1 bdc_01_Other_fixed_broadband_J23_01sep2024 csv Nationwide
#> 2 bdc_04_Other_fixed_broadband_J23_01sep2024 csv Nationwide
#> 3 bdc_06_Other_fixed_broadband_J23_01sep2024 csv Nationwide
#> 4 bdc_12_Other_fixed_broadband_J23_01sep2024 csv Nationwide
#> 5 bdc_17_Other_fixed_broadband_J23_01sep2024 csv Nationwide
#> 6 bdc_18_Other_fixed_broadband_J23_01sep2024 csv Nationwide
head(get_fcc_dictionary())
#> dataset var_name var_type
#> 1 f477 Provider_Id TEXT
#> 2 f477 FRN TEXT
#> 3 f477 ProviderName VARCHAR
#> 4 f477 DBAName VARCHAR
#> 5 f477 HoldingCompanyName VARCHAR
#> 6 f477 HocoNum TEXT
#> var_description
#> 1 filing number (assigned by FCC)
#> 2 FCC registration number
#> 3 Provider name
#> 4 'Doing business as' name
#> 5 Holding company name (as filed on Form 477)
#> 6 Holding company number (assigned by FCC)
#> var_example
#> 1 8026
#> 2 0001570936
#> 3 Arctic Slope Telephone Association Cooperative, Inc.
#> 4 ASTAC
#> 5 Arctic Slope Telephone Association Cooperative, Inc.
#> 6 130067
```

The package also provide the list of Provider ID and FRN
Expand Down
Loading
Loading