Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: cfbfastR
Title: Access College Football Play by Play Data
Version: 2.0.0
Version: 2.1.0
Authors@R: c(
person("Saiem", "Gilani", , "[email protected]", role = c("cre", "aut")),
person("Akshay", "Easwaran", , "[email protected]", role = "aut"),
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# **cfbfastR v2.1.0**

* Fixes a bug in `cfbd_pbp_data()` where play-by-play data for some games were not as expected.
* Improves `add_yardage()` where plays with missing yardage values were not being handled correctly.


# **cfbfastR v2.0.0**
### Breaking Changes to Loading Functions

Expand Down
13 changes: 10 additions & 3 deletions R/cfbd_pbp_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -586,8 +586,15 @@ cfbd_pbp_data <- function(year,
play_df <- purrr::map_dfr(
g_ids,
function(x){
play_df <- play_df %>%
dplyr::filter(.data$game_id == x) %>%
# Note: this should be changed to a complete data validation test in the future
# filter out games with less than 10 plays to avoid issues with EPA/WPA models
game_plays <- play_df %>%
dplyr::filter(.data$game_id == x)
if (nrow(game_plays) < 20) {
cli::cli_alert_danger(glue::glue("Skipping game_id {x} with only {nrow(game_plays)} plays"))
return(NULL)
}
Comment on lines +589 to +596
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix comment inconsistency with validation threshold.

Line 590 states "less than 10 plays" but line 593 checks for < 20. Update the comment to match the code.

Apply this diff:

-        # filter out games with less than 10 plays to avoid issues with EPA/WPA models
+        # filter out games with less than 20 plays to avoid issues with EPA/WPA models

Consider extracting the magic number.

The threshold of 20 plays is hardcoded. While the comment acknowledges this should be improved with a complete validation test, consider extracting it as a named constant for maintainability.

Example:

MIN_PLAYS_FOR_EPA_WPA <- 20

Then use:

if (nrow(game_plays) < MIN_PLAYS_FOR_EPA_WPA) {
  cli::cli_alert_danger(glue::glue("Skipping game_id {x} with only {nrow(game_plays)} plays"))
  return(NULL)
}
🤖 Prompt for AI Agents
In R/cfbd_pbp_data.R around lines 589 to 596, the inline comment incorrectly
says "less than 10 plays" while the code checks for fewer than 20 plays; update
the comment to reflect the actual threshold (20) and replace the hardcoded magic
number by introducing a named constant (e.g., MIN_PLAYS_FOR_EPA_WPA <- 20) at
top of the function or file, then use that constant in the if check and the
alert message to improve clarity and maintainability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ this is fair -- @saiemgilani was this minimum supposed to be 20 or 10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

game_plays <- game_plays %>%
penalty_detection() %>%
add_play_counts() %>%
clean_pbp_dat() %>%
Expand All @@ -599,7 +606,7 @@ cfbd_pbp_data <- function(year,
# create_wpa_betting() %>%
create_wpa_naive(wp_model = wp_model)
p(sprintf("x=%s", as.integer(x)))
return(play_df)
return(game_plays)
}, ...)
# } else{
# play_df <- purrr::map_dfr(
Expand Down
2 changes: 2 additions & 0 deletions R/cfbd_stats.R
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,8 @@ cfbd_stats_game_advanced <- function(year,
colnames(df) <- gsub("_Start", "_start", colnames(df))
colnames(df) <- gsub(".db", "_db", colnames(df))
colnames(df) <- gsub("Id", "_id", colnames(df))
colnames(df) <- gsub("seasonType", "season_type", colnames(df))



df <- df %>%
Expand Down
38 changes: 38 additions & 0 deletions R/helper_pbp_add_yardage.R
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,24 @@ add_yardage <- function(play_df) {
-1 * as.numeric(stringr::str_extract(
stringi::stri_extract_first_regex(.data$play_text, "(?<= for a loss of)[^,]+"), "\\d+"
)),
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("pass to", ignore_case = TRUE)) &
stringr::str_detect(.data$play_text, regex("for a loss of", ignore_case = TRUE)) ~
-1 * as.numeric(stringr::str_extract(
stringi::stri_extract_first_regex(.data$play_text, "(?<= for a loss of)[^,]+"), "\\d+"
)),
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("pass complete to", ignore_case = TRUE)) &
stringr::str_detect(.data$play_text, regex(" for \\d+ y\\w*ds?", ignore_case = TRUE)) ~
as.numeric(stringr::str_extract(
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
)),
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("pass to", ignore_case = TRUE)) &
stringr::str_detect(.data$play_text, regex(" for \\d+ y\\w*ds?", ignore_case = TRUE)) ~
as.numeric(stringr::str_extract(
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
)),
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("Yd pass", ignore_case = TRUE)) ~
as.numeric(stringr::str_extract(
Expand All @@ -99,6 +111,32 @@ add_yardage <- function(play_df) {
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("pass complete to", ignore_case = TRUE)) ~
yards_gained, # 2024 has games that don't have yards in the PBP text but do have them in the yards_gained field.

# 2025 has some plays list "PASSER pass" at the very end of the play_text
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("pass \\(\\w", ignore_case = TRUE)) &
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract(
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
)),
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("pass$", ignore_case = TRUE)) &
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract(
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
)),
# 2025 has some plays that have yards in the PBP but no listed passer. the format is the same though
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract(
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
)),
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) &
stringr::str_detect(.data$play_text, regex("for a loss of", ignore_case = TRUE)) ~
-1 * as.numeric(stringr::str_extract(
stringi::stri_extract_first_regex(.data$play_text, "(?<= for a loss of)[^,]+"), "\\d+"
)),
.data$pass == 1 &
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) &
stringr::str_detect(.data$play_text, regex("for no gain", ignore_case = TRUE)) ~ 0,
TRUE ~ NA_real_
)
)
Expand Down
30 changes: 5 additions & 25 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,10 @@
## Release summary

This is a major release that:

* Addresses the noted documentation issues which caused the previous release to be archived by CRAN.
* Addresses the error from the previous CRAN submission today though does not change official version number.
* Addresses the missing documentation for the `update_cfb_pbp()` function noted in the previous CRAN response.
* Addresses minor functionality issues in `cfbd_play_stats_player()` function.
* Updates the `cfbd_*()` functions to use the new College Football Data API v2.
* Addresses the most recent CRAN comments from the previous submission.

The following functions were added:
* `cfbd_metrics_fg_ep()`
* `cfbd_metrics_wepa_team_season()`
* `cfbd_metrics_wepa_players_passing()`
* `cfbd_metrics_wepa_players_rushing()`
* `cfbd_metrics_wepa_players_kicking()`
* `cfbd_ratings_fpi()`
* `cfbd_live_scoreboard()`
* `cfbd_live_plays()`
* `cfbd_api_key_info()`

There are minor changes to the existing `cfbd_*()` functions under the hood. See `NEWS.md` for more details.

While I believe I updated all twitter links in the `README.md` to non-redirecting links, they do give status 403
when you try to access them without authentication. If this behavior is too problematic and against policy, please let me know and I will
make the changes to the `README.md`.
This is a minor release that:

* Fixes a bug in `cfbd_pbp_data()` where play-by-play data for some games were not as expected.
* Improves `add_yardage()` where plays with missing yardage values were not being handled correctly.


## R CMD check results

Expand Down
8 changes: 4 additions & 4 deletions tests/testthat/test-cfbd_betting_lines.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,16 @@ test_that("CFB Betting Lines", {
cols <- c(
"game_id", "season", "season_type", "week",
"start_date",
"home_team", "home_conference", "home_classification", "home_score",
"away_team", "away_conference", "away_classification", "away_score",
"home_team_id", "home_team", "home_conference", "home_classification", "home_score",
"away_team_id", "away_team", "away_conference", "away_classification", "away_score",
"provider", "spread", "formatted_spread",
"spread_open", "over_under", "over_under_open",
"home_moneyline", "away_moneyline"
)
expect_equal(nrow(x), 4)
expect_equal(nrow(y), 4)
expect_setequal(colnames(x), cols)
expect_setequal(colnames(y), cols)
expect_in(cols, colnames(x))
expect_in(cols, colnames(y))
expect_s3_class(x, "data.frame")
expect_s3_class(y, "data.frame")
})
4 changes: 2 additions & 2 deletions tests/testthat/test-cfbd_live_scoreboard.R
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ test_that("CFB Live Scoreboard", {
x <- cfbd_live_scoreboard(division='fbs', conference = "B12")

y <- cfbd_live_scoreboard(division='fbs')
expect_in(colnames(x), cols)
expect_in(colnames(y), cols)
expect_in(cols, colnames(x))
expect_in(cols, colnames(y))
expect_s3_class(x, "data.frame")
expect_s3_class(y, "data.frame")
})
6 changes: 3 additions & 3 deletions tests/testthat/test-cfbd_stats_game_advanced.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ test_that("CFB Stats Game - Advanced", {
y <- cfbd_stats_game_advanced(2019, team = "LSU")

z <- cfbd_stats_game_advanced(2013, team = "Florida State")
expect_setequal(colnames(x), cols)
expect_setequal(colnames(y), cols)
expect_setequal(colnames(z), cols)
expect_in(cols, colnames(x))
expect_in(cols, colnames(y))
expect_in(cols, colnames(z))
expect_s3_class(x, "data.frame")
expect_s3_class(y, "data.frame")
expect_s3_class(z, "data.frame")
Expand Down