Skip to content

Calculations for delta.pitch and delta.field differ from previous calculations #108

@davidbmitchell

Description

@davidbmitchell

I compared running makeWAR() on the May data set to the MayProcessed data set and noticed that the delta.field and delta.pitch columns in the New MayProcessed data set differed from the original MayProcessed data set. They actually look transposed which you can see below. I did this using dplyr 0.5.0, but I first noticed it when testing makeWAR() after refactoring for dplyr 0.7.0 .

>NewMayProcessed <- makeWAR(May)
>head(NewMayProcessed$openWARPlays[,c(1:5,16, 19:23)])
  batterId start1B start2B start3B pitcherId                         gameId      delta delta.field delta.pitch    delta.br  delta.bat
1   476704    <NA>    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.3789624          NA  0.37896244          NA  0.3789624
2   519083  476704    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.2055008 -0.04671768 -0.15878313  0.03238909 -0.2378899
3   452234    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.3296470          NA -0.32964703  0.04026076 -0.3699078
4   493316    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.2032371  0.11692098  0.08631608 -0.53123407  0.7344711
5   518626  493316    <NA>  476704    450351 gid_2013_05_01_anamlb_oakmlb_1  0.1956572          NA  0.19565721 -0.01790497  0.2135622
6   474384  518626  493316  476704    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.7097701 -0.36090191 -0.34886821  0.01234560 -0.7221157

> head(MayProcessed$openWARPlays[,c(1:5,16, 19:23)])
  batterId start1B start2B start3B pitcherId                         gameId      delta delta.field delta.pitch    delta.br  delta.bat
1   476704    <NA>    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.3789624          NA   0.3789624          NA  0.3789624
2   519083  476704    <NA>    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.2055008  -0.1588469  -0.0466539  0.03238909 -0.2378899
3   452234    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.3296470          NA  -0.3296470  0.04026076 -0.3699078
4   493316    <NA>  476704    <NA>    450351 gid_2013_05_01_anamlb_oakmlb_1  0.2032371   0.1169279   0.0863092 -0.53123407  0.7344711
5   518626  493316    <NA>  476704    450351 gid_2013_05_01_anamlb_oakmlb_1  0.1956572          NA   0.1956572 -0.01790497  0.2135622
6   474384  518626  493316  476704    450351 gid_2013_05_01_anamlb_oakmlb_1 -0.7097701  -0.3487953  -0.3609748  0.01234560 -0.7221157

The original MayProcessed data set was added over 2 years ago, and there have been quite a few changes to makeWAR() since then. I imagine this happened when openWAR and dplyrized. I'm pretty sure it has to do with [Line 140].(https://github.com/beanumber/openWAR/blob/master/R/makeWAR.R#L140)

x$data <- mutate_(x$data, delta.pitch = ~ifelse(is.na(delta.field), delta, delta - delta.field))

So I guess it boils down to which data set is correct? Is it the original MayProcessed data set?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions