Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
jfjelstul committed May 26, 2024
2 parents e3b16d5 + 861fd1a commit 136e8f8
Showing 1 changed file with 8 additions and 6 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# The Fjelstul English Football Database

The Fjelstul English Football Database is a comprehensive database of football matches played in the Premier League and the English Football League from the inaugural season of the Football League (1888-89) through the most recent season (2021-22). The database was created by Joshua C. Fjelstul, Ph.D.
The Fjelstul English Football Database is a comprehensive database of football matches played in the Premier League and the English Football League from the inaugural season of the Football League in 1888-89 through the 2023-24 season. The database was created by Joshua C. Fjelstul, Ph.D.

The database contains `5` datasets: `seasons`, `teams`, `matches`, `appearances` (one observation per team per match), and `standings` (end-of-the-season league tables). The `matches` dataset includes `203956` matches.

If you use data from this database in a project, please let me know so I can feature your work!

## Downloading the data

The Fjelstul English Football Database is available via the `R` package `englishfootball`, which you can install from this repository (instructions below). Note that this repository is structured as a repository for an `R` package. You can also download the database directly from this repository in `3` formats: an `.RData` version of the database is available in the `data/` folder, a `.csv` version is available in the `data-csv/` folder, and a relational database version (`SQLite`) is available in the `data-sqlite/` folder.

The `.Rdata` and `.csv` versions of the database are all identical except for the file format. These versions of the database are not technically relational because some tables already include variables that have been merged in from other tables for convenience (i.e., some data exists in multiple tables). The `SQLite` version includes all of the same variables, but variables from other tables are not already merged in. Dummy variables that are coded `0` or `1` are converted to `FALSE` and `TRUE`. Users can use the primary and foreign keys in the tables to merge in data from other tables. See the `SQL-schema.txt` file in the `data-sqlite/` folder for more details.
The `.RData` and `.csv` versions of the database are all identical except for the file format. These versions of the database are not technically relational because some tables already include variables that have been merged in from other tables for convenience (i.e., some data exists in multiple tables). The `SQLite` version includes all of the same variables, but variables from other tables are not already merged in. Dummy variables that are coded `0` or `1` are converted to `FALSE` and `TRUE`. Users can use the primary and foreign keys in the tables to merge in data from other tables. See the `SQL-schema.txt` file in the `data-sqlite/` folder for more details.

## Downloading the codebook

Expand All @@ -22,7 +24,7 @@ The copyright for the original structure and organization of the Fjelstul Englis

The Fjelstul English Football Database and the `englishfootball` package are both published under a [CC-BY-SA 4.0 license](https://creativecommons.org/licenses/by-sa/4.0/legalcode). This means that you can distribute, modify, and use all or part of the database for commercial or non-commercial purposes as long as (1) you provide proper attribution and (2) any new works you produce based on this database also carry the CC-BY-SA 4.0 license.

To provide proper attribution, according to the CC-BY-SA 4.0 license, you must provide the name of the author ("Joshua C. Fjelstul, Ph.D."), a notice that the database is copyrighted ("© 2022 Joshua C. Fjelstul, Ph.D."), a link to the CC-BY-SA 4.0 license (https://creativecommons.org/licenses/by-sa/4.0/legalcode), and a link to this repository (https://www.github.com/jfjelstul/englishfootball). You must also indicate any modifications you have made to the database.
To provide proper attribution, according to the CC-BY-SA 4.0 license, you must provide the name of the author ("Joshua C. Fjelstul, Ph.D."), a notice that the database is copyrighted ("© 2024 Joshua C. Fjelstul, Ph.D."), a link to the CC-BY-SA 4.0 license (https://creativecommons.org/licenses/by-sa/4.0/legalcode), and a link to this repository (https://www.github.com/jfjelstul/englishfootball). You must also indicate any modifications you have made to the database.

Consistent with the CC-BY-SA 4.0 license, I provide this database as-is and as-available, and make no representations or warranties of any kind concerning the database, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable.

Expand All @@ -38,11 +40,11 @@ The data in the Fjelstul English Football Database is coded based on information

- **Team names.** Many team names end in `Football Club`, usually abbreviated as `F.C.`, and a few start with `AFC` (Athletic Football Club). I standardize team names throughout the database by removing these abbreviations. Some teams have changed their names over time. For example, Manchester United started out as Newton Heath and Arsenal started out as Woolwich Arsenal. The `matches`, `appearances`, and `standings` datasets always use the name of the team at the time. The `team_name` variable in the `teams` dataset is the current name of the team, and the `former_team_names` variable in the `teams` dataset lists any previous names. The `team_id` variable and its extensions, such as `home_team_id` and `away_team_id`, allow you to track teams across name changes in the `matches`, `appearances`, and `standings` datasets. For example, in the `matches` dataset, `team_name` will be coded `Newton Heath` before the name change and `Manchester United` after the name change, but `team_id` will have the same value for both.

- **Defunct teams.** Some teams that have been in the English Football League have been relegated and are currently playing in lower divisons. There are also some teams that have become defunct. The `defunct` variable in the `teams` dataset indicates teams that have become defunct and no longer exist. I do not code teams that have since been revived as defunct, regardless of whether they are current members of the English Football League. There are `27` defunct teams that have not been revived.
- **Defunct teams.** Some teams that have been in the English Football League have been relegated and are currently playing in lower divisions. There are also some teams that have become defunct. The `defunct` variable in the `teams` dataset indicates teams that have become defunct and no longer exist. I do not code teams that have since been revived as defunct, regardless of whether they are current members of the English Football League. There are `27` defunct teams that have not been revived.

- **Phoenix teams.** Sometimes, a team will be dissolved, and then a new team will be created with the same name as a revival of the original team. These are called phoenix teams, and I code them as a continuation of the original team, even though legally, they are a new entity. For example, I code the current Accrington Stanley as a continuation of the Accrington Stanley that was founded in 1891 and was later dissolved. Similarly, Bradford Pack Avenue was dissolved and was then later revived. One unusual case is Wimbledon. Wimbledon F.C. was relocated and became Milton Keynes Dons F.C., which I code as a separate team. Then, a protest club called AFC Wimbledon was founded to replace the original Wimbledon F.C. I code the new Wimbledon as a revival of the original Wimbledon. Accounting for phoenix teams, there have been `144` unique teams in the Premier League and English Football League.

- **Current members.** There are currently `92` members of the Premier League and the English Football League. The `current` variable in the `teams` dataset indicates which teams are current members of the Premier League or the English Football League after taking into consideration relegation from League Two and promotion from the National League at the end of the 2021-22 season. Oldham Athletic and Scunthorpe United were relegated from League Two and Grimsby Town and Stockport County were promoted from the National League. Grimsby Town and Stockport County had both been in the English Football League previously, so they are already in the `teams` table.
- **Current members.** There are currently `92` members of the Premier League and the English Football League. The `current` variable in the `teams` dataset indicates which teams are members of the Premier League or the English Football League during the most recent season in the database, which is the 2023-24 season. This variables doesn't reflect relegation from League Two or promotion from the National League following the conclusion of the 2023-24 season.

## Installing the R package

Expand Down Expand Up @@ -86,4 +88,4 @@ The `BibTeX` entry for the `R` package is:

## Reporting problems

If you notice an error in the data or a bug in the `R` package, please report it [here](https://github.com/jfjelstul/englishfotoball/issues).
If you notice an error in the data or a bug in the `R` package, please report it [here](https://github.com/jfjelstul/englishfootball/issues).

0 comments on commit 136e8f8

Please sign in to comment.