Map articles metadata and relationship to schema.org entities and stores them in RedisGraph database.
- Midfielders in Premier League (with transfers)
- German players in Premier League (with transfers)
- Liverpool's current squad
- Liverpool's squad from 2017
This repository contains structured data for 3133 football teams and 4760 football players (with focus on Premier League and Serie A). Football Wiki infoboxes (and other templates) were used as a source.
We store basic information about each player and manager (birth date, nationality, height) and team (league membership, foundation year, stadium) as nodes. Both types of nodes - Person (a player or a manager) and SportsTeam - are connected with relations. A person can be a coach or an athlete for the club.
- player career is described as
[:athlete]relation (withsinceanduntilproperties) withSportsTeamnode. - team's current squad is described as
[:athlete]relation withPersonnode. Player position on the field and number is stored in relation's properies.
2019-01-09 12:50:59 RedisGraph INFO Committing graph with 7893 nodes and 18398 edges
See
FootballWikiSourcePython class for implementation details.
mwclient Python module was used to get a list of articles from provided categories - players and teams. For each article we queried our custom API that returnes a list of all templates (with parameters passed) that were used in a given article.
We assumed the following:
Infobox Biographytemplate describes a person (either a player or a manager, or even both)Infobox Clubtemplate described a teamFs playertemplate is used to connect teams with players (i.e. set up a relation for the current team squad)
As we parse the list of parameters passed to templates we assume that:
- parameters with links form a relation (e.g. player played in this club, club has this person as a manager)
- parameters with plain values are source of data for properties (club foundation year, birth date, player's height)
We model collected data using schema.org types and properties. Each model is then represented as a node in graph database, while relation is an edge.
This approach will later on allow us to expose structured data and links between entities in HTML of rendered wiki articles.
docker-compose up
redisgraph binds to a local port 56379:
redis-cli -p 56379
grapheris a Python 3.x project used to collect structured data from Football Wiki. Please set up virtual env and runindex_football_wikiscript there.- Alternatively, tou can take graph dump stored in
grapher/output/football.graphfile and load it into RedisGraph instance.
127.0.0.1:56379> GRAPH.QUERY football "MATCH (t:SportsTeam) RETURN count(t) AS teams"
1) 1) 1) "teams"
2) 1) "3133.000000"
2) 1) "Query internal execution time: 2.656609 milliseconds"
127.0.0.1:56379> GRAPH.QUERY football "MATCH (p:Person) RETURN count(p) as players"
1) 1) 1) "players"
2) 1) "4760.000000"
2) 1) "Query internal execution time: 3.306746 milliseconds"
<PersonModel https://schema.org/Person (Ole_Gunnar_Solskjr:Person) name = "Ole Gunnar Solskjær", birthDate = "1973", birthPlace = "Kristiansund", nationality = "Norway", height = "1.78">
--[:athlete {"until": 1994, "since": 1990}]->(Clausenengen_FK:SportsTeam)
--[:athlete {"until": 1996, "since": 1994}]->(Molde_FK:SportsTeam)
--[:athlete {"until": 2007, "since": 1996}]->(Manchester_United_F_C:SportsTeam)
--[:coach {"until": 2011, "since": 2008}]->(Manchester_United_F_C_Reserves_and_Academy:SportsTeam)
--[:coach {"until": 2014, "since": 2011}]->(Molde_FK:SportsTeam)
--[:coach {"until": 2014, "since": 2014}]->(Cardiff_City_F_C:SportsTeam)
--[:coach {"until": 2016, "since": 2014}]->(Clausenengen_FK:SportsTeam)
--[:coach {"until": 2018, "since": 2015}]->(Molde_FK:SportsTeam)
--[:coach {"until": 2019, "since": 2018}]->(Manchester_United_F_C:SportsTeam)
<PersonModel https://schema.org/Person (Zlatan_Ibrahimovi:Person) name = "Zlatan Ibrahimović", birthDate = "1981", birthPlace = "Malmö", nationality = "Sweden", height = "1.95">
--[:athlete {"until": 2001, "since": 1999}]->(Malm_FF:SportsTeam)
--[:athlete {"until": 2004, "since": 2001}]->(AFC_Ajax:SportsTeam)
--[:athlete {"until": 2006, "since": 2004}]->(Juventus_F_C:SportsTeam)
--[:athlete {"until": 2009, "since": 2006}]->(Inter_Milan:SportsTeam)
--[:athlete {"until": 2011, "since": 2009}]->(FC_Barcelona:SportsTeam)
--[:athlete {"until": 2011, "since": 2010}]->(A_C_Milan:SportsTeam)
--[:athlete {"until": 2012, "since": 2011}]->(A_C_Milan:SportsTeam)
--[:athlete {"until": 2016, "since": 2012}]->(Paris_Saint_Germain_F_C:SportsTeam)
--[:athlete {"until": 2018, "since": 2016}]->(Manchester_United:SportsTeam)
--[:athlete {"until": 2018, "since": 2017}]->(Manchester_United:SportsTeam)
--[:athlete {"until": null, "since": 2018}]->(LA_Galaxy:SportsTeam)
<SportsTeamModel https://schema.org/SportsTeam (Manchester_United_F_C:SportsTeam) name = "Manchester United F.C.", sport = "Football", foundingDate = "1878", ground = "Old Trafford", memberOf = "Premier League", url = "http://www.manutd.com/">
--[:coach ]->(Ole_Gunnar_Solskjr:Person)
--[:athlete {"number": 1, "position": "GK"}]->(David_de_Gea:Person)
--[:athlete {"number": 2, "position": "DF"}]->(Victor_Lindelf:Person)
--[:athlete {"number": 3, "position": "DF"}]->(Eric_Bailly:Person)
--[:athlete {"number": 4, "position": "DF"}]->(Phil_Jones_born_1992:Person)
--[:athlete {"number": 6, "position": "MF"}]->(Paul_Pogba:Person)
--[:athlete {"number": 7, "position": "FW"}]->(Alexis_Snchez:Person)
--[:athlete {"number": 8, "position": "MF"}]->(Juan_Mata:Person)
--[:athlete {"number": 9, "position": "FW"}]->(Romelu_Lukaku:Person)
--[:athlete {"number": 10, "position": "FW"}]->(Marcus_Rashford:Person)
--[:athlete {"number": 11, "position": "FW"}]->(Anthony_Martial:Person)
--[:athlete {"number": 12, "position": "DF"}]->(Chris_Smalling:Person)
--[:athlete {"number": 13, "position": "GK"}]->(Lee_Grant_born_1983:Person)
--[:athlete {"number": 14, "position": "MF"}]->(Jesse_Lingard:Person)
--[:athlete {"number": 15, "position": "MF"}]->(Andreas_Pereira:Person)
--[:athlete {"number": 16, "position": "DF"}]->(Marcos_Rojo:Person)
--[:athlete {"number": 17, "position": "MF"}]->(Fred_born_1993:Person)
--[:athlete {"number": 18, "position": "DF"}]->(Ashley_Young:Person)
--[:athlete {"number": 20, "position": "DF"}]->(Diogo_Dalot:Person)
--[:athlete {"number": 21, "position": "MF"}]->(Ander_Herrera:Person)
--[:athlete {"number": 22, "position": "GK"}]->(Sergio_Romero:Person)
--[:athlete {"number": 23, "position": "DF"}]->(Luke_Shaw:Person)
--[:athlete {"number": 25, "position": "DF"}]->(Antonio_Valencia:Person)
--[:athlete {"number": 27, "position": "MF"}]->(Marouane_Fellaini:Person)
--[:athlete {"number": 31, "position": "MF"}]->(Nemanja_Mati:Person)
--[:athlete {"number": 36, "position": "DF"}]->(Matteo_Darmian:Person)
--[:athlete {"number": 39, "position": "MF"}]->(Scott_McTominay:Person)
--[:athlete {"number": 24, "position": "DF"}]->(Timothy_Fosu_Mensah:Person)
--[:athlete {"number": 38, "position": "DF"}]->(Axel_Tuanzebe:Person)
--[:athlete {"number": 40, "position": "GK"}]->(Joel_Castro_Pereira:Person)
Use
make redisto access redis-cli and perform queries.
127.0.0.1:56379> GRAPH.QUERY football "MATCH (t:SportsTeam)<-[a:athlete]-(p:Person) WHERE t.memberOf = 'Premier League' AND p.nationality = 'Iceland' RETURN t.name,p.name,a.since,a.until"
1) 1) 1) "t.name"
2) "p.name"
3) "a.since"
4) "a.until"
2) 1) "Arsenal F.C."
2) "\xc3\x93lafur Ingi Sk\xc3\xbalason"
3) "2001.000000"
4) "2005.000000"
3) 1) "Tottenham Hotspur F.C."
2) "Gylfi \xc3\x9e\xc3\xb3r Sigur\xc3\xb0sson"
3) "2012.000000"
4) "2014.000000"
4) 1) "Everton F.C."
2) "Gylfi \xc3\x9e\xc3\xb3r Sigur\xc3\xb0sson"
3) "2017.000000"
4) "NULL"
5) 1) "Burnley F.C."
2) "J\xc3\xb3hann Berg Gu\xc3\xb0mundsson"
3) "2016.000000"
4) "NULL"
2) 1) "Query internal execution time: 11.935366 milliseconds"
127.0.0.1:56379> GRAPH.QUERY football "MATCH (t:SportsTeam)<-[a:athlete]-(p:Person) WHERE p.name = 'Łukasz Fabiański' RETURN t.name,a.since,a.until"
1) 1) 1) "t.name"
2) "a.since"
3) "a.until"
2) 1) "West Ham United F.C."
2) "2018.000000"
3) "NULL"
3) 1) "Swansea_City_A_F_C"
2) "2014.000000"
3) "2018.000000"
4) 1) "Arsenal F.C."
2) "2007.000000"
3) "2014.000000"
5) 1) "Legia_Warsaw"
2) "2005.000000"
3) "2007.000000"
127.0.0.1:56379> GRAPH.QUERY football "MATCH (t:SportsTeam)<-[:athlete]-(c:Person)-[:athlete]->(t2:SportsTeam) WHERE t.name = 'Liverpool F.C.' and t2.name = 'Manchester United F.C.' RETURN c.name,t.name,t2.name"
1) 1) 1) "c.name"
2) "t.name"
3) "t2.name"
2) 1) "Peter Andrew Beardsley"
2) "Liverpool F.C."
3) "Manchester United F.C."
3) 1) "Paul Emerson Carlyle Ince"
2) "Liverpool F.C."
3) "Manchester United F.C."
4) 1) "Michael James Owen"
2) "Liverpool F.C."
3) "Manchester United F.C."
127.0.0.1:56379> GRAPH.QUERY football "MATCH (t:SportsTeam)-[a:athlete]->(p:Person) WHERE t.memberOf = 'Premier League' AND a.position = 'MF' AND p.nationality = 'Germany' RETURN t.name,p.name,a.number"
1) 1) 1) "t.name"
2) "p.name"
3) "a.number"
2) 1) "Arsenal F.C."
2) "Mesut \xc3\x96zil"
3) "10.000000"
3) 1) "Manchester City F.C."
2) "\xc4\xb0lkay G\xc3\xbcndo\xc4\x9fan"
3) "8.000000"
4) 1) "Manchester City F.C."
2) "Leroy San\xc3\xa9"
3) "19.000000"
5) 1) "Crystal Palace F.C."
2) "Jeffrey Schlupp"
3) "15.000000"
6) 1) "Everton F.C."
2) "Muhamed Be\xc5\xa1i\xc4\x87"
3) "21.000000"
2) 1) "Query internal execution time: 4.901690 milliseconds"
