Construction of a large painter network with ~3000 painters using the PainterPalette dataset, connecting painters if they lived at the same place for long enough time, approximately.
The project combines many classical steps of network science:
- Construction of the network, cleaning
- Network backbone extraction: disparity filter and thresholding (comparative analysis)
- Macroscopic network analysis:
- degree distribution(s), clustering coefficients
- centrality measures, and why they do not work on this longitudinal network
- assortativity, rich-club coefficients
- painter attribute statistics (e.g. percentage of movements and nationalities), attribute-attribute relationships
- Community detection: (nested) stochastic block model (SBM)
- Mesoscopic analysis: community-attribute statistics
- Visualizations
To my understanding, this is the largest "painter network from historical geographical data" constructed and analyzed. Connection of artists / painters is typically done via coexhibitions, such as in the paper of Barabási and Fraiberger et al. (2018), or in my lead joint work using e-flux coexhibition data. (Some papers study the Wikipedia page descriptions, e.g. see this paper by Kitromilidis and Evans). This project rather tries to reconstruct the historical connections based on temporal and locations information of painters, which is extracted into my dataset (PainterPalette) from Wikidata.
The PainterPalette dataset provides information on ~10000 painters from antiquity to today's age, with various biographical, geographical and stylistic data. From this, a network of painters is constructed, cleaned, and analyzed. Statistical measures of painter attributes are also computed. Then, communities in the network are detected using a (nested) stochastic block model (SBM), and attributes across the communities are compared (such as differences in female representation).
The results show, that due to the historical / longitudinal nature of the network, certain measures give different results as in comparison to typical social networks - such as power-law distribution
The network is assortative, and the rich-club effect is more prominent the higher the degree. Statistics of attributes show insights such as the increase of female representation over time, and Russian artists have the most paintings on WikiArt in the dataset.
From the communities of the painters, we can see great differences in attributes across the communities such as female representation, observing that there is one standout community with a very high average of WikiArt paintings (consisting of French and American late impressionists and realists).
Scholars can take inspiration from this project:
- use the resulting network, or the dataset and methods to analyze a network of painters
- grasp methodology to construct a similar network, and especially to filter the network
- generally understand various network science methods and motivations
- see how a nested SBM can be used to detect communities in a network, and their higher-level hierarchy
- improve Python data visualization skills
- possibly: realize the limitations of network science methods (e.g. centrality measures) on historical, "longitudinal" networks
Similar, less extensive works can be found in the repo and subrepos of ArtProjects.
The data collection, network construction and analysis are all done in one notebook: painter_network.ipynb, just need to run that.
To use the correct environment, you can create a new conda environment with the environment.yml file:
conda env create -f environment.ymlor use the requirements.txt file with pip:
pip install -r requirements.txtThere is one caveat: the community detection is done with the graph-tool library (as it has SBM implemented), which is not available on Windows due to using PyStan in the methods running Monte-Carlo Markov-chain algorithms. Thus, I did not include in the requirements.
The community detection is done separately in the sbm.ipynb notebook and results (communities) are loaded into the main notebook. If you would like to run it on Windows, I recommend running a Docker containter - here is my implementation of running graph-tool in a VSCode Docker container with extensions: GitHub repository to run graph-tool in Docker with VSCode.
The third notebook, the infomap_louvain_jaccard.ipynb, is a comparison of community detection methods outside analysis, it uses the infomap package, which is otherwise not used in the project. It can be installed with pip, but easiest done with modifying the environment file by adding these lines:
- pip
- pip:
- infomap(Currently, Infomap only runs on Python 3.10 at latest.)
Painter data is taken from PainterPalette, and is available in the data folder as artists.csv. The instances make up our nodes:
| artist | Nationality | citizenship | gender | styles | movement (Wiki) | Movements (Art500k) | birth place | death place | birth year | death_year | FirstYear | LastYear | wikiart_pictures_count | locations | locations_with_years | styles_extended | StylesCount | StylesYears | occupations | PaintingsExhibitedAt | PaintingsExhibitedAtCount | PaintingSchool | Influencedby | Influencedon | Pupils | Teachers | FriendsandCoworkers | Contemporary | ArtMovement | Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bracha L. Ettinger | French,Jewish,Israeli | Israel | female | New European Painting | New European Painting | {New European Painting:21} | Tel Aviv | NaN | 1948 | NaN | 1991 | 2009 | 21 | [] | [] | {New European Painting:21} | {New European Painting:21} | New European Painting:1991-2009 | philosopher, psychoanalyst, painter, photographer | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | NaN | NaN | |
| Friedrich Schroder-Sonnenstern | German | Germany | male | Art Brut | Outsider art (Art brut) | {Outsider art (Art brut):10} | Sovetsk | Berlin | 1892 | 1982 | 1948 | 1965 | 10 | ['Berlin'] | ['Berlin:1949-1982'] | {Art Brut:10} | {Art Brut:10} | Art Brut:1948-1965 | painter, drawer | NaN | NaN | NaN | NaN | NaN | NaN | NaN | No | NaN | NaN | |
| Nuno Gonçalves | Portuguese | Kingdom of Portugal | male | Early Renaissance | Early Renaissance | {Early Renaissance:10} | NaN | NaN | 1425 | 1492 | 1450 | 1490 | 10 | [] | [] | {Early Renaissance:10} | {Early Renaissance:10} | Early Renaissance:1450-1490 | portraitist, painter, musician, researcher | Lisbon, Portugal | {Lisbon:1},{Portugal:1} | NaN | National Museum of Ancient Art (MNAA), Lisbon,... | NaN | NaN | NaN | NaN | No | NaN | NaN |
| Jose Ferraz de Almeida Junior | Brazilian | Brazil | male | Academicism, Realism | Realism | {Realism:64} | Itu | Piracicaba | 1850 | 1899 | 1850 | 1899 | 65 | [] | [] | {Academicism:13},{Realism:52} | {Realism:51}, {Academicism:15} | Realism:1850-1899,Academicism:1850-1895 | painter | Brazil, Rio de Janeiro, Sao Paulo | {Sao Paulo:30},{Brazil:36},{Rio de Janeiro:4} | NaN | NaN | NaN | NaN | NaN | NaN | No | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
There is connection (edge) data, this has to be manually implemented: the "location" attribute is used to connect the painters. The logic is that if two painters lived in the same location at the same time, they are connected.
We gather temporal data from the birth and death years of the painter; however, since it is generally not available when a painter moved to a location, we assume for simplicity that the painter lived in for each location for the same amount of time, averaging over the lifespan. This inspired the following measure for how connected are two painters based on time and location:
(The dimension of this index is supposed to be time. The longer people live (at the same place), the more likely they are to meet, which makes sense. On the other hand, living at more places does not necessary mean two painters are more likely to meet - if they live 5 and 5 years at two different places, the probability that they meet shall be equal/similar to the probability in the case they lived 10 years at the one place. Locations are included in the formula in a normalized way to still boost the value if two painters lived at the same places.)
The initial network is constructed in NetworkX - it is a generally slow library and nested iterations can take long, therefore I improved performance by using numpy computations prior iterations wherever possible.
Nodes with sufficient location and temporal data are added while constructing the network, then edges are added based on the above formula (computing only for pairs of nodes that potentially have an edge based on their lifetime) - if the index is above 0, an edge is added, with the index as weight. We will later filter the network based the edge weight.
The filtered network is stored in the data\painters.graphml file (see the filtering for backbone extraction below).
Incorrections are cleaned for consistency.
To get rid of unrealistic edges, we need to filter the network. I test and compare the results of two methods: thresholding weights, and disparity filter.
For analysis afterwards, I select the filtered network gathered from the disparity filter method with data\painters.graphml file.
From the below plots, we can see that the best values for thresholding are in the interval [5,20]:
The fractions of nodes and edges kept for different thresholds:
Every node has a local (asymmetric) measure for the weights of its edges. This helps to keep more edges with low weight, but with higher local importance - the expected range of weights could differ greatly throughout the network, e.g. artists living in Paris will always have many and strong connections, but other artists may not have so many high weight connections.
The following measure is used, composed of the node's strength (sum of weights) and degree, and the weight of the edge. For each edge, we define two disparity values - one for each start node i.e. direction (we have an undirected network, however this measure still can be asymmetric). The probability of an edge from
where
(An explanation on using out-degree and out-strength for the general formula e.g. could be that hubs typically have many incoming edges but few outgoing edges, and the impact of the incoming edges can be unimportant - such as with the hyperlinks and the WWW.)
We set the disparity threshold to 0.7 and 0.1 to get roughly the same amount of edges as with the thresholding method.
The decrease in edges and nodes is much straighter (i.e. evenly balanced).
There is a phenomenon that many nodes have either 0 or 1 as clustering coefficient - for the disparity filter method, there is a higher concentration of nodes with clustering coefficient 0, especially with
The decrease in number of nodes with clustering coefficient 1 is much more significant - this might be due to groups of artists that are only connected to each other due to having lived at only one place; some of these artists are connected by the one and only location and hence have a clustering coefficient of 1 - the thresholding method keeps these connections as these are strong connections, but the disparity method has less of a tendency to do so, as connections are locally evaluated.
The development of the network's clustering coefficient (CC) distribution via filtering is interesting. The first method with threshold 5 gives a similar CC distribution as the disparity method with
When increasing the the threshold and decreasing the
For the analysis, I used the network generated by the disparity filter with
The network is highly assortative, and omits to rich-club behaviour at high degrees.
Alpha values for power-law fits are rather high: these two plots are for the thresholds at 5 and 20, having alpha values 12 and 5, respectively, much higher than the typical [2,3] range.
The power-law fit is much more realistic here:
| Nationality | Female representation (%) |
|---|---|
| American | 15.92% |
| British | 14.13% |
| Italian | 9.27% |
| German | 8.62% |
| Polish | 6.67% |
| French | 6.00% |
| Dutch | 5.76% |
| Flemish | 5.00% |
| Russian | 2.85% |
| Hungarian | 0.0% |
| Nationality | Average no. WikiArt paintings |
|---|---|
| Russian | 198.66 |
| Spanish | 190.48 |
| Italian School of Paintings | 185.00 |
| Colombian | 177.00 |
| French | 148.51 |
| Austrian | 130.25 |
| Hungarian | 72.57 |
| German | 69.22 |
| British | 67.04 |
The used method is the nested stochastic block model (nested SBM), which is a hierarchical version of SBMs, finding connections on higher levels, between communities (this can be nicely visualized on the plot).
As the implementation is only available in the graph-tool library, which is not available in this environment and on Windows in general, I run the community detection in the separate sbm.ipynb notebook.
To analyze the network of painters, I created a network from painter location data, defining a time-location-similarity index based on the time and location data of painters, adding an edge between any two painters that have above 0 similarity. Then I tried and compared two different network filtering methods to get rid of unrealistic edges.
Selecting the thinner backbone of the network for analysis, I firstly checked typical network measures, comparing to expectations, and explained possible causes for deviation.
I then run a thorough global analysis of attributes of painters, such as female representation and nationality, and checked assortativity and rich-club behavior in the network.
Finally, I used a community detection algorithm to extract communities in the network, and looked at the differences among different groups.
Some of the findings:
- The complete network is a longitudinal chain of networks, which makes it different from typical social networks in structure. This is because it is a historical network, and entities (artists) can only gain connections in a tight window of the network. The distribution still resembles a power law, however with very high
$\alpha$ value. Basic centrality measures in this network are biased. Taking a snapshot of the network at a certain time would give a more typical result for various measures. - Females on average are underrepresented in the network, but the representation has increased over time
- Among nationalities, Russian artists have the highest average picture count on WikiArt. Among communities, a group of French and American late impressionists, French academic art painters, and realists, have the highest average WikiArt picture count.
- Females are most common in a 20th-century community consisting of American and Japanese artists, and the dataset has a high representation of female artists among US artists.
The methods we used helped to gain insights about the network of painters, and can be inspiring for further research in art history, or for different network science projects.











