jupyter

jupytext

kernelspec

language_info

plotly

notebook_metadata_filter

text_representation

all

extension	format_name	format_version	jupytext_version
.md	markdown	1.2	1.3.0

display_name	language	name
Python 3	python	python3

codemirror_mode

file_extension

mimetype

name

nbconvert_exporter

pygments_lexer

version

name	version
ipython	3

.py

text/x-python

python

ipython3

3.7.3

description	display_as	language	layout	name	order	page_type	permalink	thumbnail
How to use datashader to rasterize large datasets, and visualize the generated raster data with plotly.	scientific	python	base	Plotly and Datashader	20	u-guide	python/datashader/	thumbnail/datashader.jpg

datashader creates rasterized representations of large datasets for easier visualization, with a pipeline approach consisting of several steps: projecting the data on a regular grid, creating a color representation of the grid, etc.

Passing datashader rasters as a mapbox image layer

We visualize here the spatial distribution of taxi rides in New York City. A higher density is observed on major avenues. For more details about mapbox charts, see the mapbox layers tutorial. No mapbox token is needed here.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/uber-rides-data1.csv')
dff = df.query('Lat < 40.82').query('Lat > 40.70').query('Lon > -74.02').query('Lon < -73.91')

import datashader as ds
cvs = ds.Canvas(plot_width=1000, plot_height=1000)
agg = cvs.points(dff, x='Lon', y='Lat')
# agg is an xarray object, see http://xarray.pydata.org/en/stable/ for more details
coords_lat, coords_lon = agg.coords['Lat'].values, agg.coords['Lon'].values
# Corners of the image, which need to be passed to mapbox
coordinates = [[coords_lon[0], coords_lat[0]],
               [coords_lon[-1], coords_lat[0]],
               [coords_lon[-1], coords_lat[-1]],
               [coords_lon[0], coords_lat[-1]]]

from colorcet import fire
import datashader.transfer_functions as tf
img = tf.shade(agg, cmap=fire)[::-1].to_pil()

import plotly.express as px
# Trick to create rapidly a figure with mapbox axes
fig = px.scatter_mapbox(dff[:1], lat='Lat', lon='Lon', zoom=12)
# Add the datashader image as a mapbox layer image
fig.update_layout(mapbox_style="carto-darkmatter",
                 mapbox_layers = [
                {
                    "sourcetype": "image",
                    "source": img,
                    "coordinates": coordinates
                }]
)
fig.show()

Exploring correlations of a large dataset

Here we explore the flight delay dataset from https://www.kaggle.com/usdot/flight-delays. In order to get a visual impression of the correlation between features, we generate a datashader rasterized array which we plot using a Heatmap trace. It creates a much clearer visualization than a scatter plot of (even a fraction of) the data points, as shown below.

Note that instead of datashader it would theoretically be possible to create a 2d histogram with plotly but this is not recommended here because you would need to load the whole dataset (5M rows !) in the browser for plotly.js to compute the heatmap, which is practically not tractable. Datashader offers the possibility to reduce the size of the dataset before passing it to the browser.

import plotly.graph_objects as go
import pandas as pd
import numpy as np
import datashader as ds
df = pd.read_parquet('https://raw.githubusercontent.com/plotly/datasets/master/2015_flights.parquet')
fig = go.Figure(go.Scattergl(x=df['SCHEDULED_DEPARTURE'][::200],
                             y=df['DEPARTURE_DELAY'][::200],
                             mode='markers')
)
fig.update_layout(title_text='A busy plot')
fig.show()

import plotly.express as px
import pandas as pd
import numpy as np
import datashader as ds
df = pd.read_parquet('https://raw.githubusercontent.com/plotly/datasets/master/2015_flights.parquet')

cvs = ds.Canvas(plot_width=100, plot_height=100)
agg = cvs.points(df, 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY')
zero_mask = agg.values == 0
agg.values = np.log10(agg.values, where=np.logical_not(zero_mask))
agg.values[zero_mask] = np.nan
fig = px.imshow(agg, origin='lower', labels={'color':'Log10(count)'})
fig.update_traces(hoverongaps=False)
fig.update_layout(coloraxis_colorbar=dict(title='Count', tickprefix='1.e'))
fig.show()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datashader.md

datashader.md

Passing datashader rasters as a mapbox image layer

Exploring correlations of a large dataset

Files

datashader.md

Latest commit

History

datashader.md

File metadata and controls

Passing datashader rasters as a mapbox image layer

Exploring correlations of a large dataset