@@ -156,21 +156,21 @@ pd.set_option("plotting.backend", "plotly")
156
156
157
157
## Domain specific pandas extensions
158
158
159
- ### [ Geopandas] ( https://github.com/geopandas/geopandas )
159
+ #### [ Geopandas] ( https://github.com/geopandas/geopandas )
160
160
161
161
Geopandas extends pandas data objects to include geographic information
162
162
which support geometric operations. If your work entails maps and
163
163
geographical coordinates, and you love pandas, you should take a close
164
164
look at Geopandas.
165
165
166
- ### [ gurobipy-pandas] ( https://github.com/Gurobi/gurobipy-pandas )
166
+ #### [ gurobipy-pandas] ( https://github.com/Gurobi/gurobipy-pandas )
167
167
168
168
gurobipy-pandas provides a convenient accessor API to connect pandas with
169
169
gurobipy. It enables users to more easily and efficiently build mathematical
170
170
optimization models from data stored in DataFrames and Series, and to read
171
171
solutions back directly as pandas objects.
172
172
173
- ### [ Hail Query] ( https://hail.is/ )
173
+ #### [ Hail Query] ( https://hail.is/ )
174
174
175
175
An out-of-core, preemptible-safe, distributed, dataframe library serving
176
176
the genetics community. Hail Query ships with on-disk data formats,
@@ -185,14 +185,14 @@ native import to and export from pandas DataFrames:
185
185
- [ ` Table.from_pandas ` ] ( https://hail.is/docs/latest/hail.Table.html#hail.Table.from_pandas )
186
186
- [ ` Table.to_pandas ` ] ( https://hail.is/docs/latest/hail.Table.html#hail.Table.to_pandas )
187
187
188
- ### [ staircase] ( https://github.com/staircase-dev/staircase )
188
+ #### [ staircase] ( https://github.com/staircase-dev/staircase )
189
189
190
190
staircase is a data analysis package, built upon pandas and numpy, for modelling and
191
191
manipulation of mathematical step functions. It provides a rich variety of arithmetic
192
192
operations, relational operations, logical operations, statistical operations and
193
193
aggregations for step functions defined over real numbers, datetime and timedelta domains.
194
194
195
- ### [ xarray] ( https://github.com/pydata/xarray )
195
+ #### [ xarray] ( https://github.com/pydata/xarray )
196
196
197
197
xarray brings the labeled data power of pandas to the physical sciences
198
198
by providing N-dimensional variants of the core pandas data structures.
@@ -203,7 +203,7 @@ which pandas excels.
203
203
204
204
## Data IO for pandas
205
205
206
- ### [ ArcticDB] ( https://github.com/man-group/ArcticDB )
206
+ #### [ ArcticDB] ( https://github.com/man-group/ArcticDB )
207
207
208
208
ArcticDB is a serverless DataFrame database engine designed for the Python Data Science ecosystem.
209
209
ArcticDB enables you to store, retrieve, and process pandas DataFrames at scale.
@@ -213,21 +213,21 @@ to object storage and can be installed in seconds.
213
213
214
214
Please find full documentation [ here] ( https://docs.arcticdb.io/latest/ ) .
215
215
216
- ### [ BCPandas] ( https://github.com/yehoshuadimarsky/bcpandas )
216
+ #### [ BCPandas] ( https://github.com/yehoshuadimarsky/bcpandas )
217
217
218
218
BCPandas provides high performance writes from pandas to Microsoft SQL Server,
219
219
far exceeding the performance of the native `` df.to_sql `` method. Internally, it uses
220
220
Microsoft's BCP utility, but the complexity is fully abstracted away from the end user.
221
221
Rigorously tested, it is a complete replacement for `` df.to_sql `` .
222
222
223
- ### [ Deltalake] ( https://pypi.org/project/deltalake )
223
+ #### [ Deltalake] ( https://pypi.org/project/deltalake )
224
224
225
225
Deltalake python package lets you access tables stored in
226
226
[ Delta Lake] ( https://delta.io/ ) natively in Python without the need to use Spark or
227
227
JVM. It provides the `` delta_table.to_pyarrow_table().to_pandas() `` method to convert
228
228
any Delta table into Pandas dataframe.
229
229
230
- ### [ fredapi] ( https://github.com/mortada/fredapi )
230
+ #### [ fredapi] ( https://github.com/mortada/fredapi )
231
231
232
232
fredapi is a Python interface to the [ Federal Reserve Economic Data
233
233
(FRED)] ( https://fred.stlouisfed.org/ ) provided by the Federal Reserve
@@ -239,7 +239,7 @@ point-in-time data from ALFRED. fredapi makes use of pandas and returns
239
239
data in a Series or DataFrame. This module requires a FRED API key that
240
240
you can obtain for free on the FRED website.
241
241
242
- ### [ Hugging Face] ( https://huggingface.co/datasets )
242
+ #### [ Hugging Face] ( https://huggingface.co/datasets )
243
243
244
244
The Hugging Face Dataset Hub provides a large collection of ready-to-use
245
245
datasets for machine learning shared by the community. The platform offers
@@ -274,7 +274,7 @@ df.to_parquet("hf://datasets/username/dataset_name/train.parquet")
274
274
275
275
You can find more information about the Hugging Face Dataset Hub in the [ documentation] ( https://huggingface.co/docs/hub/en/datasets ) .
276
276
277
- ### [ NTV-pandas] ( https://github.com/loco-philippe/ntv-pandas )
277
+ #### [ NTV-pandas] ( https://github.com/loco-philippe/ntv-pandas )
278
278
279
279
NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.
280
280
@@ -297,7 +297,7 @@ df = npd.read_json(jsn) # load a JSON-value as a `DataFrame`
297
297
df.equals(npd.read_json(df.npd.to_json(df))) # `True` in any case, whether `table=True` or not
298
298
```
299
299
300
- ### [ pandas-datareader] ( https://github.com/pydata/pandas-datareader )
300
+ #### [ pandas-datareader] ( https://github.com/pydata/pandas-datareader )
301
301
302
302
` pandas-datareader ` is a remote data access library for pandas
303
303
(PyPI:` pandas-datareader ` ). It is based on functionality that was
@@ -324,14 +324,14 @@ The following data feeds are available:
324
324
- Stooq Index Data
325
325
- MOEX Data
326
326
327
- ### [ pandas-gbq] ( https://github.com/googleapis/python-bigquery-pandas )
327
+ #### [ pandas-gbq] ( https://github.com/googleapis/python-bigquery-pandas )
328
328
329
329
pandas-gbq provides high performance reads and writes to and from
330
330
[ Google BigQuery] ( https://cloud.google.com/bigquery/ ) . Previously (before version 2.2.0),
331
331
these methods were exposed as ` pandas.read_gbq ` and ` DataFrame.to_gbq ` .
332
332
Use ` pandas_gbq.read_gbq ` and ` pandas_gbq.to_gbq ` , instead.
333
333
334
- ### [ pandaSDMX] ( https://pandasdmx.readthedocs.io )
334
+ #### [ pandaSDMX] ( https://pandasdmx.readthedocs.io )
335
335
336
336
pandaSDMX is a library to retrieve and acquire statistical data and
337
337
metadata disseminated in [ SDMX] ( https://sdmx.org ) 2.1, an
@@ -344,7 +344,7 @@ MultiIndexed DataFrames.
344
344
345
345
## Scaling pandas
346
346
347
- ### [ Bodo] ( https://github.com/bodo-ai/Bodo )
347
+ #### [ Bodo] ( https://github.com/bodo-ai/Bodo )
348
348
349
349
Bodo is a high-performance compute engine for Python data processing.
350
350
Using an auto-parallelizing just-in-time (JIT) compiler, Bodo simplifies scaling Pandas
@@ -366,26 +366,26 @@ def process_data():
366
366
process_data()
367
367
```
368
368
369
- ### [ Dask] ( https://docs.dask.org )
369
+ #### [ Dask] ( https://docs.dask.org )
370
370
371
371
Dask is a flexible parallel computing library for analytics. Dask
372
372
provides a familiar ` DataFrame ` interface for out-of-core, parallel and
373
373
distributed computing.
374
374
375
- ### [ Ibis] ( https://ibis-project.org/docs/ )
375
+ #### [ Ibis] ( https://ibis-project.org/docs/ )
376
376
377
377
Ibis offers a standard way to write analytics code, that can be run in
378
378
multiple engines. It helps in bridging the gap between local Python environments
379
379
(like pandas) and remote storage and execution systems like Hadoop components
380
380
(like HDFS, Impala, Hive, Spark) and SQL databases (Postgres, etc.).
381
381
382
- ### [ Koalas] ( https://koalas.readthedocs.io/en/latest/ )
382
+ #### [ Koalas] ( https://koalas.readthedocs.io/en/latest/ )
383
383
384
384
Koalas provides a familiar pandas DataFrame interface on top of Apache
385
385
Spark. It enables users to leverage multi-cores on one machine or a
386
386
cluster of machines to speed up or scale their DataFrame code.
387
387
388
- ### [ Modin] ( https://github.com/modin-project/modin )
388
+ #### [ Modin] ( https://github.com/modin-project/modin )
389
389
390
390
The `` modin.pandas `` DataFrame is a parallel and distributed drop-in replacement
391
391
for pandas. This means that you can use Modin with existing pandas code or write
@@ -404,21 +404,21 @@ df = pd.read_csv("big.csv") # use all your cores!
404
404
405
405
## Data cleaning and validation for pandas
406
406
407
- ### [ Pandera] ( https://pandera.readthedocs.io/en/stable/ )
407
+ #### [ Pandera] ( https://pandera.readthedocs.io/en/stable/ )
408
408
409
409
Pandera provides a flexible and expressive API for performing data validation on dataframes
410
410
to make data processing pipelines more readable and robust.
411
411
Dataframes contain information that pandera explicitly validates at runtime. This is useful in
412
412
production-critical data pipelines or reproducible research settings.
413
413
414
- ### [ pyjanitor] ( https://github.com/pyjanitor-devs/pyjanitor )
414
+ #### [ pyjanitor] ( https://github.com/pyjanitor-devs/pyjanitor )
415
415
416
416
Pyjanitor provides a clean API for cleaning data, using method chaining.
417
417
418
418
419
419
## Development tools for pandas
420
420
421
- ### [ Hamilton] ( https://github.com/dagworks-inc/hamilton )
421
+ #### [ Hamilton] ( https://github.com/dagworks-inc/hamilton )
422
422
423
423
Hamilton is a declarative dataflow framework that came out of Stitch Fix. It was
424
424
designed to help one manage a Pandas code base, specifically with respect to
@@ -436,13 +436,13 @@ This helps one to scale your pandas code base, at the same time, keeping mainten
436
436
437
437
For more information, see [ documentation] ( https://hamilton.readthedocs.io/ ) .
438
438
439
- ### [ IPython] ( https://ipython.org/documentation.html )
439
+ #### [ IPython] ( https://ipython.org/documentation.html )
440
440
441
441
IPython is an interactive command shell and distributed computing
442
442
environment. IPython tab completion works with Pandas methods and also
443
443
attributes like DataFrame columns.
444
444
445
- ### [ Jupyter Notebook / Jupyter Lab] ( https://jupyter.org )
445
+ #### [ Jupyter Notebook / Jupyter Lab] ( https://jupyter.org )
446
446
447
447
Jupyter Notebook is a web application for creating Jupyter notebooks. A
448
448
Jupyter notebook is a JSON document containing an ordered list of
@@ -460,7 +460,7 @@ or may not be compatible with non-HTML Jupyter output formats.)
460
460
See [ Options and Settings] ( https://pandas.pydata.org/docs/user_guide/options.html )
461
461
for pandas ` display. ` settings.
462
462
463
- ### [ marimo] ( https://marimo.io )
463
+ #### [ marimo] ( https://marimo.io )
464
464
465
465
marimo is a reactive notebook for Python and SQL that enhances productivity
466
466
when working with dataframes. It provides several features to make data
@@ -479,7 +479,7 @@ manipulation and visualization more interactive and fun:
479
479
6 . SQL integration: marimo allows users to write SQL queries against any
480
480
pandas dataframes existing in memory.
481
481
482
- ### [ pandas-stubs] ( https://github.com/VirtusLab/pandas-stubs )
482
+ #### [ pandas-stubs] ( https://github.com/VirtusLab/pandas-stubs )
483
483
484
484
While pandas repository is partially typed, the package itself doesn't expose this information for external use.
485
485
Install pandas-stubs to enable basic type coverage of pandas API.
@@ -489,7 +489,7 @@ Learn more by reading through these issues [14468](https://github.com/pandas-dev
489
489
490
490
See installation and usage instructions on the [ GitHub page] ( https://github.com/VirtusLab/pandas-stubs ) .
491
491
492
- ### [ Spyder] ( https://www.spyder-ide.org/ )
492
+ #### [ Spyder] ( https://www.spyder-ide.org/ )
493
493
494
494
Spyder is a cross-platform PyQt-based IDE combining the editing,
495
495
analysis, debugging and profiling functionality of a software
@@ -518,14 +518,14 @@ both automatically and on-demand.
518
518
519
519
## Other related libraries
520
520
521
- ### [ Compose] ( https://github.com/alteryx/compose )
521
+ #### [ Compose] ( https://github.com/alteryx/compose )
522
522
523
523
Compose is a machine learning tool for labeling data and prediction engineering.
524
524
It allows you to structure the labeling process by parameterizing
525
525
prediction problems and transforming time-driven relational data into
526
526
target values with cutoff times that can be used for supervised learning.
527
527
528
- ### [ D-Tale] ( https://github.com/man-group/dtale )
528
+ #### [ D-Tale] ( https://github.com/man-group/dtale )
529
529
530
530
D-Tale is a lightweight web client for visualizing pandas data structures. It
531
531
provides a rich spreadsheet-style grid which acts as a wrapper for a lot of
@@ -544,20 +544,20 @@ D-Tale integrates seamlessly with Jupyter notebooks, Python terminals, Kaggle
544
544
& Google Colab. Here are some demos of the
545
545
[ grid] ( http://alphatechadmin.pythonanywhere.com/dtale/main/1 ) .
546
546
547
- ### [ Featuretools] ( https://github.com/alteryx/featuretools/ )
547
+ #### [ Featuretools] ( https://github.com/alteryx/featuretools/ )
548
548
549
549
Featuretools is a Python library for automated feature engineering built
550
550
on top of pandas. It excels at transforming temporal and relational
551
551
datasets into feature matrices for machine learning using reusable
552
552
feature engineering "primitives". Users can contribute their own
553
553
primitives in Python and share them with the rest of the community.
554
554
555
- ### [ IPython Vega] ( https://github.com/vega/ipyvega )
555
+ #### [ IPython Vega] ( https://github.com/vega/ipyvega )
556
556
557
557
[ IPython Vega] ( https://github.com/vega/ipyvega ) leverages
558
558
[ Vega] ( https://github.com/vega/vega ) to create plots within Jupyter Notebook.
559
559
560
- ### [ plotnine] ( https://github.com/has2k1/plotnine/ )
560
+ #### [ plotnine] ( https://github.com/has2k1/plotnine/ )
561
561
562
562
Hadley Wickham's [ ggplot2] ( https://ggplot2.tidyverse.org/ ) is a
563
563
foundational exploratory visualization package for the R language. Based
@@ -568,7 +568,7 @@ generate bespoke plots of any kind of data.
568
568
Various implementations to other languages are available.
569
569
A good implementation for Python users is [ has2k1/plotnine] ( https://github.com/has2k1/plotnine/ ) .
570
570
571
- ### [ pygwalker] ( https://github.com/Kanaries/pygwalker )
571
+ #### [ pygwalker] ( https://github.com/Kanaries/pygwalker )
572
572
573
573
PyGWalker is an interactive data visualization and
574
574
exploratory data analysis tool built upon Graphic Walker
@@ -582,7 +582,7 @@ import pygwalker as pyg
582
582
pyg.walk(df)
583
583
```
584
584
585
- ### [ seaborn] ( https://seaborn.pydata.org )
585
+ #### [ seaborn] ( https://seaborn.pydata.org )
586
586
587
587
Seaborn is a Python visualization library based on
588
588
[ matplotlib] ( https://matplotlib.org ) . It provides a high-level,
@@ -599,13 +599,13 @@ import seaborn as sns
599
599
sns.set_theme()
600
600
```
601
601
602
- ### [ skrub] ( https://skrub-data.org )
602
+ #### [ skrub] ( https://skrub-data.org )
603
603
604
604
Skrub facilitates machine learning on dataframes. It bridges pandas
605
605
to scikit-learn and related. In particular it facilitates building
606
606
features from dataframes.
607
607
608
- ### [ Statsmodels] ( https://www.statsmodels.org/ )
608
+ #### [ Statsmodels] ( https://www.statsmodels.org/ )
609
609
610
610
Statsmodels is the prominent Python "statistics and econometrics
611
611
library" and it has a long-standing special relationship with pandas.
@@ -614,7 +614,7 @@ modeling functionality that is out of pandas' scope. Statsmodels
614
614
leverages pandas objects as the underlying data container for
615
615
computation.
616
616
617
- ### [ STUMPY] ( https://github.com/TDAmeritrade/stumpy )
617
+ #### [ STUMPY] ( https://github.com/TDAmeritrade/stumpy )
618
618
619
619
STUMPY is a powerful and scalable Python library for modern time series analysis.
620
620
At its core, STUMPY efficiently computes something called a
0 commit comments