crate · amotl · Sep 14, 2025 · Sep 14, 2025 · Sep 14, 2025 · Sep 14, 2025
diff --git a/docs/connect/df/index.md b/docs/connect/df/index.md
@@ -40,50 +40,10 @@ the Python libraries that you know and love, like NumPy, pandas, and scikit-lear
 - [Dask code examples]
 
 
-(pandas)=
 ## pandas
-
-:::{rubric} About
-:::
-
-```{div}
-:style: "float: right"
-[![](https://pandas.pydata.org/static/img/pandas.svg){w=180px}](https://pandas.pydata.org/)
-```
-
-[pandas] is a fast, powerful, flexible, and easy-to-use open-source data analysis
-and manipulation tool, built on top of the Python programming language. 
-
-Pandas (stylized as pandas) is a software library written for the Python programming
-language for data manipulation and analysis. In particular, it offers data structures
-and operations for manipulating numerical tables and time series.
-
-:::{rubric} Data Model
-:::
-- Pandas is built around data structures called Series and DataFrames. Data for these
-  collections can be imported from various file formats such as comma-separated values,
-  JSON, Parquet, SQL database tables or queries, and Microsoft Excel.
-- A Series is a 1-dimensional data structure built on top of NumPy's array.
-- Pandas includes support for time series, such as the ability to interpolate values
-  and filter using a range of timestamps.
-- By default, a Pandas index is a series of integers ascending from 0, similar to the
-  indices of Python arrays. However, indices can use any NumPy data type, including
-  floating point, timestamps, or strings.
-- Pandas supports hierarchical indices with multiple values per data point. An index
-  with this structure, called a "MultiIndex", allows a single DataFrame to represent
-  multiple dimensions, similar to a pivot table in Microsoft Excel. Each level of a
-  MultiIndex can be given a unique name.
-
-```{div}
-:style: "clear: both"
-```
-
-:::{rubric} Learn
+:::{seealso}
+Please navigate to the dedicated page about {ref}`pandas`.
 :::
-- [Guide to efficient data ingestion to CrateDB with pandas]
-- [Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy]
-- [pandas code examples]
-- [From data storage to data analysis: Tutorial on CrateDB and pandas]
 
 
 ## Polars
@@ -96,13 +56,11 @@ Please navigate to the dedicated page about {ref}`polars`.
 [Dask]: https://www.dask.org/
 [Dask DataFrames]: https://docs.dask.org/en/latest/dataframe.html
 [Dask Futures]: https://docs.dask.org/en/latest/futures.html
-[pandas]: https://pandas.pydata.org/
+[Polars]: https://pola.rs/
 
 [Dask code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/dask
 [Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]: https://cratedb.com/docs/python/en/latest/by-example/sqlalchemy/dataframe.html
-[From data storage to data analysis: Tutorial on CrateDB and pandas]: https://community.cratedb.com/t/from-data-storage-to-data-analysis-tutorial-on-cratedb-and-pandas/1440
-[Guide to efficient data ingestion to CrateDB with pandas]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas/1541
 [Guide to efficient data ingestion to CrateDB with pandas and Dask]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas-and-dask/1482
 [Import weather data using Dask]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb
 [Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy]: https://community.cratedb.com/t/importing-parquet-files-into-cratedb-using-apache-arrow-and-sqlalchemy/1161
-[pandas code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/pandas
+[Polars code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/polars
diff --git a/docs/integrate/index.md b/docs/integrate/index.md
@@ -56,6 +56,7 @@ n8n/index
 nifi/index
 node-red/index
 oracle/index
+pandas/index
 plotly/index
 polars/index
 postgresql/index

diff --git a/docs/integrate/pandas/efficient-ingest.md b/docs/integrate/pandas/efficient-ingest.md
@@ -0,0 +1,57 @@
+(pandas-efficient-ingest)=
+# Guide to efficient data ingestion to CrateDB with pandas
+
+## Introduction
+Bulk insert is a technique for efficiently inserting large amounts of data into a database by submitting multiple rows of data in a single database transaction. Instead of executing multiple SQL `INSERT` statements for each individual row of data, the bulk insert allows the database to process and store a batch of data at once. This approach can significantly improve the performance of data insertion, especially when dealing with large datasets.
+
+In this tutorial, you will learn how to efficiently perform [bulk inserts](https://crate.io/docs/python/en/latest/by-example/sqlalchemy/dataframe.html) into CrateDB with [pandas](https://pandas.pydata.org/) using the `insert_bulk` method, available in the `crate` Python library. To follow along with this tutorial, you should have the following:
+
+* A working installation of CrateDB. To get started with CrateDB check [this link](https://crate.io/lp-free-trial?hsCtaTracking=c2099713-cafa-4de6-a97e-2f86d80a788f%7C3a12b78e-e605-461c-9bd8-628d0d9e2522).
+* Python, Pandas, SQLAlchemy, and [crate driver](https://pypi.org/project/crate/) installed on your machine
+* Basic familiarity with pandas and SQL
+
+## Bulk insert to CrateDB
+
+The following example illustrates how to implement batch insert with the pandas library by using the `insert_bulk` method available in the `crate` driver.
+
+```python
+import sqlalchemy as sa
+import crate
+import pandas as pd
+from sqlalchemy import create_engine
+from crate.client.sqlalchemy.support import insert_bulk
+from pandas._testing import makeTimeDataFrame
+
+INSERT_RECORDS = 5000000
+CHUNK_SIZE = 50000
+
+df = makeTimeDataFrame(nper=INSERT_RECORDS, freq="S")
+engine = sa.create_engine('crate://localhost:4200')
+
+df.to_sql(
+    name="cratedb-demo",
+    con=engine,
+    if_exists="replace",
+    index=False,
+    chunksize=CHUNK_SIZE,
+    method=insert_bulk,
+)
+```
+
+By running this code, you will generate a DataFrame with a time-based index containing 5,000,000 rows of data. Each row represents a timestamp with a frequency of 1 second (`freq="S"`). The DataFrame is then inserted into a `cratedb-demo` table in CrateDB using the `to_sql()` method. If the table already exists, it will be replaced with the new data. The data insertion will be performed in batches, with each batch containing 50,000 records. Defining the `chunksize` parameter helps in managing memory and improving performance during the data insertion process.
+
+The above code runs in approximately 14s on a local Mac M1 machine with 16GiB RAM. However, if we insert data to CrateDB by setting the `method` parameter to `None` (one insert per row), the execution time increases to 27sec.
+
+## How to find the right chunksize
+
+Determining the right chunksize depends on several factors, such as the size of your data, the number of columns in your data set, and the available memory of your machine.
+
+The `chunksize` parameter in the `to_sql()` method controls the number of rows inserted in each batch. By default, `chunksize=None`, which means the entire DataFrame will be written to the database at once. However, when working with large datasets, it is recommended to set a smaller `chunksize` value to avoid memory issues and to improve the performance of the data insertion.
+
+To determine the right `chunksize` value, you can try different values and observe the memory usage and the time it takes to complete the data insertion. A good starting point is to set the `chunksize` value to a fraction of the total number of rows in your DataFrame. For example, you can start with a `chunksize` value of 10,000 or 50,000 rows and see how it performs. If the data insertion is slow, you can try increasing the `chunksize` value to reduce the number of batches. On the other hand, if you encounter memory issues, you can try reducing the `chunksize` value.
+
+## Conclusion
+
+Congratulations! You have learned how to implement an efficient data insert into CrateDB using Pandas and `insert_bulk` method. This method allows for efficient and fast data insertion, making it suitable for handling large datasets.
+
+If you like this tutorial and want to explore further CrateDB functionalities, please visit our [documentation](https://crate.io/docs) and join our [community](https://community.cratedb.com/).
diff --git a/docs/integrate/pandas/index.md b/docs/integrate/pandas/index.md
@@ -0,0 +1,59 @@
+(pandas)=
+# pandas
+
+```{div}
+:style: "float: right"
+[![](https://pandas.pydata.org/static/img/pandas.svg){w=180px}](https://pandas.pydata.org/)
+```
+```{div} .clearfix
+```
+
+:::{rubric} About
+:::
+
+[pandas] is a fast, powerful, flexible, and easy-to-use open-source data analysis
+and manipulation tool, built on top of the Python programming language. 
+
+Pandas (stylized as pandas) is a software library written for the Python programming
+language for data manipulation and analysis. In particular, it offers data structures
+and operations for manipulating numerical tables and time series.
+
+:::{rubric} Data Model
+:::
+- Pandas is built around data structures called Series and DataFrames. Data for these
+  collections can be imported from various file formats such as comma-separated values,
+  JSON, Parquet, SQL database tables or queries, and Microsoft Excel.
+- A Series is a 1-dimensional data structure built on top of NumPy's array.
+- Pandas includes support for time series, such as the ability to interpolate values
+  and filter using a range of timestamps.
+- By default, a Pandas index is a series of integers ascending from 0, similar to the
+  indices of Python arrays. However, indices can use any NumPy data type, including
+  floating point, timestamps, or strings.
+- Pandas supports hierarchical indices with multiple values per data point. An index
+  with this structure, called a "MultiIndex", allows a single DataFrame to represent
+  multiple dimensions, similar to a pivot table in Microsoft Excel. Each level of a
+  MultiIndex can be given a unique name.
+
+
+:::{rubric} Learn
+:::
+- {ref}`pandas-tutorial-start`
+- {ref}`pandas-tutorial-jupyter`
+- {ref}`arrow-import-parquet`
+- {ref}`pandas-efficient-ingest`
+- [Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]
+- [pandas code examples]
+
+
+:::{toctree}
+:maxdepth: 1
+:hidden:
+Starter tutorial <tutorial-start>
+Jupyter tutorial <tutorial-jupyter>
+Efficient ingest <efficient-ingest>
+:::
+
+
+[Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]: https://cratedb.com/docs/python/en/latest/by-example/sqlalchemy/dataframe.html
+[pandas]: https://pandas.pydata.org/
+[pandas code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/pandas