Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 25 additions & 4 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,13 @@ jobs:
with:
fetch-depth: 0

- name: Fetch upstream tags
run: |
git remote add upstream https://github.com/dask/fastparquet.git
git fetch upstream --tags

- name: Setup conda
uses: mamba-org/provision-with-micromamba@main
uses: conda-incubator/setup-miniconda@v3
with:
environment-file: ci/environment-${{ matrix.CONDA_ENV }}.yml

Expand Down Expand Up @@ -53,8 +58,13 @@ jobs:
with:
fetch-depth: 0

- name: Fetch upstream tags
run: |
git remote add upstream https://github.com/dask/fastparquet.git
git fetch upstream --tags

- name: Setup conda
uses: mamba-org/provision-with-micromamba@main
uses: conda-incubator/setup-miniconda@v3
with:
environment-file: ci/environment-${{ matrix.CONDA_ENV }}.yml

Expand Down Expand Up @@ -82,8 +92,13 @@ jobs:
with:
fetch-depth: 0

- name: Fetch upstream tags
run: |
git remote add upstream https://github.com/dask/fastparquet.git
git fetch upstream --tags

- name: Setup conda
uses: mamba-org/provision-with-micromamba@main
uses: conda-incubator/setup-miniconda@v3
with:
environment-file: ci/environment-py310.yml

Expand All @@ -94,6 +109,7 @@ jobs:
pip install hypothesis
pip install pytest-localserver pytest-xdist pytest-asyncio
pip install -e . --no-deps # Install fastparquet
pip install versioneer # Needed for pandas build
git clone https://github.com/pandas-dev/pandas
cd pandas
python setup.py build_ext -j 4
Expand All @@ -117,8 +133,13 @@ jobs:
with:
fetch-depth: 0

- name: Fetch upstream tags
run: |
git remote add upstream https://github.com/dask/fastparquet.git
git fetch upstream --tags

- name: Setup conda
uses: mamba-org/provision-with-micromamba@main
uses: conda-incubator/setup-miniconda@v3
with:
environment-file: ci/environment-py310win.yml

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test_wheel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ jobs:
python -m pip install delvewheel cython

- name: Build wheels
uses: joerick/cibuildwheel@v2.16.5
uses: joerick/cibuildwheel@v2.21.3

- name: Install wheels
shell: bash -l {0}
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/wheel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ jobs:
python -m pip install delvewheel cython

- name: Build wheels
uses: joerick/cibuildwheel@v2.18.1
uses: joerick/cibuildwheel@v2.21.3

- uses: actions/upload-artifact@v3
with:
Expand Down Expand Up @@ -105,7 +105,7 @@ jobs:
python -m pip install delvewheel cython

- name: Build wheels
uses: joerick/cibuildwheel@v2.18.1
uses: joerick/cibuildwheel@v2.21.3

- uses: actions/upload-artifact@v3
with:
Expand Down Expand Up @@ -160,7 +160,7 @@ jobs:
python -m pip install delvewheel cython

- name: Build wheels
uses: joerick/cibuildwheel@v2.18.1
uses: joerick/cibuildwheel@v2.21.3

- uses: actions/upload-artifact@v3
with:
Expand Down Expand Up @@ -215,7 +215,7 @@ jobs:
python -m pip install delvewheel cython

- name: Build wheels
uses: joerick/cibuildwheel@v2.18.1
uses: joerick/cibuildwheel@v2.21.3

- uses: actions/upload-artifact@v3
with:
Expand Down Expand Up @@ -246,10 +246,10 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
python-version: "3.13"

- name: Build wheels
uses: joerick/cibuildwheel@v2.18.1
uses: joerick/cibuildwheel@v2.21.3

- uses: actions/upload-artifact@v3
with:
Expand Down
1 change: 0 additions & 1 deletion ci/environment-py310.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,5 @@ dependencies:
- orjson
- ujson
- python-rapidjson
- versioneer
- meson-python
- pyarrow
4 changes: 2 additions & 2 deletions docs/source/details.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ split data data on the values of those columns. This is done by writing a
directory structure with *key=value* names. Multiple partition columns can
be chosen, leading to a multi-level directory tree.

Consider the following directory tree from this `Spark example <http://Spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery>`_:
Consider the following directory tree from this `Spark example <https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery>`_:

table/
gender=male/
Expand Down Expand Up @@ -261,4 +261,4 @@ the file system implementation.
.. raw:: html

<script data-goatcounter="https://fastparquet.goatcounter.com/count"
async src="//gc.zgo.at/count.js"></script>
async src="//gc.zgo.at/count.js"></script>
10 changes: 6 additions & 4 deletions fastparquet/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@ def __init__(self, fn, verify=False, open_with=default_open, root=False,
"a filesystem compatible with fsspec") from e
self.open = open_with
self._statistics = None
self.global_cats = {}

def _parse_header(self, f, verify=True):
if self.fn and self.fn.endswith("_metadata"):
Expand Down Expand Up @@ -260,7 +261,7 @@ def columns(self):

@property
def statistics(self):
if self._statistics is None:
if not hasattr(self, '_statistics') or self._statistics is None:
self._statistics = statistics(self)
return self._statistics

Expand Down Expand Up @@ -318,7 +319,8 @@ def __getitem__(self, item):
new_pf.__setstate__(
{"fn": self.fn, "open": self.open, "fmd": fmd,
"pandas_nulls": self.pandas_nulls, "_base_dtype": self._base_dtype,
"tz": self.tz, "_columns_dtype": self._columns_dtype}
"tz": self.tz, "_columns_dtype": self._columns_dtype,
"global_cats": {}} # fresh empty dict for the slice
)
new_pf._set_attrs()
return new_pf
Expand Down Expand Up @@ -389,7 +391,7 @@ def read_row_group_file(self, rg, columns, categories, index=None,
f, rg, columns, categories, self.schema, self.cats,
selfmade=self.selfmade, index=index,
assign=assign, scheme=self.file_scheme, partition_meta=partition_meta,
row_filter=row_filter
row_filter=row_filter, global_cats=self.global_cats
)
if ret:
return df
Expand Down Expand Up @@ -1011,7 +1013,7 @@ def __getstate__(self):
self.fmd.row_groups = []
return {"fn": self.fn, "open": self.open, "fmd": self.fmd,
"pandas_nulls": self.pandas_nulls, "_base_dtype": self._base_dtype,
"tz": self.tz}
"tz": self.tz, "global_cats": self.global_cats}

def __setstate__(self, state):
self.__dict__.update(state)
Expand Down
Loading
Loading