Open
Description
Description
This issue occurs when passing a dtype
argument to wr.s3.to_parquet()
to coerce one of the column to a map containing decimals, e.g. map<int, decimal(12,2)>
. See reproduction below.
The helper function used to split the map fields doesn't take into account the parenthesis and splits on all commas, resulting in field typesint
, decimal(12
and 2)
instead of the expected int
and decimal(12,2)
.
Environment
awswrangler 2.8.0
Reproduction
import awswrangler as wr
import pandas as pd
import decimal
df = pd.DataFrame({"map_col": [{"a": decimal.Decimal("1.23")}]})
wr.s3.to_parquet(
df=df,
dataset=True,
path="dummy-location",
database="dummy-db",
table="dummy-table",
dtype={"map_col": "map<int, decimal(12,2)>"},
)
Output:
Traceback (most recent call last):
File "awswrangler_map_decimal.py", line 14, in <module>
dtype={"map_col": "map<int, decimal(12,2)>"},
File "/home/laspj/.local/lib/python3.6/site-packages/awswrangler/_config.py", line 417, in wrapper
return function(**args)
File "/home/laspj/.local/lib/python3.6/site-packages/awswrangler/s3/_write_parquet.py", line 537, in to_parquet
df=df, index=index, ignore_cols=partition_cols, dtype=dtype
File "/home/laspj/.local/lib/python3.6/site-packages/awswrangler/_data_types.py", line 581, in pyarrow_schema_from_pandas
columns_types[k] = athena2pyarrow(dtype=v)
File "/home/laspj/.local/lib/python3.6/site-packages/awswrangler/_data_types.py", line 291, in athena2pyarrow
parts: List[str] = _split_map(s=orig_dtype[4:-1])
File "/home/laspj/.local/lib/python3.6/site-packages/awswrangler/_data_types.py", line 250, in _split_map
raise RuntimeError(f"Invalid map fields: {s}")
RuntimeError: Invalid map fields: int, decimal(12,2)