-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geopandas #9
Conversation
Important Auto Review SkippedReview was skipped due to path filters Files ignored due to path filters (2)
You can disable this status message by setting the ウォークスループロジェクト全体の変更は、 変更点
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9 +/- ##
==========================================
- Coverage 82.50% 81.35% -1.15%
==========================================
Files 8 8
Lines 280 295 +15
==========================================
+ Hits 231 240 +9
- Misses 49 55 +6 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave the exception.
if len(tables) == 0:
raise FileNotFoundError(
"txt files must reside at the root level directly, not in a sub folder."
)
This is so popular error. MIERUNE/GTFS-GO#75
It is difficult for users to determine the cause from the message of missing table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GeoDataFrame.dissolve() is very slow and it is a method for different purposes.
Please generate a LineString for each stop_id pattern, not for each trip_id.
This process was 13 times faster than dissolve for GTFS of Saga Bus.
f4100001_佐賀県の路線バス情報のデータ.zip
Even the process for GTFS of Toei Bus takes only 4 seconds. (dissolve() take some minutes)
if gtfs.shapes is None or ignore_shapes:
sorted_stop_times = gtfs.stop_times.sort_values(["trip_id", "stop_sequence"])
trip_stop_pattern = sorted_stop_times.groupby("trip_id")["stop_id"].apply(tuple)
route_trip_stop_pattern = pd.merge(
trip_stop_pattern,
gtfs.trips[["trip_id", "route_id"]],
on='trip_id'
)
route_stop_patterns = route_trip_stop_pattern[["route_id", "stop_id"]].drop_duplicates()
route_stop_patterns["stop_pattern"] = route_stop_patterns["stop_id"]
route_stop_ids = route_stop_patterns.explode("stop_id")
route_stop_geoms = pd.merge(
route_stop_ids,
gtfs.stops[["stop_id", "geometry"]],
on="stop_id"
)
# Point -> LineString: group by route_id and trip_id
line_df = route_stop_geoms.groupby(["route_id", "stop_pattern"])["geometry"].apply(
lambda x: shapely.geometry.LineString(x)
)
# group by route_id into MultiLineString
multilines = line_df.groupby(['route_id']).apply(
lambda x: shapely.geometry.MultiLineString(x.to_list())
)
# join route_id and route_name
multiline_df = pd.merge(
gpd.GeoSeries(multilines),
gtfs.routes[["route_id", "route_long_name", "route_short_name"]],
on="route_id",
)
# # trip-route-merge:A
# trips_routes = pd.merge(
# gtfs.trips[["trip_id", "route_id"]],
# gtfs.routes[["route_id", "route_long_name", "route_short_name"]],
# on="route_id",
# )
#
# # stop_times-stops-merge:B
# stop_times_stop = pd.merge(
# gtfs.stop_times[["stop_id", "trip_id", "stop_sequence"]],
# gtfs.stops[["stop_id", "geometry"]],
# on="stop_id",
# )
#
# # A-B-merge
# merged = pd.merge(stop_times_stop, trips_routes, on="trip_id")
# # sort by route_id, trip_id, stop_sequence
# merged.sort_values(["route_id", "trip_id", "stop_sequence"], inplace=True)
#
# # Point -> LineString: group by route_id and trip_id
# line_df = merged.groupby(["route_id", "trip_id"])["geometry"].apply(
# lambda x: shapely.geometry.LineString(x)
# )
# line_df = line_df.reset_index()
#
# # group by route_id into MultiLineString
# multiline_df = gpd.GeoDataFrame(line_df).dissolve(by="route_id")
# multiline_df.reset_index(inplace=True)
#
# # join route_id and route_name
# multiline_df = pd.merge(
# multiline_df[["route_id", "geometry"]],
# gtfs.routes[["route_id", "route_long_name", "route_short_name"]],
# on="route_id",
# )
multiline_df["route_name"] = multiline_df["route_long_name"].fillna(
""
) + multiline_df["route_short_name"].fillna("")
# to GeoJSON-Feature
features = list(
multiline_df[["geometry", "route_id", "route_name"]].iterfeatures()
)
return features
@Kanahiro 見た感じ、ソートがおかしい気がします。 参考に、手元のparse.pyを添付します。(.pyを添付できなかったので、拡張子をtxtにしています) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Out of diff range and nitpick comments (1)
gtfs_parser/aggregate.py (1)
Line range hint
110-110
: 型の比較にはisinstance()
を使用してください。- val if type(val) == str else val.stop_name + val if isinstance(val, str) else val.stop_name
すぐに解決しなさそうだったので、単にdissolveを使わない実装にしました。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
# rename column: route_id -> route_ids | ||
route_stop.rename(columns={"route_id": "route_ids"}, inplace=True) | ||
# fill na with empty list | ||
route_stop["route_ids"] = route_stop["route_ids"].fillna("").apply(list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
空のリストの処理を簡素化することを提案します。
- route_stop["route_ids"] = route_stop["route_ids"].fillna("").apply(list)
+ route_stop["route_ids"] = route_stop["route_ids"].apply(lambda x: x if isinstance(x, list) else [])
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
route_stop["route_ids"] = route_stop["route_ids"].fillna("").apply(list) | |
route_stop["route_ids"] = route_stop["route_ids"].apply(lambda x: x if isinstance(x, list) else []) |
gtfs_parser/parse.py
Outdated
unloaded_shapes = unloaded_shapes.groupby("shape_id")["geometry"].apply( | ||
lambda x: shapely.geometry.LineString(x) | ||
) | ||
unloaded_shapes = unloaded_shapes.reset_index() | ||
|
||
return features | ||
# group by route_id into MultiLineString | ||
unloaded_shapes = unloaded_shapes.groupby(["shape_id"])["geometry"].apply( | ||
lambda x: shapely.geometry.MultiLineString(x.to_list()) | ||
) | ||
unloaded_shapes = unloaded_shapes.reset_index() | ||
|
||
# fill id, name with shape_id | ||
unloaded_shapes["route_id"] = None | ||
unloaded_shapes["route_name"] = unloaded_shapes["shape_id"] | ||
|
||
# to GeoJSON-Feature | ||
unloaded_features = list( | ||
gpd.GeoDataFrame(unloaded_shapes.reset_index()).iterfeatures() | ||
) | ||
features.extend(unloaded_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
テストによってカバーされていない行がいくつかあります。これらの行のテストを追加することをお勧めします。
QGISだとエラーになるコードがあるので調査中 |
おそらくGeoPandasじゃなくてshapely側の問題 |
たぶんgroupbyで各行のpointの配列は、ndarrayになるはず |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
|
||
if len(unloaded_shapes) > 0: | ||
# group by shape_id into LineString | ||
unloaded_shapes = unloaded_shapes.groupby("shape_id")["geometry"].apply( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
テストによってカバーされていない行があります。これらの行のテストを追加することをお勧めします。
) | ||
unloaded_shapes = unloaded_shapes.reset_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
テストによってカバーされていない行があります。これらの行のテストを追加することをお勧めします。
|
||
return features | ||
# group by route_id into MultiLineString | ||
unloaded_shapes = unloaded_shapes.groupby(["shape_id"])["geometry"].apply( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
テストによってカバーされていない行があります。これらの行のテストを追加することをお勧めします。
unloaded_shapes = unloaded_shapes.groupby(["shape_id"])["geometry"].apply( | ||
lambda x: shapely.geometry.MultiLineString(x.to_list()) | ||
) | ||
unloaded_shapes = unloaded_shapes.reset_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
テストによってカバーされていない行があります。これらの行のテストを追加することをお勧めします。
unloaded_shapes["route_id"] = None | ||
unloaded_shapes["route_name"] = unloaded_shapes["shape_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
テストによってカバーされていない行があります。これらの行のテストを追加することをお勧めします。
unloaded_shapes["route_name"] = unloaded_shapes["shape_id"] | ||
|
||
# to GeoJSON-Feature | ||
unloaded_features = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
テストによってカバーされていない行があります。これらの行のテストを追加することをお勧めします。
unloaded_features = list( | ||
gpd.GeoDataFrame(unloaded_shapes.reset_index()).iterfeatures() | ||
) | ||
features.extend(unloaded_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
テストによってカバーされていない行があります。これらの行のテストを追加することをお勧めします。
Windowsだと正常な速度(都営バス、no-shapes, 10秒程度)で処理が完了したので、macOSだけが異常に遅くなるようだ |
#10 にて同一の処理をgeopandasなしに置き換えましたので、クローズ |
Close #0
Description(変更内容)
Manual Testing(手動テスト)
Not required / 不要