Skip to content

Trim whitespace from CSV output values#64

Merged
senko merged 4 commits intosenko:mainfrom
iugrina:trim-values
Dec 10, 2025
Merged

Trim whitespace from CSV output values#64
senko merged 4 commits intosenko:mainfrom
iugrina:trim-values

Conversation

@iugrina
Copy link
Copy Markdown
Contributor

@iugrina iugrina commented Dec 9, 2025

Summary

Normalizes whitespace in CSV output values by collapsing multiple whitespace characters (spaces, tabs, newlines, etc.) into a single space.

Changes

  • Added normalize_whitespace() helper function using regex to replace sequences of whitespace with a single space
  • Updated save_csv() to apply normalization to all CSV values before writing
  • Added re import

Technical Details

The normalization uses re.sub(r'\s+', ' ', value) which:

  • Matches any sequence of whitespace characters (\s+ includes spaces, tabs, newlines, carriage returns, etc.)
  • Replaces them with a single space character
  • Applies to all CSV output files (stores.csv, products.csv, prices.csv)

Files Changed

  • crawler/store/output.py

Impact

All CSV files generated by the crawler will have consistent whitespace formatting, improving data quality and consistency for downstream processing.

@senko senko merged commit 10fbb94 into senko:main Dec 10, 2025
3 checks passed
@senko
Copy link
Copy Markdown
Owner

senko commented Dec 10, 2025

Thanks!

@iugrina iugrina deleted the trim-values branch December 11, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants