-
Notifications
You must be signed in to change notification settings - Fork 0
Add a PolarsDataFeed and deprecate the CSVDataFeed #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new PolarsDataFeed class that replaces and extends the functionality of the existing CSVDataFeed. The CSVDataFeed is now deprecated and reimplemented as a thin wrapper around PolarsDataFeed to maintain backward compatibility.
- Adds
PolarsDataFeedwith support for CSV files, Parquet files, and direct Polars DataFrame/LazyFrame inputs - Deprecates
CSVDataFeedby converting it to a subclass ofPolarsDataFeedwith a deprecation warning - Updates all tests and documentation to use the new
PolarsDataFeed
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
pyproject.toml |
Adds typing-extensions dependency for deprecation decorator |
uv.lock |
Updates lock file with typing-extensions dependency |
alphaflow/data_feeds/polars_data_feed.py |
New data feed implementation supporting multiple input formats (CSV, Parquet, DataFrame, LazyFrame) |
alphaflow/data_feeds/csv_data_feed.py |
Refactored as deprecated wrapper around PolarsDataFeed |
alphaflow/data_feeds/__init__.py |
Exports PolarsDataFeed alongside CSVDataFeed |
alphaflow/tests/test_polars_data_feed.py |
Comprehensive test suite for new PolarsDataFeed |
alphaflow/tests/test_csv_data_feed.py |
Adds deprecation warning test |
alphaflow/tests/test_*.py |
Updates all test files to use PolarsDataFeed instead of CSVDataFeed |
docs/getting_started.md |
Updates documentation to reference PolarsDataFeed |
docs/api/data_feeds.md |
Updates API docs to document PolarsDataFeed |
README.md |
Updates example code to use PolarsDataFeed |
CHANGELOG.md |
Documents the addition and deprecation |
| """Initialize the Polars data feed. | ||
|
|
||
| Args: | ||
| df_or_file_path: Polars dataframe or path to the Polars dataframe containing market data. |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring incorrectly refers to "path to the Polars dataframe" when it should say "path to a CSV or Parquet file". Polars DataFrames are in-memory data structures, not files on disk. The parameter can accept either a DataFrame/LazyFrame OR a file path to CSV/Parquet files.
| df_path = Path(self.df_or_file_path) if isinstance(self.df_or_file_path, str) else self.df_or_file_path | ||
| if df_path.suffix in {".parquet", ".parq"}: | ||
| df = pl.read_parquet(df_path) | ||
| df = df.with_columns(pl.col(self._col_timestamp).cast(pl.Datetime)) |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplicate datetime casting for parquet files is unnecessary. Line 77 casts the timestamp column to Datetime after reading parquet, but lines 101-102 already handle this case for all file types when the dtype is pl.Date. The first cast should be removed to avoid redundant operations.
| df = df.with_columns(pl.col(self._col_timestamp).cast(pl.Datetime)) |
| """Tests for Polars data feeds.""" | ||
|
|
||
| from datetime import datetime | ||
|
|
||
| from alphaflow.data_feeds import PolarsDataFeed | ||
| from alphaflow.events import MarketDataEvent | ||
|
|
||
|
|
||
| def test_polars_data_feed_initialization() -> None: | ||
| """Test PolarsDataFeed initialization.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
|
|
||
| assert isinstance(data_feed.df_or_file_path, str) | ||
| assert data_feed.df_or_file_path == "alphaflow/tests/data/AAPL.csv" | ||
|
|
||
|
|
||
| def test_polars_data_feed_run_yields_market_data_events() -> None: | ||
| """Test PolarsDataFeed yields MarketDataEvent objects.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
|
|
||
| events = list( | ||
| data_feed.run( | ||
| symbol="AAPL", | ||
| start_timestamp=datetime(1980, 12, 25), | ||
| end_timestamp=datetime(1980, 12, 31), | ||
| ) | ||
| ) | ||
|
|
||
| assert len(events) > 0 | ||
| assert all(isinstance(event, MarketDataEvent) for event in events) | ||
|
|
||
|
|
||
| def test_polars_data_feed_events_have_correct_symbol() -> None: | ||
| """Test all events have the requested symbol.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
|
|
||
| events = list( | ||
| data_feed.run( | ||
| symbol="TEST_SYMBOL", | ||
| start_timestamp=datetime(1980, 12, 25), | ||
| end_timestamp=datetime(1980, 12, 31), | ||
| ) | ||
| ) | ||
|
|
||
| assert all(event.symbol == "TEST_SYMBOL" for event in events) | ||
|
|
||
|
|
||
| def test_polars_data_feed_events_sorted_by_timestamp() -> None: | ||
| """Test events are yielded in chronological order.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
|
|
||
| events = list( | ||
| data_feed.run( | ||
| symbol="AAPL", | ||
| start_timestamp=datetime(1980, 12, 25), | ||
| end_timestamp=datetime(1981, 1, 31), | ||
| ) | ||
| ) | ||
|
|
||
| timestamps = [event.timestamp for event in events] | ||
| assert timestamps == sorted(timestamps) | ||
|
|
||
|
|
||
| def test_polars_data_feed_respects_start_timestamp() -> None: | ||
| """Test data feed only yields events after start timestamp.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
| start_timestamp = datetime(1981, 1, 1) | ||
|
|
||
| events = list( | ||
| data_feed.run( | ||
| symbol="AAPL", | ||
| start_timestamp=start_timestamp, | ||
| end_timestamp=datetime(1981, 1, 31), | ||
| ) | ||
| ) | ||
|
|
||
| assert all(event.timestamp >= start_timestamp for event in events) | ||
|
|
||
|
|
||
| def test_polars_data_feed_respects_end_timestamp() -> None: | ||
| """Test data feed only yields events before end timestamp.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
| end_timestamp = datetime(1981, 1, 15) | ||
|
|
||
| events = list( | ||
| data_feed.run( | ||
| symbol="AAPL", | ||
| start_timestamp=datetime(1980, 12, 25), | ||
| end_timestamp=end_timestamp, | ||
| ) | ||
| ) | ||
|
|
||
| assert all(event.timestamp <= end_timestamp for event in events) | ||
|
|
||
|
|
||
| def test_polars_data_feed_event_has_all_ohlcv_fields() -> None: | ||
| """Test MarketDataEvent has open, high, low, close, volume.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
|
|
||
| events = list( | ||
| data_feed.run( | ||
| symbol="AAPL", | ||
| start_timestamp=datetime(1980, 12, 25), | ||
| end_timestamp=datetime(1980, 12, 31), | ||
| ) | ||
| ) | ||
|
|
||
| # Check first event has all required fields | ||
| event = events[0] | ||
| assert hasattr(event, "open") | ||
| assert hasattr(event, "high") | ||
| assert hasattr(event, "low") | ||
| assert hasattr(event, "close") | ||
| assert hasattr(event, "volume") | ||
| assert hasattr(event, "timestamp") | ||
| assert hasattr(event, "symbol") | ||
|
|
||
|
|
||
| def test_polars_data_feed_prices_are_positive() -> None: | ||
| """Test all OHLC prices are positive.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
|
|
||
| events = list( | ||
| data_feed.run( | ||
| symbol="AAPL", | ||
| start_timestamp=datetime(1980, 12, 25), | ||
| end_timestamp=datetime(1980, 12, 31), | ||
| ) | ||
| ) | ||
|
|
||
| for event in events: | ||
| assert event.open > 0 | ||
| assert event.high > 0 | ||
| assert event.low > 0 | ||
| assert event.close > 0 | ||
|
|
||
|
|
||
| def test_polars_data_feed_high_low_relationship() -> None: | ||
| """Test high >= low for all events.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
|
|
||
| events = list( | ||
| data_feed.run( | ||
| symbol="AAPL", | ||
| start_timestamp=datetime(1980, 12, 25), | ||
| end_timestamp=datetime(1981, 1, 31), | ||
| ) | ||
| ) | ||
|
|
||
| for event in events: | ||
| assert event.high >= event.low | ||
|
|
||
|
|
||
| def test_polars_data_feed_empty_range() -> None: | ||
| """Test data feed with date range that has no data.""" | ||
| data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv") | ||
|
|
||
| # Use a date range before any data exists | ||
| events = list( | ||
| data_feed.run( | ||
| symbol="AAPL", | ||
| start_timestamp=datetime(1970, 1, 1), | ||
| end_timestamp=datetime(1970, 1, 31), | ||
| ) | ||
| ) | ||
|
|
||
| assert len(events) == 0 |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test coverage for PolarsDataFeed is incomplete. The implementation supports multiple input types (pl.DataFrame, pl.LazyFrame, parquet files) and various edge cases, but there are no tests covering:
- Direct DataFrame/LazyFrame inputs (only CSV file path is tested)
- Parquet file loading
- Custom column name configuration
- Error handling for unsupported file formats
- Error handling for missing required columns
These scenarios should be tested to ensure the new functionality works as intended.
6a36b31 to
2ae949a
Compare
No description provided.