Add a PolarsDataFeed and deprecate the CSVDataFeed #43

brandonschabell · 2025-12-24T04:16:07Z

No description provided.

Copilot

Pull request overview

This PR introduces a new PolarsDataFeed class that replaces and extends the functionality of the existing CSVDataFeed. The CSVDataFeed is now deprecated and reimplemented as a thin wrapper around PolarsDataFeed to maintain backward compatibility.

Adds PolarsDataFeed with support for CSV files, Parquet files, and direct Polars DataFrame/LazyFrame inputs
Deprecates CSVDataFeed by converting it to a subclass of PolarsDataFeed with a deprecation warning
Updates all tests and documentation to use the new PolarsDataFeed

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`pyproject.toml`	Adds typing-extensions dependency for deprecation decorator
`uv.lock`	Updates lock file with typing-extensions dependency
`alphaflow/data_feeds/polars_data_feed.py`	New data feed implementation supporting multiple input formats (CSV, Parquet, DataFrame, LazyFrame)
`alphaflow/data_feeds/csv_data_feed.py`	Refactored as deprecated wrapper around PolarsDataFeed
`alphaflow/data_feeds/__init__.py`	Exports PolarsDataFeed alongside CSVDataFeed
`alphaflow/tests/test_polars_data_feed.py`	Comprehensive test suite for new PolarsDataFeed
`alphaflow/tests/test_csv_data_feed.py`	Adds deprecation warning test
`alphaflow/tests/test_*.py`	Updates all test files to use PolarsDataFeed instead of CSVDataFeed
`docs/getting_started.md`	Updates documentation to reference PolarsDataFeed
`docs/api/data_feeds.md`	Updates API docs to document PolarsDataFeed
`README.md`	Updates example code to use PolarsDataFeed
`CHANGELOG.md`	Documents the addition and deprecation

Copilot · 2025-12-24T04:18:58Z

alphaflow/data_feeds/polars_data_feed.py

+        """Initialize the Polars data feed.
+
+        Args:
+            df_or_file_path: Polars dataframe or path to the Polars dataframe containing market data.


The docstring incorrectly refers to "path to the Polars dataframe" when it should say "path to a CSV or Parquet file". Polars DataFrames are in-memory data structures, not files on disk. The parameter can accept either a DataFrame/LazyFrame OR a file path to CSV/Parquet files.

Copilot · 2025-12-24T04:18:59Z

alphaflow/data_feeds/polars_data_feed.py

+            df_path = Path(self.df_or_file_path) if isinstance(self.df_or_file_path, str) else self.df_or_file_path
+            if df_path.suffix in {".parquet", ".parq"}:
+                df = pl.read_parquet(df_path)
+                df = df.with_columns(pl.col(self._col_timestamp).cast(pl.Datetime))


The duplicate datetime casting for parquet files is unnecessary. Line 77 casts the timestamp column to Datetime after reading parquet, but lines 101-102 already handle this case for all file types when the dtype is pl.Date. The first cast should be removed to avoid redundant operations.

Suggested change

df = df.with_columns(pl.col(self._col_timestamp).cast(pl.Datetime))

Copilot · 2025-12-24T04:18:59Z

alphaflow/tests/test_polars_data_feed.py

+"""Tests for Polars data feeds."""
+
+from datetime import datetime
+
+from alphaflow.data_feeds import PolarsDataFeed
+from alphaflow.events import MarketDataEvent
+
+
+def test_polars_data_feed_initialization() -> None:
+    """Test PolarsDataFeed initialization."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+
+    assert isinstance(data_feed.df_or_file_path, str)
+    assert data_feed.df_or_file_path == "alphaflow/tests/data/AAPL.csv"
+
+
+def test_polars_data_feed_run_yields_market_data_events() -> None:
+    """Test PolarsDataFeed yields MarketDataEvent objects."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+
+    events = list(
+        data_feed.run(
+            symbol="AAPL",
+            start_timestamp=datetime(1980, 12, 25),
+            end_timestamp=datetime(1980, 12, 31),
+        )
+    )
+
+    assert len(events) > 0
+    assert all(isinstance(event, MarketDataEvent) for event in events)
+
+
+def test_polars_data_feed_events_have_correct_symbol() -> None:
+    """Test all events have the requested symbol."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+
+    events = list(
+        data_feed.run(
+            symbol="TEST_SYMBOL",
+            start_timestamp=datetime(1980, 12, 25),
+            end_timestamp=datetime(1980, 12, 31),
+        )
+    )
+
+    assert all(event.symbol == "TEST_SYMBOL" for event in events)
+
+
+def test_polars_data_feed_events_sorted_by_timestamp() -> None:
+    """Test events are yielded in chronological order."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+
+    events = list(
+        data_feed.run(
+            symbol="AAPL",
+            start_timestamp=datetime(1980, 12, 25),
+            end_timestamp=datetime(1981, 1, 31),
+        )
+    )
+
+    timestamps = [event.timestamp for event in events]
+    assert timestamps == sorted(timestamps)
+
+
+def test_polars_data_feed_respects_start_timestamp() -> None:
+    """Test data feed only yields events after start timestamp."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+    start_timestamp = datetime(1981, 1, 1)
+
+    events = list(
+        data_feed.run(
+            symbol="AAPL",
+            start_timestamp=start_timestamp,
+            end_timestamp=datetime(1981, 1, 31),
+        )
+    )
+
+    assert all(event.timestamp >= start_timestamp for event in events)
+
+
+def test_polars_data_feed_respects_end_timestamp() -> None:
+    """Test data feed only yields events before end timestamp."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+    end_timestamp = datetime(1981, 1, 15)
+
+    events = list(
+        data_feed.run(
+            symbol="AAPL",
+            start_timestamp=datetime(1980, 12, 25),
+            end_timestamp=end_timestamp,
+        )
+    )
+
+    assert all(event.timestamp <= end_timestamp for event in events)
+
+
+def test_polars_data_feed_event_has_all_ohlcv_fields() -> None:
+    """Test MarketDataEvent has open, high, low, close, volume."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+
+    events = list(
+        data_feed.run(
+            symbol="AAPL",
+            start_timestamp=datetime(1980, 12, 25),
+            end_timestamp=datetime(1980, 12, 31),
+        )
+    )
+
+    # Check first event has all required fields
+    event = events[0]
+    assert hasattr(event, "open")
+    assert hasattr(event, "high")
+    assert hasattr(event, "low")
+    assert hasattr(event, "close")
+    assert hasattr(event, "volume")
+    assert hasattr(event, "timestamp")
+    assert hasattr(event, "symbol")
+
+
+def test_polars_data_feed_prices_are_positive() -> None:
+    """Test all OHLC prices are positive."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+
+    events = list(
+        data_feed.run(
+            symbol="AAPL",
+            start_timestamp=datetime(1980, 12, 25),
+            end_timestamp=datetime(1980, 12, 31),
+        )
+    )
+
+    for event in events:
+        assert event.open > 0
+        assert event.high > 0
+        assert event.low > 0
+        assert event.close > 0
+
+
+def test_polars_data_feed_high_low_relationship() -> None:
+    """Test high >= low for all events."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+
+    events = list(
+        data_feed.run(
+            symbol="AAPL",
+            start_timestamp=datetime(1980, 12, 25),
+            end_timestamp=datetime(1981, 1, 31),
+        )
+    )
+
+    for event in events:
+        assert event.high >= event.low
+
+
+def test_polars_data_feed_empty_range() -> None:
+    """Test data feed with date range that has no data."""
+    data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
+
+    # Use a date range before any data exists
+    events = list(
+        data_feed.run(
+            symbol="AAPL",
+            start_timestamp=datetime(1970, 1, 1),
+            end_timestamp=datetime(1970, 1, 31),
+        )
+    )
+
+    assert len(events) == 0


The test coverage for PolarsDataFeed is incomplete. The implementation supports multiple input types (pl.DataFrame, pl.LazyFrame, parquet files) and various edge cases, but there are no tests covering:

Direct DataFrame/LazyFrame inputs (only CSV file path is tested)

Parquet file loading

Custom column name configuration

Error handling for unsupported file formats

Error handling for missing required columns

These scenarios should be tested to ensure the new functionality works as intended.

Copilot AI review requested due to automatic review settings December 24, 2025 04:16

Copilot started reviewing on behalf of brandonschabell December 24, 2025 04:16 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

Add a PolarsDataFeed and deprecate the CSVDataFeed

2ae949a

brandonschabell force-pushed the polars-data-feed branch from 6a36b31 to 2ae949a Compare December 24, 2025 04:24

brandonschabell merged commit b92afd3 into main Dec 24, 2025
6 checks passed

brandonschabell deleted the polars-data-feed branch December 24, 2025 04:36

brandonschabell mentioned this pull request Dec 24, 2025

Add Polars data feed #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a PolarsDataFeed and deprecate the CSVDataFeed #43

Add a PolarsDataFeed and deprecate the CSVDataFeed #43

Uh oh!

brandonschabell commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a PolarsDataFeed and deprecate the CSVDataFeed #43

Add a PolarsDataFeed and deprecate the CSVDataFeed #43

Uh oh!

Conversation

brandonschabell commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants