Skip to content

Conversation

@brandonschabell
Copy link
Owner

No description provided.

Copilot AI review requested due to automatic review settings December 24, 2025 04:16
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new PolarsDataFeed class that replaces and extends the functionality of the existing CSVDataFeed. The CSVDataFeed is now deprecated and reimplemented as a thin wrapper around PolarsDataFeed to maintain backward compatibility.

  • Adds PolarsDataFeed with support for CSV files, Parquet files, and direct Polars DataFrame/LazyFrame inputs
  • Deprecates CSVDataFeed by converting it to a subclass of PolarsDataFeed with a deprecation warning
  • Updates all tests and documentation to use the new PolarsDataFeed

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyproject.toml Adds typing-extensions dependency for deprecation decorator
uv.lock Updates lock file with typing-extensions dependency
alphaflow/data_feeds/polars_data_feed.py New data feed implementation supporting multiple input formats (CSV, Parquet, DataFrame, LazyFrame)
alphaflow/data_feeds/csv_data_feed.py Refactored as deprecated wrapper around PolarsDataFeed
alphaflow/data_feeds/__init__.py Exports PolarsDataFeed alongside CSVDataFeed
alphaflow/tests/test_polars_data_feed.py Comprehensive test suite for new PolarsDataFeed
alphaflow/tests/test_csv_data_feed.py Adds deprecation warning test
alphaflow/tests/test_*.py Updates all test files to use PolarsDataFeed instead of CSVDataFeed
docs/getting_started.md Updates documentation to reference PolarsDataFeed
docs/api/data_feeds.md Updates API docs to document PolarsDataFeed
README.md Updates example code to use PolarsDataFeed
CHANGELOG.md Documents the addition and deprecation

"""Initialize the Polars data feed.

Args:
df_or_file_path: Polars dataframe or path to the Polars dataframe containing market data.
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring incorrectly refers to "path to the Polars dataframe" when it should say "path to a CSV or Parquet file". Polars DataFrames are in-memory data structures, not files on disk. The parameter can accept either a DataFrame/LazyFrame OR a file path to CSV/Parquet files.

Copilot uses AI. Check for mistakes.
df_path = Path(self.df_or_file_path) if isinstance(self.df_or_file_path, str) else self.df_or_file_path
if df_path.suffix in {".parquet", ".parq"}:
df = pl.read_parquet(df_path)
df = df.with_columns(pl.col(self._col_timestamp).cast(pl.Datetime))
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate datetime casting for parquet files is unnecessary. Line 77 casts the timestamp column to Datetime after reading parquet, but lines 101-102 already handle this case for all file types when the dtype is pl.Date. The first cast should be removed to avoid redundant operations.

Suggested change
df = df.with_columns(pl.col(self._col_timestamp).cast(pl.Datetime))

Copilot uses AI. Check for mistakes.
Comment on lines 1 to 170
"""Tests for Polars data feeds."""

from datetime import datetime

from alphaflow.data_feeds import PolarsDataFeed
from alphaflow.events import MarketDataEvent


def test_polars_data_feed_initialization() -> None:
"""Test PolarsDataFeed initialization."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")

assert isinstance(data_feed.df_or_file_path, str)
assert data_feed.df_or_file_path == "alphaflow/tests/data/AAPL.csv"


def test_polars_data_feed_run_yields_market_data_events() -> None:
"""Test PolarsDataFeed yields MarketDataEvent objects."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")

events = list(
data_feed.run(
symbol="AAPL",
start_timestamp=datetime(1980, 12, 25),
end_timestamp=datetime(1980, 12, 31),
)
)

assert len(events) > 0
assert all(isinstance(event, MarketDataEvent) for event in events)


def test_polars_data_feed_events_have_correct_symbol() -> None:
"""Test all events have the requested symbol."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")

events = list(
data_feed.run(
symbol="TEST_SYMBOL",
start_timestamp=datetime(1980, 12, 25),
end_timestamp=datetime(1980, 12, 31),
)
)

assert all(event.symbol == "TEST_SYMBOL" for event in events)


def test_polars_data_feed_events_sorted_by_timestamp() -> None:
"""Test events are yielded in chronological order."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")

events = list(
data_feed.run(
symbol="AAPL",
start_timestamp=datetime(1980, 12, 25),
end_timestamp=datetime(1981, 1, 31),
)
)

timestamps = [event.timestamp for event in events]
assert timestamps == sorted(timestamps)


def test_polars_data_feed_respects_start_timestamp() -> None:
"""Test data feed only yields events after start timestamp."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
start_timestamp = datetime(1981, 1, 1)

events = list(
data_feed.run(
symbol="AAPL",
start_timestamp=start_timestamp,
end_timestamp=datetime(1981, 1, 31),
)
)

assert all(event.timestamp >= start_timestamp for event in events)


def test_polars_data_feed_respects_end_timestamp() -> None:
"""Test data feed only yields events before end timestamp."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")
end_timestamp = datetime(1981, 1, 15)

events = list(
data_feed.run(
symbol="AAPL",
start_timestamp=datetime(1980, 12, 25),
end_timestamp=end_timestamp,
)
)

assert all(event.timestamp <= end_timestamp for event in events)


def test_polars_data_feed_event_has_all_ohlcv_fields() -> None:
"""Test MarketDataEvent has open, high, low, close, volume."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")

events = list(
data_feed.run(
symbol="AAPL",
start_timestamp=datetime(1980, 12, 25),
end_timestamp=datetime(1980, 12, 31),
)
)

# Check first event has all required fields
event = events[0]
assert hasattr(event, "open")
assert hasattr(event, "high")
assert hasattr(event, "low")
assert hasattr(event, "close")
assert hasattr(event, "volume")
assert hasattr(event, "timestamp")
assert hasattr(event, "symbol")


def test_polars_data_feed_prices_are_positive() -> None:
"""Test all OHLC prices are positive."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")

events = list(
data_feed.run(
symbol="AAPL",
start_timestamp=datetime(1980, 12, 25),
end_timestamp=datetime(1980, 12, 31),
)
)

for event in events:
assert event.open > 0
assert event.high > 0
assert event.low > 0
assert event.close > 0


def test_polars_data_feed_high_low_relationship() -> None:
"""Test high >= low for all events."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")

events = list(
data_feed.run(
symbol="AAPL",
start_timestamp=datetime(1980, 12, 25),
end_timestamp=datetime(1981, 1, 31),
)
)

for event in events:
assert event.high >= event.low


def test_polars_data_feed_empty_range() -> None:
"""Test data feed with date range that has no data."""
data_feed = PolarsDataFeed("alphaflow/tests/data/AAPL.csv")

# Use a date range before any data exists
events = list(
data_feed.run(
symbol="AAPL",
start_timestamp=datetime(1970, 1, 1),
end_timestamp=datetime(1970, 1, 31),
)
)

assert len(events) == 0
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage for PolarsDataFeed is incomplete. The implementation supports multiple input types (pl.DataFrame, pl.LazyFrame, parquet files) and various edge cases, but there are no tests covering:

  1. Direct DataFrame/LazyFrame inputs (only CSV file path is tested)
  2. Parquet file loading
  3. Custom column name configuration
  4. Error handling for unsupported file formats
  5. Error handling for missing required columns

These scenarios should be tested to ensure the new functionality works as intended.

Copilot uses AI. Check for mistakes.
@brandonschabell brandonschabell merged commit b92afd3 into main Dec 24, 2025
6 checks passed
@brandonschabell brandonschabell deleted the polars-data-feed branch December 24, 2025 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants