Add fake streaming source #8

wengh · 2025-03-11T23:40:31Z

Implement streamReader for FakeDataSource.

This pull request introduces significant enhancements to the FakeDataSource in the pyspark_datasources module, including the addition of a streaming capability and refactoring for better code organization. The most important changes include the addition of a streaming reader, the creation of a schema validation function, and updates to the documentation and tests.

Enhancements to FakeDataSource:

Added a streaming reader with the FakeDataSourceStreamReader class to support streaming fake data (pyspark_datasources/fake.py).
Introduced the _validate_faker_schema function to validate the schema and ensure that the faker library is correctly installed and used (pyspark_datasources/fake.py).

Documentation updates:

Updated the documentation to include information about the streaming reader and provided an example of how to use it (pyspark_datasources/fake.py). [1] [2]

Tests:

Added a new test test_fake_datasource_stream to verify the functionality of the streaming reader (tests/test_data_sources.py).

Refactoring:

Refactored the reader method to use the _validate_faker_schema function and added the streamReader method to the FakeDataSource class (pyspark_datasources/fake.py).

Add fake streaming source

7924e47

allisonwang-db approved these changes Mar 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fake streaming source #8

Add fake streaming source #8

wengh commented Mar 11, 2025 •

edited

Loading

Add fake streaming source #8

Are you sure you want to change the base?

Add fake streaming source #8

Conversation

wengh commented Mar 11, 2025 • edited Loading

wengh commented Mar 11, 2025 •

edited

Loading