-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New MongoDB/DocumentDB driver with Motor #1374
Conversation
Reviewer's Guide by SourceryThis pull request introduces a new MongoDB/DocumentDB driver using the Motor library. It replaces the existing Sequence diagram for MongoDB connection and operation flowsequenceDiagram
participant Client
participant MongoDriver as mongo
participant MongoDB
Client->>MongoDriver: __init__(dsn, params)
MongoDriver->>MongoDriver: _construct_dsn(params)
Client->>MongoDriver: connection()
MongoDriver->>MongoDB: connect with Motor client
MongoDB-->>MongoDriver: connection established
MongoDriver-->>Client: driver instance
Client->>MongoDriver: use(database)
MongoDriver->>MongoDB: select database
MongoDB-->>MongoDriver: database selected
MongoDriver-->>Client: database instance
Client->>MongoDriver: execute(collection, operation)
MongoDriver->>MongoDriver: _select_database()
MongoDriver->>MongoDB: perform operation
MongoDB-->>MongoDriver: operation result
MongoDriver-->>Client: result
Class diagram for the updated MongoDB driverclassDiagram
class mongo {
+str _provider
+str _syntax
+str _dsn_template
+AsyncIOMotorClient _connection
+AsyncIOMotorDatabase _database
+str _database_name
+list _databases
+int _timeout
+__init__(dsn, loop, params, kwargs)
+_construct_dsn(params)
+_select_database()
+connection()
+close()
+test_connection(use_ping)
+use(database)
+execute(collection_name, operation, args, kwargs)
+execute_many(collection_name, operation, documents)
+query(collection_name, filter, args, kwargs)
+write(data, collection, database, use_pandas, if_exists)
+truncate_table(collection_name)
+delete(collection_name, filter, many)
+drop_collection(collection_name)
+drop_database(database_name)
}
class BaseDriver {
<<interface>>
+connection()
+close()
}
mongo --|> BaseDriver : implements
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @phenobarbital - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 1 issue found
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
write_success = await db_driver.write(data=sample_dataframe, collection=collection_name) | ||
assert write_success is True, "Failed to write DataFrame to MongoDB collection." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Test edge cases for write operation with DataFrame
Consider adding tests for empty DataFrames, DataFrames with null values, and different data types in the DataFrame (e.g., nested dictionaries, lists). Also, test the if_exists='replace'
scenario with DataFrame input.
Suggested implementation:
@pytest.fixture
def empty_dataframe():
"""Fixture that returns an empty DataFrame."""
return pd.DataFrame()
@pytest.fixture
def null_dataframe():
"""Fixture that returns a DataFrame with null values."""
return pd.DataFrame({
'col1': [1, None, 3],
'col2': ['a', None, 'c'],
'col3': [None, 2.5, 3.5]
})
@pytest.fixture
def complex_dataframe():
"""Fixture that returns a DataFrame with complex data types."""
return pd.DataFrame({
'nested_dict': [{'key1': 'value1'}, {'key2': 'value2'}],
'list_col': [[1, 2, 3], [4, 5, 6]],
'mixed_types': [1, 'string'],
'basic_col': [1, 2]
})
@pytest.mark.asyncio
async def test_write_dataframe_basic(sample_collection, sample_dataframe):
# Verify that the data was written correctly
@pytest.mark.asyncio
async def test_write_empty_dataframe(sample_collection, empty_dataframe):
"""Test writing an empty DataFrame to MongoDB."""
db_driver, collection_name = sample_collection
write_success = await db_driver.write(data=empty_dataframe, collection=collection_name)
assert write_success is True, "Failed to write empty DataFrame to MongoDB collection."
# Verify that no documents were written
documents, error = await db_driver.fetch(collection_name)
assert error is None, f"Error during fetch: {error}"
assert len(documents) == 0, "Collection should be empty"
@pytest.mark.asyncio
async def test_write_null_dataframe(sample_collection, null_dataframe):
"""Test writing a DataFrame with null values to MongoDB."""
db_driver, collection_name = sample_collection
write_success = await db_driver.write(data=null_dataframe, collection=collection_name)
assert write_success is True, "Failed to write DataFrame with null values to MongoDB collection."
# Verify the data including null values
documents, error = await db_driver.fetch(collection_name)
assert error is None, f"Error during fetch: {error}"
assert len(documents) == 3, "Number of documents doesn't match"
# Check that null values are preserved
fetched_data = [{k: v for k, v in doc.items() if k != '_id'} for doc in documents]
assert any(None in doc.values() for doc in fetched_data), "Null values were not preserved"
@pytest.mark.asyncio
async def test_write_complex_dataframe(sample_collection, complex_dataframe):
"""Test writing a DataFrame with complex data types to MongoDB."""
db_driver, collection_name = sample_collection
write_success = await db_driver.write(data=complex_dataframe, collection=collection_name)
assert write_success is True, "Failed to write DataFrame with complex types to MongoDB collection."
# Verify the complex data types
documents, error = await db_driver.fetch(collection_name)
assert error is None, f"Error during fetch: {error}"
fetched_data = [{k: v for k, v in doc.items() if k != '_id'} for doc in documents]
assert isinstance(fetched_data[0]['nested_dict'], dict), "Nested dict not preserved"
assert isinstance(fetched_data[0]['list_col'], list), "List not preserved"
@pytest.mark.asyncio
async def test_write_dataframe_replace(sample_collection, sample_dataframe, null_dataframe):
"""Test writing a DataFrame with if_exists='replace' option."""
db_driver, collection_name = sample_collection
# First write the sample dataframe
write_success = await db_driver.write(data=sample_dataframe, collection=collection_name)
assert write_success is True, "Failed to write initial DataFrame"
# Then replace it with the null dataframe
write_success = await db_driver.write(
data=null_dataframe,
collection=collection_name,
if_exists='replace'
)
assert write_success is True, "Failed to replace existing data with new DataFrame"
# Verify that only the new data exists
documents, error = await db_driver.fetch(collection_name)
assert error is None, f"Error during fetch: {error}"
assert len(documents) == 3, "Collection should only contain the replacement data"
You'll need to:
- Import pandas as pd at the top of the file if not already imported
- Ensure the MongoDB driver class supports the if_exists='replace' parameter in its write method
- Update any existing documentation to reflect the new test cases
asyncdb/drivers/mongo.py
Outdated
method = getattr(collection, operation) | ||
result = await method(*args, **kwargs) | ||
except Exception as err: | ||
error = err | ||
return (result, error) | ||
|
||
async def execute_many(self, collection_name: str, operation: str, documents: list) -> Optional[Any]: | ||
async def execute_many( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider implementing batching in execute_many
to handle large operations efficiently and prevent potential memory issues.
The execute_many
method currently just forwards to execute
, creating an unnecessary abstraction layer. To justify this separate method, implement proper bulk operation handling:
async def execute_many(
self,
collection_name: str,
operation: str,
documents: list,
batch_size: int = 1000
) -> Optional[Any]:
"""Execute bulk operations with batching for better performance."""
results = []
for i in range(0, len(documents), batch_size):
batch = documents[i:i + batch_size]
result = await self.execute(collection_name, operation, batch)
results.append(result)
return results
This adds value by:
- Breaking large operations into manageable batches
- Aggregating results from multiple batches
- Preventing memory issues with large document sets
asyncdb/drivers/bigquery.py
Outdated
try: | ||
table = self._connection.get_table(table_ref) | ||
except NotFound: | ||
raise DriverError( | ||
f"BigQuery: Table `{dataset_id}.{table_id}` does not exist." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Explicitly raise from a previous error (raise-from-previous-error
)
asyncdb/drivers/mongo.py
Outdated
except Exception as err: | ||
raise DriverError(f"No row to be returned {err}") | ||
return result | ||
return await self.queryrow(collection_name, filter, *args, **kwargs) | ||
|
||
fetchrow = fetch_one | ||
fetchone = fetch_one | ||
|
||
async def write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Simplify conditional into switch-like form (
switch
) - Low code quality found in mongo.write - 24% (
low-code-quality
)
Explanation
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines. - Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.
df = pd.DataFrame(data) | ||
return df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable
)
df = pd.DataFrame(data) | |
return df | |
return pd.DataFrame(data) |
Summary by Sourcery
Implement a new MongoDB/DocumentDB driver using Motor.
New Features:
Tests: