Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c] Workflow configuration #17

Merged
merged 3 commits into from
Jan 20, 2025
Merged

[c] Workflow configuration #17

merged 3 commits into from
Jan 20, 2025

Conversation

lamle-ea
Copy link
Contributor

@lamle-ea lamle-ea commented Jan 20, 2025

What does this PR do?

  • Add environment variable for workflow archive.yml and cron.yml.
  • Upgrade Python version to 3.11.

Summary by CodeRabbit

  • Chores

    • Updated Python version from 3.9 to 3.11 across GitHub Actions workflows and project configuration
    • Updated Scrapy package to version 2.11.2
    • Configured Azure cloud storage integration
    • Removed previous cloud storage configurations for S3 and Google Cloud
  • Infrastructure

    • Added Azure-related environment variables for cloud storage and monitoring
    • Enabled Azure Blob storage and status extensions for data processing

Copy link
Contributor

coderabbitai bot commented Jan 20, 2025

Walkthrough

The pull request introduces a comprehensive update to the project's infrastructure, focusing on upgrading the Python version from 3.9 to 3.11 across multiple configuration files. The changes span GitHub Actions workflows, the Pipfile, and production settings. Additionally, the configuration shifts towards Azure-based storage and monitoring solutions, enabling Azure Blob storage and Sentry integration while removing previous cloud service configurations.

Changes

File Change Summary
.github/workflows/archive.yml Updated Python version from 3.9 to 3.11
.github/workflows/ci.yml Updated Python version in workflow matrix from 3.9 to 3.11
.github/workflows/cron.yml - Updated Python version from 3.9 to 3.11
- Added Azure and Sentry environment variables
- Removed commented AWS and Google Cloud credentials
Pipfile - Updated scrapy package to specific version 2.11.2
- Updated Python version from 3.9 to 3.11
city_scrapers/settings/prod.py - Enabled Azure-related pipelines and extensions
- Added Azure credential environment variables
- Configured Azure Blob storage feed URI
- Removed S3 and GCS configurations

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant GHA as GitHub Actions
    participant Azure as Azure Services
    participant Sentry as Sentry Monitoring

    Dev->>GHA: Push code changes
    GHA->>GHA: Setup Python 3.11
    GHA->>Azure: Configure Blob Storage
    GHA->>Sentry: Configure Error Tracking
    GHA->>GHA: Run CI/CD Workflows
Loading

Poem

🐰 A Rabbit's Ode to Python's Leap

From version nine to eleven we soar,
Azure clouds now open their door
Scrapy spins with newfound might
Workflows dance in version's light
Hop, hop, hurray! Code's fresh and bright! 🚀

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@lamle-ea lamle-ea changed the title Setup workflow [c] Workflow configuration Jan 20, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)

18-18: Consider maintaining compatibility with Python 3.9.

While upgrading to Python 3.11 is good, consider keeping Python 3.9 in the matrix to ensure backward compatibility and catch potential issues early. This is especially important if this is a library that others might use.

     strategy:
       max-parallel: 4
       matrix:
-        python-version: [3.11]
+        python-version: [3.9, 3.11]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e330df and 4435ccb.

⛔ Files ignored due to path filters (1)
  • Pipfile.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • .github/workflows/archive.yml (1 hunks)
  • .github/workflows/ci.yml (1 hunks)
  • .github/workflows/cron.yml (1 hunks)
  • Pipfile (2 hunks)
  • city_scrapers/settings/prod.py (2 hunks)
🔇 Additional comments (10)
city_scrapers/settings/prod.py (4)

17-19: Confirm Sentry integration with Azure extension.

The combination of Azure status extension and Sentry error tracking looks good. The priority order (100 for Azure, 10 for Sentry) ensures proper execution sequence.


33-36: Ensure all Azure environment variables are properly configured in GitHub secrets.

These environment variables are security-sensitive. Verify they are properly set in the GitHub repository secrets.

✅ Verification successful

Azure environment variables are properly configured as GitHub secrets

All required Azure environment variables are correctly set up as GitHub secrets and are being properly passed to the application. The configuration follows security best practices.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if all Azure environment variables are used consistently
rg "AZURE_(ACCOUNT_NAME|ACCOUNT_KEY|CONTAINER|STATUS_CONTAINER)" -g "*.yml"

Length of output: 226


Script:

#!/bin/bash
# Check workflow files for Azure environment variables and secrets
fd . .github/workflows/ -e yml -e yaml -x cat {} \; | grep -i "azure"

# Also check if there are any workflow files
fd . .github/workflows/ -e yml -e yaml

Length of output: 419


38-45: Review Azure feed URI format for security concerns.

The feed URI format includes sensitive credentials. While this is standard for Azure Blob storage, ensure:

  1. The account key is properly masked in logs
  2. The container path format is secure against path traversal
✅ Verification successful

Azure feed URI implementation follows security best practices

The current implementation is secure because:

  • Credentials are properly managed through environment variables
  • No logging of sensitive information detected
  • Container path format uses safe, system-controlled variables preventing path traversal
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any logging of the feed URI
rg "FEED_URI" -A 2 -B 2

Length of output: 377


Script:

#!/bin/bash
# Check for logging configuration files and FEED_URI usage
echo "=== Checking for logging configurations ==="
fd -e py -e yaml -e ini -e conf -e json | grep -i "log" || true

echo -e "\n=== Checking for FEED_URI usage in code ==="
rg -l "FEED_URI"

echo -e "\n=== Checking for any print/debug statements with FEED_URI ==="
ast-grep --pattern 'print($$$FEED_URI$$$)'

echo -e "\n=== Looking for credential handling patterns ==="
rg "AZURE_(ACCOUNT_KEY|ACCOUNT_NAME)" -A 2 -B 2

Length of output: 1431


9-11: Verify pipeline configuration completeness.

The pipeline configuration looks correct, but ensure all required pipelines for Azure integration are included. The order (200, 300, 400) allows for proper data flow.

✅ Verification successful

Pipeline configuration is correctly structured and complete

The pipeline setup in prod.py is properly configured with all necessary components for Azure integration. The ordering (200, 300, 400) ensures correct data flow from diff processing through meeting data handling to final civic data formatting.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if any other Azure-related pipelines might be needed
rg "Pipeline.*Azure" -g "*.py"

Length of output: 32


Script:

#!/bin/bash
# Search for Azure-related imports and pipeline configurations
rg "from.*azure|import.*azure" -g "*.py"
rg "class.*Pipeline" -g "*.py"

# Search specifically in city_scrapers_core for pipeline definitions
fd . -e py --exec grep -l "Pipeline" {} \; | grep "city_scrapers_core"

Length of output: 147


Script:

#!/bin/bash
# Search for settings files and pipeline configurations
fd settings -e py
rg "AzureDiffPipeline|MeetingPipeline|OpenCivicDataPipeline" -g "*.py"

Length of output: 543

Pipfile (2)

7-7: Verify Scrapy version compatibility with Python 3.11.

The pinned version of Scrapy (2.11.2) is compatible with Python 3.11, but ensure all extensions and middleware support this version.

Consider adding a comment explaining why this specific version was chosen to help with future maintenance.


20-20: Confirm all dependencies support Python 3.11.

The upgrade to Python 3.11 is significant. Verify that all dependencies, especially city-scrapers-core and scrapy-wayback-middleware, are compatible with Python 3.11.

✅ Verification successful

All dependencies are compatible with Python 3.11

Both city-scrapers-core and scrapy-wayback-middleware, along with all other dependencies, are compatible with Python 3.11. The most restrictive constraint found is "python_version < '4.0'", which allows Python 3.11.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any Python version constraints in dependencies
rg "python_version" -g "*.txt" -g "*.toml" -g "*.lock"

Length of output: 4058


Script:

#!/bin/bash
# Check version constraints for specific packages
rg "city-scrapers-core|scrapy-wayback-middleware" -A 5 Pipfile.lock

Length of output: 729

.github/workflows/archive.yml (1)

10-10: Verify Python 3.11 compatibility with all workflow steps.

The Python version upgrade looks good, but ensure all GitHub Actions used in the workflow support Python 3.11.

.github/workflows/cron.yml (2)

18-22: Review security of environment variable usage.

The Azure and Sentry credentials are properly configured using GitHub secrets. However:

  1. Ensure these secrets are properly set in the repository settings
  2. Consider adding secret masking in the workflow logs

Consider documenting the required secrets in the repository's README.md to help with setup.


11-17: Verify autothrottle settings with new Python version.

The autothrottle settings look reasonable, but with the Python 3.11 upgrade, consider:

  1. Testing these values with the new runtime
  2. Monitoring initial runs for any performance changes
.github/workflows/ci.yml (1)

18-18: Verify dependency compatibility with Python 3.11.

The upgrade to Python 3.11 aligns with the PR objectives. However, let's verify that all dependencies in the project support Python 3.11 to prevent potential compatibility issues.

✅ Verification successful

Python 3.11 upgrade is compatible with all project dependencies

All project dependencies in Pipfile.lock support Python 3.11, with no conflicting version constraints detected. The upgrade is safe to proceed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if any dependencies in Pipfile.lock have Python version constraints that might conflict with Python 3.11

# Extract and analyze package metadata from Pipfile.lock
jq -r '.default + ."develop" | to_entries[] | select(.value.markers != null) | "\(.key): \(.value.markers)"' Pipfile.lock

# Look for any explicit Python version constraints in Pipfile
grep -A 5 "python_version" Pipfile || true

Length of output: 2670

@lamle-ea lamle-ea merged commit 74b7b49 into main Jan 20, 2025
3 checks passed
@lamle-ea lamle-ea deleted the setup-workflow branch January 20, 2025 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant