Skip to content

chore(source-s3): update base image to 4.0.0 and use caret dependencies (do not merge) #55202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 12, 2025

Conversation

devin-ai-integration[bot]
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 5, 2025

Update source-s3 to:

  • Use new base image (4.0.0)
  • Replace dependency declarations from specific versions to use carets
  • Bump dependencies by running poetry lock

Link to Devin run: https://app.devin.ai/sessions/38e801d31cf94b62ad7bc5f7577bfd2e
Requested by: User

Resolves: https://github.com/airbytehq/airbyte-internal-issues/issues/11890

Copy link

vercel bot commented Mar 5, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 12, 2025 6:49pm

Copy link
Contributor Author

🤖 Devin AI Engineer

Original prompt from [email protected]:

Received message in Slack channel #dev-devin-ai:

@Devin please update source-s3 to:
• Use new base image (4.0.0, lookup correct SHA in other python sources)
• Replace dependency declarations from specific versions (`== XYZ`) to use carets (` ^XYZ`) instead. For example, smart-open ^5.0.3 or something
• bump deps (run poetry lock, then commit the result)

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add "(aside)" to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@natikgadzhi
Copy link
Contributor

@aldogonzalez8 this one is interesting — it's failing with the following error in pypi smoke test:

TypeError: Can't instantiate abstract class SourceS3StreamReader with abstract methods file_permissions_schema, get_file_acl_permissions, identities_schema, load_identity_groups

And I think what that means is that file-transfer mode now requires those methods to be defined, but they are not really relevant to S3. What's the path forward?

@aldogonzalez8
Copy link
Contributor

@aldogonzalez8 this one is interesting — it's failing with the following error in pypi smoke test:

TypeError: Can't instantiate abstract class SourceS3StreamReader with abstract methods file_permissions_schema, get_file_acl_permissions, identities_schema, load_identity_groups

And I think what that means is that file-transfer mode now requires those methods to be defined, but they are not really relevant to S3. What's the path forward?

I see, can we ask Devin to implement as noop?

@aldogonzalez8
Copy link
Contributor

@aldogonzalez8 this one is interesting — it's failing with the following error in pypi smoke test:

TypeError: Can't instantiate abstract class SourceS3StreamReader with abstract methods file_permissions_schema, get_file_acl_permissions, identities_schema, load_identity_groups

And I think what that means is that file-transfer mode now requires those methods to be defined, but they are not really relevant to S3. What's the path forward?

I see, can we ask Devin to implement as noop?

That was fast.

@aldogonzalez8
Copy link
Contributor

We may need to ask Devin to create some unit tests for Noop methods. It seems awkward, but I think they are hitting coverage for stream_reader.

---------- coverage: platform linux, python 3.11.11-final-0 ----------
Name Stmts Miss Cover

source_s3/init.py 0 0 100%
source_s3/source.py 19 0 100%
source_s3/source_files_abstract/init.py 0 0 100%
source_s3/source_files_abstract/formats/init.py 0 0 100%
source_s3/source_files_abstract/formats/avro_spec.py 5 0 100%
source_s3/source_files_abstract/formats/csv_spec.py 16 0 100%
source_s3/source_files_abstract/formats/jsonl_spec.py 13 0 100%
source_s3/source_files_abstract/formats/parquet_spec.py 9 0 100%
source_s3/source_files_abstract/spec.py 55 1 98%
source_s3/v4/init.py 6 0 100%
source_s3/v4/config.py 35 1 97%
source_s3/v4/cursor.py 82 1 99%
source_s3/v4/legacy_config_transformer.py 86 3 97%
source_s3/v4/source.py 65 1 98%
source_s3/v4/stream_reader.py 184 37 80%
source_s3/v4/zip_reader.py 183 33 82%

TOTAL 758 77 90%

FAIL Required test coverage of 90% not reached. Total coverage: 89.84%
======================= 103 passed, 19 warnings in 6.83s =======================

@aldogonzalez8
Copy link
Contributor

/format-fix

@aldogonzalez8
Copy link
Contributor

aldogonzalez8 commented Mar 5, 2025

/format-fix

Format-fix job started... Check job output.

✅ Changes applied successfully. (167156b)

@aldogonzalez8
Copy link
Contributor

I will decouple these methods to a new class so we don't need to initialize noops, will circle back soon.

@aldogonzalez8
Copy link
Contributor

I have a fix, I will update this PR once the CDK PR is merged.

airbytehq/airbyte-python-cdk#402

@aldogonzalez8
Copy link
Contributor

Devin can you do the follwing:

  1. Change in airbyte-integrations/connectors/source-s3/pyproject.toml

airbyte-cdk = {extras = ["file-based"], version = "^6.18.2"}

to

airbyte-cdk = {extras = ["file-based"], version = "6.38.3.dev04101"}

  1. Do poetry lock to update airbyte-integrations/connectors/source-s3/poetry.lock

  2. Remove noop methods:

  • file_permissions_schema
  • get_file_acl_permissions
  • identities_schema
  • load_identity_groups
  1. Also remove the unit tests related to these noop methods (test_file_permissions_and_identity_methods), if CDK changes are ok, everything will pass.

@aldogonzalez8
Copy link
Contributor

aldogonzalez8 commented Mar 10, 2025

/format-fix

Format-fix job started... Check job output.

✅ Changes applied successfully. (ee35ae4)

@aldogonzalez8
Copy link
Contributor

Devin can you do the follwing:

Change in airbyte-integrations/connectors/source-s3/pyproject.toml

airbyte-cdk = {extras = ["file-based"], version = "6.38.3.dev04101"}

to

airbyte-cdk = {extras = ["file-based"], version = "^6.38.5"}

Do poetry lock to update airbyte-integrations/connectors/source-s3/poetry.lock

@aldogonzalez8 aldogonzalez8 marked this pull request as ready for review March 12, 2025 17:09
@aldogonzalez8
Copy link
Contributor

Devin in airbyte-integrations/connectors/source-s3/source_s3/v4/stream_reader.py

We have "from typing import Any, Dict, Iterable, List, Optional, Set, cast" that includes "Any", but I think "Any" is not used. Can we remove that part of the import? Probably is a miss from when we removed the unit tests that were unnecessary.

Copy link
Contributor

@natikgadzhi natikgadzhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good — merge when you're happy with it.

@aldogonzalez8
Copy link
Contributor

aldogonzalez8 commented Mar 12, 2025

/approve-regression-tests

Check job output.

✅ Approving regression tests

@aldogonzalez8
Copy link
Contributor

aldogonzalez8 commented Mar 12, 2025

/approve-regression-tests

Check job output.

✅ Approving regression tests

@aldogonzalez8 aldogonzalez8 merged commit 2da3ed7 into master Mar 12, 2025
27 checks passed
@aldogonzalez8 aldogonzalez8 deleted the devin/1741137980-update-source-s3 branch March 12, 2025 19:54
sc250072 added a commit to Teradata/airbyte that referenced this pull request Mar 14, 2025
* ✨ Source Intercom: adding a mock server test (airbytehq#54715)

* [Destination MS SQL V2] Correct Part Size, No buffered input stream (airbytehq#55252)

* [source-hibob] Changed check stream from payrolls to profiles airbytehq#55674 (airbytehq#55675)

* Add llms.txt to docs.airbyte.com (airbytehq#55261)

* ✨ TikTok Marketing Source: Add `Pixels`, `PixelInstantPageEvents`, `PixelEventsStatistics` streams (airbytehq#55669)

Co-authored-by: Octavia Squidington III <[email protected]>

* chore(ci): remove stale and unused workflows (airbytehq#55260)

* ci: for community CI, rename 'early ci' to 'pre-release checks', skip duplicated tests in 'connector tests' (airbytehq#55241)

* (feat: Salesloft) - Add emails_scoped_fields stream (airbytehq#55229)

* 🐛 Source Outreach: remove stream_state interpolation (airbytehq#55180)

Co-authored-by: Natik Gadzhi <[email protected]>
Co-authored-by: Octavia Squidington III <[email protected]>

* fix(source-stripe): disable progressive rollout (airbytehq#55682)

Signed-off-by: Artem Inzhyyants <[email protected]>

* fix(source-instagram): Disable cache for InstagramMediaChildrenTransformation (airbytehq#55685)

* [Destination MSSQL V2] Bulk Load Local Performance Test (airbytehq#55687)

* [Destination-S3] File Xfer Local Performance Test (airbytehq#55220)

* 🐛bug(source-hubspot): fix deals_archived and marketing_emails issues for CAT (airbytehq#54177)

* ✨Source Quickbooks: Migrate to manifest-only (airbytehq#55263)

* ✨ Source Zendesk Chat : Migrate to Manifest-only (airbytehq#47319)

* ✨ Source Facebook Marketing: Add `learning_stage_info` field to AdSets stream (airbytehq#50418)

Co-authored-by: Marcos Marx <[email protected]>
Co-authored-by: marcosmarxm <[email protected]>

* Source Sendgrid: Update manifest for adapting changes with AsyncRetriever (airbytehq#55185)

Co-authored-by: Octavia Squidington III <[email protected]>

* docs: fix broken markup in Python CDK Basic Concepts page (airbytehq#55699)

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: [email protected] <[email protected]>

* Source Zendesk Talk : Restore Unit Test (airbytehq#50956)

Co-authored-by: Octavia Squidington III <[email protected]>
Co-authored-by: Natik Gadzhi <[email protected]>

* chore: 2.0.0 release (airbytehq#55684)

* Adding EnrichedAirbyteValue and DeclaredField (airbytehq#55218)

Co-authored-by: Octavia Squidington III <[email protected]>

* Add Airbyte Academy section (airbytehq#49964)

* [source-faker] - Bump to stable 6.2.21 (airbytehq#55705)

* fix: remove duplicate breaking changes from destination-mssql metadata (airbytehq#55718)

* fix: restore definition ID (airbytehq#55720)

* ✨ Source Freshdesk : Migrate to Manifest-only (airbytehq#54687)

Co-authored-by: Octavia Squidington III <[email protected]>

* chore(source-s3): update base image to 4.0.0 and use caret dependencies (do not merge) (airbytehq#55202)

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Octavia Squidington III <[email protected]>
Co-authored-by: Aldo Gonzalez <[email protected]>

* fix(Source-LinkedIn-Ads): Update outdated schema (airbytehq#55724)

* Bump @babel/runtime-corejs3 from 7.23.6 to 7.26.10 in /docusaurus (airbytehq#55708)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* 🐛 Source S3: Up CDK to fix schema type issue (airbytehq#55694)

* source-shipstation contribution from suhl79 (airbytehq#55738)

Co-authored-by: Octavia Squidington III <[email protected]>
Co-authored-by: Marcos Marx <[email protected]>

* destination-teradata: Upgrade JDBC driver (airbytehq#55183)

Co-authored-by: Marcos Marx <[email protected]>

* 🐛 Destination Databricks: Fix destination check test table collisions when multiple connections write to same schema. (airbytehq#55232)

Co-authored-by: Octavia Squidington III <[email protected]>

* (source-sendgrid) - Configure max concurrent async job count (airbytehq#55744)

* pass streams to debezium sources on cold start (airbytehq#55734)

* Destination S3 Data Lake: exclude invalid fields from identifier fields (airbytehq#55700)

* [source-mysql] pin to cdk 0.342 (airbytehq#55754)

* docs: add enterprise connector documentation (airbytehq#55751)

* Add UTM source (airbytehq#55733)

* Matteogp/docs sap hana update 1 (airbytehq#55696)

* Destination S3 Data Lake: Handle number in primary key (airbytehq#55755)

* Fix reversed assertions in MySQL source tests (airbytehq#55756)

* Update CDK to pass DestinationRecordRaw around (airbytehq#55737)

Co-authored-by: Octavia Squidington III <[email protected]>
Co-authored-by: Edward Gao <[email protected]>

* fix(ci): remove empty notify-on-push workflow file (airbytehq#55757)

* Update OTEL metrics, add new linting exceptions (airbytehq#55752)

* 11526 second pass through iceberg documentation (airbytehq#55736)

* [source-mysql] don't do sampling for source-mysql (airbytehq#55761)

Co-authored-by: Octavia Squidington III <[email protected]>

* 🚨Source Fauna: Migrate to poetry (airbytehq#41051)

---------

Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Maxime Carbonneau-Leclerc <[email protected]>
Co-authored-by: Johnny Schmidt <[email protected]>
Co-authored-by: tautvydas-v <[email protected]>
Co-authored-by: Ian Alton <[email protected]>
Co-authored-by: Tope Folorunso <[email protected]>
Co-authored-by: Octavia Squidington III <[email protected]>
Co-authored-by: Natik Gadzhi <[email protected]>
Co-authored-by: Aaron ("AJ") Steers <[email protected]>
Co-authored-by: Tyler B <[email protected]>
Co-authored-by: kyleromines <[email protected]>
Co-authored-by: Artem Inzhyyants <[email protected]>
Co-authored-by: Anatolii Yatsuk <[email protected]>
Co-authored-by: Aldo Gonzalez <[email protected]>
Co-authored-by: Dhroov Makwana <[email protected]>
Co-authored-by: jake horban <[email protected]>
Co-authored-by: Marcos Marx <[email protected]>
Co-authored-by: marcosmarxm <[email protected]>
Co-authored-by: btkcodedev <[email protected]>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Jonathan Pearlin <[email protected]>
Co-authored-by: Francis Genet <[email protected]>
Co-authored-by: Patrick Nilan <[email protected]>
Co-authored-by: Aldo Gonzalez <[email protected]>
Co-authored-by: Alfredo Garcia <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: suhl79 <[email protected]>
Co-authored-by: Satish Chinthanippu <[email protected]>
Co-authored-by: Sena Heydari <[email protected]>
Co-authored-by: Matt Bayley <[email protected]>
Co-authored-by: Edward Gao <[email protected]>
Co-authored-by: Matteo Palarchio <[email protected]>
Co-authored-by: Wenqi Hu <[email protected]>
Co-authored-by: Yue Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants