Skip to content

Use curl as optional client v1.4 #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 64 commits into
base: v1.4-andium
Choose a base branch
from

Conversation

Tmonster
Copy link
Contributor

Follow up to #86, except merging with v1.4-duckdb to see if CI resolves. Otherwise will need intervention on duckdb submodule pointer as well

Picking up #77 to merge #58 and add a way to optionally set the client_implementation.

This integrates AND #76, since they have some interactions that needed some care.

After duckdb/duckdb#18107 landed in duckdb/duckdb, and moving the duckdb submodule to a recent commit on v1.3-ossivalis, this PR allows to switch at runtime based on the newly added httpfs config option httpfs_client_implementation:

D SET logging_storage=stdout;
D PRAGMA enable_logging('HTTP');
D SET httpfs_client_implementation='default';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:18.479, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537178.169255,VS0,VE1', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290029-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 1', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:06:18 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=806, Accept-Ranges=bytes}}}, CONNECTION, 2, 11, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='curl';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:30.247, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': '', 'headers': {content-type=application/octet-stream, x-ms-lease-state=available, last-modified='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, accept-ranges=bytes, x-ms-version=2025-05-05, server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', x-cache='HIT, HIT', __RESPONSE_STATUS__='HTTP/2 200 ', etag='"0x8DAF8D1CD43CA79"', x-ms-blob-type=BlockBlob, x-ms-server-encrypted=true, age=818, x-ms-lease-status=unlocked, x-served-by='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290021-RTM', fastly-restarts=1, via='1.1 varnish, 1.1 varnish', date='Thu, 03 Jul 2025 10:06:30 GMT', x-cache-hits='3730, 1', content-disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-timer='S1751537190.940711,VS0,VE1', content-length=21916382}}}, CONNECTION, 2, 13, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='httplib';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:07:45.552, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537265.144944,VS0,VE0', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290047-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 0', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:07:45 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=893, Accept-Ranges=bytes}}}, CONNECTION, 2, 15, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='something_else';
Invalid Input Error:
Unsupported option for httpfs_client_implementation, only `curl`, `httplib` and `default` are currently supported
It can be checked from the headers that slightly different implementations are used, given for example different styling for Etag vs etag or similar implementation details.

Please check original PR from @Tmonster that all relevant details: #58, this PR only adds a setting and resolve conflict with ongoing work.
Probably best path is cherry-picking commit back into original PR, or anyhow to be discussed on a side.

hannes added a commit to duckdb/duckdb that referenced this pull request Aug 18, 2025
Yes this will fix my build errors at
duckdb/duckdb-httpfs#96. This CI link has a
passing build status
https://github.com/duckdb/duckdb-httpfs/actions/runs/16991311944/job/48171130825

~~[DO NOT MERGE]: 
I am testing to see if this is the correct fix with [this
PR](duckdb/duckdb-httpfs#96) first. I am just
updating the duckdb submodule pointer for the httpfs fork to the branch
here. If those tests pass then I know what the correct fix is. (don't
know how to trigger it otherwise yet)~~

This is prompted by this PR
duckdb/duckdb-httpfs#96. Related [CI
failure](https://github.com/duckdb/duckdb-httpfs/actions/runs/16932639981/job/47981684022?pr=96#step:26:597)

Seems like the httplib has conflicts with the max() function. I've
searched for other instances of `::max()` and `::min()` in httplib.hpp
and didn't find any.

It seems like the proper fix is to use
`(std::numeric_limits<size_t>::max)()` as seen on line [96 of
httplib.hpp](https://github.com/duckdb/duckdb/blob/1f0de28806a8915c8203dd060dad549f28f5539b/third_party/httplib/httplib.hpp#L96)
and that did not fail the windows build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants