diff --git a/HISTORY.md b/HISTORY.md
index cfacb629..dc733aad 100644
--- a/HISTORY.md
+++ b/HISTORY.md
@@ -3,6 +3,7 @@
## Unreleased
- Fixed `rmtree` fail on Azure with no `hns` and more than 256 blobs to drop (Issue [#509](https://github.com/drivendataorg/cloudpathlib/issues/509), PR [#508](https://github.com/drivendataorg/cloudpathlib/pull/508), thanks @alikefia)
+- Added support for http(s) urls with `HttpClient`, `HttpPath`, `HttpsClient`, and `HttpsPath`. (Issue [#455](https://github.com/drivendataorg/cloudpathlib/issues/455 ), PR [#468](https://github.com/drivendataorg/cloudpathlib/pull/468))
## v0.21.0 (2025-03-03)
diff --git a/README.md b/README.md
index 5ca8ef50..f76eb223 100644
--- a/README.md
+++ b/README.md
@@ -124,88 +124,97 @@ list(root_dir.glob('**/*.txt'))
Most methods and properties from `pathlib.Path` are supported except for the ones that don't make sense in a cloud context. There are a few additional methods or properties that relate to specific cloud services or specifically for cloud paths.
-| Methods + properties | `AzureBlobPath` | `S3Path` | `GSPath` |
-|:-----------------------|:------------------|:-----------|:-----------|
-| `absolute` | ✅ | ✅ | ✅ |
-| `anchor` | ✅ | ✅ | ✅ |
-| `as_uri` | ✅ | ✅ | ✅ |
-| `drive` | ✅ | ✅ | ✅ |
-| `exists` | ✅ | ✅ | ✅ |
-| `glob` | ✅ | ✅ | ✅ |
-| `is_absolute` | ✅ | ✅ | ✅ |
-| `is_dir` | ✅ | ✅ | ✅ |
-| `is_file` | ✅ | ✅ | ✅ |
-| `is_relative_to` | ✅ | ✅ | ✅ |
-| `iterdir` | ✅ | ✅ | ✅ |
-| `joinpath` | ✅ | ✅ | ✅ |
-| `match` | ✅ | ✅ | ✅ |
-| `mkdir` | ✅ | ✅ | ✅ |
-| `name` | ✅ | ✅ | ✅ |
-| `open` | ✅ | ✅ | ✅ |
-| `parent` | ✅ | ✅ | ✅ |
-| `parents` | ✅ | ✅ | ✅ |
-| `parts` | ✅ | ✅ | ✅ |
-| `read_bytes` | ✅ | ✅ | ✅ |
-| `read_text` | ✅ | ✅ | ✅ |
-| `relative_to` | ✅ | ✅ | ✅ |
-| `rename` | ✅ | ✅ | ✅ |
-| `replace` | ✅ | ✅ | ✅ |
-| `resolve` | ✅ | ✅ | ✅ |
-| `rglob` | ✅ | ✅ | ✅ |
-| `rmdir` | ✅ | ✅ | ✅ |
-| `samefile` | ✅ | ✅ | ✅ |
-| `stat` | ✅ | ✅ | ✅ |
-| `stem` | ✅ | ✅ | ✅ |
-| `suffix` | ✅ | ✅ | ✅ |
-| `suffixes` | ✅ | ✅ | ✅ |
-| `touch` | ✅ | ✅ | ✅ |
-| `unlink` | ✅ | ✅ | ✅ |
-| `with_name` | ✅ | ✅ | ✅ |
-| `with_stem` | ✅ | ✅ | ✅ |
-| `with_suffix` | ✅ | ✅ | ✅ |
-| `write_bytes` | ✅ | ✅ | ✅ |
-| `write_text` | ✅ | ✅ | ✅ |
-| `as_posix` | ❌ | ❌ | ❌ |
-| `chmod` | ❌ | ❌ | ❌ |
-| `cwd` | ❌ | ❌ | ❌ |
-| `expanduser` | ❌ | ❌ | ❌ |
-| `group` | ❌ | ❌ | ❌ |
-| `hardlink_to` | ❌ | ❌ | ❌ |
-| `home` | ❌ | ❌ | ❌ |
-| `is_block_device` | ❌ | ❌ | ❌ |
-| `is_char_device` | ❌ | ❌ | ❌ |
-| `is_fifo` | ❌ | ❌ | ❌ |
-| `is_mount` | ❌ | ❌ | ❌ |
-| `is_reserved` | ❌ | ❌ | ❌ |
-| `is_socket` | ❌ | ❌ | ❌ |
-| `is_symlink` | ❌ | ❌ | ❌ |
-| `lchmod` | ❌ | ❌ | ❌ |
-| `link_to` | ❌ | ❌ | ❌ |
-| `lstat` | ❌ | ❌ | ❌ |
-| `owner` | ❌ | ❌ | ❌ |
-| `readlink` | ❌ | ❌ | ❌ |
-| `root` | ❌ | ❌ | ❌ |
-| `symlink_to` | ❌ | ❌ | ❌ |
-| `as_url` | ✅ | ✅ | ✅ |
-| `clear_cache` | ✅ | ✅ | ✅ |
-| `cloud_prefix` | ✅ | ✅ | ✅ |
-| `copy` | ✅ | ✅ | ✅ |
-| `copytree` | ✅ | ✅ | ✅ |
-| `download_to` | ✅ | ✅ | ✅ |
-| `etag` | ✅ | ✅ | ✅ |
-| `fspath` | ✅ | ✅ | ✅ |
-| `is_junction` | ✅ | ✅ | ✅ |
-| `is_valid_cloudpath` | ✅ | ✅ | ✅ |
-| `rmtree` | ✅ | ✅ | ✅ |
-| `upload_from` | ✅ | ✅ | ✅ |
-| `validate` | ✅ | ✅ | ✅ |
-| `walk` | ✅ | ✅ | ✅ |
-| `with_segments` | ✅ | ✅ | ✅ |
-| `blob` | ✅ | ❌ | ✅ |
-| `bucket` | ❌ | ✅ | ✅ |
-| `container` | ✅ | ❌ | ❌ |
-| `key` | ❌ | ✅ | ❌ |
-| `md5` | ✅ | ❌ | ✅ |
+| Methods + properties | `AzureBlobPath` | `GSPath` | `HttpsPath` | `S3Path` |
+|:-----------------------|:------------------|:-----------|:--------------|:-----------|
+| `absolute` | ✅ | ✅ | ✅ | ✅ |
+| `anchor` | ✅ | ✅ | ✅ | ✅ |
+| `as_uri` | ✅ | ✅ | ✅ | ✅ |
+| `drive` | ✅ | ✅ | ✅ | ✅ |
+| `exists` | ✅ | ✅ | ✅ | ✅ |
+| `glob` | ✅ | ✅ | ✅ | ✅ |
+| `is_absolute` | ✅ | ✅ | ✅ | ✅ |
+| `is_dir` | ✅ | ✅ | ✅ | ✅ |
+| `is_file` | ✅ | ✅ | ✅ | ✅ |
+| `is_junction` | ✅ | ✅ | ✅ | ✅ |
+| `is_relative_to` | ✅ | ✅ | ✅ | ✅ |
+| `iterdir` | ✅ | ✅ | ✅ | ✅ |
+| `joinpath` | ✅ | ✅ | ✅ | ✅ |
+| `match` | ✅ | ✅ | ✅ | ✅ |
+| `mkdir` | ✅ | ✅ | ✅ | ✅ |
+| `name` | ✅ | ✅ | ✅ | ✅ |
+| `open` | ✅ | ✅ | ✅ | ✅ |
+| `parent` | ✅ | ✅ | ✅ | ✅ |
+| `parents` | ✅ | ✅ | ✅ | ✅ |
+| `parts` | ✅ | ✅ | ✅ | ✅ |
+| `read_bytes` | ✅ | ✅ | ✅ | ✅ |
+| `read_text` | ✅ | ✅ | ✅ | ✅ |
+| `relative_to` | ✅ | ✅ | ✅ | ✅ |
+| `rename` | ✅ | ✅ | ✅ | ✅ |
+| `replace` | ✅ | ✅ | ✅ | ✅ |
+| `resolve` | ✅ | ✅ | ✅ | ✅ |
+| `rglob` | ✅ | ✅ | ✅ | ✅ |
+| `rmdir` | ✅ | ✅ | ✅ | ✅ |
+| `samefile` | ✅ | ✅ | ✅ | ✅ |
+| `stat` | ✅ | ✅ | ✅ | ✅ |
+| `stem` | ✅ | ✅ | ✅ | ✅ |
+| `suffix` | ✅ | ✅ | ✅ | ✅ |
+| `suffixes` | ✅ | ✅ | ✅ | ✅ |
+| `touch` | ✅ | ✅ | ✅ | ✅ |
+| `unlink` | ✅ | ✅ | ✅ | ✅ |
+| `walk` | ✅ | ✅ | ✅ | ✅ |
+| `with_name` | ✅ | ✅ | ✅ | ✅ |
+| `with_segments` | ✅ | ✅ | ✅ | ✅ |
+| `with_stem` | ✅ | ✅ | ✅ | ✅ |
+| `with_suffix` | ✅ | ✅ | ✅ | ✅ |
+| `write_bytes` | ✅ | ✅ | ✅ | ✅ |
+| `write_text` | ✅ | ✅ | ✅ | ✅ |
+| `as_posix` | ❌ | ❌ | ❌ | ❌ |
+| `chmod` | ❌ | ❌ | ❌ | ❌ |
+| `cwd` | ❌ | ❌ | ❌ | ❌ |
+| `expanduser` | ❌ | ❌ | ❌ | ❌ |
+| `group` | ❌ | ❌ | ❌ | ❌ |
+| `hardlink_to` | ❌ | ❌ | ❌ | ❌ |
+| `home` | ❌ | ❌ | ❌ | ❌ |
+| `is_block_device` | ❌ | ❌ | ❌ | ❌ |
+| `is_char_device` | ❌ | ❌ | ❌ | ❌ |
+| `is_fifo` | ❌ | ❌ | ❌ | ❌ |
+| `is_mount` | ❌ | ❌ | ❌ | ❌ |
+| `is_reserved` | ❌ | ❌ | ❌ | ❌ |
+| `is_socket` | ❌ | ❌ | ❌ | ❌ |
+| `is_symlink` | ❌ | ❌ | ❌ | ❌ |
+| `lchmod` | ❌ | ❌ | ❌ | ❌ |
+| `lstat` | ❌ | ❌ | ❌ | ❌ |
+| `owner` | ❌ | ❌ | ❌ | ❌ |
+| `readlink` | ❌ | ❌ | ❌ | ❌ |
+| `root` | ❌ | ❌ | ❌ | ❌ |
+| `symlink_to` | ❌ | ❌ | ❌ | ❌ |
+| `as_url` | ✅ | ✅ | ✅ | ✅ |
+| `clear_cache` | ✅ | ✅ | ✅ | ✅ |
+| `client` | ✅ | ✅ | ✅ | ✅ |
+| `cloud_prefix` | ✅ | ✅ | ✅ | ✅ |
+| `copy` | ✅ | ✅ | ✅ | ✅ |
+| `copytree` | ✅ | ✅ | ✅ | ✅ |
+| `download_to` | ✅ | ✅ | ✅ | ✅ |
+| `from_uri` | ✅ | ✅ | ✅ | ✅ |
+| `fspath` | ✅ | ✅ | ✅ | ✅ |
+| `full_match` | ✅ | ✅ | ✅ | ✅ |
+| `is_valid_cloudpath` | ✅ | ✅ | ✅ | ✅ |
+| `parser` | ✅ | ✅ | ✅ | ✅ |
+| `rmtree` | ✅ | ✅ | ✅ | ✅ |
+| `upload_from` | ✅ | ✅ | ✅ | ✅ |
+| `validate` | ✅ | ✅ | ✅ | ✅ |
+| `etag` | ✅ | ✅ | ❌ | ✅ |
+| `blob` | ✅ | ✅ | ❌ | ❌ |
+| `bucket` | ❌ | ✅ | ❌ | ✅ |
+| `md5` | ✅ | ✅ | ❌ | ❌ |
+| `container` | ✅ | ❌ | ❌ | ❌ |
+| `delete` | ❌ | ❌ | ✅ | ❌ |
+| `get` | ❌ | ❌ | ✅ | ❌ |
+| `head` | ❌ | ❌ | ✅ | ❌ |
+| `key` | ❌ | ❌ | ❌ | ✅ |
+| `parsed_url` | ❌ | ❌ | ✅ | ❌ |
+| `post` | ❌ | ❌ | ✅ | ❌ |
+| `put` | ❌ | ❌ | ✅ | ❌ |
----
diff --git a/cloudpathlib/__init__.py b/cloudpathlib/__init__.py
index da4fe28e..84ed31b2 100644
--- a/cloudpathlib/__init__.py
+++ b/cloudpathlib/__init__.py
@@ -4,9 +4,11 @@
from .azure.azblobclient import AzureBlobClient
from .azure.azblobpath import AzureBlobPath
from .cloudpath import CloudPath, implementation_registry
-from .s3.s3client import S3Client
-from .gs.gspath import GSPath
from .gs.gsclient import GSClient
+from .gs.gspath import GSPath
+from .http.httpclient import HttpClient, HttpsClient
+from .http.httppath import HttpPath, HttpsPath
+from .s3.s3client import S3Client
from .s3.s3path import S3Path
@@ -27,6 +29,10 @@
"implementation_registry",
"GSClient",
"GSPath",
+ "HttpClient",
+ "HttpsClient",
+ "HttpPath",
+ "HttpsPath",
"S3Client",
"S3Path",
]
diff --git a/cloudpathlib/cloudpath.py b/cloudpathlib/cloudpath.py
index 5845e929..f7621c5b 100644
--- a/cloudpathlib/cloudpath.py
+++ b/cloudpathlib/cloudpath.py
@@ -27,7 +27,6 @@
Generator,
List,
Optional,
- Sequence,
Tuple,
Type,
TYPE_CHECKING,
@@ -299,11 +298,11 @@ def __setstate__(self, state: Dict[str, Any]) -> None:
@property
def _no_prefix(self) -> str:
- return self._str[len(self.cloud_prefix) :]
+ return self._str[len(self.anchor) :]
@property
def _no_prefix_no_drive(self) -> str:
- return self._str[len(self.cloud_prefix) + len(self.drive) :]
+ return self._str[len(self.anchor) + len(self.drive) :]
@overload
@classmethod
@@ -909,9 +908,9 @@ def relative_to(self, other: Self, walk_up: bool = False) -> PurePosixPath:
# absolute)
if not isinstance(other, CloudPath):
raise ValueError(f"{self} is a cloud path, but {other} is not")
- if self.cloud_prefix != other.cloud_prefix:
+ if self.anchor != other.anchor:
raise ValueError(
- f"{self} is a {self.cloud_prefix} path, but {other} is a {other.cloud_prefix} path"
+ f"{self} is a {self.anchor} path, but {other} is a {other.anchor} path"
)
kwargs = dict(walk_up=walk_up)
@@ -939,6 +938,9 @@ def full_match(self, pattern: str, case_sensitive: Optional[bool] = None) -> boo
# strip scheme from start of pattern before testing
if pattern.startswith(self.anchor + self.drive):
pattern = pattern[len(self.anchor + self.drive) :]
+ elif pattern.startswith(self.anchor):
+ # for http paths, keep leading slash
+ pattern = pattern[len(self.anchor) - 1 :]
# remove drive, which is kept on normal dispatch to pathlib
return PurePosixPath(self._no_prefix_no_drive).full_match( # type: ignore[attr-defined]
@@ -969,7 +971,7 @@ def parent(self) -> Self:
return self._dispatch_to_path("parent")
@property
- def parents(self) -> Sequence[Self]:
+ def parents(self) -> Tuple[Self, ...]:
return self._dispatch_to_path("parents")
@property
@@ -1224,7 +1226,7 @@ def copytree(self, destination, force_overwrite_to_cloud=None, ignore=None):
)
elif subpath.is_dir():
subpath.copytree(
- destination / subpath.name,
+ destination / (subpath.name + ("" if subpath.name.endswith("/") else "/")),
force_overwrite_to_cloud=force_overwrite_to_cloud,
ignore=ignore,
)
@@ -1258,8 +1260,8 @@ def _new_cloudpath(self, path: Union[str, os.PathLike]) -> Self:
path = path[1:]
# add prefix/anchor if it is not already
- if not path.startswith(self.cloud_prefix):
- path = f"{self.cloud_prefix}{path}"
+ if not path.startswith(self.anchor):
+ path = f"{self.anchor}{path}"
return self.client.CloudPath(path)
diff --git a/cloudpathlib/http/__init__.py b/cloudpathlib/http/__init__.py
new file mode 100644
index 00000000..ccf7452e
--- /dev/null
+++ b/cloudpathlib/http/__init__.py
@@ -0,0 +1,9 @@
+from .httpclient import HttpClient, HttpsClient
+from .httppath import HttpPath, HttpsPath
+
+__all__ = [
+ "HttpClient",
+ "HttpPath",
+ "HttpsClient",
+ "HttpsPath",
+]
diff --git a/cloudpathlib/http/httpclient.py b/cloudpathlib/http/httpclient.py
new file mode 100644
index 00000000..4f1fe87a
--- /dev/null
+++ b/cloudpathlib/http/httpclient.py
@@ -0,0 +1,201 @@
+from datetime import datetime, timezone
+import http
+import os
+import re
+import urllib.request
+import urllib.parse
+import urllib.error
+from pathlib import Path
+from typing import Iterable, Optional, Tuple, Union, Callable
+import shutil
+import mimetypes
+
+from cloudpathlib.client import Client, register_client_class
+from cloudpathlib.enums import FileCacheMode
+
+from .httppath import HttpPath
+
+
+@register_client_class("http")
+class HttpClient(Client):
+ def __init__(
+ self,
+ file_cache_mode: Optional[Union[str, FileCacheMode]] = None,
+ local_cache_dir: Optional[Union[str, os.PathLike]] = None,
+ content_type_method: Optional[Callable] = mimetypes.guess_type,
+ auth: Optional[urllib.request.BaseHandler] = None,
+ custom_list_page_parser: Optional[Callable[[str], Iterable[str]]] = None,
+ custom_dir_matcher: Optional[Callable[[str], bool]] = None,
+ write_file_http_method: Optional[str] = "PUT",
+ ):
+ """Class constructor. Creates an HTTP client that can be used to interact with HTTP servers
+ using the cloudpathlib library.
+
+ Args:
+ file_cache_mode (Optional[Union[str, FileCacheMode]]): How often to clear the file cache; see
+ [the caching docs](https://cloudpathlib.drivendata.org/stable/caching/) for more information
+ about the options in cloudpathlib.eums.FileCacheMode.
+ local_cache_dir (Optional[Union[str, os.PathLike]]): Path to directory to use as cache
+ for downloaded files. If None, will use a temporary directory. Default can be set with
+ the `CLOUDPATHLIB_LOCAL_CACHE_DIR` environment variable.
+ content_type_method (Optional[Callable]): Function to call to guess media type (mimetype) when
+ uploading files. Defaults to `mimetypes.guess_type`.
+ auth (Optional[urllib.request.BaseHandler]): Authentication handler to use for the client. Defaults to None, which will use the default handler.
+ custom_list_page_parser (Optional[Callable[[str], Iterable[str]]]): Function to call to parse pages that list directories. Defaults to looking for `` tags with `href`.
+ custom_dir_matcher (Optional[Callable[[str], bool]]): Function to call to identify a url that is a directory. Defaults to a lambda that checks if the path ends with a `/`.
+ write_file_http_method (Optional[str]): HTTP method to use when writing files. Defaults to "PUT", but some servers may want "POST".
+ """
+ super().__init__(file_cache_mode, local_cache_dir, content_type_method)
+ self.auth = auth
+
+ if self.auth is None:
+ self.opener = urllib.request.build_opener()
+ else:
+ self.opener = urllib.request.build_opener(self.auth)
+
+ self.custom_list_page_parser = custom_list_page_parser
+
+ self.dir_matcher = (
+ custom_dir_matcher if custom_dir_matcher is not None else lambda x: x.endswith("/")
+ )
+
+ self.write_file_http_method = write_file_http_method
+
+ def _get_metadata(self, cloud_path: HttpPath) -> dict:
+ with self.opener.open(cloud_path.as_url()) as response:
+ last_modified = response.headers.get("Last-Modified", None)
+
+ if last_modified is not None:
+ # per https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Last-Modified
+ last_modified = datetime.strptime(last_modified, "%a, %d %b %Y %H:%M:%S %Z")
+
+ # should always be utc https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Last-Modified#gmt
+ last_modified = last_modified.replace(tzinfo=timezone.utc)
+
+ return {
+ "size": int(response.headers.get("Content-Length", 0)),
+ "last_modified": last_modified,
+ "content_type": response.headers.get("Content-Type", None),
+ }
+
+ def _download_file(self, cloud_path: HttpPath, local_path: Union[str, os.PathLike]) -> Path:
+ local_path = Path(local_path)
+ with self.opener.open(cloud_path.as_url()) as response:
+ # Ensure parent directory exists before opening file
+ local_path.parent.mkdir(parents=True, exist_ok=True)
+ with local_path.open("wb") as out_file:
+ shutil.copyfileobj(response, out_file)
+ return local_path
+
+ def _exists(self, cloud_path: HttpPath) -> bool:
+ request = urllib.request.Request(cloud_path.as_url(), method="HEAD")
+ try:
+ with self.opener.open(request) as response:
+ return response.status == 200
+ except (urllib.error.HTTPError, urllib.error.URLError) as e:
+ if isinstance(e, urllib.error.URLError) or e.code == 404:
+ return False
+ raise
+
+ def _move_file(self, src: HttpPath, dst: HttpPath, remove_src: bool = True) -> HttpPath:
+ # .fspath will download the file so the local version can be uploaded
+ self._upload_file(src.fspath, dst)
+ if remove_src:
+ self._remove(src)
+ return dst
+
+ def _remove(self, cloud_path: HttpPath, missing_ok: bool = True) -> None:
+ request = urllib.request.Request(cloud_path.as_url(), method="DELETE")
+ try:
+ with self.opener.open(request) as response:
+ if response.status != 204:
+ raise Exception(f"Failed to delete {cloud_path}.")
+ except urllib.error.HTTPError as e:
+ if e.code == 404 and missing_ok:
+ pass
+ else:
+ raise FileNotFoundError(f"Failed to delete {cloud_path}.")
+
+ def _list_dir(self, cloud_path: HttpPath, recursive: bool) -> Iterable[Tuple[HttpPath, bool]]:
+ try:
+ with self.opener.open(cloud_path.as_url()) as response:
+ # Parse the directory listing
+ for path, is_dir in self._parse_list_dir_response(
+ response.read().decode(), base_url=str(cloud_path)
+ ):
+ yield path, is_dir
+
+ # If it's a directory and recursive is True, list the contents of the directory
+ if recursive and is_dir:
+ yield from self._list_dir(path, recursive=True)
+
+ except Exception as e: # noqa E722
+ raise NotImplementedError(
+ f"Unable to parse response as a listing of files; please provide a custom parser as `custom_list_page_parser`. Error raised: {e}"
+ )
+
+ def _upload_file(self, local_path: Union[str, os.PathLike], cloud_path: HttpPath) -> HttpPath:
+ local_path = Path(local_path)
+ if self.content_type_method is not None:
+ content_type, _ = self.content_type_method(local_path)
+
+ headers = {"Content-Type": content_type or "application/octet-stream"}
+
+ with local_path.open("rb") as file_data:
+ request = urllib.request.Request(
+ cloud_path.as_url(),
+ data=file_data.read(),
+ method=self.write_file_http_method,
+ headers=headers,
+ )
+ with self.opener.open(request) as response:
+ if response.status != 201 and response.status != 200:
+ raise Exception(f"Failed to upload {local_path} to {cloud_path}.")
+ return cloud_path
+
+ def _get_public_url(self, cloud_path: HttpPath) -> str:
+ return cloud_path.as_url()
+
+ def _generate_presigned_url(self, cloud_path: HttpPath, expire_seconds: int = 60 * 60) -> str:
+ raise NotImplementedError("Presigned URLs are not supported using urllib.")
+
+ def _parse_list_dir_response(
+ self, response: str, base_url: str
+ ) -> Iterable[Tuple[HttpPath, bool]]:
+ # Ensure base_url ends with a trailing slash so joining works
+ if not base_url.endswith("/"):
+ base_url += "/"
+
+ def _simple_links(html: str) -> Iterable[str]:
+ return re.findall(r' Tuple[http.client.HTTPResponse, bytes]:
+ request = urllib.request.Request(url.as_url(), method=method, **kwargs)
+ with self.opener.open(request) as response:
+ # eager read of response content, which is not available after
+ # the connection is closed when we exit the context manager.
+ return response, response.read()
+
+
+HttpClient.HttpPath = HttpClient.CloudPath # type: ignore
+
+
+@register_client_class("https")
+class HttpsClient(HttpClient):
+ pass
+
+
+HttpsClient.HttpsPath = HttpsClient.CloudPath # type: ignore
diff --git a/cloudpathlib/http/httppath.py b/cloudpathlib/http/httppath.py
new file mode 100644
index 00000000..3f42a82d
--- /dev/null
+++ b/cloudpathlib/http/httppath.py
@@ -0,0 +1,163 @@
+import datetime
+import http
+import os
+from pathlib import Path, PurePosixPath
+from tempfile import TemporaryDirectory
+from typing import Any, Tuple, TYPE_CHECKING, Union, Optional
+import urllib
+
+from ..cloudpath import CloudPath, NoStatError, register_path_class
+
+
+if TYPE_CHECKING:
+ from .httpclient import HttpClient, HttpsClient
+
+
+@register_path_class("http")
+class HttpPath(CloudPath):
+ cloud_prefix = "http://"
+ client: "HttpClient"
+
+ def __init__(
+ self,
+ cloud_path: Union[str, "HttpPath"],
+ client: Optional["HttpClient"] = None,
+ ) -> None:
+ super().__init__(cloud_path, client)
+
+ self._path = (
+ PurePosixPath(self._url.path)
+ if self._url.path.startswith("/")
+ else PurePosixPath(f"/{self._url.path}")
+ )
+
+ @property
+ def _local(self) -> Path:
+ """Cached local version of the file."""
+ # remove params, query, fragment to get local path
+ return self.client._local_cache_dir / self._url.path.lstrip("/")
+
+ def _dispatch_to_path(self, func: str, *args, **kwargs) -> Any:
+ sup = super()._dispatch_to_path(func, *args, **kwargs)
+
+ # some dispatch methods like "__truediv__" strip trailing slashes;
+ # for http paths, we need to keep them to indicate directories
+ if func == "__truediv__" and str(args[0]).endswith("/"):
+ return self._new_cloudpath(str(sup) + "/")
+
+ else:
+ return sup
+
+ @property
+ def parsed_url(self) -> urllib.parse.ParseResult:
+ return self._url
+
+ @property
+ def drive(self) -> str:
+ # For HTTP paths, no drive; use .anchor for scheme + netloc
+ return self._url.netloc
+
+ @property
+ def anchor(self) -> str:
+ return f"{self._url.scheme}://{self._url.netloc}/"
+
+ @property
+ def _no_prefix_no_drive(self) -> str:
+ # netloc appears in anchor and drive for httppath; so don't double count
+ return self._str[len(self.anchor) - 1 :]
+
+ def is_dir(self, follow_symlinks: bool = True) -> bool:
+ if not self.exists():
+ return False
+
+ # Use client default to identify directories
+ return self.client.dir_matcher(str(self))
+
+ def is_file(self, follow_symlinks: bool = True) -> bool:
+ if not self.exists():
+ return False
+
+ return not self.client.dir_matcher(str(self))
+
+ def mkdir(self, parents: bool = False, exist_ok: bool = False) -> None:
+ pass # no-op for HTTP Paths
+
+ def touch(self, exist_ok: bool = True) -> None:
+ if self.exists():
+ if not exist_ok:
+ raise FileExistsError(f"File already exists: {self}")
+
+ raise NotImplementedError(
+ "Touch not implemented for existing HTTP files since we can't update the modified time; "
+ "use `put()` or write to the file instead."
+ )
+ else:
+ empty_file = Path(TemporaryDirectory().name) / "empty_file.txt"
+ empty_file.parent.mkdir(parents=True, exist_ok=True)
+ empty_file.write_text("")
+ self.client._upload_file(empty_file, self)
+
+ def stat(self, follow_symlinks: bool = True) -> os.stat_result:
+ try:
+ meta = self.client._get_metadata(self)
+ except: # noqa E722
+ raise NoStatError(f"Could not get metadata for {self}")
+
+ return os.stat_result(
+ ( # type: ignore
+ None, # mode
+ None, # ino
+ self.cloud_prefix, # dev,
+ None, # nlink,
+ None, # uid,
+ None, # gid,
+ meta.get("size", 0), # size,
+ None, # atime,
+ meta.get(
+ "last_modified", datetime.datetime.fromtimestamp(0)
+ ).timestamp(), # mtime,
+ None, # ctime,
+ )
+ )
+
+ def as_url(self, presign: bool = False, expire_seconds: int = 60 * 60) -> str:
+ if presign:
+ raise NotImplementedError("Presigning not supported for HTTP paths")
+
+ return (
+ self._url.geturl()
+ ) # recreate from what was initialized so we have the same query params, etc.
+
+ @property
+ def name(self) -> str:
+ return self._path.name
+
+ @property
+ def parents(self) -> Tuple["HttpPath", ...]:
+ return super().parents + (self._new_cloudpath(""),)
+
+ def get(self, **kwargs) -> Tuple[http.client.HTTPResponse, bytes]:
+ """Issue a get request with `urllib.request.Request`"""
+ return self.client.request(self, "GET", **kwargs)
+
+ def put(self, **kwargs) -> Tuple[http.client.HTTPResponse, bytes]:
+ """Issue a put request with `urllib.request.Request`"""
+ return self.client.request(self, "PUT", **kwargs)
+
+ def post(self, **kwargs) -> Tuple[http.client.HTTPResponse, bytes]:
+ """Issue a post request with `urllib.request.Request`"""
+ return self.client.request(self, "POST", **kwargs)
+
+ def delete(self, **kwargs) -> Tuple[http.client.HTTPResponse, bytes]:
+ """Issue a delete request with `urllib.request.Request`"""
+ return self.client.request(self, "DELETE", **kwargs)
+
+ def head(self, **kwargs) -> Tuple[http.client.HTTPResponse, bytes]:
+ """Issue a head request with `urllib.request.Request`"""
+ return self.client.request(self, "HEAD", **kwargs)
+
+
+@register_path_class("https")
+class HttpsPath(HttpPath):
+ cloud_prefix: str = "https://"
+ client: "HttpsClient"
diff --git a/docs/docs/http.md b/docs/docs/http.md
new file mode 100644
index 00000000..ce1846cf
--- /dev/null
+++ b/docs/docs/http.md
@@ -0,0 +1,208 @@
+# HTTP Support in CloudPathLib
+
+We support `http://` and `https://` URLs with `CloudPath`, but these behave somewhat differently from typical cloud provider URIs (e.g., `s3://`, `gs://`) or local file paths. This document describes those differences, caveats, and the additional configuration options available.
+
+ > **Note:** We don't currently automatically detect `http` links to cloud storage providers (for example, `http://s3.amazonaws.com/bucket/key`) and treat those as `S3Path`, `GSPath`, etc. They will be treated as normal urls (i.e., `HttpPath` objects).
+
+## Basic Usage
+
+```python
+from cloudpathlib import CloudPath
+
+# Create a path object
+path = CloudPath("https://example.com/data/file.txt")
+
+# Read file contents
+text = path.read_text()
+binary = path.read_bytes()
+
+# Get parent directory
+parent = path.parent # https://example.com/data/
+
+# Join paths
+subpath = path.parent / "other.txt" # https://example.com/data/other.txt
+
+# Check if file exists
+if path.exists():
+ print("File exists!")
+
+# Get file name and suffix
+print(path.name) # "file.txt"
+print(path.suffix) # ".txt"
+
+# List directory contents (if server supports directory listings)
+data_dir = CloudPath("https://example.com/data/")
+for child_path in data_dir.iterdir():
+ print(child_path)
+```
+
+## How HTTP Paths Differ
+
+ - HTTP servers are not necessarily structured like file systems. Operations such as listing directories, removing files, or creating folders depend on whether the server supports them.
+ - For many operations (e.g., uploading, removing files), this implementation relies on specific HTTP verbs like `PUT` or `DELETE`. If the server does not allow these verbs, those operations will fail.
+ - While some cloud storage backends (e.g., AWS S3) provide robust directory emulation, a basic HTTP server may only partially implement these concepts (e.g., listing a directory might just be an HTML page with links).
+ - HTTP URLs often include more than just a path, for example query strings, fragments, and other URL modifiers that are not part of the path. These are handled differently than with other cloud storage providers.
+
+## URL components
+
+You can access the various components of a URL via the `HttpPath.parsed_url` property, which is a [`urllib.parse.ParseResult`](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse) object.
+
+For example for the following URL:
+
+```
+https://username:password@www.example.com:8080/path/to/resource?query=param#fragment
+```
+
+The components are:
+
+```mermaid
+flowchart LR
+
+ %% Define colors for each block
+ classDef scheme fill:#FFD700,stroke:#000,stroke-width:1px,color:#000
+ classDef netloc fill:#ADD8E6,stroke:#000,stroke-width:1px,color:#000
+ classDef path fill:#98FB98,stroke:#000,stroke-width:1px,color:#000
+ classDef query fill:#EE82EE,stroke:#000,stroke-width:1px,color:#000
+ classDef fragment fill:#FFB6C1,stroke:#000,stroke-width:1px,color:#000
+
+ A[".scheme
https
"]:::scheme
+ B[".netloc
username:password\@www.example.com:8080"]:::netloc
+ C[".path
/path/to/resource
"]:::path
+ D[".query
query=param
"]:::query
+ E[".fragment
fragment
"]:::fragment
+
+ A --> B --> C --> D --> E
+```
+
+To access the components of the URL, you can use the `HttpPath.parsed_url` property:
+
+```python
+my_path = HttpPath("http://username:password@www.example.com:8080/path/to/resource?query=param#fragment")
+
+print(my_path.parsed_url.scheme) # "http"
+print(my_path.parsed_url.netloc) # "username:password@www.example.com:8080"
+print(my_path.parsed_url.path) # "/path/to/resource"
+print(my_path.parsed_url.query) # "query=param"
+print(my_path.parsed_url.fragment) # "fragment"
+
+# extra properties that are subcomponents of `netloc`
+print(my_path.parsed_url.username) # "username"
+print(my_path.parsed_url.password) # "password"
+print(my_path.parsed_url.hostname) # "www.example.com"
+print(my_path.parsed_url.port) # "8080"
+```
+
+### Preservation and Joining Behavior
+
+ - **Params, query, and fragment** are part of the URL, but be aware that when you perform operations that return a new path (e.g., joining `my_path / "subdir"`, walking directories, fetching parents, etc.), these modifiers will be discarded unless you explicitly preserve them, since we operate under the assumption that these modifiers are tied to the specific URL.
+ - **netloc (including the subcomponents, username, password, hostname, port) and scheme** are preserved when joining. They are derived from the main portion of the URL (e.g., `http://username:password@www.example.com:8080`).
+
+### The `HttpPath.anchor` Property
+
+Because of naming conventions inherited from Python's `pathlib`, the "anchor" in a CloudPath (e.g., `my_path.anchor`) refers to `:///`. It does **not** include the "fragment" portion of a URL (which is sometimes also called the "anchor" in HTML contexts since it can refer to a `` tag). In other words, `.anchor` returns something like `https://www.example.com/`, not `...#fragment`. To get the fragment, use `my_path.parsed_url.fragment`.
+
+## Required serverside HTTP verbs support
+
+Some operations require that the server support specific HTTP verbs. If your server does not support these verbs, the operation will fail.
+
+Your server needs to support these operations for them to succeed:
+
+ - If your server does not allow `DELETE`, you will not be able to remove files via `HttpPath.unlink()` or `HttpPath.remove()`.
+ - If your server does not allow `PUT` (or `POST`, see next bullet), you won't be able to upload files.
+ - By default, we use `PUT` for creating or replacing a file. If you need `POST` for uploads, you can override the behavior by passing `write_file_http_method="POST"` to the `HttpClient` constructor.
+
+### Making requests with the `HttpPath` object
+
+`HttpPath` and `HttpsPath` expose direct methods to perform the relevant HTTP verbs:
+
+```python
+response, content = my_path.get() # issues a GET
+response, content = my_path.put() # issues a PUT
+response, content = my_path.post() # issues a POST
+response, content = my_path.delete() # issues a DELETE
+response, content = my_path.head() # issues a HEAD
+```
+
+These methods are thin wrappers around the client's underlying `request(...)` method, so you can pass any arguments that [`urllib.request.Request`](https://docs.python.org/3/library/urllib.request.html#urllib.request.Request) supports, so you can pass content via `data=` and headers via `headers=`.
+
+## Authentication
+
+By default, `HttpClient` will build a simple opener with `urllib.request.build_opener()`, which typically handles no or basic system-wide HTTP auth. However, you can pass an implementation of `urllib.request.BaseHandler` (e.g., an `HTTPBasicAuthHandler`) to the `HttpClient` of `HttpsClient` constructors to handle authentication:
+
+```python
+import urllib.request
+
+auth_handler = urllib.request.HTTPBasicAuthHandler()
+auth_handler.add_password(
+ realm="Some Realm",
+ uri="http://www.example.com",
+ user="username",
+ passwd="password"
+)
+
+client = HttpClient(auth=auth_handler)
+my_path = client.CloudPath("http://www.example.com/secret/data.txt")
+
+# Now GET requests will include basic auth headers
+content = my_path.read_text()
+```
+
+This can be extended to more sophisticated authentication approaches (e.g., OAuth, custom headers) by providing your own `BaseHandler` implementation. There are examples on the internet of handlers for most common authentication schemes.
+
+## Directory Assumptions
+
+Directories are handled differently from other `CloudPath` implementations:
+
+ - By default, a URL is considered a directory if it **ends with a slash**. For example, `http://example.com/somedir/`.
+ - If you call `HttpPath.is_dir()`, it checks `my_url.endswith("/")` by default. You can override this with a custom function by passing `custom_dir_matcher` to `HttpClient`. This will allow you to implement custom logic for determining if a URL is a directory. The `custom_dir_matcher` will receive a string representing the URL, so if you need to interact with the server, you will need to make those requests within your `custom_dir_matcher` implementation.
+
+### Listing the Contents of a Directory
+
+We attempt to parse directory listings by calling `GET` on the directory URL (which presumably returns an HTML page that has a directory index). Our default parser looks for `` tags and yields them, assuming they are children. You can override this logic with `custom_list_page_parser` if your server's HTML or API returns a different listing format. For example:
+
+```python
+def my_parser(html_content: str) -> Iterable[str]:
+ # for example, just get a with href and class "file-link"
+ # using beautifulsoup
+ soup = BeautifulSoup(html_content, "html.parser")
+ for link in soup.find_all("a", class_="file-link"):
+ yield link.get("href")
+
+client = HttpClient(custom_list_page_parser=my_parser)
+my_dir = client.CloudPath("http://example.com/public/")
+
+for subpath, is_dir in my_dir.list_dir(recursive=False):
+ print(subpath, "dir" if is_dir else "file")
+```
+
+**Note**: If your server doesn't provide an HTML index or a suitable listing format that we can parse, you will see:
+
+```
+NotImplementedError("Unable to parse response as a listing of files; please provide a custom parser as `custom_list_page_parser`.")
+```
+
+In that case, you must provide a custom parser or avoid directory-listing operations altogether.
+
+## HTTP or HTTPS
+
+There are separate classes for `HttpClient`/`HttpPath` for `http://` and `HttpsClient`/`HttpsPath` for `https://`. However, from a usage standpoint, you can use either `CloudPath` or `AnyPath` to dispatch to the right subclass.
+
+```python
+from cloudpathlib import AnyPath, CloudPath
+
+# AnyPath will automatically detect "http://" or "https://" (or local file paths)
+my_path = AnyPath("https://www.example.com/files/info.txt")
+
+# CloudPath will dispatch to the correct subclass
+my_path = CloudPath("https://www.example.com/files/info.txt")
+```
+
+If you explicitly instantiate a `HttpClient`, it will only handle `http://` paths. If you instantiate a `HttpsClient`, it will only handle `https://` paths. But `AnyPath` and `CloudPath` will route to the correct client class automatically.
+
+In general, you should use `HttpsClient` and work with `https://` urls wherever possible.
+
+## Additional Notes
+
+ - **Caching**: This implementation uses the same local file caching mechanics as other CloudPathLib providers, controlled by `file_cache_mode` and `local_cache_dir`. However, for static HTTP servers, re-downloading or re-checking may not be as efficient as with typical cloud storages that return robust metadata.
+ - **"Move" or "Rename"**: The `_move_file` operation is implemented as an upload followed by a delete. This will fail if your server does not allow both `PUT` and `DELETE`.
+
diff --git a/docs/make_support_table.py b/docs/make_support_table.py
index ad06142a..eb3a34f2 100644
--- a/docs/make_support_table.py
+++ b/docs/make_support_table.py
@@ -12,6 +12,7 @@ def print_table():
lib_methods = {
v.path_class.__name__: {m for m in dir(v.path_class) if not m.startswith("_")}
for k, v in cloudpathlib.cloudpath.implementation_registry.items()
+ if k not in ["http"] # just list https in table since they are the same
}
all_methods = copy(path_base)
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
index 5d710441..29743fb4 100644
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -18,6 +18,7 @@ nav:
- Home: "index.md"
- Why cloudpathlib?: "why_cloudpathlib.ipynb"
- Authentication: "authentication.md"
+ - HTTP URLs: "http.md"
- Caching: "caching.ipynb"
- AnyPath: "anypath-polymorphism.md"
- Other Client settings: "other_client_settings.md"
@@ -46,7 +47,11 @@ nav:
markdown_extensions:
- admonition
- pymdownx.highlight
- - pymdownx.superfences
+ - pymdownx.superfences:
+ custom_fences:
+ - name: mermaid
+ class: mermaid
+ format: !!python/name:pymdownx.superfences.fence_code_format
- toc:
permalink: True
toc_depth: 3
diff --git a/tests/conftest.py b/tests/conftest.py
index 301ffe87..9fc6e5ea 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -1,8 +1,13 @@
+from functools import wraps
import os
from pathlib import Path, PurePosixPath
import shutil
+import ssl
+import time
from tempfile import TemporaryDirectory
from typing import Dict, Optional
+from urllib.parse import urlparse
+from urllib.request import HTTPSHandler
from azure.storage.blob import BlobServiceClient
from azure.storage.filedatalake import (
@@ -18,6 +23,8 @@
from cloudpathlib import AzureBlobClient, AzureBlobPath, GSClient, GSPath, S3Client, S3Path
from cloudpathlib.cloudpath import implementation_registry
+from cloudpathlib.http.httpclient import HttpClient, HttpsClient
+from cloudpathlib.http.httppath import HttpPath, HttpsPath
from cloudpathlib.local import (
local_azure_blob_implementation,
LocalAzureBlobClient,
@@ -32,6 +39,7 @@
import cloudpathlib.azure.azblobclient
from cloudpathlib.azure.azblobclient import _hns_rmtree
import cloudpathlib.s3.s3client
+from .http_fixtures import http_server, https_server, utilities_dir # noqa: F401
from .mock_clients.mock_azureblob import MockBlobServiceClient, DEFAULT_CONTAINER_NAME
from .mock_clients.mock_adls_gen2 import MockedDataLakeServiceClient
from .mock_clients.mock_gs import (
@@ -40,6 +48,7 @@
MockTransferManager,
)
from .mock_clients.mock_s3 import mocked_session_class_factory, DEFAULT_S3_BUCKET_NAME
+from .utils import _sync_filesystem
if os.getenv("USE_LIVE_CLOUD") == "1":
@@ -115,6 +124,28 @@ def create_test_dir_name(request) -> str:
return test_dir
+@fixture
+def wait_for_mkdir(monkeypatch):
+ """Fixture that patches os.mkdir to wait for directory creation for tests that sometimes are flaky."""
+ original_mkdir = os.mkdir
+
+ @wraps(original_mkdir)
+ def wrapped_mkdir(path, *args, **kwargs):
+ result = original_mkdir(path, *args, **kwargs)
+ _sync_filesystem()
+
+ start = time.time()
+
+ while not os.path.exists(path) and time.time() - start < 5:
+ time.sleep(0.01)
+ _sync_filesystem()
+
+ assert os.path.exists(path), f"Directory {path} was not created"
+ return result
+
+ monkeypatch.setattr(os, "mkdir", wrapped_mkdir)
+
+
def _azure_fixture(conn_str_env_var, adls_gen2, request, monkeypatch, assets_dir):
drive = os.getenv("LIVE_AZURE_CONTAINER", DEFAULT_CONTAINER_NAME)
test_dir = create_test_dir_name(request)
@@ -469,6 +500,82 @@ def local_s3_rig(request, monkeypatch, assets_dir):
rig.client_class.reset_default_storage_dir() # reset local storage directory
+class HttpProviderTestRig(CloudProviderTestRig):
+ def create_cloud_path(self, path: str, client=None):
+ """Http version needs to include netloc as well"""
+ if client:
+ return client.CloudPath(
+ cloud_path=f"{self.path_class.cloud_prefix}{self.drive}/{self.test_dir}/{path}"
+ )
+ else:
+ return self.path_class(
+ cloud_path=f"{self.path_class.cloud_prefix}{self.drive}/{self.test_dir}/{path}"
+ )
+
+
+@fixture()
+def http_rig(request, assets_dir, http_server): # noqa: F811
+ test_dir = create_test_dir_name(request)
+
+ host, server_dir = http_server
+ drive = urlparse(host).netloc
+
+ # copy test assets
+ shutil.copytree(assets_dir, server_dir / test_dir)
+ _sync_filesystem()
+
+ rig = CloudProviderTestRig(
+ path_class=HttpPath,
+ client_class=HttpClient,
+ drive=drive,
+ test_dir=test_dir,
+ )
+
+ rig.http_server_dir = server_dir
+ rig.client_class(**rig.required_client_kwargs).set_as_default_client() # set default client
+
+ yield rig
+
+ rig.client_class._default_client = None # reset default client
+ shutil.rmtree(server_dir)
+ _sync_filesystem()
+
+
+@fixture()
+def https_rig(request, assets_dir, https_server): # noqa: F811
+ test_dir = create_test_dir_name(request)
+
+ host, server_dir = https_server
+ drive = urlparse(host).netloc
+
+ # copy test assets
+ shutil.copytree(assets_dir, server_dir / test_dir)
+ _sync_filesystem()
+
+ skip_verify_ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
+ skip_verify_ctx.check_hostname = False
+ skip_verify_ctx.load_verify_locations(utilities_dir / "insecure-test.pem")
+
+ rig = CloudProviderTestRig(
+ path_class=HttpsPath,
+ client_class=HttpsClient,
+ drive=drive,
+ test_dir=test_dir,
+ required_client_kwargs=dict(
+ auth=HTTPSHandler(context=skip_verify_ctx, check_hostname=False)
+ ),
+ )
+
+ rig.http_server_dir = server_dir
+ rig.client_class(**rig.required_client_kwargs).set_as_default_client() # set default client
+
+ yield rig
+
+ rig.client_class._default_client = None # reset default client
+ shutil.rmtree(server_dir)
+ _sync_filesystem()
+
+
# create azure fixtures for both blob and gen2 storage
azure_rigs = fixture_union(
"azure_rigs",
@@ -478,6 +585,7 @@ def local_s3_rig(request, monkeypatch, assets_dir):
],
)
+
rig = fixture_union(
"rig",
[
@@ -489,6 +597,8 @@ def local_s3_rig(request, monkeypatch, assets_dir):
local_azure_rig,
local_s3_rig,
local_gs_rig,
+ http_rig,
+ https_rig,
],
)
@@ -500,3 +610,12 @@ def local_s3_rig(request, monkeypatch, assets_dir):
custom_s3_rig,
],
)
+
+# run some http-specific tests on http and https
+http_like_rig = fixture_union(
+ "http_like_rig",
+ [
+ http_rig,
+ https_rig,
+ ],
+)
diff --git a/tests/http_fixtures.py b/tests/http_fixtures.py
new file mode 100644
index 00000000..d43ce236
--- /dev/null
+++ b/tests/http_fixtures.py
@@ -0,0 +1,214 @@
+from datetime import datetime
+from functools import partial
+from http.server import HTTPServer, SimpleHTTPRequestHandler
+import os
+from pathlib import Path
+import shutil
+import ssl
+import threading
+import time
+from urllib.request import urlopen
+import socket
+
+from pytest import fixture
+from tenacity import retry, stop_after_attempt, wait_fixed
+
+from .utils import _sync_filesystem
+
+utilities_dir = Path(__file__).parent / "utilities"
+
+
+class TestHTTPRequestHandler(SimpleHTTPRequestHandler):
+ """Also allows PUT and DELETE requests for testing."""
+
+ @retry(stop=stop_after_attempt(5), wait=wait_fixed(0.1))
+ def do_PUT(self):
+ length = int(self.headers["Content-Length"])
+ path = Path(self.translate_path(self.path))
+
+ if path.is_dir():
+ path.mkdir(parents=True, exist_ok=True)
+ else:
+ path.parent.mkdir(parents=True, exist_ok=True)
+
+ _sync_filesystem()
+
+ with path.open("wb") as f:
+ f.write(self.rfile.read(length))
+
+ # Ensure the file is flushed and synced to disk before returning
+ # The perf hit is ok here since this is a test server
+ f.flush()
+ os.fsync(f.fileno())
+
+ now = datetime.now().timestamp()
+ os.utime(path, (now, now))
+
+ self.send_response(201)
+ self.end_headers()
+
+ @retry(stop=stop_after_attempt(5), wait=wait_fixed(0.1))
+ def do_DELETE(self):
+ path = Path(self.translate_path(self.path))
+
+ try:
+ if path.is_dir():
+ shutil.rmtree(path)
+ else:
+ path.unlink()
+ self.send_response(204)
+ except FileNotFoundError:
+ self.send_response(404)
+
+ self.end_headers()
+
+ @retry(stop=stop_after_attempt(5), wait=wait_fixed(0.1))
+ def do_POST(self):
+ # roundtrip any posted JSON data for testing
+ content_length = int(self.headers["Content-Length"])
+ post_data = self.rfile.read(content_length)
+ self.send_response(200)
+ self.send_header("Content-type", "application/json")
+ self.send_header("Content-Length", self.headers["Content-Length"])
+ self.end_headers()
+ self.wfile.write(post_data)
+
+ @retry(stop=stop_after_attempt(5), wait=wait_fixed(0.1))
+ def do_GET(self):
+ super().do_GET()
+
+ @retry(stop=stop_after_attempt(5), wait=wait_fixed(0.1))
+ def do_HEAD(self):
+ super().do_HEAD()
+
+
+def _http_server(
+ root_dir,
+ port=None,
+ hostname="127.0.0.1",
+ use_ssl=False,
+ certfile=None,
+ keyfile=None,
+ threaded=True,
+):
+ root_dir.mkdir(exist_ok=True)
+
+ scheme = "http" if not use_ssl else "https"
+
+ # Find a free port if not specified
+ if port is None:
+ with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+ s.bind((hostname, 0))
+ port = s.getsockname()[1]
+
+ def start_server(server_ready_event):
+ handler = partial(TestHTTPRequestHandler, directory=str(root_dir))
+ httpd = HTTPServer((hostname, port), handler)
+
+ if use_ssl:
+ if not certfile or not keyfile:
+ raise ValueError("certfile and keyfile must be provided if `ssl=True`")
+
+ context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
+ context.load_cert_chain(certfile=certfile, keyfile=keyfile)
+ context.check_hostname = False
+ httpd.socket = context.wrap_socket(httpd.socket, server_side=True)
+
+ server_ready_event.set()
+ httpd.serve_forever()
+
+ server_ready_event = threading.Event()
+ if threaded:
+ server_thread = threading.Thread(
+ target=start_server, args=(server_ready_event,), daemon=True
+ )
+ server_thread.start()
+ server_ready_event.wait()
+ else:
+ start_server(server_ready_event)
+
+ # Wait for server to be ready to accept connections
+ max_attempts = 100
+ wait_time = 0.2
+
+ for attempt in range(max_attempts):
+ try:
+ if use_ssl:
+ req_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
+ req_context.check_hostname = False
+ req_context.verify_mode = ssl.CERT_NONE
+ else:
+ req_context = None
+
+ with urlopen(
+ f"{scheme}://{hostname}:{port}", context=req_context, timeout=1.0
+ ) as response:
+ if response.status == 200:
+ break
+ except Exception:
+ if attempt == max_attempts - 1:
+ raise RuntimeError(f"Server failed to start after {max_attempts} attempts")
+ time.sleep(wait_time)
+
+ return f"{scheme}://{hostname}:{port}", server_thread
+
+
+@fixture(scope="module")
+def http_server(tmp_path_factory, worker_id):
+ # port is now None, so OS will pick a free port
+ port = None
+ server_dir = tmp_path_factory.mktemp("server_files").resolve()
+ host, server_thread = _http_server(server_dir, port)
+ yield host, server_dir
+ server_thread.join(0)
+ if server_dir.exists():
+ shutil.rmtree(server_dir)
+
+
+@fixture(scope="module")
+def https_server(tmp_path_factory, worker_id):
+ port = None
+ server_dir = tmp_path_factory.mktemp("server_files").resolve()
+
+ # # Self‑signed cert for 127.0.0.1 (≈273 years validity)
+ # openssl req -x509 -out 127.0.0.1.crt -keyout 127.0.0.1.key \
+ # -newkey rsa:2048 -nodes -sha256 -days 99999 \
+ # -subj '/CN=127.0.0.1' \
+ # -extensions EXT -config <( \
+ # printf "[dn]\nCN=127.0.0.1\n\
+ # [req]\ndistinguished_name = dn\n\
+ # [EXT]\nsubjectAltName=IP:127.0.0.1\n\
+ # keyUsage=digitalSignature\nextendedKeyUsage=serverAuth" )
+ # # Convert to PEM (optional)
+ # openssl x509 -in 127.0.0.1.crt -out 127.0.0.1.pem -outform PEM
+
+ host, server_thread = _http_server(
+ server_dir,
+ port,
+ use_ssl=True,
+ certfile=utilities_dir / "insecure-test.pem",
+ keyfile=utilities_dir / "insecure-test.key",
+ )
+
+ # Add this self-signed cert at the library level so it is used in tests
+ _original_create_context = ssl._create_default_https_context
+
+ def _create_context_with_self_signed_cert(*args, **kwargs):
+ context = _original_create_context(*args, **kwargs)
+ context.load_cert_chain(
+ certfile=utilities_dir / "insecure-test.pem",
+ keyfile=utilities_dir / "insecure-test.key",
+ )
+ context.load_verify_locations(cafile=utilities_dir / "insecure-test.pem")
+ return context
+
+ ssl._create_default_https_context = _create_context_with_self_signed_cert
+
+ yield host, server_dir
+
+ ssl._create_default_https_context = _original_create_context
+
+ server_thread.join(0)
+
+ if server_dir.exists():
+ shutil.rmtree(server_dir)
diff --git a/tests/test_caching.py b/tests/test_caching.py
index aefe912e..4fce4f6f 100644
--- a/tests/test_caching.py
+++ b/tests/test_caching.py
@@ -19,6 +19,7 @@
OverwriteNewerLocalError,
)
from tests.conftest import CloudProviderTestRig
+from tests.utils import _sync_filesystem
def test_defaults_work_as_expected(rig: CloudProviderTestRig):
@@ -189,7 +190,7 @@ def test_persistent_mode(rig: CloudProviderTestRig, tmpdir):
assert client_cache_dir.exists()
-def test_loc_dir(rig: CloudProviderTestRig, tmpdir):
+def test_loc_dir(rig: CloudProviderTestRig, tmpdir, wait_for_mkdir):
"""Tests that local cache dir is used when specified and works'
with the different cache modes.
@@ -250,6 +251,7 @@ def test_loc_dir(rig: CloudProviderTestRig, tmpdir):
assert cp.client.file_cache_mode == FileCacheMode.tmp_dir
# download from cloud into the cache
+ _sync_filesystem()
with cp.open("r") as f:
_ = f.read()
diff --git a/tests/test_client.py b/tests/test_client.py
index fd58535b..3eceafc8 100644
--- a/tests/test_client.py
+++ b/tests/test_client.py
@@ -9,6 +9,7 @@
from cloudpathlib import CloudPath
from cloudpathlib.client import register_client_class
from cloudpathlib.cloudpath import implementation_registry, register_path_class
+from cloudpathlib.http.httpclient import HttpClient, HttpsClient
from cloudpathlib.s3.s3client import S3Client
from cloudpathlib.s3.s3path import S3Path
@@ -96,6 +97,10 @@ def _test_write_content_type(suffix, expected, rig_ref, check=True):
for suffix, content_type in mimes:
_test_write_content_type(suffix, content_type, rig, check=False)
+ if rig.client_class in [HttpClient, HttpsClient]:
+ # HTTP client doesn't support custom content types
+ return
+
# custom mime type method
def my_content_type(path):
# do lookup for content types I define; fallback to
diff --git a/tests/test_cloudpath_file_io.py b/tests/test_cloudpath_file_io.py
index 16c835f9..a7f6f0e9 100644
--- a/tests/test_cloudpath_file_io.py
+++ b/tests/test_cloudpath_file_io.py
@@ -14,17 +14,25 @@
CloudPathNotImplementedError,
DirectoryNotEmptyError,
)
+from cloudpathlib.http.httpclient import HttpClient, HttpsClient
+from cloudpathlib.http.httppath import HttpPath, HttpsPath
def test_file_discovery(rig):
p = rig.create_cloud_path("dir_0/file0_0.txt")
assert p.exists()
- p2 = rig.create_cloud_path("dir_0/not_a_file")
+ p2 = rig.create_cloud_path("dir_0/not_a_file_yet.file")
assert not p2.exists()
p2.touch()
assert p2.exists()
- p2.touch(exist_ok=True)
+
+ if rig.client_class not in [HttpClient, HttpsClient]: # not supported to touch existing
+ p2.touch(exist_ok=True)
+ else:
+ with pytest.raises(NotImplementedError):
+ p2.touch(exist_ok=True)
+
with pytest.raises(FileExistsError):
p2.touch(exist_ok=False)
p2.unlink(missing_ok=False)
@@ -83,19 +91,19 @@ def glob_test_dirs(rig, tmp_path):
def _make_glob_directory(root):
(root / "dirB").mkdir()
- (root / "dirB" / "fileB").write_text("fileB")
+ (root / "dirB" / "fileB.txt").write_text("fileB")
(root / "dirC").mkdir()
(root / "dirC" / "dirD").mkdir()
- (root / "dirC" / "dirD" / "fileD").write_text("fileD")
- (root / "dirC" / "fileC").write_text("fileC")
- (root / "fileA").write_text("fileA")
+ (root / "dirC" / "dirD" / "fileD.txt").write_text("fileD")
+ (root / "dirC" / "fileC.txt").write_text("fileC")
+ (root / "fileA.txt").write_text("fileA")
- cloud_root = rig.create_cloud_path("glob-tests")
+ cloud_root = rig.create_cloud_path("glob-tests/")
cloud_root.mkdir()
_make_glob_directory(cloud_root)
- local_root = tmp_path / "glob-tests"
+ local_root = tmp_path / "glob-tests/"
local_root.mkdir()
_make_glob_directory(local_root)
@@ -108,7 +116,7 @@ def _make_glob_directory(root):
def _lstrip_path_root(path, root):
rel_path = str(path)[len(str(root)) :]
- return rel_path.rstrip("/") # agnostic to trailing slash
+ return rel_path.strip("/")
def _assert_glob_results_match(cloud_results, local_results, cloud_root, local_root):
@@ -181,6 +189,9 @@ def test_walk(glob_test_dirs):
def test_list_buckets(rig):
+ if rig.path_class in [HttpPath, HttpsPath]:
+ return # no bucket listing for HTTP
+
# test we can list buckets
buckets = list(rig.path_class(f"{rig.path_class.cloud_prefix}").iterdir())
assert len(buckets) > 0
@@ -331,6 +342,10 @@ def test_is_dir_is_file(rig, tmp_path):
dir_nested_no_slash = rig.create_cloud_path("dir_1/dir_1_0")
for test_case in [dir_slash, dir_no_slash, dir_nested_slash, dir_nested_no_slash]:
+ # skip no-slash cases, which are interpreted as files for http paths
+ if not str(test_case).endswith("/") and rig.path_class in [HttpPath, HttpsPath]:
+ continue
+
assert test_case.is_dir()
assert not test_case.is_file()
@@ -349,7 +364,7 @@ def test_is_dir_is_file(rig, tmp_path):
def test_file_read_writes(rig, tmp_path):
p = rig.create_cloud_path("dir_0/file0_0.txt")
- p2 = rig.create_cloud_path("dir_0/not_a_file")
+ p2 = rig.create_cloud_path("dir_0/not_a_file.txt")
p3 = rig.create_cloud_path("")
text = "lalala" * 10_000
@@ -367,16 +382,20 @@ def test_file_read_writes(rig, tmp_path):
before_touch = datetime.now()
sleep(1)
- p.touch()
- if not getattr(rig, "is_custom_s3", False):
- # Our S3Path.touch implementation does not update mod time for MinIO
- assert datetime.fromtimestamp(p.stat().st_mtime) > before_touch
+
+ if rig.path_class not in [HttpPath, HttpsPath]: # not supported to touch existing
+ p.touch()
+
+ if not getattr(rig, "is_custom_s3", False):
+ # Our S3Path.touch implementation does not update mod time for MinIO
+ assert datetime.fromtimestamp(p.stat().st_mtime) > before_touch
# no-op
if not getattr(rig, "is_adls_gen2", False):
p.mkdir()
- assert p.etag is not None
+ if rig.path_class not in [HttpPath, HttpsPath]: # not supported to touch existing
+ assert p.etag is not None
dest = rig.create_cloud_path("dir2/new_file0_0.txt")
assert not dest.exists()
@@ -414,6 +433,25 @@ def test_file_read_writes(rig, tmp_path):
(p / "not_exists_file").download_to(dl_file)
+def test_filenames(rig):
+ # test that we can handle filenames with special characters
+ p = rig.create_cloud_path("dir_0/new_file.txt") # real extension
+ p.write_text("hello")
+ assert p.read_text() == "hello"
+
+ p2 = rig.create_cloud_path("dir_0/new_file") # no extension
+ p2.write_text("hello")
+ assert p2.read_text() == "hello"
+
+ p3 = rig.create_cloud_path("dir_0/new_file.textfile") # long extension
+ p3.write_text("hello")
+ assert p3.read_text() == "hello"
+
+ p4 = rig.create_cloud_path("dir_0/new_file.abc.def.txt") # multiple suffixes
+ p4.write_text("hello")
+ assert p4.read_text() == "hello"
+
+
def test_dispatch_to_local_cache(rig):
p = rig.create_cloud_path("dir_0/file0_1.txt")
stat = p._dispatch_to_local_cache_path("stat")
@@ -457,7 +495,7 @@ def test_cloud_path_download_to(rig, tmp_path):
def test_fspath(rig):
- p = rig.create_cloud_path("dir_0")
+ p = rig.create_cloud_path("dir_0/")
assert os.fspath(p) == p.fspath
diff --git a/tests/test_cloudpath_instantiation.py b/tests/test_cloudpath_instantiation.py
index 4be6085c..4f7cdf5d 100644
--- a/tests/test_cloudpath_instantiation.py
+++ b/tests/test_cloudpath_instantiation.py
@@ -7,6 +7,7 @@
from cloudpathlib import AzureBlobPath, CloudPath, GSPath, S3Path
from cloudpathlib.exceptions import InvalidPrefixError, MissingDependenciesError
+from cloudpathlib.http.httppath import HttpPath, HttpsPath
@pytest.mark.parametrize(
@@ -45,6 +46,9 @@ def test_dispatch_error():
@pytest.mark.parametrize("path", ["b/k", "b/k", "b/k.file", "b/k", "b"])
def test_instantiation(rig, path):
+ if rig.path_class in [HttpPath, HttpsPath]:
+ path = "example-url.com/" + path
+
# check two cases of prefix
for prefix in [rig.cloud_prefix.lower(), rig.cloud_prefix.upper()]:
expected = prefix + path
@@ -52,13 +56,17 @@ def test_instantiation(rig, path):
assert repr(p) == f"{rig.path_class.__name__}('{expected}')"
assert str(p) == expected
- assert p._no_prefix == expected.split("://", 1)[-1]
+ if rig.path_class in [HttpPath, HttpsPath]:
+ assert p._no_prefix == path.replace("example-url.com/", "")
+ assert str(p._path) == path.replace("example-url.com", "")
+
+ else:
+ assert p._no_prefix == expected.split("://", 1)[-1]
+ assert str(p._path) == expected.split(":/", 1)[-1]
assert p._url.scheme == expected.split("://", 1)[0].lower()
assert p._url.netloc == expected.split("://", 1)[-1].split("/")[0]
- assert str(p._path) == expected.split(":/", 1)[-1]
-
def test_default_client_lazy(rig):
cp = rig.path_class(rig.cloud_prefix + "testing/file.txt")
@@ -106,7 +114,7 @@ def test_dependencies_not_loaded(rig, monkeypatch):
def test_is_pathlike(rig):
- p = rig.create_cloud_path("dir_0")
+ p = rig.create_cloud_path("dir_0/")
assert isinstance(p, os.PathLike)
diff --git a/tests/test_cloudpath_manipulation.py b/tests/test_cloudpath_manipulation.py
index 9e314299..9e392881 100644
--- a/tests/test_cloudpath_manipulation.py
+++ b/tests/test_cloudpath_manipulation.py
@@ -5,6 +5,7 @@
import pytest
from cloudpathlib import CloudPath
+from cloudpathlib.http.httppath import HttpPath, HttpsPath
def test_properties(rig):
@@ -84,16 +85,27 @@ def test_joins(rig):
if sys.version_info >= (3, 12):
assert rig.create_cloud_path("a/b/c/d").match("A/*/C/D", case_sensitive=False)
- assert rig.create_cloud_path("a/b/c/d").anchor == rig.cloud_prefix
+ if rig.path_class not in [HttpPath, HttpsPath]:
+ assert rig.create_cloud_path("a/b/c/d").anchor == rig.cloud_prefix
+
assert rig.create_cloud_path("a/b/c/d").parent == rig.create_cloud_path("a/b/c")
- assert rig.create_cloud_path("a/b/c/d").parents == (
- rig.create_cloud_path("a/b/c"),
- rig.create_cloud_path("a/b"),
- rig.create_cloud_path("a"),
- rig.path_class(f"{rig.cloud_prefix}{rig.drive}/{rig.test_dir}"),
- rig.path_class(f"{rig.cloud_prefix}{rig.drive}"),
- )
+ if rig.path_class not in [HttpPath, HttpsPath]:
+ assert rig.create_cloud_path("a/b/c/d").parents == (
+ rig.create_cloud_path("a/b/c"),
+ rig.create_cloud_path("a/b"),
+ rig.create_cloud_path("a"),
+ rig.path_class(f"{rig.cloud_prefix}{rig.drive}/{rig.test_dir}"),
+ rig.path_class(f"{rig.cloud_prefix}{rig.drive}"),
+ )
+ else:
+ assert rig.create_cloud_path("a/b/c/d").parents == (
+ rig.create_cloud_path("a/b/c"),
+ rig.create_cloud_path("a/b"),
+ rig.create_cloud_path("a"),
+ rig.path_class(f"{rig.cloud_prefix}{rig.drive}/{rig.test_dir}"),
+ rig.path_class(f"{rig.cloud_prefix}{rig.drive}/"),
+ )
assert rig.create_cloud_path("a").joinpath("b", "c") == rig.create_cloud_path("a/b/c")
assert rig.create_cloud_path("a").joinpath(PurePosixPath("b"), "c") == rig.create_cloud_path(
@@ -107,21 +119,32 @@ def test_joins(rig):
== f"{rig.cloud_prefix}{rig.drive}/{rig.test_dir}/a/b/c"
)
- assert rig.create_cloud_path("a/b/c/d").parts == (
- rig.cloud_prefix,
- rig.drive,
- rig.test_dir,
- "a",
- "b",
- "c",
- "d",
- )
+ if rig.path_class in [HttpPath, HttpsPath]:
+ assert rig.create_cloud_path("a/b/c/d").parts == (
+ rig.cloud_prefix + rig.drive + "/",
+ rig.test_dir,
+ "a",
+ "b",
+ "c",
+ "d",
+ )
+ else:
+ assert rig.create_cloud_path("a/b/c/d").parts == (
+ rig.cloud_prefix,
+ rig.drive,
+ rig.test_dir,
+ "a",
+ "b",
+ "c",
+ "d",
+ )
def test_with_segments(rig):
- assert rig.create_cloud_path("a/b/c/d").with_segments("x", "y", "z") == rig.client_class(
- **rig.required_client_kwargs
- ).CloudPath(f"{rig.cloud_prefix}x/y/z")
+ to_test = rig.create_cloud_path("a/b/c/d").with_segments("x", "y", "z")
+ assert to_test == rig.client_class(**rig.required_client_kwargs).CloudPath(
+ f"{to_test.anchor}x/y/z"
+ )
def test_is_junction(rig):
diff --git a/tests/test_cloudpath_upload_copy.py b/tests/test_cloudpath_upload_copy.py
index acf5e5ec..110537b8 100644
--- a/tests/test_cloudpath_upload_copy.py
+++ b/tests/test_cloudpath_upload_copy.py
@@ -4,12 +4,14 @@
import pytest
+from cloudpathlib.http.httppath import HttpPath, HttpsPath
from cloudpathlib.local import LocalGSPath, LocalS3Path, LocalS3Client
from cloudpathlib.exceptions import (
CloudPathFileExistsError,
CloudPathNotADirectoryError,
OverwriteNewerCloudError,
)
+from tests.utils import _sync_filesystem
@pytest.fixture
@@ -64,19 +66,21 @@ def test_upload_from_file(rig, upload_assets_dir):
assert p.read_text() == "Hello from 2"
# to file, file exists and is newer
- p.touch()
+ sleep(1.1)
+ p.write_text("newer")
with pytest.raises(OverwriteNewerCloudError):
p.upload_from(upload_assets_dir / "upload_1.txt")
# to file, file exists and is newer; overwrite
- p.touch()
+ sleep(1.1)
+ p.write_text("even newer")
sleep(1.1)
p.upload_from(upload_assets_dir / "upload_1.txt", force_overwrite_to_cloud=True)
assert p.exists()
assert p.read_text() == "Hello from 1"
# to dir, dir exists
- p = rig.create_cloud_path("dir_0") # created by fixtures
+ p = rig.create_cloud_path("dir_0/") # created by fixtures
assert p.exists()
p.upload_from(upload_assets_dir / "upload_1.txt")
assert (p / "upload_1.txt").exists()
@@ -92,7 +96,7 @@ def test_upload_from_dir(rig, upload_assets_dir):
assert assert_mirrored(p, upload_assets_dir)
# to dir, dir exists
- p2 = rig.create_cloud_path("dir_0") # created by fixtures
+ p2 = rig.create_cloud_path("dir_0/") # created by fixtures
assert p2.exists()
p2.upload_from(upload_assets_dir)
@@ -100,12 +104,15 @@ def test_upload_from_dir(rig, upload_assets_dir):
# a newer file exists on cloud
sleep(1)
- (p / "upload_1.txt").touch()
+ (p / "upload_1.txt").write_text("newer")
with pytest.raises(OverwriteNewerCloudError):
p.upload_from(upload_assets_dir)
+ _sync_filesystem()
+
# force overwrite
- (p / "upload_1.txt").touch()
+ sleep(1)
+ (p / "upload_1.txt").write_text("even newer")
(p / "upload_2.txt").unlink()
p.upload_from(upload_assets_dir, force_overwrite_to_cloud=True)
assert assert_mirrored(p, upload_assets_dir)
@@ -135,9 +142,11 @@ def test_copy(rig, upload_assets_dir, tmpdir):
# cloud to cloud -> make sure no local cache
p_new = p.copy(p.parent / "new_upload_1.txt")
assert p_new.exists()
- assert not p_new._local.exists() # cache should never have been downloaded
- assert not p._local.exists() # cache should never have been downloaded
- assert p_new.read_text() == "Hello from 1"
+
+ if rig.path_class not in [HttpPath, HttpsPath]:
+ assert not p_new._local.exists() # cache should never have been downloaded
+ assert not p._local.exists() # cache should never have been downloaded
+ assert p_new.read_text() == "Hello from 1"
# cloud to cloud path as string
cloud_dest = str(p.parent / "new_upload_0.txt")
@@ -146,14 +155,15 @@ def test_copy(rig, upload_assets_dir, tmpdir):
assert p_new.read_text() == "Hello from 1"
# cloud to cloud directory
- cloud_dest = rig.create_cloud_path("dir_1") # created by fixtures
+ cloud_dest = rig.create_cloud_path("dir_1/") # created by fixtures
p_new = p.copy(cloud_dest)
assert str(p_new) == str(p_new.parent / p.name) # file created
assert p_new.exists()
assert p_new.read_text() == "Hello from 1"
# cloud to cloud overwrite
- p_new.touch()
+ sleep(1.1)
+ p_new.write_text("p_new")
with pytest.raises(OverwriteNewerCloudError):
p_new = p.copy(p_new)
@@ -193,7 +203,7 @@ def test_copy(rig, upload_assets_dir, tmpdir):
(other_dir / p2.name).unlink()
# cloud dir raises
- cloud_dir = rig.create_cloud_path("dir_1") # created by fixtures
+ cloud_dir = rig.create_cloud_path("dir_1/") # created by fixtures
with pytest.raises(ValueError) as e:
p_new = cloud_dir.copy(Path(tmpdir.mkdir("test_copy_dir_fails")))
assert "use the method copytree" in str(e)
@@ -207,12 +217,12 @@ def test_copytree(rig, tmpdir):
p.copytree(local_out)
with pytest.raises(CloudPathFileExistsError):
- p = rig.create_cloud_path("dir_0")
+ p = rig.create_cloud_path("dir_0/")
p_out = rig.create_cloud_path("dir_0/file0_0.txt")
p.copytree(p_out)
# cloud dir to local dir that exists
- p = rig.create_cloud_path("dir_1")
+ p = rig.create_cloud_path("dir_1/")
local_out = Path(tmpdir.mkdir("copytree_from_cloud"))
p.copytree(local_out)
assert assert_mirrored(p, local_out)
@@ -228,12 +238,12 @@ def test_copytree(rig, tmpdir):
assert assert_mirrored(p, local_out)
# cloud dir to cloud dir that does not exist
- p2 = rig.create_cloud_path("new_dir")
+ p2 = rig.create_cloud_path("new_dir/")
p.copytree(p2)
assert assert_mirrored(p2, p)
# cloud dir to cloud dir that exists
- p2 = rig.create_cloud_path("new_dir2")
+ p2 = rig.create_cloud_path("new_dir2/")
(p2 / "existing_file.txt").write_text("asdf") # ensures p2 exists
p.copytree(p2)
assert assert_mirrored(p2, p, check_no_extra=False)
@@ -251,7 +261,7 @@ def test_copytree(rig, tmpdir):
(p / "dir2" / "file2.txt").write_text("ignore")
# cloud dir to local dir but ignoring files (shutil.ignore_patterns)
- p3 = rig.create_cloud_path("new_dir3")
+ p3 = rig.create_cloud_path("new_dir3/")
p.copytree(p3, ignore=ignore_patterns("*.py", "dir*"))
assert assert_mirrored(p, p3, check_no_extra=False)
assert not (p3 / "ignored.py").exists()
@@ -259,7 +269,7 @@ def test_copytree(rig, tmpdir):
assert not (p3 / "dir2").exists()
# cloud dir to local dir but ignoring files (custom function)
- p4 = rig.create_cloud_path("new_dir4")
+ p4 = rig.create_cloud_path("new_dir4/")
def _custom_ignore(path, names):
ignore = []
diff --git a/tests/test_http.py b/tests/test_http.py
new file mode 100644
index 00000000..4dbf30a2
--- /dev/null
+++ b/tests/test_http.py
@@ -0,0 +1,128 @@
+import urllib
+
+from tests.conftest import CloudProviderTestRig
+
+
+def test_https(https_rig: CloudProviderTestRig):
+ """Basic tests for https"""
+ existing_file = https_rig.create_cloud_path("dir_0/file0_0.txt")
+
+ # existence and listing
+ assert existing_file.exists()
+ assert existing_file.parent.exists()
+ assert existing_file.name in [f.name for f in existing_file.parent.iterdir()]
+
+ # root level checks
+ root = list(existing_file.parents)[-1]
+ assert root.exists()
+ assert len(list(root.iterdir())) > 0
+
+ # reading and wrirting
+ existing_file.write_text("Hello from 0")
+ assert existing_file.read_text() == "Hello from 0"
+
+ # creating new files
+ not_existing_file = https_rig.create_cloud_path("dir_0/new_file.txt")
+
+ assert not not_existing_file.exists()
+
+ not_existing_file.upload_from(existing_file)
+
+ assert not_existing_file.read_text() == "Hello from 0"
+
+ # deleteing
+ not_existing_file.unlink()
+ assert not not_existing_file.exists()
+
+ # metadata
+ assert existing_file.stat().st_mtime != 0
+
+
+def test_http_verbs(http_like_rig: CloudProviderTestRig):
+ """Test that the http verbs work"""
+ p = http_like_rig.create_cloud_path("dir_0/file0_0.txt")
+
+ # test put
+ p.put(data="Hello from 0".encode("utf-8"), headers={"Content-Type": "text/plain"})
+
+ # test get
+ resp, data = p.get()
+ assert resp.status == 200
+ assert data.decode("utf-8") == "Hello from 0"
+
+ # post
+ import json
+
+ post_payload = {"key": "value"}
+ resp, data = p.post(
+ data=json.dumps(post_payload).encode(), headers={"Content-Type": "application/json"}
+ )
+ assert resp.status == 200
+ assert json.loads(data.decode("utf-8")) == post_payload
+
+ # head
+ resp, data = p.head()
+ assert resp.status == 200
+ assert data == b""
+
+ # delete
+ p.delete()
+ assert not p.exists()
+
+
+def test_http_parsed_url(http_like_rig: CloudProviderTestRig):
+ """Test that the parsed_url property works"""
+ p = http_like_rig.create_cloud_path("dir_0/file0_0.txt")
+ assert p.parsed_url.scheme == http_like_rig.cloud_prefix.split("://")[0]
+ assert p.parsed_url.netloc == http_like_rig.drive
+ assert p.parsed_url.path == str(p).split(http_like_rig.drive)[1]
+
+
+def test_http_url_decorations(http_like_rig: CloudProviderTestRig):
+ def _test_preserved_properties(base_url, returned_url):
+ parsed_base = urllib.parse.urlparse(str(base_url))
+ parsed_returned = urllib.parse.urlparse(str(returned_url))
+
+ assert parsed_returned.scheme == parsed_base.scheme
+ assert parsed_returned.netloc == parsed_base.netloc
+ assert parsed_returned.username == parsed_base.username
+ assert parsed_returned.password == parsed_base.password
+ assert parsed_returned.hostname == parsed_base.hostname
+ assert parsed_returned.port == parsed_base.port
+
+ p = http_like_rig.create_cloud_path("dir_0/file0_0.txt")
+ p.write_text("Hello!")
+
+ # add some properties to the url
+ new_url = p.parsed_url._replace(
+ params="param=value", query="query=value&query2=value2", fragment="fragment-value"
+ )
+ p = http_like_rig.path_class(urllib.parse.urlunparse(new_url))
+
+ # operations that should preserve properties of the original url and need to hit the server
+ # glob, iterdir, walk
+ _test_preserved_properties(p, next(p.parent.glob("*.txt")))
+ _test_preserved_properties(p, next(p.parent.iterdir()))
+ _test_preserved_properties(p, next(p.parent.walk())[0])
+
+ # rename and replace?
+ new_location = p.with_name("other_file.txt")
+ _test_preserved_properties(p, p.rename(new_location))
+ _test_preserved_properties(p, new_location.replace(p))
+
+ # operations that should preserve properties of the original url and don't hit the server
+ # so that we can add some other properties (e.g., username, password)
+ new_url = p.parsed_url._replace(netloc="user:pass@example.com:8000")
+ p = http_like_rig.path_class(urllib.parse.urlunparse(new_url))
+
+ # parent
+ _test_preserved_properties(p, p.parent)
+
+ # joining / and joinpath
+ _test_preserved_properties(p, p.parent / "other_file.txt")
+ _test_preserved_properties(p, p.parent.joinpath("other_file.txt"))
+
+ # with_name, with_suffix, with_stem
+ _test_preserved_properties(p, p.with_name("other_file.txt"))
+ _test_preserved_properties(p, p.with_suffix(".txt"))
+ _test_preserved_properties(p, p.with_stem("other_file"))
diff --git a/tests/test_s3_specific.py b/tests/test_s3_specific.py
index d9edc94e..58b2e21a 100644
--- a/tests/test_s3_specific.py
+++ b/tests/test_s3_specific.py
@@ -176,7 +176,7 @@ def test_directories(s3_like_rig):
assert super_path.exists()
assert not super_path.is_dir()
- super_path = s3_like_rig.create_cloud_path("dir_0")
+ super_path = s3_like_rig.create_cloud_path("dir_0/")
assert super_path.exists()
assert super_path.is_dir()
diff --git a/tests/utilities/insecure-test.crt b/tests/utilities/insecure-test.crt
new file mode 100644
index 00000000..9bb5d9e4
--- /dev/null
+++ b/tests/utilities/insecure-test.crt
@@ -0,0 +1,19 @@
+-----BEGIN CERTIFICATE-----
+MIIDDDCCAfSgAwIBAgIUZn3DPy1MuLcPNQGuGU8JfzvCpEIwDQYJKoZIhvcNAQEL
+BQAwFDESMBAGA1UEAwwJMTI3LjAuMC4xMCAXDTI1MDQyMTAzMTQ1OVoYDzIyOTkw
+MjAzMDMxNDU5WjAUMRIwEAYDVQQDDAkxMjcuMC4wLjEwggEiMA0GCSqGSIb3DQEB
+AQUAA4IBDwAwggEKAoIBAQDlxF2z2I2XaDnLgV3exFPtjs9upFuUPTPthubaxRMz
+PWGfNRg8fLqXDOe8E+KHgdXYeqTd0xkWZzfx+xwz1flvTBubgtan0yvri0bZIemk
+gv7f8ABRAjNIQzpehIjXI9RZyU2JoPIN4+Q8WHZ8uc8uZtHOHsyMYoj2j0akUoic
+ukoYlo6W8nN1ykBvhwnO9sRooPrYV9ViBhG9eaH/L0NzVv6cU3vHj3pKyO3cMQqW
+4AfaSz+aFXx7ulRzxR5bphCy5281FqBgG76Y1lqOSUMTxfJQSnCCUe58DXy4CpfQ
+rGrNiLV/yWz7xYKSeutcJxWsCMFLrI+S79IW6ntILS6pAgMBAAGjVDBSMA8GA1Ud
+EQQIMAaHBH8AAAEwCwYDVR0PBAQDAgeAMBMGA1UdJQQMMAoGCCsGAQUFBwMBMB0G
+A1UdDgQWBBSyGv/zfxIBK9Tm4/5uOuhh6pB3CDANBgkqhkiG9w0BAQsFAAOCAQEA
+UWg3vZNCCUjPAqKAXEYZeBI9VNXim4egkmxn9FHgiraxapKc4RHpCmVdjpF5miFe
+4hbcvHOxb9JclLVKP2oC7vkdYDtgkT8o264gy0eASHE8GP1YawjJlLeFFuJuxatu
+NxZXKnMFQRPoZbD4KSImLy8xEy1FMslnBxcgxgqIKoyqwtt+HGO6ZnvdxDbRLZSQ
+FNDNlqQYgnxf4zzNro9mtWHH/A/UA/vuRWRlppn9vy8k7X5VXlhEIAMmI4nPihhS
+YmgpRntt8A0BLQcNNWcNw0b0IWLhpSWiREunkZDEMWDjoBwRhQpYxEC0zrKlQmwb
+jhnl/rtIL+2Shly8zkxWew==
+-----END CERTIFICATE-----
diff --git a/tests/utilities/insecure-test.key b/tests/utilities/insecure-test.key
new file mode 100644
index 00000000..b4adc213
--- /dev/null
+++ b/tests/utilities/insecure-test.key
@@ -0,0 +1,28 @@
+-----BEGIN PRIVATE KEY-----
+MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDlxF2z2I2XaDnL
+gV3exFPtjs9upFuUPTPthubaxRMzPWGfNRg8fLqXDOe8E+KHgdXYeqTd0xkWZzfx
++xwz1flvTBubgtan0yvri0bZIemkgv7f8ABRAjNIQzpehIjXI9RZyU2JoPIN4+Q8
+WHZ8uc8uZtHOHsyMYoj2j0akUoicukoYlo6W8nN1ykBvhwnO9sRooPrYV9ViBhG9
+eaH/L0NzVv6cU3vHj3pKyO3cMQqW4AfaSz+aFXx7ulRzxR5bphCy5281FqBgG76Y
+1lqOSUMTxfJQSnCCUe58DXy4CpfQrGrNiLV/yWz7xYKSeutcJxWsCMFLrI+S79IW
+6ntILS6pAgMBAAECggEAAY42QdqoPt+lrkC0m4jUB10kS8zYWr2dRAAeODfWtOvQ
+xyBvE7iOQF0sUbjDEylHHH8G3OBSvFcb2gkNH4tQwL1Kan19UivozSB6pG1g1NcK
+QpfSNlPJb6i4uRcfYIHj6CBOLRg8mJwtcNYle1dzsQnYdkaW78Eaa6Ozk9jqdibj
+w0fcsfp1Od5UHVqSsuHpN7N7MP78lD7nZ4h1oAUAHKJw3o5Np24cdgzfwsjmaV/M
+RcTIVoRLoCiPj7ZrGMgCq3PsI14E3C02oYGVHqBsVzCkgzdBwqckuX8eTWs4Ae9/
+adV2cMIBe0EC3WA6cHVh/NS/fgSlRDw6/chz50WcPQKBgQDzo9qw+G9a2Vtkylyd
+cnbY3oQVH+gygdULKAf1IxlRPMvuSEm1DqA3YKkQXO4ypf8llKZznyk5xIDfG28k
+SIRUBQGoOeLMVley/EydXd6GslsHoK5kLmLerbqZHRdo7hdYsIIbHqU14OrLwLwK
+3CJlSzpR1ProYufmDFRGt5SxpQKBgQDxbFfx+5aMNvKh7/NRxzor+2owcaSIE0FQ
+4OV9xTZw+fU3vQl0BUzB+t2cZezOm8vJh2Xwkjp3Uz/3h2kZPe42HjZ733vSFDSS
+rE+aKSG8ptu08bsVqOmQgkfjcIdxugbQoFLY/XWWHCglD3Lq1fUkMOBnne0yjQiW
+5iTL6e8xtQKBgQDfsA528ID8PhclAI3rmE35asKFypea14zMA2La8/Con1L0YLYb
+X2RFs59FAK1JHxKUZFg2S2jEOt++9ychftrPcRFGbG8IADXghLequ6Y0sMfWxvWV
+0OjBXWu2a/k0Q3R33wZ087vnLaskir2akuWZbmoK+6mpdjVHBwbRLnd8aQKBgQC6
+/AYVhp2wlbJQ2C7ljN+yRvSU9r/PINK62KUGR2OGFyLk+8XBlYVAzJMt2geScjph
+KTw8GpWr68+kYL127m98fOQIByy4piud2lWA+hCGM9oBCCS1fvD/mtghAPv2inVS
+yonARHb5P2+cXJ3N4s8OK8jyl++p8m/PqAqh4NsA7QKBgCmFHpm+loiqG0is9v4l
+/iBJUVjBrQlgjlyIYEJqLjNQ2w/vmZT067YSVCON88JWJEjKpE2zAc5C0miTJa7D
+cRn2yIWPFm8emlLHjx+4CVXlfLR6lTiekbZWK2bs9KNZrCQXL/K/3lNEE/3MvqUD
+dIELjg1KulUVY+7r07Pd54Ze
+-----END PRIVATE KEY-----
diff --git a/tests/utilities/insecure-test.pem b/tests/utilities/insecure-test.pem
new file mode 100644
index 00000000..9bb5d9e4
--- /dev/null
+++ b/tests/utilities/insecure-test.pem
@@ -0,0 +1,19 @@
+-----BEGIN CERTIFICATE-----
+MIIDDDCCAfSgAwIBAgIUZn3DPy1MuLcPNQGuGU8JfzvCpEIwDQYJKoZIhvcNAQEL
+BQAwFDESMBAGA1UEAwwJMTI3LjAuMC4xMCAXDTI1MDQyMTAzMTQ1OVoYDzIyOTkw
+MjAzMDMxNDU5WjAUMRIwEAYDVQQDDAkxMjcuMC4wLjEwggEiMA0GCSqGSIb3DQEB
+AQUAA4IBDwAwggEKAoIBAQDlxF2z2I2XaDnLgV3exFPtjs9upFuUPTPthubaxRMz
+PWGfNRg8fLqXDOe8E+KHgdXYeqTd0xkWZzfx+xwz1flvTBubgtan0yvri0bZIemk
+gv7f8ABRAjNIQzpehIjXI9RZyU2JoPIN4+Q8WHZ8uc8uZtHOHsyMYoj2j0akUoic
+ukoYlo6W8nN1ykBvhwnO9sRooPrYV9ViBhG9eaH/L0NzVv6cU3vHj3pKyO3cMQqW
+4AfaSz+aFXx7ulRzxR5bphCy5281FqBgG76Y1lqOSUMTxfJQSnCCUe58DXy4CpfQ
+rGrNiLV/yWz7xYKSeutcJxWsCMFLrI+S79IW6ntILS6pAgMBAAGjVDBSMA8GA1Ud
+EQQIMAaHBH8AAAEwCwYDVR0PBAQDAgeAMBMGA1UdJQQMMAoGCCsGAQUFBwMBMB0G
+A1UdDgQWBBSyGv/zfxIBK9Tm4/5uOuhh6pB3CDANBgkqhkiG9w0BAQsFAAOCAQEA
+UWg3vZNCCUjPAqKAXEYZeBI9VNXim4egkmxn9FHgiraxapKc4RHpCmVdjpF5miFe
+4hbcvHOxb9JclLVKP2oC7vkdYDtgkT8o264gy0eASHE8GP1YawjJlLeFFuJuxatu
+NxZXKnMFQRPoZbD4KSImLy8xEy1FMslnBxcgxgqIKoyqwtt+HGO6ZnvdxDbRLZSQ
+FNDNlqQYgnxf4zzNro9mtWHH/A/UA/vuRWRlppn9vy8k7X5VXlhEIAMmI4nPihhS
+YmgpRntt8A0BLQcNNWcNw0b0IWLhpSWiREunkZDEMWDjoBwRhQpYxEC0zrKlQmwb
+jhnl/rtIL+2Shly8zkxWew==
+-----END CERTIFICATE-----
diff --git a/tests/utils.py b/tests/utils.py
new file mode 100644
index 00000000..34fe8e1f
--- /dev/null
+++ b/tests/utils.py
@@ -0,0 +1,15 @@
+import platform
+import os
+import time
+
+
+def _sync_filesystem():
+ """Try to force sync of the filesystem to stabilize tests.
+
+ On Windows, give the filesystem a moment to catch up since sync is not available.
+ """
+ if platform.system() != "Windows":
+ os.sync()
+ else:
+ # On Windows, give the filesystem a moment to catch up
+ time.sleep(0.05)