Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 14 additions & 23 deletions .github/workflows/python-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,29 +40,29 @@ jobs:
with:
python-version: ${{ matrix.python-version }}

- name: Create virtualenv
- name: Install uv
uses: astral-sh/setup-uv@v7

- name: Run lints
run: |
python3 -m venv .venv
uv sync

- uses: PyO3/maturin-action@v1
with:
command: develop
sccache: 'true'
container: 'off'
working-directory: ./python
args: --extras devel

- name: Run lints
run: |
source .venv/bin/activate
mypy
ruff check .
ruff format . --check --diff
uv run ty check
uv run ruff check .
uv run ruff format . --check --diff

- name: Run tests
run: |
source .venv/bin/activate
pytest
uv run pytest

docs:
runs-on: ubuntu-latest
Expand All @@ -72,24 +72,15 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.13

- name: Create virtualenv
run: |
python3 -m venv .venv
python-version: 3.14

- uses: PyO3/maturin-action@v1
with:
command: develop
sccache: 'true'
container: 'off'
working-directory: ./python
args: --extras docs
- name: Install uv
uses: astral-sh/setup-uv@v7

- name: Build docs
run: |
source .venv/bin/activate
sphinx-build -M html docs/source/ docs/build/
uv sync --group docs
uv run sphinx-build -M html docs/source/ docs/build/

build:
runs-on: ubuntu-latest
Expand Down
23 changes: 23 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,29 @@ client = Client("hdfs://localhost:9000")
status = client.get_file_info("/file.txt")
```

## CLI
There is a built-in CLI `hdfsn` that implements most of the behavior of `hdfs dfs` but with a more bash-like syntax. The easiest way to use the CLI is with UV.

### Install CLI with UV
```bash
uv tool install hdfs-ntaive
```

### Auto-complete support
The CLI supports auto-complete for HDFS paths using `argcomplete`. There are two ways to enable this support

To permanently enable support for all Python modules using `argcomplete`:
```bash
uv tool install argcomplete
activate-global-python-argcomplete
```

To enable support just for `hdfsn` in your active shell:
```bash
uv tool install argcomplete
eval "$(register-python-argcomplete hdfsn)"
```

## Kerberos support
Kerberos (SASL GSSAPI) is supported through a runtime dynamic link to `libgssapi_krb5`. This must be installed separately, but is likely already installed on your system. If not you can install it by:

Expand Down
2 changes: 2 additions & 0 deletions python/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ def minidfs():
bufsize=0,
)

assert child.stdout is not None

output = child.stdout.readline().strip()
assert output == "Ready!", output

Expand Down
219 changes: 218 additions & 1 deletion python/docs/source/usage.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
Usage
=====

Client API
----------

Simply create a :py:class:`Client <hdfs_native.Client>`.

.. code-block::
Expand All @@ -16,4 +19,218 @@ Simply create a :py:class:`Client <hdfs_native.Client>`.
# Connect to a Name Service
client = Client("hdfs://ns")

See :py:class:`Client <hdfs_native.Client>` for supported methods on a client.
See :py:class:`Client <hdfs_native.Client>` for supported methods on a client.

Common Operations
~~~~~~~~~~~~~~~~~

Reading Files
^^^^^^^^^^^^^

.. code-block::

from hdfs_native import Client

client = Client("hdfs://localhost:9000")

# Check if file exists
status = client.get_file_info("/path/to/file.txt")
print(f"File size: {status.length} bytes")

# Read entire file
with client.read("/path/to/file.txt") as f:
content = f.read()

# Read file with offset
with client.read("/path/to/file.txt") as f:
f.seek(100)
chunk = f.read(1024)

Writing Files
^^^^^^^^^^^^^

.. code-block::

from hdfs_native import Client, WriteOptions

client = Client("hdfs://localhost:9000")

# Write new file
with client.create("/path/to/newfile.txt") as f:
f.write(b"Hello, HDFS!")

# Append to existing file
with client.append("/path/to/file.txt") as f:
f.write(b"\nAppended content")

# Write with custom options
write_opts = WriteOptions(overwrite=True, replication=3)
with client.create("/path/to/file.txt", write_options=write_opts) as f:
f.write(b"Data with replication factor of 3")

Directory Operations
^^^^^^^^^^^^^^^^^^^^

.. code-block::

from hdfs_native import Client

client = Client("hdfs://localhost:9000")

# List directory
entries = client.list_status("/path/to/dir")
for entry in entries:
print(f"{entry.path} - Size: {entry.length}")

# Create directory
client.mkdir("/path/to/newdir")

# Create directory recursively
client.mkdirs("/path/to/nested/dir")

# Delete directory
client.delete("/path/to/dir", recursive=True)

# Get directory summary
summary = client.get_content_summary("/path/to/dir")
print(f"Total size: {summary.length} bytes")
print(f"File count: {summary.file_count}")

File Metadata
^^^^^^^^^^^^^

.. code-block::

from hdfs_native import Client

client = Client("hdfs://localhost:9000")

# Get file status
status = client.get_file_info("/path/to/file.txt")
print(f"Owner: {status.owner}")
print(f"Group: {status.group}")
print(f"Permissions: {status.permission}")
print(f"Modification time: {status.modification_time}")

# Set permissions
client.set_permission("/path/to/file.txt", 0o644)

# Get ACL status
acl = client.get_acl_status("/path/to/file.txt")
for entry in acl.entries:
print(entry)

Async Operations
~~~~~~~~~~~~~~~~

For async applications, use :py:class:`AsyncClient <hdfs_native.AsyncClient>`.

.. code-block::

import asyncio
from hdfs_native import AsyncClient

async def main():
client = AsyncClient("hdfs://localhost:9000")

# Read file asynchronously
f = await client.read("/path/to/file.txt")
async with f:
data = await f.read()

# List directory asynchronously
async for entry in client.list_status("/path/to/dir"):
print(entry.path)

# Create file asynchronously
f = await client.create("/path/to/newfile.txt")
async with f:
await f.write(b"Async write")

asyncio.run(main())

CLI Usage
---------

The package includes a built-in CLI tool ``hdfsn`` that implements most of the behavior of ``hdfs dfs`` with a more bash-like syntax.

Installation
~~~~~~~~~~~~

The easiest way to install the CLI is with UV:

.. code-block::

uv tool install hdfs-native

Alternatively, you can install via pip:

.. code-block::

pip install hdfs-native
hdfsn --help

Basic Commands
~~~~~~~~~~~~~~

.. code-block::

# List files
hdfsn ls /path/to/dir

# Create directory
hdfsn mkdir /path/to/newdir

# Upload file
hdfsn put local_file.txt /hdfs/path/

# Download file
hdfsn get /hdfs/path/file.txt local_file.txt

# Remove file or directory
hdfsn rm /path/to/file.txt
hdfsn rm -r /path/to/dir

# Display file contents
hdfsn cat /path/to/file.txt

# Copy within HDFS
hdfsn cp /source/path /dest/path

# Move within HDFS
hdfsn mv /source/path /dest/path

Auto-complete Support
~~~~~~~~~~~~~~~~~~~~~

The CLI supports shell auto-completion for HDFS paths using ``argcomplete``. There are two ways to enable this support:

**Option 1: Global auto-complete for all Python tools**

.. code-block::

uv tool install argcomplete
activate-global-python-argcomplete

This enables auto-complete for all Python command-line tools that support ``argcomplete``.

**Option 2: Shell-specific auto-complete for hdfsn only**

.. code-block::

uv tool install argcomplete
eval "$(register-python-argcomplete hdfsn)"

Add this command to your shell's configuration file (``.bashrc``, ``.zshrc``, etc.) to make it persistent.

Once enabled, you can use Tab to auto-complete HDFS paths:

.. code-block::

$ hdfsn ls /data/<TAB>
/data/users
/data/logs

$ hdfsn cat /data/file<TAB>
/data/file.txt
/data/file.csv
3 changes: 1 addition & 2 deletions python/hdfs_native/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@
from collections.abc import AsyncIterator
from typing import TYPE_CHECKING, Dict, Iterator, List, Optional

# For some reason mypy doesn't think this exists
from typing_extensions import Buffer # type: ignore
from typing_extensions import Buffer

from ._internal import (
AclEntry,
Expand Down
3 changes: 1 addition & 2 deletions python/hdfs_native/_internal.pyi
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
from collections.abc import AsyncIterator
from typing import Dict, Iterator, List, Literal, Optional

# For some reason mypy doesn't think this exists
from typing_extensions import Buffer # type: ignore
from typing_extensions import Buffer

class FileStatus:
path: str
Expand Down
Loading
Loading