Skip to content

[Feature]: Configurable file/dir ignores for CLI uploads: support ovcli.conf and .gitignore defaults #1386

@sentisso

Description

@sentisso

Problem Statement

Current OpenViking CLI requires users or AI agents to always specify file and directory ignore flags (ignore_dirs, include, exclude) explicitly with each upload. This is error-prone and makes it difficult to ensure consistency in which files/directories are ingested, especially for human users or when automating uploads via agents (e.g., after each feature implementation).

We lack a robust way to keep certain files/folders out of the knowledge base by default.

Proposed Solution

Support a configurable, automatic ignore mechanism for the CLI's folder/file uploads:

  • By default, respect the repository's .gitignore (and all nested .gitignore files) for which files and directories to ignore during upload. Respecting .gitignore should ideally be optional and possible to turn off in config. Several OSS Python implementations (e.g. mherrmann/gitignore_parser) can be leveraged for parsing.
  • Extend ovcli.conf with default ignore rules (ignore_dirs, include, exclude) that will be used for every upload when the corresponding CLI flags are not explicitly specified. This allows personalization and consistent filtering at the user/project level, following the documented config format.
  • The set of ignore rules from all sources should be merged for the final configuration (not override each other).

Combined, these improvements ensure uploads are always filtered correctly/consistently without having to set ignore flags every time.

Alternatives Considered

  • Only extending the ovcli.conf with default ignore_dirs/include/exclude, might be easier to implement and it might cover most of the use cases mentioned here (only downside is that most rules in repo .gitignore will be duplicated here, but that's more acceptable). Meaning if respecting .gitignore is too complex, it's okay to only extend ovcli.conf.

Feature Area

CLI Tools

Use Case

I want to ensure that confidential, irrelevant, or project-specific files are never accidentally uploaded to OpenViking (especially if an agent or user forgets to set all the right ignore flags on each upload), and to inherit sensible and concrete ignore rules without manual intervention. Since the default IGNORE_DIRS might not cover all cases.

This enables safer and more predictable ingestion at scale or automation.

Example API (Optional)

# Example: ovcli.conf config

{
  "url": "http://localhost:1933",
  "api_key": "your-secret-key",
  "upload": {
    "ignore_dirs": [".cache", ".mypy_cache","custom_data",".nx"],
    "exclude": ["*.tmp","*.log","*.bak"],
    "include": ["*.md","*.pdf"]
  }
}


# Example: CLI behaviour
openviking add-resource ./docs           # Uses .gitignore, ovcli.conf ignore rules
openviking add-resource ./docs --exclude "*.test.ts"  # Adds an additional exclude just for this upload, .gitignore and ovcli.conf are still respected, NOT overriden

Additional Context

No response

Contribution

  • I am willing to contribute to implementing this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions