Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GitLab fetcher #649

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from
84 changes: 83 additions & 1 deletion docs/configuration.rst
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now at least the build is fixed.
For reference, here is the last build of lines with missing coverage.

We can try to mimic how GitHub was tested.

Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The first file found will be used; the other files will be ignored.

If no style is configured, Nitpick will fail with an error message.

Run ``nipick init`` to create a config file (:ref:`cli_cmd_init`).
Run ``nitpick init`` to create a config file (:ref:`cli_cmd_init`).

To configure your own style, you can either use ``nitpick init``:

Expand All @@ -42,6 +42,9 @@ Remote style

Use the URL of the remote file.

Github
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a capital "H". 😅

Suggested change
Github
GitHub

Please fix it everywhere if there are other places.

~~~~~~

If it's hosted on GitHub, use any of the following formats:

GitHub URL scheme (``github://`` or ``gh://``) pinned to a specific version:
Expand Down Expand Up @@ -111,6 +114,85 @@ Or you can use an environment variable to avoid keeping secrets in plain text.
A literal token cannot start with a ``$``.
All tokens must not contain any ``@`` or ``:`` characters.

Gitlab
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, capital "L". 😄

Suggested change
Gitlab
GitLab

~~~~~~

*Tested for GitLab.com and self-managed Gitlab, should work fine with Ultimate Gitlab.*

Unlike GitHub, projects on GitLab can have a deep hierarchy of the form ``https://gitlab.com/group/*subgroups/project/*folders/file``.

In the interface, GitLab uses the group URL and subgroup names to form the project path;
In contrast, in the GitLab API the path to project is not important, the project number is used instead of the project path.

So, if it's hosted on GitLab, you can use two options:
- For domain ``gitlab.com`` you can use either ``https://`` or ``gitlab://`` (or ``gl://``) schemes
- For a self-managed Gitlab you should use ``gitlab://`` (or ``gl://``) scheme pinned to a specific version

Regardless of the chosen scheme, the corresponding raw valid URL will be generated.

``https://`` scheme
^^^^^^^^^^^^^^^^^^^

Applicable only if your style repo is hosted on a domain ``gitlab.com``. The regular GitLab URL is used.

.. code-block:: toml

[tool.nitpick]
style = "https://gitlab.com/my_group/sub_group/nitpick/-/blob/main/my_folder/.nitpick.toml"

Or use the raw GitLab URL directly:

.. code-block:: toml

[tool.nitpick]
style = "https://gitlab.com/my_group/sub_group/nitpick/-/raw/main/my_folder/.nitpick.toml"

If your style is on a private GitLab repo, you can provide the token directly on the URL.
Or you can use an environment variable to avoid keeping secrets in plain text.

.. code-block:: toml

[tool.nitpick]
style = "https://[email protected]/my_group/nitpick/-/blob/main/.nitpick.toml"
# or using an environment variable instead of plain text
style = "https://[email protected]/my_group/nitpick/-/blob/main/.nitpick.toml"

``gitlab://`` scheme
^^^^^^^^^^^^^^^^^^^^

The GitLab URL scheme uses the GitLab API and can be used for any version of GitLab: Free, Premium (self-hosted) and Ultimate.

Scheme uses GitLab API, and GitLab API uses the project number instead of the name.
Project number can be obtained in the project settings.

GitLab URL scheme (``gitlab://`` or ``gl://``) pinned to a specific version:

.. code-block:: toml

[tool.nitpick]
# gl|gitlab://[<TOKEN>@]<HOST>/<PROJECT_NUMBER>[@<BRANCH_NAME_OR_TAG_OR_COMMIT>]/<FILE_PATH>
style = "gitlab://my_gitlab.com/123456@main/my_folder/nitpick-style.toml"
# if no branch is provided, the default branch will be used
style = "gitlab://my_gitlab.com/123456/nitpick-style.toml"

You must pass the hostname, project number and file path.
Optionally you can pass the branch_name and the private token from your private GitLab repo as plain text (or use an environment variable)

.. code-block:: toml

[tool.nitpick]
style = "gitlab://p5iCG5AJuDgY@my_gitlab.com/123456/.nitpick.toml"
# it has the same effect as providing the default branch explicitly
style = "gl://p5iCG5AJuDgY@my_gitlab.com/123456@default_branch/.nitpick.toml"
# pass custom branch and token through environment variable
style = "gl://$GITLAB_TOKEN@my_gitlab.com/123456@custom_branch/linters/nitpick/.nitpick.toml"

.. note::

A literal token cannot start with a ``$``.
All tokens must not contain any ``@`` or ``:`` characters.


Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this far.
Good docs, thanks a lot. 👍🏻 👏🏻

I will try it out on some GitLab repo from gitlab.com (I don't have a self-hosted or paid versions).

Style inside Python package
---------------------------

Expand Down
2 changes: 2 additions & 0 deletions src/nitpick/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
GITHUB_COM_API = "api.github.com"
GITHUB_COM_QUERY_STRING_TOKEN = "token" # nosec # noqa: S105
GITHUB_COM_RAW = "raw.githubusercontent.com"
GITLAB_BRANCH_REFERENCE = "ref"
GITLAB_COM = "gitlab.com"
GIT_AT_REFERENCE = "@"
GIT_CORE_EXCLUDES_FILE = "core.excludesFile"
GIT_DIR = ".git"
Expand Down
182 changes: 175 additions & 7 deletions src/nitpick/style.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# pylint: disable=too-many-lines # TODO: refactor: break this into separate modules in a follow-up PR
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this and will fix it later on another PR, so this one stays small.

"""Style parsing and merging."""

from __future__ import annotations
Expand All @@ -9,7 +10,7 @@
from enum import auto
from functools import lru_cache
from pathlib import Path
from typing import TYPE_CHECKING, ClassVar, Iterable, Iterator, Sequence, cast
from typing import TYPE_CHECKING, ClassVar, Iterable, Iterator, Literal, NoReturn, Sequence, cast

import attr
import click
Expand Down Expand Up @@ -37,6 +38,8 @@
GITHUB_COM_API,
GITHUB_COM_QUERY_STRING_TOKEN,
GITHUB_COM_RAW,
GITLAB_BRANCH_REFERENCE,
GITLAB_COM,
JMEX_NITPICK_STYLES_INCLUDE,
MERGED_STYLE_TOML,
NITPICK_STYLE_TOML,
Expand All @@ -62,8 +65,7 @@
except ImportError: # pragma: no cover
from dpath.util import merge as dpath_merge

GITHUB_API_SESSION = Session() # Dedicated session to reuse connections

GIT_API_SESSION = Session() # Dedicated session to reuse connections

if TYPE_CHECKING:
from marshmallow import Schema
Expand Down Expand Up @@ -101,7 +103,7 @@ def github_default_branch(api_url: str, *, token: str | None = None) -> str:
This function is using ``lru_cache()`` as a simple memoizer, trying to avoid this rate limit error.
"""
headers = {"Authorization": f"token {token}"} if token else None
response = GITHUB_API_SESSION.get(api_url, headers=headers)
response = GIT_API_SESSION.get(api_url, headers=headers)
response.raise_for_status()

return response.json()["default_branch"]
Expand Down Expand Up @@ -133,6 +135,12 @@ def parse_cache_option(cache_option: str) -> tuple[CachingEnum, timedelta | int]
return caching, expires_after


def raise_gitlab_incorrect_url_error(url: furl) -> NoReturn:
"""Raise an error if the URL is not a valid GitLab URL."""
message = f"Invalid GitLab URL: {url}"
raise ValueError(message)
Comment on lines +138 to +141
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a private method on the GitLabURL below since it's only used there.



@dataclass()
class StyleManager: # pylint: disable=too-many-instance-attributes
"""Include styles recursively from one another."""
Expand Down Expand Up @@ -403,6 +411,8 @@ class Scheme(LowercaseStrEnum):
FILE = auto()
GH = auto()
GITHUB = auto()
GITLAB = auto()
GL = auto()
HTTP = auto()
HTTPS = auto()
PY = auto()
Expand All @@ -420,9 +430,9 @@ class StyleFetcherManager:

session: CachedSession = field(init=False)
fetchers: dict[str, StyleFetcher] = field(init=False)
schemes: tuple[str] = field(init=False)
schemes: tuple[str, ...] = field(init=False)

def __post_init__(self):
def __post_init__(self) -> None:
"""Initialize dependant properties."""
caching, expire_after = parse_cache_option(self.cache_option)
# honour caching headers on the response when an expiration time has
Expand Down Expand Up @@ -535,7 +545,13 @@ def _get_fetchers(session: CachedSession) -> dict[str, StyleFetcher]:
def _factory(klass: type[StyleFetcher]) -> StyleFetcher:
return klass(session) if klass.requires_connection else klass()

fetchers = (_factory(FileFetcher), _factory(HttpFetcher), _factory(GitHubFetcher), _factory(PythonPackageFetcher))
fetchers = (
_factory(FileFetcher),
_factory(HttpFetcher),
_factory(GitHubFetcher),
_factory(GitLabFetcher),
_factory(PythonPackageFetcher),
)
return dict(_fetchers_to_pairs(fetchers))


Expand Down Expand Up @@ -733,6 +749,158 @@ def _download(self, url: furl, **kwargs) -> str:
return super()._download(github_url.raw_content_url, **kwargs)


@dataclass(frozen=True)
class GitLabURL:
"""Represent a GitLab URL, created from a URL or from its parts."""

scheme: str
host: str
project: list[str]
path: str
git_reference: str
query_params: tuple[tuple[str, str], ...]
auth_token: str | None = None

@property
def token(self) -> str | None:
"""Token encoded in this URL.

If present, and it starts with a ``$``, it will be replaced with the
value of the environment corresponding to the remaining part of the
string.
"""
token = self.auth_token
if token is not None and token.startswith("$"):
token = os.getenv(token[1:])
Comment on lines +773 to +774
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this works, I haven't tried.

Suggested change
if token is not None and token.startswith("$"):
token = os.getenv(token[1:])
if not token and token.startswith("$"):
token = os.getenv(token[1:])

return token

@property
def authorization_header(self) -> dict[Literal["PRIVATE-TOKEN"], str] | None:
"""Authorization header encoded in this URL."""
return {"PRIVATE-TOKEN": self.token} if self.token else None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could a constant in constants.py:

GITLAB_TOKEN_KEY = "PRIVATE-TOKEN"

Or some similar name.


@property
def raw_content_url(self) -> furl:
"""Raw content URL for this path."""
if self.scheme in GitLabFetcher.protocols:
query_params = self.query_params
if self.git_reference:
# If the branch was not specified for the raw file, GitLab itself will substitute the HEAD branch
# https://docs.gitlab.com/ee/api/repository_files.html#get-raw-file-from-repository
query_params += ((GITLAB_BRANCH_REFERENCE, self.git_reference),)

return furl(
scheme=Scheme.HTTPS,
host=self.host,
path=["api", "v4", "projects", *self.project, "repository", "files", self.path, "raw"],
query_params=query_params,
)

return furl(
scheme=Scheme.HTTPS,
host=self.host,
path=[*self.project, "-", "raw", self.git_reference, *self.path],
query_params=self.query_params,
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this far, will continue later.


@classmethod
def _from_http_scheme_furl(cls, url: furl) -> GitLabURL:
"""Create an instance from a parsed URL in accepted format.

Gitlab GUI uses named path like:
- https://gitlab.com/group_URL/subgroup/project_name/-/blob/branch/folder/file
- https://gitlab.com/group_URL/sub_group/project_name/-/raw/branch/folder/file
See the code for ``test_parsing_gitlab_http_api_urls()`` for more examples.
"""
auth_token = url.username
query_params = tuple(url.args.items())

segments = url.path.segments
try:
dash_index = segments.index("-")
blob_index = dash_index + 2 # "blob" or "raw" should immediately follow
if segments[dash_index + 1] not in {"blob", "raw"}:
raise_gitlab_incorrect_url_error(url)
except (ValueError, IndexError):
raise_gitlab_incorrect_url_error(url)

project = segments[:dash_index] # Everything before the "-"
# The error for git_reference will never be raised due to url normalization (always add .toml)
git_reference = segments[blob_index] # The first argument after "blob"
path = segments[blob_index + 1 :] # Everything after the git_reference

return cls(
scheme=url.scheme,
host=url.host,
project=project,
path=path,
git_reference=git_reference,
query_params=query_params,
auth_token=auth_token,
)

@classmethod
def _from_gitlab_scheme_furl(cls, url: furl) -> GitLabURL:
"""Create an instance from a parsed URL in accepted format.

The Gitlab API does not pay attention to the groups and subgroups the project is in,
instead it uses the project number and use URL encoded full path to file:
https://gitlab.com/api/v4/projects/project_number/repository/files/folder%2Ffile/raw?ref=branch_name

Documentation https://docs.gitlab.com/ee/api/repository_files.html#get-raw-file-from-repository
See the code for ``test_parsing_gitlab_gl_api_urls()`` for more examples.
"""
auth_token = url.username
query_params = tuple(url.args.items())

project_with_git_reference, *path = url.path.segments
project, _, git_reference = project_with_git_reference.partition(GIT_AT_REFERENCE)
project = [project]
path = "/".join(path)

return cls(
scheme=url.scheme,
host=url.host,
project=project,
path=path,
git_reference=git_reference,
query_params=query_params,
auth_token=auth_token,
)

@classmethod
def from_furl(cls, url: furl) -> GitLabURL:
"""Create an instance from a parsed URL in any accepted format.

The gitlab:// scheme uses the Gitlab API and takes a project number.
The https:// scheme uses the Gitlab site and takes the path to the project.
"""
if url.scheme in GitLabFetcher.protocols:
return cls._from_gitlab_scheme_furl(url)
return cls._from_http_scheme_furl(url)


@dataclass(frozen=True)
class GitLabFetcher(HttpFetcher): # pylint: disable=too-few-public-methods
"""Fetch styles from GitLab repositories via API."""

protocols: tuple[str, ...] = (
Scheme.GL,
Scheme.GITLAB,
) # type: ignore[assignment,has-type]
domains: tuple[str, ...] = (GITLAB_COM,)

def _normalize_scheme(self, scheme: str) -> str: # pylint: disable=no-self-use
# Use gitlab:// instead of gl:// in the canonical URL
return Scheme.GITLAB if scheme == Scheme.GL else scheme # type: ignore[return-value]

def _download(self, url: furl, **kwargs) -> str:
"""Downloading style from url."""
gitlab_url = GitLabURL.from_furl(url)
kwargs.setdefault("headers", gitlab_url.authorization_header)
return super()._download(gitlab_url.raw_content_url, **kwargs)


@dataclass(frozen=True)
class PythonPackageURL:
"""Represent a resource file in installed Python package."""
Expand Down
Loading
Loading