Skip to content

Add regional endpoints code samples #12097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions dlp/snippets/Inspect/inspect_string_rep.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Sample app that uses the Data Loss Prevention API to inspect a string, a
local file or a file on Google Cloud Storage."""


import argparse

# [START dlp_inspect_string_rep]
from typing import List
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per PEP-585, we can now type hint with list directly (for example list[str]), so no need to import this.


import google.cloud.dlp


def inspect_string(
project: str,
rep_location: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: For naming consistency with other samples, can we keep this as just location?

content_string: str,
info_types: List[str],
include_quote: bool = True,
) -> None:
"""Uses the Data Loss Prevention API to analyze strings for protected data.
Args:
project: The Google Cloud project id to use as a parent resource.
rep_location: The location of regional endpoint to use.
content_string: The string to inspect.
info_types: A list of strings representing info types to look for.
A full list of info type categories can be fetched from the API.
include_quote: Boolean for whether to display a quote of the detected
information in the results.
Returns:
None; the response from the API is printed to the terminal.
"""

# Assemble the regional endpoint url using provided rep location
rep_endpoint = f"dlp.{rep_location}.rep.googleapis.com"
client_options = {"api_endpoint": rep_endpoint}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It might be a good idea to match the variable name with the option entry? Like {"api_endpoint": api_endpoint}


# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient(client_options=client_options)

# Prepare info_types by converting the list of strings into a list of
# dictionaries (protos are also accepted).
info_types = [{"name": info_type} for info_type in info_types]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this it's not clear to me what the argument info_types is supposed to look like. For a code sample, it's best if the request is completely hard-coded to something we know works and can be copy-pasted to run.


# Construct the configuration dictionary. Keys which are None may
# optionally be omitted entirely.
inspect_config = {
"info_types": info_types,
"include_quote": include_quote,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I would hard-code a value for include_quote here. It makes the sample easier to read and there are less decisions to be taken for users.

}

# Construct the `item`.
item = {"value": content_string}

# Convert the project id into a full resource id.
parent = f"projects/{project}/locations/{rep_location}"

# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request is a little hard to reason about since all the variables were computed before. It might be a good idea to inline the values directly here instead of having them as separate variables.

)

# Print out the results.
if response.result.findings:
for finding in response.result.findings:
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of try-catch, we could use hasattr to check that.

if hasattr(finding, 'quote'):
  print(f"Quote: {finding.quote}")

if finding.quote:
print(f"Quote: {finding.quote}")
except AttributeError:
pass
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")


# [END dlp_inspect_string_rep]

if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider removing the entire CLI interface. We shouldn't include CLI interfaces unless they're strictly necessary for the code inside the region tags to work.

parser = argparse.ArgumentParser()

parser.add_argument("item", help="The string to inspect.")
parser.add_argument(
"--project",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mark this as required=True?

help="The Google Cloud project id to use as a parent resource.",
)
parser.add_argument(
"--rep_location",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a few examples of what values are valid in the help message.

Also add a default value, since all flags here are optional, but leaving this empty will make the snippet fails to run.

help="The regional endpoint location to use.",
)
parser.add_argument(
"--info_types",
nargs="+",
help="Strings representing info types to look for. A full list of "
"info categories and types is available from the API. Examples "
'include "FIRST_NAME", "LAST_NAME", "EMAIL_ADDRESS". '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change FIRST_NAME LAST_NAME into some other infotypes. They are not recommended in https://cloud.google.com/sensitive-data-protection/docs/infotypes-reference , so we should avoid them in code samples as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of a code sample is not to make a fully-runnable wrapper tool around the API, but to provide an example of a specific case. The style guide mentions that we shouldn't be creating CLIs around code samples to reduce maintenance burden.

Instead, the code sample should have a hard-coded request, that people can copy-paste and run directly.

"If unspecified, the three above examples will be used.",
default=["FIRST_NAME", "LAST_NAME", "EMAIL_ADDRESS"],
)
parser.add_argument(
"--include_quote",
type=bool,
help="A boolean for whether to display a quote of the detected "
"information in the results.",
default=True,
)
args = parser.parse_args()

inspect_string(
args.project,
args.rep_location,
args.item,
args.info_types,
include_quote=args.include_quote,
)
38 changes: 38 additions & 0 deletions dlp/snippets/Inspect/inspect_string_rep_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the 'License');
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an 'AS IS' BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

import inspect_string_rep as inspect_content_rep

import pytest

GCLOUD_PROJECT = os.getenv("GOOGLE_CLOUD_PROJECT")


def test_inspect_string(capsys: pytest.CaptureFixture) -> None:
test_string = "My name is Gary Smith and my email is [email protected]"
rep_location = "us-east1"

inspect_content_rep.inspect_string(
GCLOUD_PROJECT,
rep_location,
test_string,
["FIRST_NAME", "EMAIL_ADDRESS"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use PERSON_NAME instead.

include_quote=True,
)

out, _ = capsys.readouterr()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing on stdout is brittle and can cause tests to fail if the underlying implementation changes. Instead, you should return the response from the code snippet and test it directly here.

assert "Info type: FIRST_NAME" in out
assert "Info type: EMAIL_ADDRESS" in out