Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 1.45 KB

settings-for-data-processing.md

File metadata and controls

16 lines (12 loc) · 1.45 KB

To use {{ dataproc-name }} clusters, specify the following parameters in your project settings:

  • Default folder for integrating with other {{ yandex-cloud }} services. A {{ dataproc-name }} cluster will be deployed in this folder based on the current cloud quotas. A fee for using the cluster will be debited from your cloud billing account.

  • Service account {{ ml-platform-name }} will use for creating and managing clusters. The service account needs the following roles:

    • dataproc.agent to use {{ dataproc-name }} clusters.
    • dataproc.admin to create clusters from {{ dataproc-name }} templates.
    • vpc.user to use the {{ dataproc-name }} cluster network.
    • iam.serviceAccounts.user to create resources in the folder on behalf of the service account.
  • Subnet for {{ ml-platform-name }} to communicate with the {{ dataproc-name }} cluster. Since the {{ dataproc-name }} cluster needs to access the internet, make sure to configure a NAT gateway in the subnet.

    {% include subnet-create %}

{% note warning %}

The {{ dataproc-name }} persistent cluster must have the livy:livy.spark.deploy-mode : client setting.

{% endnote %}