To use {{ dataproc-name }} clusters, specify the following parameters in your project settings:
-
Default folder for integrating with other {{ yandex-cloud }} services. A {{ dataproc-name }} cluster will be deployed in this folder based on the current cloud quotas. A fee for using the cluster will be debited from your cloud billing account.
-
Service account {{ ml-platform-name }} will use for creating and managing clusters. The service account needs the following roles:
dataproc.agent
to use {{ dataproc-name }} clusters.dataproc.admin
to create clusters from {{ dataproc-name }} templates.vpc.user
to use the {{ dataproc-name }} cluster network.iam.serviceAccounts.user
to create resources in the folder on behalf of the service account.
-
Subnet for {{ ml-platform-name }} to communicate with the {{ dataproc-name }} cluster. Since the {{ dataproc-name }} cluster needs to access the internet, make sure to configure a NAT gateway in the subnet.
{% include subnet-create %}
{% note warning %}
The {{ dataproc-name }} persistent cluster must have the livy:livy.spark.deploy-mode : client
setting.
{% endnote %}