|
| 1 | +--- |
| 2 | +title: Lake Formation |
| 3 | +--- |
| 4 | + |
| 5 | +{% include content/plan-grid.md name="data-lakes" %} |
| 6 | + |
| 7 | +Lake Formation is a fully managed service built on top of the AWS Glue Data Catalog that provides one central set of tools to build and manage a Data Lake. These tools help import, catalog, transform, and deduplicate data, as well as provide strategies to optimize data storage and security. |
| 8 | + |
| 9 | +> note "Learn more about Lake Formation features" |
| 10 | +> To learn more about Lake Formation features, refer to the [Amazon Web Services documentation](https://aws.amazon.com/lake-formation/features/){:target="_blank"}. |
| 11 | +
|
| 12 | +The security policies in Lake Formation use two layers of permissions: each resource is protected by Lake Formation permissions (which control access to Data Catalog resources and S3 locations) and IAM permissions (which control access to Lake Formation and AWS Glue API resources). When any user or role reads or writes to a resource, that action must pass a both a Lake Formation and an IAM resource check: for example, a user trying to create a new table in the Data Catalog may have Lake Formation access to the Data Catalog, but if they don't have the correct Glue API permissions, they will be unable to create the table. |
| 13 | + |
| 14 | +For more information about security practices in Lake Formation, see Amazon's [Lake Formation Permissions Reference](https://docs.aws.amazon.com/lake-formation/latest/dg/lf-permissions-reference.html){:target="_blank"} documentation. |
| 15 | + |
| 16 | +## Configure Lake Formation |
| 17 | +You can configure Lake Formation using the [`IAMAllowedPrincipals` group](#configure-lake-formation-using-the-iamallowedprincipals-group) or by [using IAM policies for access control](#configure-lake-formation-using-iam-policies). Configuring Lake Formation using the `IAMAllowedPrincipals` group is an easier method, recommended for those exploring Lake Formation. Setting up Lake Formation using IAM policies for access control is a more advanced setup option, recommended for those who want additional customization options. |
| 18 | + |
| 19 | +> info "Permissions required to configure Data Lakes" |
| 20 | +> To configure Lake Formation, you must be logged in to AWS with data lake administrator or database creator permissions. |
| 21 | +
|
| 22 | +### Configure Lake Formation using the IAMAllowedPrincipals group |
| 23 | + |
| 24 | +#### Existing databases |
| 25 | +1. Open the [AWS Lake Formation service](https://console.aws.amazon.com/lakeformation/){:target="_blank"}. |
| 26 | +2. Under **Data catalog**, select **Settings**. Ensure the checkboxes under the **Default permissions for newly created databases and tables** are not checked. |
| 27 | +3. Under **Permissions**, select the **Data lake permissions** section. Click **Grant**. |
| 28 | +4. On the **Grant data permissions** page, select the `IAMAllowedPrincipals` group in the Principals section. |
| 29 | +5. In the **Database permissions** section, select the checkboxes for **Super** database permissions and **Super** grantable permissions. |
| 30 | +6. Click **Grant**. |
| 31 | +7. On the **Permissions** page, verify the `IAMAllowedPrincipals` group has "All" permissions. |
| 32 | + |
| 33 | +#### New databases |
| 34 | +1. Open the [AWS Lake Formation service](https://console.aws.amazon.com/lakeformation/){:target="_blank"}. |
| 35 | +2. Under **Data catalog**, select **Settings**. Ensure the checkboxes under **Default permissions for newly created databases and tables** are not checked. |
| 36 | +3. Select the Databases tab and click **Create database**. On the **Create database** page: |
| 37 | + 1. Select the **Database** button. |
| 38 | + 2. Name your database. |
| 39 | + 3. Set the location to `s3://$datalake_bucket/segment-data/`. <br/> **Optional:** Add a description to your database. |
| 40 | + 4. Select the `Use only IAM access control for new tables in this database`. |
| 41 | + 5. Click **Create database**. |
| 42 | +4. On the **Databases** page, select your database. From the **Actions** menu, select **Grant**. |
| 43 | +5. On the **Grant data permissions** page, select the `IAMAllowedPrincipals` group in the Principals section. |
| 44 | +6. In the **Database permissions** section, select the checkboxes for **Super** database permissions and **Super** grantable permissions. |
| 45 | +7. Click **Grant**. |
| 46 | +8. On the **Permissions** page, verify the `IAMAllowedPrincipals` group has "All" permissions. |
| 47 | + |
| 48 | +#### Verify your configuration |
| 49 | +To verify that you've configured Lake Formation, open the [AWS Lake Formation service](https://console.aws.amazon.com/lakeformation/){:target="_blank"}, select **Data lake permissions**, and verify the `IAMAllowedPrincipals` group is listed with "All" permissions. |
| 50 | + |
| 51 | +### Configure Lake Formation using IAM policies |
| 52 | + |
| 53 | +> note "Granting Super permission to IAM roles" |
| 54 | +> If you manually configured your database, assign the `EMR_EC2_DefaultRole` Super permissions in step 8. If you configured your database using Terraform, assign the `segment_emr_instance_profile` Super permissions in step 8. |
| 55 | +
|
| 56 | +#### Existing databases |
| 57 | +1. Open the [AWS Lake Formation service](https://console.aws.amazon.com/lakeformation/){:target="_blank"}. |
| 58 | +2. Under **Data catalog**, select **Settings**. Ensure the checkboxes under the **Default permissions for newly created databases and tables** are not checked. |
| 59 | +3. On the **Databases** page, select your database. From the **Actions** menu, select **Grant**. |
| 60 | +5. On the **Grant data permissions** page, select the `EMR_EC2_DefaultRole` (or `segment_emr_instance_profile`, if you configured your data lake using Terraform) and `segment-data-lake-iam-role` roles in the Principals section. |
| 61 | +6. In the **Database permissions** section, select the checkboxes for **Super** database permissions and **Super** grantable permissions. |
| 62 | +7. Click **Grant**. |
| 63 | +8. On the **Permissions** page, verify the `EMR_EC2_DefaultRole` (or `segment_emr_instance_profile`) and `segment-data-lake-iam-role` roles have "All" permissions. |
| 64 | + |
| 65 | +#### New databases |
| 66 | +1. Open the [AWS Lake Formation service](https://console.aws.amazon.com/lakeformation/){:target="_blank"}. |
| 67 | +2. Under **Data catalog**, select **Settings**. Ensure the checkboxes under the **Default permissions for newly created databases and tables** are not checked. |
| 68 | +3. Select the Databases tab and click **Create database**. On the **Create database** page: |
| 69 | + 1. Select the **Database** button. |
| 70 | + 2. Name your database. |
| 71 | + 3. Set the location to `s3://$datalake_bucket/segment-data/`. <br/> **Optional:** Add a description to your database. |
| 72 | + 4. Click **Create database**. |
| 73 | +4. On the **Databases** page, select your database. From the **Actions** menu, select **Grant**. |
| 74 | +5. On the **Grant data permissions** page, select the `EMR_EC2_DefaultRole` (or `segment_emr_instance_profile`, if you configured your data lake using Terraform) and `segment-data-lake-iam-role` roles in the Principals section. |
| 75 | +6. In the **Database permissions** section, select the checkboxes for **Super** database permissions and **Super** grantable permissions. |
| 76 | +7. Click **Grant**. |
| 77 | +8. On the **Permissions** page, verify the `EMR_EC2_DefaultRole` (or `segment_emr_instance_profile`) and `segment-data-lake-iam-role` roles have "All" permissions. |
0 commit comments