How do I implement EMR

Approve access to EMRFS data in Amazon S3

By default, the EMR role for EC2 determines the permissions to access EMRFS data in Amazon S3. The IAM policies assigned to this role apply regardless of who or the group making the request through EMRFS. The default is. For more information, see Service Role for EC2 Cluster Instances (EC2 Instance Profile).

Starting with Amazon EMR version 5.10.0, you can use a security configuration to specify IAM roles for EMRFS. This allows you to customize permissions for EMRFS requests to Amazon S3 for multi-user clusters. You can specify different IAM roles for different users and groups and for different Amazon S3 bucket locations based on the prefix in Amazon S3. When EMRFS makes a request to Amazon S3 that matches the users, groups, or locations you specify, the cluster uses the appropriate role, not the EMR role for EC2. For more information, see Configuring IAM Roles for EMRFS Requests to Amazon S3.

Alternatively, if the needs of your Amazon EMR solution go beyond what IAM roles provide for EMRFS, you can define a custom credential provider class that allows you to customize access to EMRFS data in Amazon S3.

Create a custom credential provider for EMRFS data in Amazon S3

To create a custom credential provider, implement the AWSCredentialsProvider class and the Hadoop Configurable class.

For a detailed description of this approach, see Securely Analyze Data from Another AWS Account Using EMRFS on the AWS Big Data Blog. The blog post has a tutorial that walks you through the entire process, from creating IAM roles to starting the cluster. It also includes a sample Java code to implement the custom credential provider class.

The basic steps are as follows:

How to define a custom credential provider

  1. Create a custom credential provider class as a JAR file.

  2. Run a script as a bootstrap action to copy the JAR file with the custom credential provider into the master node of the cluster. For more information about bootstrap actions, see (Optional) Create Bootstrap Actions to Install Additional Software.

  3. Customize the classification to indicate the class implemented in the JAR file. For more information about specifying configuration objects to customize applications, see Configuring Applications in theAmazon EMR Release Notesout.

    The following example shows a command that starts a Hive cluster with common configuration parameters and also includes the following:

    • A bootstrap action that the script runs, which is located in Amazon S3.

    • A classification that defines one in the JAR file as a custom credential provider.

    Linux line-continuation characters (\) are included for readability. They can be removed or used in Linux commands. On Windows, remove these files or replace them with a caret (^).