Class S3FileIOProperties

  • All Implemented Interfaces:
    java.io.Serializable

    public class S3FileIOProperties
    extends java.lang.Object
    implements java.io.Serializable
    See Also:
    Serialized Form
    • Field Detail

      • CLIENT_FACTORY

        public static final java.lang.String CLIENT_FACTORY
        This property is used to pass in the aws client factory implementation class for S3 FileIO. The class should implement S3FileIOAwsClientFactory. For example, DefaultS3FileIOAwsClientFactory implements S3FileIOAwsClientFactory. If this property wasn't set, will load one of AwsClientFactory factory classes to provide backward compatibility.
        See Also:
        Constant Field Values
      • SSE_TYPE

        public static final java.lang.String SSE_TYPE
        Type of S3 Server side encryption used, default to SSE_TYPE_NONE.

        For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html

        See Also:
        Constant Field Values
      • SSE_TYPE_NONE

        public static final java.lang.String SSE_TYPE_NONE
        No server side encryption.
        See Also:
        Constant Field Values
      • SSE_TYPE_KMS

        public static final java.lang.String SSE_TYPE_KMS
        S3 SSE-KMS encryption.

        For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html

        See Also:
        Constant Field Values
      • SSE_TYPE_S3

        public static final java.lang.String SSE_TYPE_S3
        S3 SSE-S3 encryption.

        For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html

        See Also:
        Constant Field Values
      • SSE_TYPE_CUSTOM

        public static final java.lang.String SSE_TYPE_CUSTOM
        S3 SSE-C encryption.

        For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html

        See Also:
        Constant Field Values
      • SSE_KEY

        public static final java.lang.String SSE_KEY
        If S3 encryption type is SSE-KMS, input is a KMS Key ID or ARN. In case this property is not set, default key "aws/s3" is used. If encryption type is SSE-C, input is a custom base-64 AES256 symmetric key.
        See Also:
        Constant Field Values
      • SSE_MD5

        public static final java.lang.String SSE_MD5
        If S3 encryption type is SSE-C, input is the base-64 MD5 digest of the secret key. This MD5 must be explicitly passed in by the caller to ensure key integrity.
        See Also:
        Constant Field Values
      • MULTIPART_UPLOAD_THREADS

        public static final java.lang.String MULTIPART_UPLOAD_THREADS
        Number of threads to use for uploading parts to S3 (shared pool across all output streams), default to Runtime.availableProcessors()
        See Also:
        Constant Field Values
      • MULTIPART_SIZE

        public static final java.lang.String MULTIPART_SIZE
        The size of a single part for multipart upload requests in bytes (default: 32MB). based on S3 requirement, the part size must be at least 5MB. To ensure performance of the reader and writer, the part size must be less than 2GB.

        For more details, see https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html

        See Also:
        Constant Field Values
      • MULTIPART_THRESHOLD_FACTOR

        public static final java.lang.String MULTIPART_THRESHOLD_FACTOR
        The threshold expressed as a factor times the multipart size at which to switch from uploading using a single put object request to uploading using multipart upload (default: 1.5).
        See Also:
        Constant Field Values
      • MULTIPART_THRESHOLD_FACTOR_DEFAULT

        public static final double MULTIPART_THRESHOLD_FACTOR_DEFAULT
        See Also:
        Constant Field Values
      • STAGING_DIRECTORY

        public static final java.lang.String STAGING_DIRECTORY
        Location to put staging files for upload to S3, default to temp directory set in java.io.tmpdir.
        See Also:
        Constant Field Values
      • ACL

        public static final java.lang.String ACL
        Used to configure canned access control list (ACL) for S3 client to use during write. If not set, ACL will not be set for requests.

        The input must be one of ObjectCannedACL, such as 'public-read-write' For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html

        See Also:
        Constant Field Values
      • ENDPOINT

        public static final java.lang.String ENDPOINT
        Configure an alternative endpoint of the S3 service for S3FileIO to access.

        This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud.

        See Also:
        Constant Field Values
      • PATH_STYLE_ACCESS

        public static final java.lang.String PATH_STYLE_ACCESS
        If set true, requests to S3FileIO will use Path-Style, otherwise, Virtual Hosted-Style will be used.

        For more details: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html

        See Also:
        Constant Field Values
      • PATH_STYLE_ACCESS_DEFAULT

        public static final boolean PATH_STYLE_ACCESS_DEFAULT
        See Also:
        Constant Field Values
      • ACCESS_KEY_ID

        public static final java.lang.String ACCESS_KEY_ID
        Configure the static access key ID used to access S3FileIO.

        When set, the default client factory will use the basic or session credentials provided instead of reading the default credential chain to create S3 access credentials. If SESSION_TOKEN is set, session credential is used, otherwise basic credential is used.

        See Also:
        Constant Field Values
      • SECRET_ACCESS_KEY

        public static final java.lang.String SECRET_ACCESS_KEY
        Configure the static secret access key used to access S3FileIO.

        When set, the default client factory will use the basic or session credentials provided instead of reading the default credential chain to create S3 access credentials. If SESSION_TOKEN is set, session credential is used, otherwise basic credential is used.

        See Also:
        Constant Field Values
      • SESSION_TOKEN

        public static final java.lang.String SESSION_TOKEN
        Configure the static session token used to access S3FileIO.

        When set, the default client factory will use the session credentials provided instead of reading the default credential chain to create S3 access credentials.

        See Also:
        Constant Field Values
      • USE_ARN_REGION_ENABLED

        public static final java.lang.String USE_ARN_REGION_ENABLED
        Enable to make S3FileIO, to make cross-region call to the region specified in the ARN of an access point.

        By default, attempting to use an access point in a different region will throw an exception. When enabled, this property allows using access points in other regions.

        For more details see: https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3Configuration.html#useArnRegionEnabled--

        See Also:
        Constant Field Values
      • USE_ARN_REGION_ENABLED_DEFAULT

        public static final boolean USE_ARN_REGION_ENABLED_DEFAULT
        See Also:
        Constant Field Values
      • CHECKSUM_ENABLED

        public static final java.lang.String CHECKSUM_ENABLED
        Enables eTag checks for S3 PUT and MULTIPART upload requests.
        See Also:
        Constant Field Values
      • CHECKSUM_ENABLED_DEFAULT

        public static final boolean CHECKSUM_ENABLED_DEFAULT
        See Also:
        Constant Field Values
      • REMOTE_SIGNING_ENABLED

        public static final java.lang.String REMOTE_SIGNING_ENABLED
        See Also:
        Constant Field Values
      • REMOTE_SIGNING_ENABLED_DEFAULT

        public static final boolean REMOTE_SIGNING_ENABLED_DEFAULT
        See Also:
        Constant Field Values
      • DELETE_BATCH_SIZE

        public static final java.lang.String DELETE_BATCH_SIZE
        Configure the batch size used when deleting multiple files from a given S3 bucket
        See Also:
        Constant Field Values
      • DELETE_BATCH_SIZE_DEFAULT

        public static final int DELETE_BATCH_SIZE_DEFAULT
        Default batch size used when deleting files.

        Refer to https://github.com/apache/hadoop/commit/56dee667707926f3796c7757be1a133a362f05c9 for more details on why this value was chosen.

        See Also:
        Constant Field Values
      • DELETE_BATCH_SIZE_MAX

        public static final int DELETE_BATCH_SIZE_MAX
        Max possible batch size for deletion. Currently, a max of 1000 keys can be deleted in one batch. https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html
        See Also:
        Constant Field Values
      • WRITE_TAGS_PREFIX

        public static final java.lang.String WRITE_TAGS_PREFIX
        Used by S3FileIO to tag objects when writing. To set, we can pass a catalog property.

        For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html

        Example: s3.write.tags.my_key=my_val

        See Also:
        Constant Field Values
      • WRITE_TABLE_TAG_ENABLED

        public static final java.lang.String WRITE_TABLE_TAG_ENABLED
        Used by GlueCatalog to tag objects when writing. To set, we can pass a catalog property.

        For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html

        Example: s3.write.table-tag-enabled=true

        See Also:
        Constant Field Values
      • WRITE_TABLE_TAG_ENABLED_DEFAULT

        public static final boolean WRITE_TABLE_TAG_ENABLED_DEFAULT
        See Also:
        Constant Field Values
      • WRITE_STORAGE_CLASS

        public static final java.lang.String WRITE_STORAGE_CLASS
        Used by S3FileIO to tag objects' storage class when writing. To set, we can pass a catalog property. After set, x-amz-storage-class header will be set to this property

        For more details, see https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/userguide/storage-class-intro.html

        Example: s3.write.storage-class=INTELLIGENT_TIERING

        See Also:
        Constant Field Values
      • WRITE_NAMESPACE_TAG_ENABLED

        public static final java.lang.String WRITE_NAMESPACE_TAG_ENABLED
        Used by GlueCatalog to tag objects when writing. To set, we can pass a catalog property.

        For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html

        Example: s3.write.namespace-tag-enabled=true

        See Also:
        Constant Field Values
      • WRITE_NAMESPACE_TAG_ENABLED_DEFAULT

        public static final boolean WRITE_NAMESPACE_TAG_ENABLED_DEFAULT
        See Also:
        Constant Field Values
      • DELETE_TAGS_PREFIX

        public static final java.lang.String DELETE_TAGS_PREFIX
        Used by S3FileIO to tag objects when deleting. When this config is set, objects are tagged with the configured key-value pairs before deletion. This is considered a soft-delete, because users are able to configure tag-based object lifecycle policy at bucket level to transition objects to different tiers.

        For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html

        Example: s3.delete.tags.my_key=my_val

        See Also:
        Constant Field Values
      • DELETE_THREADS

        public static final java.lang.String DELETE_THREADS
        Number of threads to use for adding delete tags to S3 objects, default to Runtime.availableProcessors()
        See Also:
        Constant Field Values
      • DELETE_ENABLED

        public static final java.lang.String DELETE_ENABLED
        Determines if S3FileIO deletes the object when io.delete() is called, default to true. Once disabled, users are expected to set tags through DELETE_TAGS_PREFIX and manage deleted files through S3 lifecycle policy.
        See Also:
        Constant Field Values
      • DELETE_ENABLED_DEFAULT

        public static final boolean DELETE_ENABLED_DEFAULT
        See Also:
        Constant Field Values
      • ACCELERATION_ENABLED

        public static final java.lang.String ACCELERATION_ENABLED
        Determines if S3 client will use the Acceleration Mode, default to false.

        For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration.html

        See Also:
        Constant Field Values
      • ACCELERATION_ENABLED_DEFAULT

        public static final boolean ACCELERATION_ENABLED_DEFAULT
        See Also:
        Constant Field Values
      • DUALSTACK_ENABLED

        public static final java.lang.String DUALSTACK_ENABLED
        Determines if S3 client will use the Dualstack Mode, default to false.

        For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/dual-stack-endpoints.html

        See Also:
        Constant Field Values
      • DUALSTACK_ENABLED_DEFAULT

        public static final boolean DUALSTACK_ENABLED_DEFAULT
        See Also:
        Constant Field Values
      • ACCESS_POINTS_PREFIX

        public static final java.lang.String ACCESS_POINTS_PREFIX
        Used by S3FileIO, prefix used for bucket access point configuration. To set, we can pass a catalog property.

        For more details, see https://aws.amazon.com/s3/features/access-points/

        Example: s3.access-points.my-bucket=access-point

        See Also:
        Constant Field Values
      • PRELOAD_CLIENT_ENABLED

        public static final java.lang.String PRELOAD_CLIENT_ENABLED
        This flag controls whether the S3 client will be initialized during the S3FileIO initialization, instead of default lazy initialization upon use. This is needed for cases that the credentials to use might change and needs to be preloaded.
        See Also:
        Constant Field Values
      • PRELOAD_CLIENT_ENABLED_DEFAULT

        public static final boolean PRELOAD_CLIENT_ENABLED_DEFAULT
        See Also:
        Constant Field Values
    • Constructor Detail

      • S3FileIOProperties

        public S3FileIOProperties()
      • S3FileIOProperties

        public S3FileIOProperties​(java.util.Map<java.lang.String,​java.lang.String> properties)
    • Method Detail

      • sseType

        public java.lang.String sseType()
      • setSseType

        public void setSseType​(java.lang.String sseType)
      • sseKey

        public java.lang.String sseKey()
      • setSseKey

        public void setSseKey​(java.lang.String sseKey)
      • deleteBatchSize

        public int deleteBatchSize()
      • setDeleteBatchSize

        public void setDeleteBatchSize​(int deleteBatchSize)
      • sseMd5

        public java.lang.String sseMd5()
      • setSseMd5

        public void setSseMd5​(java.lang.String sseMd5)
      • multipartUploadThreads

        public int multipartUploadThreads()
      • setMultipartUploadThreads

        public void setMultipartUploadThreads​(int threads)
      • multiPartSize

        public int multiPartSize()
      • setMultiPartSize

        public void setMultiPartSize​(int size)
      • multipartThresholdFactor

        public double multipartThresholdFactor()
      • setMultipartThresholdFactor

        public void setMultipartThresholdFactor​(double factor)
      • stagingDirectory

        public java.lang.String stagingDirectory()
      • setStagingDirectory

        public void setStagingDirectory​(java.lang.String directory)
      • acl

        public software.amazon.awssdk.services.s3.model.ObjectCannedACL acl()
      • setAcl

        public void setAcl​(software.amazon.awssdk.services.s3.model.ObjectCannedACL acl)
      • isPreloadClientEnabled

        public boolean isPreloadClientEnabled()
      • setPreloadClientEnabled

        public void setPreloadClientEnabled​(boolean preloadClientEnabled)
      • isDualStackEnabled

        public boolean isDualStackEnabled()
      • isPathStyleAccess

        public boolean isPathStyleAccess()
      • isUseArnRegionEnabled

        public boolean isUseArnRegionEnabled()
      • isAccelerationEnabled

        public boolean isAccelerationEnabled()
      • isChecksumEnabled

        public boolean isChecksumEnabled()
      • isRemoteSigningEnabled

        public boolean isRemoteSigningEnabled()
      • endpoint

        public java.lang.String endpoint()
      • setChecksumEnabled

        public void setChecksumEnabled​(boolean eTagCheckEnabled)
      • writeTags

        public java.util.Set<software.amazon.awssdk.services.s3.model.Tag> writeTags()
      • writeTableTagEnabled

        public boolean writeTableTagEnabled()
      • setWriteTableTagEnabled

        public void setWriteTableTagEnabled​(boolean s3WriteTableNameTagEnabled)
      • isWriteNamespaceTagEnabled

        public boolean isWriteNamespaceTagEnabled()
      • setWriteNamespaceTagEnabled

        public void setWriteNamespaceTagEnabled​(boolean writeNamespaceTagEnabled)
      • deleteTags

        public java.util.Set<software.amazon.awssdk.services.s3.model.Tag> deleteTags()
      • deleteThreads

        public int deleteThreads()
      • setDeleteThreads

        public void setDeleteThreads​(int threads)
      • isDeleteEnabled

        public boolean isDeleteEnabled()
      • setDeleteEnabled

        public void setDeleteEnabled​(boolean deleteEnabled)
      • bucketToAccessPointMapping

        public java.util.Map<java.lang.String,​java.lang.String> bucketToAccessPointMapping()
      • accessKeyId

        public java.lang.String accessKeyId()
      • secretAccessKey

        public java.lang.String secretAccessKey()
      • sessionToken

        public java.lang.String sessionToken()
      • writeStorageClass

        public java.lang.String writeStorageClass()
      • applyCredentialConfigurations

        public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applyCredentialConfigurations​(AwsClientProperties awsClientProperties,
                                                                                                                 T builder)
      • applyServiceConfigurations

        public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applyServiceConfigurations​(T builder)
        Configure services settings for an S3 client. The settings include: s3DualStack, s3UseArnRegion, s3PathStyleAccess, and s3Acceleration

        Sample usage:

             S3Client.builder().applyMutation(s3FileIOProperties::applyS3ServiceConfigurations)
         
      • applySignerConfiguration

        public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applySignerConfiguration​(T builder)
        Configure a signer for an S3 client.

        Sample usage:

             S3Client.builder().applyMutation(s3FileIOProperties::applyS3SignerConfiguration)
         
      • applyEndpointConfigurations

        public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applyEndpointConfigurations​(T builder)
        Override the endpoint for an S3 client.

        Sample usage:

             S3Client.builder().applyMutation(s3FileIOProperties::applyEndpointConfigurations)