Class S3FileIOProperties

java.lang.Object
org.apache.iceberg.aws.s3.S3FileIOProperties
All Implemented Interfaces:
Serializable

public class S3FileIOProperties extends Object implements Serializable
See Also:
  • Field Details

    • CLIENT_FACTORY

      public static final String CLIENT_FACTORY
      This property is used to pass in the aws client factory implementation class for S3 FileIO. The class should implement S3FileIOAwsClientFactory. For example, DefaultS3FileIOAwsClientFactory implements S3FileIOAwsClientFactory. If this property wasn't set, will load one of AwsClientFactory factory classes to provide backward compatibility.
      See Also:
    • S3_ACCESS_GRANTS_ENABLED

      public static final String S3_ACCESS_GRANTS_ENABLED
      This property is used to enable using the S3 Access Grants product to control authorization to S3 data. More information regarding this feature can be found at: https://aws.amazon.com/s3/features/access-grants/.
      See Also:
    • S3_ACCESS_GRANTS_ENABLED_DEFAULT

      public static final boolean S3_ACCESS_GRANTS_ENABLED_DEFAULT
      See Also:
    • S3_ACCESS_GRANTS_FALLBACK_TO_IAM_ENABLED

      public static final String S3_ACCESS_GRANTS_FALLBACK_TO_IAM_ENABLED
      The fallback-to-iam property allows users to customize whether or not they would like their jobs fall back to the Job Execution IAM role in case they get an Access Denied from the S3 Access Grants call. Further documentation regarding this flag can be found in the S3 Access Grants Plugin GitHub:

      For more details, see: https://github.com/aws/aws-s3-accessgrants-plugin-java-v2

      See Also:
    • S3_ACCESS_GRANTS_FALLBACK_TO_IAM_ENABLED_DEFAULT

      public static final boolean S3_ACCESS_GRANTS_FALLBACK_TO_IAM_ENABLED_DEFAULT
      See Also:
    • SSE_TYPE

      public static final String SSE_TYPE
      Type of S3 Server side encryption used, default to SSE_TYPE_NONE.

      For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html

      See Also:
    • SSE_TYPE_NONE

      public static final String SSE_TYPE_NONE
      No server side encryption.
      See Also:
    • SSE_TYPE_KMS

      public static final String SSE_TYPE_KMS
      S3 SSE-KMS encryption.

      For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html

      See Also:
    • DSSE_TYPE_KMS

      public static final String DSSE_TYPE_KMS
      S3 DSSE-KMS encryption.

      For more details: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingDSSEncryption.html

      See Also:
    • SSE_TYPE_S3

      public static final String SSE_TYPE_S3
      S3 SSE-S3 encryption.

      For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html

      See Also:
    • SSE_TYPE_CUSTOM

      public static final String SSE_TYPE_CUSTOM
      S3 SSE-C encryption.

      For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html

      See Also:
    • SSE_KEY

      public static final String SSE_KEY
      If S3 encryption type is SSE-KMS or DSSE-KMS, input is a KMS Key ID or ARN. In case this property is not set, default key "aws/s3" is used. If encryption type is SSE-C, input is a custom base-64 AES256 symmetric key.
      See Also:
    • SSE_MD5

      public static final String SSE_MD5
      If S3 encryption type is SSE-C, input is the base-64 MD5 digest of the secret key. This MD5 must be explicitly passed in by the caller to ensure key integrity.
      See Also:
    • MULTIPART_UPLOAD_THREADS

      public static final String MULTIPART_UPLOAD_THREADS
      Number of threads to use for uploading parts to S3 (shared pool across all output streams), default to Runtime.availableProcessors()
      See Also:
    • MULTIPART_SIZE

      public static final String MULTIPART_SIZE
      The size of a single part for multipart upload requests in bytes (default: 32MB). based on S3 requirement, the part size must be at least 5MB. To ensure performance of the reader and writer, the part size must be less than 2GB.

      For more details, see https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html

      See Also:
    • MULTIPART_SIZE_DEFAULT

      public static final int MULTIPART_SIZE_DEFAULT
      See Also:
    • MULTIPART_SIZE_MIN

      public static final int MULTIPART_SIZE_MIN
      See Also:
    • MULTIPART_THRESHOLD_FACTOR

      public static final String MULTIPART_THRESHOLD_FACTOR
      The threshold expressed as a factor times the multipart size at which to switch from uploading using a single put object request to uploading using multipart upload (default: 1.5).
      See Also:
    • MULTIPART_THRESHOLD_FACTOR_DEFAULT

      public static final double MULTIPART_THRESHOLD_FACTOR_DEFAULT
      See Also:
    • STAGING_DIRECTORY

      public static final String STAGING_DIRECTORY
      Location to put staging files for upload to S3, default to temp directory set in java.io.tmpdir.
      See Also:
    • ACL

      public static final String ACL
      Used to configure canned access control list (ACL) for S3 client to use during write. If not set, ACL will not be set for requests.

      The input must be one of ObjectCannedACL, such as 'public-read-write' For more details: https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html

      See Also:
    • ENDPOINT

      public static final String ENDPOINT
      Configure an alternative endpoint of the S3 service for S3FileIO to access.

      This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud.

      See Also:
    • PATH_STYLE_ACCESS

      public static final String PATH_STYLE_ACCESS
      If set true, requests to S3FileIO will use Path-Style, otherwise, Virtual Hosted-Style will be used.

      For more details: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html

      See Also:
    • PATH_STYLE_ACCESS_DEFAULT

      public static final boolean PATH_STYLE_ACCESS_DEFAULT
      See Also:
    • ACCESS_KEY_ID

      public static final String ACCESS_KEY_ID
      Configure the static access key ID used to access S3FileIO.

      When set, the default client factory will use the basic or session credentials provided instead of reading the default credential chain to create S3 access credentials. If SESSION_TOKEN is set, session credential is used, otherwise basic credential is used.

      See Also:
    • SECRET_ACCESS_KEY

      public static final String SECRET_ACCESS_KEY
      Configure the static secret access key used to access S3FileIO.

      When set, the default client factory will use the basic or session credentials provided instead of reading the default credential chain to create S3 access credentials. If SESSION_TOKEN is set, session credential is used, otherwise basic credential is used.

      See Also:
    • SESSION_TOKEN

      public static final String SESSION_TOKEN
      Configure the static session token used to access S3FileIO.

      When set, the default client factory will use the session credentials provided instead of reading the default credential chain to create S3 access credentials.

      See Also:
    • USE_ARN_REGION_ENABLED

      public static final String USE_ARN_REGION_ENABLED
      Enable to make S3FileIO, to make cross-region call to the region specified in the ARN of an access point.

      By default, attempting to use an access point in a different region will throw an exception. When enabled, this property allows using access points in other regions.

      For more details see: https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3Configuration.html#useArnRegionEnabled--

      See Also:
    • USE_ARN_REGION_ENABLED_DEFAULT

      public static final boolean USE_ARN_REGION_ENABLED_DEFAULT
      See Also:
    • CHECKSUM_ENABLED

      public static final String CHECKSUM_ENABLED
      Enables eTag checks for S3 PUT and MULTIPART upload requests.
      See Also:
    • CHECKSUM_ENABLED_DEFAULT

      public static final boolean CHECKSUM_ENABLED_DEFAULT
      See Also:
    • REMOTE_SIGNING_ENABLED

      public static final String REMOTE_SIGNING_ENABLED
      See Also:
    • REMOTE_SIGNING_ENABLED_DEFAULT

      public static final boolean REMOTE_SIGNING_ENABLED_DEFAULT
      See Also:
    • DELETE_BATCH_SIZE

      public static final String DELETE_BATCH_SIZE
      Configure the batch size used when deleting multiple files from a given S3 bucket
      See Also:
    • DELETE_BATCH_SIZE_DEFAULT

      public static final int DELETE_BATCH_SIZE_DEFAULT
      Default batch size used when deleting files.

      Refer to https://github.com/apache/hadoop/commit/56dee667707926f3796c7757be1a133a362f05c9 for more details on why this value was chosen.

      See Also:
    • DELETE_BATCH_SIZE_MAX

      public static final int DELETE_BATCH_SIZE_MAX
      Max possible batch size for deletion. Currently, a max of 1000 keys can be deleted in one batch. https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html
      See Also:
    • WRITE_TAGS_PREFIX

      public static final String WRITE_TAGS_PREFIX
      Used by S3FileIO to tag objects when writing. To set, we can pass a catalog property.

      For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html

      Example: s3.write.tags.my_key=my_val

      See Also:
    • WRITE_TABLE_TAG_ENABLED

      public static final String WRITE_TABLE_TAG_ENABLED
      Used by GlueCatalog to tag objects when writing. To set, we can pass a catalog property.

      For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html

      Example: s3.write.table-tag-enabled=true

      See Also:
    • WRITE_TABLE_TAG_ENABLED_DEFAULT

      public static final boolean WRITE_TABLE_TAG_ENABLED_DEFAULT
      See Also:
    • WRITE_STORAGE_CLASS

      public static final String WRITE_STORAGE_CLASS
      Used by S3FileIO to tag objects' storage class when writing. To set, we can pass a catalog property. After set, x-amz-storage-class header will be set to this property

      For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html

      Example: s3.write.storage-class=INTELLIGENT_TIERING

      See Also:
    • WRITE_NAMESPACE_TAG_ENABLED

      public static final String WRITE_NAMESPACE_TAG_ENABLED
      Used by GlueCatalog to tag objects when writing. To set, we can pass a catalog property.

      For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html

      Example: s3.write.namespace-tag-enabled=true

      See Also:
    • WRITE_NAMESPACE_TAG_ENABLED_DEFAULT

      public static final boolean WRITE_NAMESPACE_TAG_ENABLED_DEFAULT
      See Also:
    • S3_TAG_ICEBERG_TABLE

      public static final String S3_TAG_ICEBERG_TABLE
      Tag name that will be used by WRITE_TAGS_PREFIX when WRITE_TABLE_TAG_ENABLED is enabled

      Example: iceberg.table=tableName

      See Also:
    • S3_TAG_ICEBERG_NAMESPACE

      public static final String S3_TAG_ICEBERG_NAMESPACE
      Tag name that will be used by WRITE_TAGS_PREFIX when WRITE_NAMESPACE_TAG_ENABLED is enabled

      Example: iceberg.namespace=namespaceName

      See Also:
    • DELETE_TAGS_PREFIX

      public static final String DELETE_TAGS_PREFIX
      Used by S3FileIO to tag objects when deleting. When this config is set, objects are tagged with the configured key-value pairs before deletion. This is considered a soft-delete, because users are able to configure tag-based object lifecycle policy at bucket level to transition objects to different tiers.

      For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html

      Example: s3.delete.tags.my_key=my_val

      See Also:
    • DELETE_THREADS

      public static final String DELETE_THREADS
      Number of threads to use for adding delete tags to S3 objects, default to Runtime.availableProcessors()
      See Also:
    • DELETE_ENABLED

      public static final String DELETE_ENABLED
      Determines if S3FileIO deletes the object when io.delete() is called, default to true. Once disabled, users are expected to set tags through DELETE_TAGS_PREFIX and manage deleted files through S3 lifecycle policy.
      See Also:
    • DELETE_ENABLED_DEFAULT

      public static final boolean DELETE_ENABLED_DEFAULT
      See Also:
    • ACCELERATION_ENABLED

      public static final String ACCELERATION_ENABLED
      Determines if S3 client will use the Acceleration Mode, default to false.

      For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration.html

      See Also:
    • ACCELERATION_ENABLED_DEFAULT

      public static final boolean ACCELERATION_ENABLED_DEFAULT
      See Also:
    • DUALSTACK_ENABLED

      public static final String DUALSTACK_ENABLED
      Determines if S3 client will use the Dualstack Mode, default to false.

      For more details, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/dual-stack-endpoints.html

      See Also:
    • DUALSTACK_ENABLED_DEFAULT

      public static final boolean DUALSTACK_ENABLED_DEFAULT
      See Also:
    • CROSS_REGION_ACCESS_ENABLED

      public static final String CROSS_REGION_ACCESS_ENABLED
      Determines if S3 client will allow Cross-Region bucket access, default to false.

      For more details, see https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/s3-cross-region.html

      See Also:
    • CROSS_REGION_ACCESS_ENABLED_DEFAULT

      public static final boolean CROSS_REGION_ACCESS_ENABLED_DEFAULT
      See Also:
    • ACCESS_POINTS_PREFIX

      public static final String ACCESS_POINTS_PREFIX
      Used by S3FileIO, prefix used for bucket access point configuration. To set, we can pass a catalog property.

      For more details, see https://aws.amazon.com/s3/features/access-points/

      Example: s3.access-points.my-bucket=access-point

      See Also:
    • PRELOAD_CLIENT_ENABLED

      public static final String PRELOAD_CLIENT_ENABLED
      This flag controls whether the S3 client will be initialized during the S3FileIO initialization, instead of default lazy initialization upon use. This is needed for cases that the credentials to use might change and needs to be preloaded.
      See Also:
    • PRELOAD_CLIENT_ENABLED_DEFAULT

      public static final boolean PRELOAD_CLIENT_ENABLED_DEFAULT
      See Also:
    • S3_RETRY_NUM_RETRIES

      public static final String S3_RETRY_NUM_RETRIES
      Number of times to retry S3 operations.
      See Also:
    • S3_RETRY_NUM_RETRIES_DEFAULT

      public static final int S3_RETRY_NUM_RETRIES_DEFAULT
      See Also:
    • S3_RETRY_MIN_WAIT_MS

      public static final String S3_RETRY_MIN_WAIT_MS
      Minimum wait time to retry a S3 operation
      See Also:
    • S3_RETRY_MIN_WAIT_MS_DEFAULT

      public static final long S3_RETRY_MIN_WAIT_MS_DEFAULT
      See Also:
    • S3_RETRY_MAX_WAIT_MS

      public static final String S3_RETRY_MAX_WAIT_MS
      Maximum wait time to retry a S3 read operation
      See Also:
    • S3_RETRY_MAX_WAIT_MS_DEFAULT

      public static final long S3_RETRY_MAX_WAIT_MS_DEFAULT
      See Also:
    • S3_DIRECTORY_BUCKET_LIST_PREFIX_AS_DIRECTORY

      public static final String S3_DIRECTORY_BUCKET_LIST_PREFIX_AS_DIRECTORY
      Controls whether to list prefixes as directories for S3 Directory buckets Defaults value is true, where it will add the "/"

      Example: s3://bucket/prefix will be shown as s3://bucket/prefix/

      For more details see delimiter section in: https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html#API_ListObjectsV2_RequestSyntax

      If set to false, this will throw an error when the "/" is not provided for directory bucket. Turn off this feature if you are using S3FileIO.listPrefix for listing bucket prefixes that are not directories. This would ensure correctness and fail the operation based on S3 requirement when listing against a non-directory prefix in a directory bucket.

      See Also:
    • S3_DIRECTORY_BUCKET_LIST_PREFIX_AS_DIRECTORY_DEFAULT

      public static final boolean S3_DIRECTORY_BUCKET_LIST_PREFIX_AS_DIRECTORY_DEFAULT
      See Also:
  • Constructor Details

    • S3FileIOProperties

      public S3FileIOProperties()
    • S3FileIOProperties

      public S3FileIOProperties(Map<String,String> properties)
  • Method Details

    • sseType

      public String sseType()
    • setSseType

      public void setSseType(String sseType)
    • sseKey

      public String sseKey()
    • setSseKey

      public void setSseKey(String sseKey)
    • deleteBatchSize

      public int deleteBatchSize()
    • setDeleteBatchSize

      public void setDeleteBatchSize(int deleteBatchSize)
    • sseMd5

      public String sseMd5()
    • setSseMd5

      public void setSseMd5(String sseMd5)
    • multipartUploadThreads

      public int multipartUploadThreads()
    • setMultipartUploadThreads

      public void setMultipartUploadThreads(int threads)
    • multiPartSize

      public int multiPartSize()
    • setMultiPartSize

      public void setMultiPartSize(int size)
    • multipartThresholdFactor

      public double multipartThresholdFactor()
    • setMultipartThresholdFactor

      public void setMultipartThresholdFactor(double factor)
    • stagingDirectory

      public String stagingDirectory()
    • setStagingDirectory

      public void setStagingDirectory(String directory)
    • acl

      public software.amazon.awssdk.services.s3.model.ObjectCannedACL acl()
    • setAcl

      public void setAcl(software.amazon.awssdk.services.s3.model.ObjectCannedACL acl)
    • isPreloadClientEnabled

      public boolean isPreloadClientEnabled()
    • setPreloadClientEnabled

      public void setPreloadClientEnabled(boolean preloadClientEnabled)
    • isDualStackEnabled

      public boolean isDualStackEnabled()
    • isCrossRegionAccessEnabled

      public boolean isCrossRegionAccessEnabled()
    • isPathStyleAccess

      public boolean isPathStyleAccess()
    • isUseArnRegionEnabled

      public boolean isUseArnRegionEnabled()
    • isAccelerationEnabled

      public boolean isAccelerationEnabled()
    • isChecksumEnabled

      public boolean isChecksumEnabled()
    • isRemoteSigningEnabled

      public boolean isRemoteSigningEnabled()
    • endpoint

      public String endpoint()
    • setChecksumEnabled

      public void setChecksumEnabled(boolean eTagCheckEnabled)
    • writeTags

      public Set<software.amazon.awssdk.services.s3.model.Tag> writeTags()
    • writeTableTagEnabled

      public boolean writeTableTagEnabled()
    • setWriteTableTagEnabled

      public void setWriteTableTagEnabled(boolean s3WriteTableNameTagEnabled)
    • isWriteNamespaceTagEnabled

      public boolean isWriteNamespaceTagEnabled()
    • setWriteNamespaceTagEnabled

      public void setWriteNamespaceTagEnabled(boolean writeNamespaceTagEnabled)
    • deleteTags

      public Set<software.amazon.awssdk.services.s3.model.Tag> deleteTags()
    • deleteThreads

      public int deleteThreads()
    • setDeleteThreads

      public void setDeleteThreads(int threads)
    • isDeleteEnabled

      public boolean isDeleteEnabled()
    • setDeleteEnabled

      public void setDeleteEnabled(boolean deleteEnabled)
    • bucketToAccessPointMapping

      public Map<String,String> bucketToAccessPointMapping()
    • accessKeyId

      public String accessKeyId()
    • secretAccessKey

      public String secretAccessKey()
    • sessionToken

      public String sessionToken()
    • writeStorageClass

      public String writeStorageClass()
    • isS3AccessGrantsEnabled

      public boolean isS3AccessGrantsEnabled()
    • setS3AccessGrantsEnabled

      public void setS3AccessGrantsEnabled(boolean s3AccessGrantsEnabled)
    • isS3AccessGrantsFallbackToIamEnabled

      public boolean isS3AccessGrantsFallbackToIamEnabled()
    • setS3AccessGrantsFallbackToIamEnabled

      public void setS3AccessGrantsFallbackToIamEnabled(boolean s3AccessGrantsFallbackToIamEnabled)
    • s3RetryNumRetries

      public int s3RetryNumRetries()
    • setS3RetryNumRetries

      public void setS3RetryNumRetries(int s3RetryNumRetries)
    • s3RetryMinWaitMs

      public long s3RetryMinWaitMs()
    • setS3RetryMinWaitMs

      public void setS3RetryMinWaitMs(long s3RetryMinWaitMs)
    • s3RetryMaxWaitMs

      public long s3RetryMaxWaitMs()
    • setS3RetryMaxWaitMs

      public void setS3RetryMaxWaitMs(long s3RetryMaxWaitMs)
    • s3RetryTotalWaitMs

      public long s3RetryTotalWaitMs()
    • isS3DirectoryBucketListPrefixAsDirectory

      public boolean isS3DirectoryBucketListPrefixAsDirectory()
    • setS3DirectoryBucketListPrefixAsDirectory

      public void setS3DirectoryBucketListPrefixAsDirectory(boolean s3DirectoryBucketListPrefixAsDirectory)
    • applyCredentialConfigurations

      public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applyCredentialConfigurations(AwsClientProperties awsClientProperties, T builder)
    • applyServiceConfigurations

      public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applyServiceConfigurations(T builder)
      Configure services settings for an S3 client. The settings include: s3DualStack, crossRegionAccessEnabled, s3UseArnRegion, s3PathStyleAccess, and s3Acceleration

      Sample usage:

           S3Client.builder().applyMutation(s3FileIOProperties::applyS3ServiceConfigurations)
       
    • applySignerConfiguration

      public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applySignerConfiguration(T builder)
      Configure a signer for an S3 client.

      Sample usage:

           S3Client.builder().applyMutation(s3FileIOProperties::applyS3SignerConfiguration)
       
    • applyEndpointConfigurations

      public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applyEndpointConfigurations(T builder)
      Override the endpoint for an S3 client.

      Sample usage:

           S3Client.builder().applyMutation(s3FileIOProperties::applyEndpointConfigurations)
       
    • applyRetryConfigurations

      public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applyRetryConfigurations(T builder)
      Override the retry configurations for an S3 client.

      Sample usage:

           S3Client.builder().applyMutation(s3FileIOProperties::applyRetryConfigurations)
       
    • applyS3AccessGrantsConfigurations

      public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applyS3AccessGrantsConfigurations(T builder)
      Add the S3 Access Grants Plugin for an S3 client.

      Sample usage:

           S3Client.builder().applyMutation(s3FileIOProperties::applyS3AccessGrantsConfigurations)
       
    • applyUserAgentConfigurations

      public <T extends software.amazon.awssdk.services.s3.S3ClientBuilder> void applyUserAgentConfigurations(T builder)