S3 Advanced Features
Access Control Lists in S3¶
This chapter describes how you can set up specific Access Control Lists (from now on called ACLs) to control access to specific buckets or objects from other S3 users in the platform.
Reasons for doing this could be:
- Give public access to a specific bucket or object
- Give another S3 user read or read/write access to a specific bucket, objects or subfolder of a bucket.
Naming Convention in this document¶
- Bucket Owner is a S3 account holder which has an object storage bucket intended for sharing to another S3 account holder in the same platform.
- Bucket User which is a S3 account holder who wants to gain access to the Bucket Owner's bucket
- Buckets Owner's Project - the Bucket Owner's Project ID
- Bucket User's Project - the Bucket User's Project ID
In the examples below words written with capital letters, such as BUCKETOWNERPROJECT, are variables that should be replaced with values matching your use-case.
Setting up S3 clients¶
The examples in this document use both s3cmd and aws-cli. See the respective configuration guides for installation and setup:
Note
aws CLI sometimes can be cryptic with error messsages. If querying a property with aws CLI and that specific query returns the following:
argument of type 'NoneType' is not iterable
To get the credentials needed for either tool, follow the steps in Get S3 credentials. The important variables are Project ID, S3 URL (without the https:// prefix), EC2 Access Key, and EC2 Secret Key.
For the ACL and policy examples below, you will need credentials for both the bucket owner and the bucket user. With s3cmd, this means creating two separate configuration files. In the examples below we have named them owner-s3.cfg and user-s3.cfg.
Setting up policies¶
Bucket policies are expressed as JSON-files with a specific format:
{
"Version": "2012-10-17",
"Id": "POLICY_NAME",
"Statement": [
{
"Sid": "STATEMENT_NAME",
"Effect": "EFFECT",
"Principal": {
"AWS": "arn:aws:iam::PROJECT_ID:root"
},
"Action": [
"ACTION_1",
"ACTION_2"
],
"Resource": [
"arn:aws:s3:::KEY_SPECIFICATION"
]
}
]
}
| Key | Value |
|---|---|
| Version | "2012-10-17", cannot be changed |
| Id | arbitrary policy name |
| Statement | a list of statements |
| Statement.Sid | arbitrary statement name |
| Statement.Effect | Allowed values: "Allow" or "Deny" |
| Statement.Principal | On or more accounts specified in Amazon arn format: |
| "AWS": ["arn:aws:iam::FIRST_PROJECT_ID:root","arn:aws:iam::SECOND_PROJECT_ID:root"] | |
| Statement.Action | One or more actions that the policy should apply to. For a complete list of actions |
| see here | |
| Statement.Resource | Specifies to which resources the policy should be applied. Could be one of: |
| "arn:aws:s3:::*" - the bucket and its all objects | |
| "arn:aws:s3:::mybucket/*" - all objects of mybucket | |
| "arn:aws:s3:::mybucket/myfolder/*" - all objects which are subkeys to | |
| myfolder in mybucket. |
Applying and inspecting policies¶
Let's say that you have created a policy file called policy.json. To let the owner apply this policy the following command is used:
s3cmd -c owner-s3.cfg setpolicy policy.json s3://sharedbucket
aws --endpoint=$S3_URL s3api put-bucket-policy --bucket sharedbucket --policy file://policy.json
To view the current policies on the bucket issue the following:
s3cmd -c owner-s3.cfg info s3://sharedbucket
aws --endpoint=$S3_URL s3api get-bucket-policy --bucket sharedbucket
To delete a policy from a bucket, use the following:
s3cmd -c owner-s3.cfg delpolicy s3://sharedbucket
aws --endpoint=$S3_URL s3api delete-bucket-policy --bucket sharedbucket
Now you are good to go to start writing your policies!
Sample policies¶
Grant another user read and write access to a bucket¶
Create a file rw-policy.json with the following contents. You need to replace the BUCKET_OWNER_PROJECT_ID and BUCKET_USER_PROJECT_ID with the values you fetched from the portal above.
{
"Version": "2012-10-17",
"Id": "read-write",
"Statement": [
{
"Sid": "project-read-write",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::BUCKET_OWNER_PROJECT_ID:root",
"arn:aws:iam::BUCKET_USER_PROJECT_ID:root"
]
},
"Action": [
"s3:ListBucket",
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::*"
]
}
]
}
s3cmd -c owner-s3.cfg setpolicy rw-policy.json s3://sharedbucket
aws --endpoint=$S3_URL s3api put-bucket-policy --bucket sharedbucket --policy file://rw-policy.json
The owner now has to send its Project ID, and the name of the bucket to the user.
To list the contents of the bucket, the bucket user should issue:
s3cmd -c user-s3.cfg ls s3://BUCKET_OWNER_PROJECT_ID:sharedbucket
aws --endpoint=$S3_URL s3api list-objects --bucket BUCKET_OWNER_PROJECT_ID:sharedbucket
The user now should see a listing of the contents of the bucket.
Grant any user read access to a bucket¶
Create a file called all-read-policy.json:
{
"Version": "2012-10-17",
"Id": "policy-read-any",
"Statement": [
{
"Sid": "read-any",
"Effect": "Allow",
"Principal": {
"AWS": [
"*"
]
},
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::*"
]
}
]
}
s3cmd -c owner-s3.cfg setpolicy all-read-policy.json s3://sharedbucket
aws --endpoint=$S3_URL s3api put-bucket-policy --bucket sharedbucket --policy file://all-read-policy.json
Users from other projects (which has the owners Project ID) now can access the contents of the bucket, for instance the file testfile.
s3cmd -c user-s3.cfg get s3://BUCKET_OWNER_PROJECT_ID:sharedbucket/testfile
aws --endpoint=$S3_URL s3api get-object --bucket BUCKET_OWNER_PROJECT_ID:sharedbucket --key testfile testfile
Grant one user full access and another read access¶
Policies can also be combined like this. The example below will give FIRST_USER full access to the Owners bucket, and SECOND_USER read access::
{
"Version": "2012-10-17",
"Id": "complex-policy",
"Statement": [
{
"Sid": "project-write",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::BUCKET_OWNER_PROJECT_ID:root",
"arn:aws:iam::FIRST_BUCKET_USER_PROJECT_ID:root"
]
},
"Action": [
"s3:ListBucket",
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::mysharedbucket/mysharedfolder/*"
]
},
{
"Sid": "project-read",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::SECOND_BUCKET_USER_PROJECT_ID:root"
},
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::mysharedbucket/mysharedfolder/*"
]
}
]
}
The owner applies the policy like above with "setpolicy" or "put-bucket-policy".
The first user now can upload a file to the bucket:
s3cmd -c first-user-project-s3cfg put productlist.db s3://BUCKET_OWNER_PROJECT_ID:mysharedbucket/mysharedfolder/
aws --endpoint=$S3_URL s3api put-object --bucket BUCKET_OWNER_PROJECT_ID:mysharedbucket --key mysharedfolder/productlist.db --body productlist.db
The the second user can download the same file, but will not be able to upload anything:
s3cmd -c second-user-project-s3cfg get s3://BUCKET_OWNER_PROJECT_ID:mysharedbucket/mysharedfolder/productlist.db
aws --endpoint=$S3_URL s3api get-object --bucket BUCKET_OWNER_PROJECT_ID:mysharedbucket --key mysharedfolder/productlist.db productlist.db
Accessing a publicly available file over HTTPS¶
It is possible to configure an object to be publicly available, and reachable over HTTPS. Below are the most common commands to alter the ACLs on an object or a bucket.
You may choose to remove --recursive if is required only for the bucket or folder and not for objects within.
s3cmd setacl --acl-private --recursive s3://mybucket-name s3cmd setacl --acl-private --recursive s3://mybucket-name/folder-name s3cmd setacl --acl-private --recursive s3://mybucket-name/folder-name/object-name s3cmd setacl --acl-public --recursive s3://mybucket-name s3cmd setacl --acl-public --recursive s3://mybucket-name/folder-name s3cmd setacl --acl-public --recursive s3://mybucket-name/folder-name/object-name
# Set private ACL on a bucket aws --endpoint=$S3_URL s3api put-bucket-acl --bucket mybucket-name --acl private # Set public-read ACL on a bucket aws --endpoint=$S3_URL s3api put-bucket-acl --bucket mybucket-name --acl public-read # Set private ACL on an object aws --endpoint=$S3_URL s3api put-object-acl --bucket mybucket-name --key object-name --acl private # Set public-read ACL on an object aws --endpoint=$S3_URL s3api put-object-acl --bucket mybucket-name --key object-name --acl public-read
The first three commands is to restrict public access, and the three last is to enable it. There are two variables you need to access a publicly available object over HTTPS:
- The S3_URL, which can be found in the "View Credentials" dialogue (see picture above) or in the welcome mail you got when you got onboarded. Most common values for this is s3.sto1.safedc.net or s3.osl2.safedc.net.
- The PROJECT_ID, which can be found in the "View Credentials" dialogue.
Once you got these variables you, and have set the bucket or objects to public with one of the commands above the URL for reaching the object will be:
https://<S3_URL>/<PROJECT_ID>:bucket/object-name
https://s3.sto1.safedc.net/ABC123:bucket/object-name
Using presigned URLs¶
It is possible to generate URLs to object which are presigned with a time limit how long the link is valid. This way it is possible to grant temporary access to single objects without the need to use policies.
When you generated your s3.cfg file above there where two lines that maybe did not make any sense then:
public_url_use_https = True signurl_use_https = True
Let's say that the owner has a configuration file called owner-s3.cfg where those variables are set and wants to create a pre-signed url for the object s3://bucket/testfile which is valid for 24 hours (or 86400 seconds).
The command to issue is the following:
s3cmd -c owner-s3.cfg signurl s3://bucket/testfile +86400
aws --endpoint=$S3_URL s3 presign s3://bucket/testfile --expires-in 86400
The command will return an URL that you can send to anyone and will be valid for 24 hours from now. With s3cmd, you can also skip +86400 to make the URL permanent.
Block Public Access¶
Block Public Access provides bucket-level settings to prevent public access to your data. When enabled, it overrides any ACLs or bucket policies that would otherwise grant public access.
To enable Block Public Access on a bucket:
aws --endpoint=$S3_URL s3api put-public-access-block \
--bucket mybucket \
--public-access-block-configuration \
'{"BlockPublicAcls":true,"IgnorePublicAcls":true,"BlockPublicPolicy":true,"RestrictPublicBuckets":true}'
To check the current Block Public Access settings:
aws --endpoint=$S3_URL s3api get-public-access-block --bucket mybucket
To remove Block Public Access settings:
aws --endpoint=$S3_URL s3api delete-public-access-block --bucket mybucket
CORS configuration¶
Cross-Origin Resource Sharing (CORS) allows web applications running in a browser to make requests to your S3 bucket from a different domain. This is necessary if you want to access objects directly from client-side JavaScript.
To set a CORS configuration using s3cmd, create an XML file called cors.xml:
<CORSConfiguration> <CORSRule> <AllowedOrigin>https://example.com</AllowedOrigin> <AllowedMethod>GET</AllowedMethod> <AllowedMethod>PUT</AllowedMethod> <AllowedHeader>*</AllowedHeader> </CORSRule> </CORSConfiguration>
Apply the CORS configuration:
s3cmd setcors cors.xml s3://mybucket
Note
When using aws-cli, the CORS configuration must be in JSON format instead of XML:
{ "CORSRules": [ { "AllowedOrigins": ["https://example.com"], "AllowedMethods": ["GET", "PUT"], "AllowedHeaders": ["*"] } ] }
aws --endpoint=$S3_URL s3api put-bucket-cors --bucket mybucket --cors-configuration file://cors.json
To view the current CORS configuration:
s3cmd info s3://mybucket
aws --endpoint=$S3_URL s3api get-bucket-cors --bucket mybucket
To delete the CORS configuration:
s3cmd delcors s3://mybucket
aws --endpoint=$S3_URL s3api delete-bucket-cors --bucket mybucket
Bucket and object tagging¶
Tags are key-value pairs that can be attached to buckets and objects. They are useful for categorizing and organizing resources, for instance by environment, department or purpose.
Bucket tagging¶
To set tags on a bucket:
s3cmd settagging s3://mybucket "env=production&department=finance"
aws --endpoint=$S3_URL s3api put-bucket-tagging --bucket mybucket \ --tagging '{"TagSet":[{"Key":"env","Value":"production"},{"Key":"department","Value":"finance"}]}'
To view the tags:
s3cmd gettagging s3://mybucket
aws --endpoint=$S3_URL s3api get-bucket-tagging --bucket mybucket
To remove all tags:
s3cmd deltagging s3://mybucket
aws --endpoint=$S3_URL s3api delete-bucket-tagging --bucket mybucket
Object tagging¶
Tags can also be set on individual objects:
s3cmd settagging s3://mybucket/myfile.txt "classification=internal&retention=90days"
aws --endpoint=$S3_URL s3api put-object-tagging --bucket mybucket --key myfile.txt \ --tagging '{"TagSet":[{"Key":"classification","Value":"internal"},{"Key":"retention","Value":"90days"}]}'
To view object tags:
s3cmd gettagging s3://mybucket/myfile.txt
aws --endpoint=$S3_URL s3api get-object-tagging --bucket mybucket --key myfile.txt
To delete object tags:
s3cmd deltagging s3://mybucket/myfile.txt
aws --endpoint=$S3_URL s3api delete-object-tagging --bucket mybucket --key myfile.txt
Lifecycle management¶
Lifecycle rules allow you to automatically delete objects after a specified number of days. This is useful for managing temporary data, logs, or other objects that should not be kept indefinitely.
To set a lifecycle policy, create an XML file called lifecycle.xml:
<LifecycleConfiguration> <Rule> <ID>DeleteOldLogs</ID> <Prefix>logs/</Prefix> <Status>Enabled</Status> <Expiration> <Days>90</Days> </Expiration> </Rule> </LifecycleConfiguration>
Apply the lifecycle policy:
s3cmd setlifecycle lifecycle.xml s3://mybucket
aws --endpoint=$S3_URL s3api put-bucket-lifecycle-configuration --bucket mybucket \ --lifecycle-configuration '{"Rules":[{"ID":"DeleteOldLogs","Prefix":"logs/","Status":"Enabled","Expiration":{"Days":90}}]}'
To view the current lifecycle configuration:
s3cmd getlifecycle s3://mybucket
aws --endpoint=$S3_URL s3api get-bucket-lifecycle-configuration --bucket mybucket
To remove the lifecycle policy:
s3cmd dellifecycle s3://mybucket
aws --endpoint=$S3_URL s3api delete-bucket-lifecycle --bucket mybucket
Note
Lifecycle transition rules (moving objects between storage classes) are not supported on Safespring. Only expiration rules are available.
S3 Select¶
S3 Select allows you to run SQL queries directly on objects stored in S3 without downloading the entire object. This is useful for extracting specific data from large CSV, JSON, or Parquet files.
For example, given a CSV file data.csv with the columns name, department and salary:
aws --endpoint=$S3_URL s3api select-object-content \
--bucket mybucket \
--key data.csv \
--expression "SELECT name, salary FROM s3object s WHERE s.department = 'engineering'" \
--expression-type SQL \
--input-serialization '{"CSV":{"FileHeaderInfo":"USE"}}' \
--output-serialization '{"CSV":{}}' \
output.csv
In Safesprings S3 solution S3 Select only supports the CSV format. JSON or Parquet are not supported.
Batch delete¶
Multiple objects can be deleted in a single API call using the batch delete operation. This is more efficient than deleting objects one by one.
aws --endpoint=$S3_URL s3api delete-objects \
--bucket mybucket \
--delete '{"Objects":[{"Key":"file1.txt"},{"Key":"file2.txt"},{"Key":"folder/file3.txt"}]}'
With s3cmd, recursive delete can be used to remove all objects under a prefix:
s3cmd del --recursive s3://mybucket/folder/
aws --endpoint=$S3_URL s3 rm --recursive s3://mybucket/folder/