Example config for s3cmd¶

Getting s3cmd¶

Most linux distributions will have s3cmd in their repos, so a simple apt-get install s3cmd or yum install s3cmd (perhaps after adding the RH/CentOS EPEL repos) should suffice. Otherwise, the project website is http://s3tools.org/s3cmd

Creating a config file¶

s3cmd can be told to output a file containing all options in their current state, but it holds a lot of unneeded information so a simple start config could be as short as the following example and nothing else.

$HOME/.s3cfg:

[default]
access_key = Access_key_goes_here
secret_key = REDACTED
check_ssl_certificate = True
guess_mime_type = True
host_base = s3.sto1.safedc.net
host_bucket = s3.sto1.safedc.net
use_https = True

Simple s3cmd usage¶

When you have the program installed and a config file in place, you can make a bucket and place a file or two in there.

s3cmd mb s3://unique-bucket-name

s3cmd put localfile.txt s3://unique-bucket-name

upload: 'localfile.txt' -> 's3://unique-bucket-name'
524 of 524   100% in  1s    3.68 MB/s  done

s3cmd ls s3://unique-bucket-name

Other commands include sync to synchronize a local folder to S3 storage or the other way around:

s3cmd sync local/ s3://unique-bucket-name
s3cmd sync  s3://unique-bucket-name local/

Make files public¶

If you want to make a certain file publicly available, you can either add -P while uploading afterwards to publish the file.

Before uploading

s3cmd put -P file1 file2 s3://unique-bucket-name

After uploading

s3cmd setacl -P s3://unique-bucket-name/file1

Change to https and inject project id if necessary

The public upload will return a URL to the now-available file but with http://... which you should change to https://. Also, if you use s3 with credentials obtained from an Openstack project you must inject the project id of that project into the public url like this: https://<site>/<project-id>:<bucket-name>/<object-name>

Performance options in $HOME/.s3cfg¶

multipart_chunk_size_mb = 256

The multipart chunk size is tunable, it's there for people with shaky internet connections and mean that files larger than 256MB will be uploaded in pieces. If you see issues with resends, you can try lower sizes like 16MB. After all pieces are completed, an MD5 of the whole file will be compared against the checksum of the local file, to ensure the file was correctly uploaded.

We have uploaded lots of files of sizes above 1G without issues, and using larger chunks will allow s3cmd reach higher speeds at the cost of larger resends in case anything does happen during transport. The upper limit seems to be somewhere above 2G.

We will not allow unencrypted access, so the https and ssl options should be left as is.

The default http chunk size for each subpart seems low, defaulting to either 4096 or 65536 depending on s3cmd version. If you are on a decent network, you can bump these numbers also for some extra performance.

send_chunk = 262144
recv_chunk = 262144

If you have huge amounts of data to send, you can of course parallelize the whole operation by starting several s3cmd clients, perhaps handling one directory each or with separate --files-from=FILE input lists.

Other flags and options¶

s3cmd has subcommands for many different operations, not all of them applicable to our service, but among the useful ones is the command to synchronize a local folder over to an s3 bucket as a simple backup method.

In that scenario, you might run over the same files over and over and only send the newly arrived files which doesn't already exist at the s3 side.

In order to prevent s3cmd from re-calculating the local MD5 sums on every run, add a

--cache-file=/path/to/cache.md5s to the s3cmd invocation and it will note the timestamp of the files along with the MD5 sum so that it can skip that part on every upcoming run.

s3cmd --list-md5 ls s3://unique-bucket-name

will show the server side checksums, which are also stored alongside with each file. MD5 is not a 100% perfect checksum and intentionally crafted files having the same checksum has been created, so people needing more certainty are encouraged to run other algorithms and store the results alongside with the files for added confidence.