It seems to me like back in the day, all the companies we worked with shared files with FTP. Remember FTP? A surprising number of enterprise integrations patters depended on FTP and eventually SFTP.
Nowadays, it seems like many companies have moved to Amazon S3 to share information. This post is about using S3 securely and introduces a tool we’re working on to make it as easy as possible.
What is S3 and How Should We Use It?
S3 is like a file system that you can access via an API and https. Because there are numerous clients, and Amazon provides a number of features out of the box, it is an extremely robust solution. Consider the features:
- Auditing of access
- Managed access through IAM
- Lifecycles to delete old files
Typically we talk about buckets with S3 as discrete file systems. They are almost like different mapped drives. You can use a GUI client like this one from AWS:
We can also use a command line or build programs against the SDK. This lists our buckets:
aws s3 ls
Generally, S3 provides all the features I think most clients need. Sometimes it is not easy to use correctly though. In particular, access controls and encryption seem to be things that are tricky for people to use
A Use Case
Consider a use case where a company AwesomeAI is working on machine learning models that depend on very large csv data sets that they get and process for customers. Now assume that they are trying to get their partners to drop files in S3 to protect them.
We can address this use case 100% with native S3. AweomeAI can create a bucket with default encryption that only they can read out of. They can give BigDataRider access to write to the bucket. BigDataRider can write objects to the bucket and even set KMS key they want to use. In theory, BigDataRider can even bring their own KMS key. Meaning, even though it is AwesomeAI’s bucket, they can make it policy that BigDataRider can use their own KMS key.
aws s3 cp /filepath s3://mybucket/filename \ --sse aws:kms \ --sse-kms-key-id <key id>
Confused yet? Add in that there might be lots of additional customers and CustomDataSurfer shouldn’t be able to see or access any of the data BigDataRider pushes or that AwesomeAI can see.
Security in S3
So again, the things we care about:
- Size : We don’t want to have a problem because we have a big file. (Availability)
- Backup : We want the data to be backed up somewhere. (Availability)
- Versioning : This assures us that if files change, or are somehow tampered with, we can always get back the one we want. (Integrity, Availability)
- Auditing of access : Object access (PUT / GET) written to cloudtrail.
- Managed access through IAM : Control who can write, read from or list the bucket. Control what roles and KMS keys can be used.
- Encryption : Encyrpt files
- Lifecycles : Limit the time files are shared
aws-vault exec jemurai -- aws s3api get-bucket-policy --bucket amiexposed
Our Open Source Tool S3S3
So while we think S3 is awesome and feature rich, we also observe that it can be challenging to fully use securely. For the use case above, we built a simple open source tool S3S2 that will do a couple of things when you use it:
s3s2 share /directory/to/share
Or you can specify options at the command line:
s3s2 share --bucket the-bucket \ --pubKey the-public-key-of-the-receiver \ --awsKey the-name-of-the-kms-key \
This accomplishes the following goals:
- Allows us to capture a config of buckets and keys we want to use so that they will be used as a default.
- Allows AwesomeAI to create this configuration and share it so that BigDataRider and CustomDataSurfer can easily get set up.
- Provides a way for any number of clients to encrypt a file in such a way that only AwesomeAI can read it (using AwesomeAI’s public key)
- Ensures that either the GPG encyrption is correct or the kms key file header is set to ensure the file is encrypted.
Our new open source tool:
Tools and other pertinent references:
Want to stay up to date with the lastest from Jemurai?
Sign up for our monthly newsletter!