I'm in the midst of rewriting a big app that currently uses AWS S3 and will soon be switched over to Google Cloud Storage. This blog post is a rough attempt to log various activities in both Python libraries:
Disclaimer: I'm manually copying these snippets from a real project and I have to manually scrub the code clean of unimportant quirks, hacks, and other unrelated things that would just add noise.
Install
boto3
$ pip install boto3 $ emacs ~/.aws/credentials
google-cloud-storage
$ pip install google-cloud-storage $ cat ./google_service_account.json
Note: You need to create a service account and then that gives you a .json
file which you download and make sure you pass its path when you create a client.
I suspect there are more/other ways to do this with environment variables alone but I haven't got there yet.
Making a "client"
boto3
Note, there are easier shortcuts for this but with this pattern you can have full control over things like like read_timeout
, connect_timeout
, etc. with that confi_params
keyword.
import boto3 from botocore.config import Config def get_s3_client(region_name=None, **config_params): options = {"config": Config(**config_params)} if region_name: options["region_name"] = region_name session = boto3.session.Session() return session.client("s3", **options)
google-cloud-storage
from google.cloud import storage def get_gcs_client(): return storage.Client.from_service_account_json( settings.GOOGLE_APPLICATION_CREDENTIALS_PATH )
Checking if a bucket exists and if you have access to it
boto3 (for s3_client
here, see above)
from botocore.exceptions import ClientError, EndpointConnectionError try: s3_client.head_bucket(Bucket=bucket_name) except ClientError as exception: if exception.response["Error"]["Code"] in ("403", "404"): raise BucketHardError( f"Unable to connect to bucket={bucket_name!r} " f"ClientError ({exception.response!r})" ) else: raise except EndpointConnectionError: raise BucketSoftError( f"Unable to connect to bucket={bucket.name!r} " f"EndpointConnectionError" ) else: print("It exists and we have access to it.")
google-cloud-storage
from google.api_core.exceptions import BadRequest try: gcs_client.get_bucket(bucket_name) except BadRequest as exception: raise BucketHardError( f"Unable to connect to bucket={bucket_name!r}, " f"because bucket not found due to {exception}" ) else: print("It exists and we have access to it.")
Checking if an object exists
boto3
from botocore.exceptions import ClientError def key_existing(client, bucket_name, key): """return a tuple of ( key's size if it exists or 0, S3 key metadata ) If the object doesn't exist, return None for the metadata. """ try: response = client.head_object(Bucket=bucket_name, Key=key) return response["ContentLength"], response.get("Metadata") except ClientError as exception: if exception.response["Error"]["Code"] == "404": return 0, None raise
Note, if you do this a lot and often find that the object doesn't exist the using list_objects_v2
is probably faster.
google-cloud-storage
def key_existing(client, bucket_name, key): """return a tuple of ( key's size if it exists or 0, S3 key metadata ) If the object doesn't exist, return None for the metadata. """ bucket = client.get_bucket(bucket_name) blob = bucket.get_blob(key) if blob: return blob.size, blob.metadata return 0, None
Uploading a file with a special Content-Encoding
Note: You have to use your imagination with regards to the source. In this example, I'm assuming that the source is a file on disk and that it might have already been compressed with gzip
.
boto3
def upload(file_path, bucket_name, key_name, metadata=None, compressed=False): content_type = get_key_content_type(key_name) metadata = metadata or {} # boto3 will raise a botocore.exceptions.ParamValidationError # error if you try to do something like: # # s3.put_object(Bucket=..., Key=..., Body=..., ContentEncoding=None) # # ...because apparently 'NoneType' is not a valid type. # We /could/ set it to something like '' but that feels like an # actual value/opinion. Better just avoid if it's not something # really real. extras = {} if content_type: extras["ContentType"] = content_type if compressed: extras["ContentEncoding"] = "gzip" if metadata: extras["Metadata"] = metadata with open(file_path, "rb") as f: s3_client.put_object(Bucket=bucket_name, Key=key_name, Body=f, **extras)
google-cloud-storage
def upload(file_path, bucket_name, key_name, metadata=None, compressed=False): content_type = get_key_content_type(key_name) metadata = metadata or {} bucket = gcs_client.get_bucket(bucket_name) blob = bucket.blob(key_name) if content_type: blob.content_type = content_type if compressed: blob.content_encoding = "gzip" blob.metadata = metadata blob.upload_from_file(f)
Downloading and uncompressing a gzipped object
boto3
from io import BytesIO from gzip import GzipFile from botocore.exceptions import ClientError from .utils import iter_lines def get_stream(bucket_name, key_name): try: response = source.s3_client.get_object( Bucket=bucket_name, Key=key ) except ClientError as exception: if exception.response["Error"]["Code"] == "NoSuchKey": raise KeyHardError("key not in bucket") raise stream = response["Body"] # But if the content encoding is gzip we have re-wrap the stream. if response.get("ContentEncoding") == "gzip": body = response["Body"].read() bytestream = BytesIO(body) stream = GzipFile(None, "rb", fileobj=bytestream) for line in iter_lines(stream): yield line.decode("utf-8")
google-cloud-storage
from io import BytesIO from gzip import GzipFile from botocore.exceptions import ClientError from .utils import iter_lines def get_stream(bucket_name, key_name): bucket = gcs_client.get_bucket(bucket_name) blob = bucket.get_blob(key) if blob is None: raise KeyHardError("key not in bucket") bytestream = BytesIO() blob.download_to_file(bytestream) bytestream.seek(0) for line in iter_lines(bytestream): yield line.decode("utf-8")
Note That here blob.download_to_file
works a bit like requests.get()
in that it automatically notices the Content-Encoding
metadata and does the gunzip on the fly.
Conclusion
It's not fair to compare them on style because I think boto3 came out of boto which probably started back in the day when Google was just web search and web emails.
I wanted to include a section about how to unit test against these. Especially how to mock them. But what I had for a draft was getting ugly. Yes, it works for the testing needs I have in my app but it's very personal taste (aka. appropriate for the context) and admittedly quite messy.