Automatically Converting FLAC Files to MP3 With AWS Lambda and Python

Wanting to get some practice in with S3 Event Notifications and Lambda recently, I decided to create a setup to convert files from FLAC to MP3. I’ll share my thoughts and a basic tutorial here.

Which Conversion Software?

My initial thought was to utilize the Amazon Elastic Transcoder service to convert the FLAC files; however, the limit of 20 minutes of free output per month on the free tier made me question this option, and the regular price of $0.0045 per minute of converted audio cemented my decision to seek another option.

It’s not so much that this cost would make Elastic Transcoder unfit for experimentation so much as it seems very cost inefficient for this purpose. Since we’re already using Lambda (which is quite capable of the task using CPU) and the cost per second of execution is so low, we might as well use its compute power for the conversion. As a bonus, the software we’ll use (FFmpeg) has arm64 builds readily available, allowing us to take advantage of even more cost savings.

Creating the FFmpeg Lambda Layer

A pre-made Lambda layer for FFmpeg is available from the AWS Serverless Application Repository, and deploying this is an easy option to easily get started. While the most recent version of Python this has been marked as supporting is Python 3.6 (a pull request was submitted almost two years ago to update the template.yaml), there’s no reason you can’t simply specify the ARN and use a more recent runtime.

Unfortunately, there is one more potential issue here: The layer is x64 only. While I could just clone the project and change two letters from amd64 to arm64, I opted to create the layer by hand.

To do so, simply download a static build of FFmpeg for the architecture of your choice, extract the files, create and put them in a directory labeled bin so that it’s in the proper path, and compress that directory as a zip file. You can then either use the console to create a new layer either uploading the zip file right then or by giving an S3 URI, or use the AWS CLI (or an SDK) to publish the layer, optionally uploading the file to S3 first or simply specifying the --zip-file parameter.

Creating the Lambda Function

Next, we’ll need a basic Lambda function for converting the FLAC files to MP3. I decided to go with 320 kbps, but you might wish to opt for lower quality.

You might note the use of unquote_plus. If you forget this step, you may run into failures when filenames have spaces. For example, if you try to use the key 01+My+Song.flac without unquoting to 01 My Song.flac, downloading the object will fail on line 41.

When creating the function, make sure to add the FFmpeg layer.

A Word On Configuration

You may also notice the use of a MP3Bucket environment variable in the code above; if you use it, you’ll either need to set this for your Lambda function, or replace it with a bucket name.

There’s also the matter of allocating memory, which also determines vCPU availability for the function. You might be tempted to set the memory to the minimum (128 MB) or some low value to try to keep costs down, but this is not necessarily the best approach. A function that uses more memory while finishing faster may ultimately result in lower cost (or lower free tier usage), although at some point you’ll see diminishing returns. Finding the right value for a particular use case (e.g. many small FLAC files, a few large files, or some mix) will likely require some experimentation and consulting CloudWatch Logs and/or execution time. I ultimately settled on 1024 MB for working with a mix of file sizes.

If you’re working with some particularly large FLAC files, you’ll probably also need to increase the amount of ephemeral storage above the default of 512 MB. Of course, this does come with an additional charge whether a given execution of the function actually uses that storage or not, so you’ll probably want to consider whether you actually need it.

Permissions

Provided both of the buckets and the Lambda function are in the same account, it’s sufficient to attach a policy to the Lambda function’s role to allow it to access the buckets in question. The following policy will allow for this:

Since the function doesn’t need to communicate within a VPC, no network setup (such as a security group) is required.

Setting Up the Event

With that out of the way, it’s time to set up the event itself. If you navigate to the origin bucket in the Console and go to the “Properties” tab, you’ll find an “Event notifications” section. Click “Create event notification.”

Under “General configuration,” specify an event name and the desired suffix to limit notifications for events to .flac. Under “Event types,” select “All object create events” as well as “All object removal events” (if desired).

Finally, for the destination, select or input the ARN of your Lambda function.

The Result

Provided everything has gone right, if you upload a FLAC file to the origin bucket, you’ll find a corresponding MP3 in the MP3 bucket a short time later.

From my testing, a roughly 245 MB and 5 minute long very high quality FLAC recording resulted in a billable duration of 22504 ms with a Lambda function configured with 1024 MB of RAM. This equates to 22.504 GB-seconds of usage, or roughly worth of usage at the “First 7.5 Billion GB-seconds / month” rate in the . This is ~6.67% the price of Elastic Transcoder for one minute of audio. At 2048 MB of RAM it has a billable duration of 11728 ms for a slightly GB-seconds usage but time not far off from the 11.298 seconds it took Elastic Transcoder to handle the same file.

What about converting other formats? The script above doesn’t actually depend on the input file being a FLAC, so you can convert a wide range of formats, though converting from one lossy format to another isn’t recommended due to digital generation loss.

I hope you’ve enjoyed this quick look at S3 events and Lambda. Thanks for reading!

Jonathan Monreal