Integrating SQS, Lambda and Celery (Part 1)

In the Eventure app, users can upload photos and share them with friends or with others at an event. When those photos are uploaded, we save the images to S3, rescale the images for mobile apps (make thumbnails), save the scaled images to S3, and finally save all the image and thumbnail information in the database. But, based on some application profiling, we found that it was possible to push images to the server faster than they could be processed, and we could potentially overload the server given enough image upload activity.

There are multiple ways we could deal with the issue, but one option we wanted to explore was to establish the following workflow, and totally offload image processing to Amazon AWS Lambda:

  1. User uploads an image, which is immediately saved in Amazon S3
  2. Saving the image kicks off an AWS Lambda function (S3 serves as an event source for the Lambda function)
  3. The Lambda function creates several image thumbnails, and saves them to S3
  4. When the Lambda function completes, notify the Eventure backend that the thumbnailing is done with information about the new thumbnail images

Steps 1-3 above are fairly standard uses of the AWS infrastructure. In fact, one of the AWS Lambda walkthroughs deals with image resizing. For step 4, we could have made the notification in several ways. One idea was to use AWS Simple Queue Service (SQS) to receive a message from from the Lambda function, and then have Celery consume and process the message for our app.

To do that, we needed to answer a couple of questions:

  • Can we make SQS act as a Celery message broker?
  • Can we push messages from outside Celery that workers can process?

We can explore the first question in this blog post, and discuss external messages in the next one as part 2.

Setting up SQS as a Celery message broker

As a first step, we needed to get Celery and SQS talking. Amazon SQS is listed as an “experimental” broker, so let’s do some experimenting in a walkthrough to get Django, Celery, and SQS working together. In a fresh Python virtualenv, run:

Ok, now we have a stub Django project with one app (queuedemo). Now, let’s edit the celerysqs/settings.py file to configure the application.

Ok, let’s go over some of the more interesting settings in this settings file.

  • Setting CELERY_TASK_SERIALIZER, CELERY_ACCEPT_CONTENT, CELERY_RESULT_SERIALIZER ensures that we can communicate using JSON rather than Python pickles. This will be important when we want to create messages outside of Python in AWS Lambda. It also gets rid of an annoying warning about pickling being insecure.
  • The SQS broker doesn’t support remote control or events, so we disable those features by setting CELERY_ENABLE_REMOTE_CONTROL and CELERY_SEND_EVENTS to False. This also stops Celery from trying to create a new SQS queue for each machine running Celery jobs.
  • CELERY_DISABLE_RATE_LIMITS = True turns off rate limit tracking. I believe you can turn this on if you need it, but there is some overhead. Personalize to taste.
  • BROKER_TRANSPORT_OPTIONS contains configuration settings specific to SQS.
    • Setting queue_name_prefix will give us a SQS queue named ‘dev-celery’
    • visibility_timeout is documented, and basically dictates how long you have to process the message before it can be re-issued to another worker
    • wait_time_seconds establishes long polling for SQS, meaning the worker will hold the connection open waiting for a message for up to this many seconds if there is no message to process. This cuts down on SQS queries (and costs) for queues that may not always have a message available.

Ok, now that settings are established, we have some housekeeping to set up a Celery task (this pretty much follows the django celery documentation). First, create a new file celerysqs/celery.py

And edit celerysqs/__init__.py

Finally, set up a new task in the queuedemo app, add a new module queuedemo/tasks.py

After all that, let’s do this! Let’s set up our database and start up a celery worker:

Looks good, now let’s switch to another console and fire up a task.

Looks good! On the celery console, we see that the task was processed.

So, with a little work, we now we have a working link between Celery and SQS. We can easily extend this to add other functions to offload jobs that don’t need to be done immediately.

In the next post, I’ll work through how to push a message back from AWS Lambda into SQS so that it can be processed by Celery. While we will do this inside a Lambda function, it should be useful to anyone that needs to send a job from outside Python to a Celery worker.

Image credit National Archives and Records Administration

Leave a Reply

Your email address will not be published. Required fields are marked *