How I resolved some tough docker problems on Amazon ECS

via GIPHY

ECS is Amazon’s elastic container service. If you have a dockerized app, this is one way to get it deployed in the cloud. It is basically an Amazon bootleg Kubernetes clone. And not nearly as feature rich! ๐Ÿ™‚

Join 38,000 others and follow Sean Hull on twitter @hullsean.

That said, ECS does work, and it will allow you to get your application going on Amazon. Soon enough EKS (Amazon’s Kubernetes service) will be production, and we’ll all happily switch.

Meantime, if you’re struggling with the weird errors, and when it is silently failing, I have some help here for you. Hopefully these various error cases are ones you’ve run into, and this helps you solve them.

Why is my container in a stopped state?

Containers can fail for a lot of different reasons. The litany of causes I found were:

o port mismatches
o missing links in the task definition
o shortage of resources (see #2 below)

When ecs repeatedly fails, it leaves around stopped containers. These eat up system resources, without much visible feedback. “df -k” or “df -m” doesn’t show you volumes filled up. *BUT* there are logical volumes which can fill.

Do this to see the status:


[[email protected] ~]# lvdisplay
--- Logical volume ---
LV Name docker-pool
VG Name docker
LV UUID aSSS-fEEE-d333-V999-e999-a000-t11111
LV Write Access read/write
LV Creation host, time ip-10-111-40-30, 2018-04-21 18:16:19 +0000
LV Pool metadata docker-pool_tmeta
LV Pool data docker-pool_tdata
LV Status available
# open 3
LV Size 21.73 GiB
Allocated pool data 18.81%
Allocated metadata 6.10%
Current LE 5562
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:2

[[email protected] ~]#

Related: 30 questions to ask a serverless fanboy

Why am I getting this error “Couldn’t run containers – reason=RESOURCE:PORTS”?

I was seeing errors like this. Your first thought might be that I have multiple containers on the same port. But no I didn’t have a port conflict.

What was happening was containers were failing, but in inconsistent ways. So docker had old copies still sitting around.

On the ecs host, use “docker ps -a” to list *ALL* containers. Then use “docker system prune” to cleanup old resources.


INFO[0000] Using ECS task definition TaskDefinition="docker:5"
INFO[0000] Couldn't run containers reason="RESOURCE:PORTS"
INFO[0000] Couldn't run containers reason="RESOURCE:PORTS"
INFO[0000] Starting container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-redis
INFO[0000] Starting container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-main
INFO[0000] Starting container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-postgres
INFO[0000] Describe ECS container status container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-postgres desiredStatus=RUNNING lastStatus=PENDING taskDefinition="docker:5"
INFO[0000] Describe ECS container status container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-redis desiredStatus=RUNNING lastStatus=PENDING taskDefinition="docker:5"
INFO[0000] Describe ECS container status container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-main desiredStatus=RUNNING lastStatus=PENDING taskDefinition="docker:5"

INFO[0007] Stopped container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-postgres desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="docker:5"
INFO[0007] Stopped container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-redis desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="docker:5"
INFO[0007] Stopped container... container=750f3d42-a0ce-454b-ac38-f42791462b76/sean-main desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="docker:5"

Related: What’s the luckiest thing that’s happened in your career?

3. My container gets killed before fully started

When a service is run, ECS wants to have *all* of the containers running together. Just like when you use docker-compose. If one container fails, ecs-agent may decide to kill the entire service, and restart. So you may see weird things happening in “docker logs” for one container, simply because another failed. What to do?

First look at your task definition, and set “essential = false”. That way if one fails, the other will still run. So you can eliminate the working container as a cause.

Next thing is remember some containers may startup almost instantly, like nginx for example. Because it is a very small footprint, it can start in a second or two. So if *it* depends on another container that is slow, nginx will fail. That’s because in the strange world of docker discovery, that other container doesn’t even exist yet. While nginx references it, it says hey, I don’t see the upstream server you are pointing to.

Solution? Be sure you have a “links” section in your task definition. This tells ecs-agent, that one container depends on another (think of the depends_on flag in docker-compose).

Related: Curve ball interview questions and answers

4. Understanding container ordering

As you are building your ecs manifest aka task definition, you want to run through your docker-compose file carefully. Review the links, essential flags and depends_on settings. Then be sure to mirror those in your ECS task.

When in doubt, reduce the scope of your problem. That is define *only one* container, then start the service. Once that container works, add a second. When you get that working as well, add a third or other container.

This approach allows you to eliminate interconnecting dependencies, and related problems.

Related: Are generalists better at scaling the web?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are you getting errors building Amazon lambda functions? Don’t fret I got you!

aws lambda python

This post has gotten popular. So I wrote a better easier guide to diving into serverless lambda here. It’s quicker. I promise!

Why does Amazon make lambda functions so hard to create? Well my guess is that when you live at the bleeding edge you should expect to get scrapes!!

Check out: How daily notes help me work better with clients

Everybody is trying to build lambda functions these days. And it’s no wonder. Once you get them running, Amazon takes care of all the infrastructure drudge work! So cool.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve been spending some time trying to get answers out of AWS support, and let me tell you it’s no fun. Yes all this stuff is new technology, and nobody has expertise in it in the way say you might in Linux or Oracle or another technology that’s been around for a decade.





Still you’d hope the techs would have some clue. In the end it was a slog dealing with support, and I think I was the one teaching them!

I did find Matt Perry’s howto which is pretty good.

Hopefully my own notes can help someone, so read on!

1. No lambda_function?

The very first issue you’re gonna run into is if you name the file incorrectly, you get this error:


Unable to import module 'lambda_function': No module named lambda_function

If you name the function incorrectly you get this error:


Handler 'handler' missing on module 'lambda_function_file': 'module' object has no attribute 'handler'

On the dashboard, make sure the handler field is entered as function_filename.actual_function_name and make sure they match up in your deployment package.

If only the messages were a bit more instructive that would have been a simpler step, but oh well!

Also: Is Amazon too big to fail?

2. No module named MySQLdb

This is a very tricky one. I mean after all you just spent all this time building your deployment package specifically for lambda, so what gives??


"Unable to import module 'lambda_function': No module named MySQLdb"

Turns out when you use a virtualenv, files will be installed into proj/lib/python2.7/site-packages/ or lib64. However Lambda wants them in the root proj/ directory! So move them there. I know I know. Weird, but that’s what they want.

Related: When hosting on Amazon turns bloodsport

3. Can’t find libmysqlclient

If you’re using the MySQLdb library like I was, you’ll eventually bump into this error:


Unable to import module 'lambda_function': libmysqlclient.so.18: cannot open shared object file: No such file or directory

Turns out that /usr/lib/libmysqlclient.so.18 needs to be COPIED from /usr/lib. Don’t do “mv” or your system won’t have the mysql lib anymore!

Related: Are SQL databases dead?

4. Use the Amazon Lambda environment

One thing the support pointed out is that AWS as *supported images* for lambda development.

After all the errors above were resolved, it’s not clear to me that the supported AMI’s are truly required. However if you’re hitting intractable problems building a properly lambda deploy, you might wanna look at building one of these boxes.

Read: Why dropbox didn’t have to fail

5. Build your lambda deploy package

Now let’s roll it all together. Here’s are all the steps to build your deploy package.


- SSH to the instance
- mkdir test
- virtualenv test
- source proj/bin/activate
- sudo yum groupinstall 'Development Tools'
- sudo yum install mysql
- sudo yum install mysql-devel
- pip install MySQL-python
- cd test
- emacs -nw lamdba_function.py
- add your code to that file
- save the lambda_function.py
- mv proj/lib/python2.7/site-packages/* proj/
- mv proj/lib64/python2.7/site-packages/* proj/
- rm -rf proj/lib (don't need dist-packages in the deploy pkg)
- rm -rf proj/lib64 (don't need dist-packages
- zip -r proj.zip *

Also: How to hire a developer that doesn’t suck?

6. Upload your code

Uploading your code via the AWS dashboard is fine when you’re first testing things. But after a while it’ll get tiring going in the front door.

Create a new lambda function by specifying the basics as follows:


aws lambda create-function \
--function-name testfunc1 \
--runtime python2.7 \
--role arn:aws:iam::996225510001:role/lambda_basic_execution \
--handler lambda_function_file.handler_name \
--zip-file file://proj.zip

And when you want to update your function, do the following:


aws lambda update-function-code \
--function-name testfunc1 \
--zip-file file://proj.zip

Also: How to deploy on EC2 with Vagrant

Good luck with lambda. Once you get past Amazon’s weak documentation it’s pretty cool to be in a serverless computing environment. Happy deploying!

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters