What was the best decision you made in your career?

via GIPHY

I was recently asked this question by a colleague. I thought a little bit about it for a moment. The answer was quite clear.

For me the answer is easy. Going indedepent has been the best decision of my career.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

Starting at the birth of the internet explosion, mid-nineties when mozilla became real. The dot-com era took off, and so did the demand for engineering talent.

1. Going independent

For me I had just moved to New York. So timing was right. I had experience running my own business, in my teen years. That streak of independence drove me to do the same with my technology skills. Call it a hunger. A need to go it alone, make my own way in the world.

Related: When you have to take the fall

2. Self directed career

The advantages of going it alone are a double edge sword. On the one hand you can steer towards projects you find interesting. And upgrade your skills in those directions. The downside is you’re taking on all the risk. If you’re wrong about the direction of the industry, you’ll have wasted your time, money, and resources.

I wrote previously about that in Why do people leave consulting. It’s one reason among many.

Related: When clients don’t pay

3. Wide ranging exposure

For many in the traditional FT career track, you may work for 5-10 companies in the course of 20 years. In my case I’ve worked for close to 200 firms in that time.

In that process, you get exposure. To human problems & challenges, to product design & development problems, and architectural issues. And at that scale, patterns begin to emerge, as you see certain types of issues repeat themselves. This becomes valuable insight.

Related: Why i ask for a deposit

4. Build survival skills

As I mentioned previously, independence is a double edged sword. You build survival skills. But you need them. There’s no net beneath you, protecting you from falling. So you’re forced to make hard decisions about how you spend your time, finding projects, networking, learning new skills, and delivering in a real way to your customers.

The dividend is that now you have survival skills. And those indeed are very valuable.

Related: Why i ask for a deposit

5. Good money

There is a myth that consultants make more money. But then i hear stories of someone getting laid off, and getting a 4 or 5 month severance. That’s shocking to me. What’s more people often forget about the value of days off, health care & other benefits, and the huge one being upgrading skills. If a firm is offering you this, take advantage!

Remember that you’ll get none of these benefits working for yourself, unless you’re successful enough to reward yourself in this way. That means having a good pipeline of projects, and a trail of happy successful customers behind you. They will tell your story, and sell you to colleagues.

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Thinking deeply about Amazon cloud & infrastructure code

via GIPHY

If you’re building anything in the public cloud these days, you’re probably using some automation. There are a lot of ways to reach the goal posts, and a lot of tools to choose from.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

In my case I’ve put Terraform to use over and over again. I’m built vpcs, public & private subnets, and bastion boxes for mobile apps for mental health & fitness, building security, and two factor authentication apps.

I’ve chosen Terraform because it has a vibrant & growing community, the usability is miles ahead of CloudFormation, and it can work in a multi-cloud environment.

But this article isn’t about choice of tools. I’m curious about this one question:

“What architectural considerations should I keep in mind as I build my infrastructure code?”

Here are my thoughts on that one…

The VPC is your fundamental container

Everything you build sits inside of a VPC. Your entire stack references back to those variables, including vpc-id, private and public subnet IDs.

Here’s an example where you can get into trouble. Digging through some infra code, reviewing with a new devops hire, we were going through everything with a fine toothed comb. We found that the RDS instance was being deployed in PUBLIC subnet, instead of private.

Alerted to the problem, we first checked to see whether it was accessible from the internet at large. It wasn’t, because we had not exposed a public facing IP. That said it wasn’t the most secure setup and I wanted to fix it.

I made some changes to the Terraform code, to update the subnet to private, and tried “$ terraform apply”. Then I got all sorts of errors. Try as I might, this update would not work.

Sadly the long term solution was to destroy the entire stack, and rebuild with RDS in the right place. Lesson learned.

Related: When you have to take the fall

Why I discovered a shared or utility vpc was so useful

o story of placing ELK inside an application vpc

Related: Before you do infrastructure as code, consider your workflow

Think carefully about domains

As you build your application, you’ll likely need a route 53 zone associated with it. And you’ll want a CNAME in front of your load balancer, so it’s easy for customers to hit your endpoint.

1. rebuilding stack means new zone & new nameservers

If your registrar is elsewhere, you’ll need to update nameservers each time you destory & build the zone. This happens even if you host the domain at AWS. And it can’t be automated at the moment.

You could also have the zone created *outside* terraform. Then your terraform code would reference and add CNAMEs to that zone by using it’s ARN as reference. This is another possible pattern.

The pattern I prefer is to have each vpc & stack have their own unique top level domain. That way terraform can cleanly create and destroy the whole stack and nothing is comingled.

Related: I tried to do infrastructure as code, it didn’t go as I expected

Enable easy create & destroy

Each time you tear up your work and rebuild, you test the whole process. This is good. Iron out those hiccups before they cause you trouble. After some time, you’ll be able to move your entire application stack, db, ec2 instances, vpc & network resources from one region to another easily & quickly.

After doing this a few times, you’ll start to learn what resources in AWS are region specific. And which ones are global.

Remember, don’t allow any manually created objects or resouces inside your automated ones. If you aren’t strict here, you’ll hit errors when you try to destroy, and then have to troubleshoot those one-by-one.

Related: How to setup an Amazon ECS cluster with Terraform

Automate first

I was building an ELK server setup to centralize our logging infra. Everything worked pretty well. After a time, I added some more S3 buckets for load balancer log ingestion.

Later we hit a problem, where the root volume was filling up. This stopped new logs from appearing. So we rebuild the ELK server with a 10x larger root volume. As we had used terraform and ansible, the rebuild was easy. And quickly we had are logging system back online.

A week or so later though we had trouble again. It seemed that some of the load balancer log data wasn’t showing up. We spent a day troubleshooting, and eventually found out why. Those S3 buckets weren’t being ingested.

Turns out when we added those, we added them to the config file *directly* on the server, but not in the configuration scripts.

Moral of the story…

“Always update the automation scripts first, and apply those to the server. Don’t work on the server directly.”

Related: Are you getting good at Terraform or wrestling with a bear

Beware of account limits

As you’re building your stack in us-east-1, you may later go and try to create another copy of it. Suddenly AWS complains that there are no VPCs left. Or you’ve hit a maximum of 20 EC2 instances. While these errors may be irritating, you should be glad to have them. With them in place, and errant piece of infra code or application cannot accidentally run up your account and receive a surprise bill.

That said you should be mindful of those limits, and increase them before your application hits a wall.

A few that I’ve run into:

o 20 ec2 instances per region
o 5 Elastic IPs per region
o 5 VPCs per region

If your application requires more, prepare to switch regions, or up those service limits. You can open a support ticket to do that.

Related: How do you interview for key AWS skills?

What resources live on past a stack build/destroy cycle

As you build your stack with infrastructure code, you’ll tear it down again often. Each time you do this, you’ll be reminded of one thing. Any data inside there will be gone.

That means for starters don’t store things in the filesystem. Store them in a database. RDS is great for this purpose. Then the question becomes, when I destroy my stack, how can I backup and restore my database. RDS does support this, but if you have more nuanced requirements, you may have to build your own backup & restore.

What about cloudwatch logs? As long as your stack doesn’t destroy those resources, they’ll be kept in perpetuity for you. You may want to further back them up.

What about your load balancer logs? Here you can either create the S3 log bucket *outside* of your infra code. In that case it won’t get cleaned up during a destroy. Alternatively you can create a meta bucket for load balancer logs, and copy those over regularly. Then when you cleanup your infra, you can do a bucket destroy with –force option.

Related: Is Amazon too big to fail?

Some things remain manual but can be made easier

One thing that remains manual in AWS is the SSL certificate creation. You can “request” a new certificate and select DNS validation.

When you do this, incorporate the certificate control cname and certificate control record into the infra code as variables. Then copy/paste these two values from your certificate dashboard.

Assuming your nameservers are pointing to aws, the certificate validation code should spot the above secret control record in DNS. When it does it will conclude that you control the zone, and therefore validate your SSL certificate.

Once it shows VALID on the dashboard, copy the SSL certificate ARN, and pop that into your terraform code. You will add it as part of an SSL listener to your ALB (application load balancer) configuration.

Related: Does AWS have a dirty little secret?

Don’t just monitor metrics

Monitoring is of course important. You’ll want to setup a prometheus server that can do discovery. This allows it to dynamically configure and learn about new servers, so it can monitor those too. It does this by using the AWS API to find out what is there to monitor.

All of this monitoring is crucial, but it applies mainly to server metrics. What is the load average, CPU utilization, memory or disk usage.

What about your log counts? As I describe above, a logstash misconfiguration meant that log records didn’t show up. However this was only noticed through manual discovery. We want that to be automatically discovered!

Do that by creating checks that count records, and alert to numbers that are

You can also validate other data with checksums, by creating your own custom methods. You’ll need to think intelligently about your application, and what type of checks make sense.

Related: How I resolved some tough Docker problems on Amazon ECS

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does migrating to the cloud require a mindset change of your team?

via GIPHY

We’ve all heard the success stories at firms that have grappled with automation. The dividends are legendary.

Take Amazon themselves for example. By decoupling their teams, allowing each to grow independently and at their own pace, they’ve been able to scale massively.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

One look at the AWS dashboard these days, or their wikipedia page, reveals over 90 services on offer. And each of those is growing and expanding by day.

I’ve worked with a lot of startups, trying to get there. They’ve heard the gospel, and want to gain the benefits themselves.

Here are the challenges I’ve found.

1. Building ain’t easy

One example story was building an ELK box. ELK is elasticsearch, Logstash and Kibana. It provides a centralized place to send all your application & service logs, collect them all together on one dashboard. It’s the business intelligence of devops & software development. Super valuable tool.

In building our solution, we took a marketplace AMI off the shelf, and then customized that. After building the terraform code to spinup the server, we added Ansible scripts to further customize. This allowed us to add a cronjob for backups, set a password, add additional logstash configs, and a few other important housekeeping tasks.

All was great until we hit a snag, we found some CloudWatch logs were not making there way into ELK. Digging through the log messages, we eventually uncovered an error. And that was caused by a conflicting port configuration. So we removed that unused in logstash.conf, and problem solved.

Later, we rebuilt the server and that was pretty quick. Having all the scripts in place, meant we could rebuild quickly. In this case we just needed to resize the root volume by 25x to make room for future logs. This was 3 lines of terraform code and then done!

A couple of weeks later however, we found missing logs again. Digging digging digging, and then we finally discover it is a repeat of our old problem! Turns out the change to logstash.conf never got rolled into the automation scripts. It was done manually! Bad bad!

Moral of the story, with automation, your workflow needs to change. You should *always be working on the scripts* and then reapplying those. Never work on the server directly!

Time to eat my own dogfood!

Related: Is AGILE right for fixing performance issues?

2. Troubleshooting is tough

In the automation universe, as I wrote above, you really want to avoid logging into servers and doing things manually. But that may be easier said than done.

Take another example, I had an ssh key distribution script. I repurposed from the Terraform Community Modules. It works great when it works. It gets injected onto the server at boot time, by terraform inside the user-data script.

The code gets added to cron, and relies on awscli. As it turns out awscli is *not* on all of the aws linux images. Who knows why?!? But that’s where we are.

Should be easy to install. Use yum to get pip (python package manager) installed. Then use pip to install awscli. The script even has *both* yum and apt-get commands to attempt to install pip on either ubuntu or amzn linux. Problem is sometimes it doesn’t. Sometimes? You ask. Yes indeed.

Digging further, it seems that the new pip package gets installed in /usr/local/bin, while it used to install in /usr/bin/. Seems simple. Add a symlink. Yeah did that. Sometimes the package has a different name, such as python-pip3. Great!

Now all this is magnified because you can’t just go on the box and go through the steps. Why? Because in the primordial cromagnon universe that is linux server boot time, sometimes things happen in weird orders, or slower. So you may have something missing during that period, that is later available. So after boot you see no errors.

Yes complicated. Yes you need to build, destory, build destroy the server in endless cycles.

At the next level of automation, we will implement infrastructure testing pipeline. This will automatically build the server for you. The infrastructure unit testing framework seems pretty darn cool. And there is also Gruntworks Terratest.

Related: Is automation killing old-school operations?

3. The dividend is agility

What have i seen in terms of agility?

Well moving our application to a new region takes 20 minutes. Crazy as that sounds, from vpc, to 3 private subnets, 3 public subnets, bastion boxes, load balancers, rds & redis instances, security groups, ingress rules, iam roles, users, s3 buckets, ecs cluster, and various ec2 instances, route 53 zones & cnames, plus even EIPs all can be moved with a few simple code changes. Wow!

What else? We can resize our ELK box root volume by deploying a brand new setup, all in about ten minutes.

This kind of speed is so exciting. It brings repeatability to your engineering processes. It brings confidence to all of those components.

And best of all it allows the business to experiment with new product ideas, and accelerate in the marketplace.

And we all know what that means!

Related: I have a new appreciation for AGILE

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What types of management problems plague startups?

via GIPHY

Being an avid reader of Fred Wilson’s AVC, I’ve learned much over the years. And one thing he underscores is that *ideas* are a dime a dozen. And that great investments are in team & execution.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

As a long time consultant, I’ve had the good fortune to see a lot of startups under the microscope. If you work a FT role for 2 years, over a decade you may work at 5 companies. In the same amount of time, i’ve worked at over 65 companies.

In those years I’ve encountered great teams that are super organized, and continue to move the product forward. But I have also seen a number of symptoms, that caused the business problems, and slowed down their march forward.

Low morale

One firm I worked at a few years back was in the space around education, specifically with a lot of microlearning products, with big customers doing corporate onboarding.

Their sales team was world class, closing bigger and bigger deals, but engineering had terrible and festering problems. As it went, they grew to have hundreds of employees in a matter of a year or two. Meanwhile the CTO was not a big people person. He didn’t like speaking in front of large groups, nor was he very hands on. As a small ten person startup he was super technical and talented, but as the company grew so fast, it left a leadership vacuum.

And then some bad hires grew the engineer team fast. But internally there was a lot of infighting. The original founding team worked hard and had strong direction, but the new hires all vied for control. And the ugly personalities reared their heads.

After a few short weeks, half the engineering team quit, in a matter of days. A tough blow to a team already struggling to keep up with growth.

It is not easy to right a large plane in mid flight like that, carrying plenty of technical debt besides.

Related: A CTO must never do this

Bad alignment

Another place I had the pleasure of working at was a well known digital media brand, that expanded into film production, recording and even investigative reporting. For all it’s wide ranging efforts, it presided over a huge growth business, with seemingly unlimited revenue. Impressive to be sure.

On the technology side, however things were not so sunny. As their business grew, they planned to consoldate data from many disparate divisions. And this is a process that many growing businesses go through. Finance in one platform & database, bookings & production in another, while analytics and viewer statistics in yet a third. But how to report on all of that data?

As a special crack team of big data experts, we were assigned the task of building out this centralized repository of business truth. And as we built and architect that system, we needed to work closely with the operations division.

Now in this business, they were using public cloud, Amazon Web Services like many other startups. However they had a separate team of devops who presided over these accounts.

As our team was handed strict deadlines to deliver working reports & systems, we had conference calls with the Devops team. However that team was not on board with those deadlines. They pushed back and claimed such systems would take months to setup.

As we explained expectations being pushed on our shoulders, Devops said “just push back and say no”. They advised that we “send it back up the chain”

But what if there’s a chink in the chain?

Clearly the two teams were not aligned at all on deadlines & deliverables. And that’s not a fault of either of those teams. It straightaway falls in the lap of management to align those.

And we were somehow stuck in the middle. Ugh!

Related: How to avoid legal trouble in consulting

Loose discipline

One startup I worked at had a security and authentication app.

Here teams were fairly happy on the whole. In fact they raved about having a great boss. Indeed the boss was a very kind leader, understanding, patient, and hardworking.

However, over and over, we lacked a “decider”. Here other team members were giving each other tasks. Promises were made loosely, and then forgotten one or two weeks later. And a constant lack of direction dragged down delivery.

For my money, a promise to have a meeting at 10am, is one all parties should abide. Whatever their level in the business. Not be late, have excuses about trains, or simply skip the meeting with no explanation. These types of habits cause the team to grow weary, and lower the bar of expectations.

Frustrating indeed.

Related: When you have to take the fall

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Has Apple jumped the shark or is app complexity out of control?

I was recently using my mac. I say that in jest. Because like all digital creatures, moderns, everyone with a pulse these days I suppose, like all those people I use my mac everyday.

So when I see something strange, it’s rather jarring. Because when you use a device everyday, you become quite familiar with it. And with that familiarity comes comfort.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

So what to my surprise when I was doing a task I do very often. Simply trying to add a contact from email into my address book. And I got the lovely and strange dialog you see above.

Now to most users it might not scare them much. I mean you know, just click one or other. Simple right? Wrong!

1. Click DELETE

As a long time database admin, I know that before you ever delete *ANYTHING* you make a backup first. Now in the above scenario, the backup menu item is blanked out. You cannot do a backup first. Unless you go to system backups.

Of course I did that, and that took me on a multi-hour wild goose chase, that is still not resolved.

Related: Why Fred Wilson was wrong about Apple

2. Click KEEP

Why not choose that option? Well for me my contacts database is sort of sacred. It’s where I keep track of everyone I’ve ever worked with. Now I never gave Linkedin permission to write anything to my contacts database, so who knows if they got in there, or how. But I certainly don’t want them in there. If I click keep, how the *heck* will I ever get thousands of contacts out of my database again?

Related: Why the Android ecosystem *was* and maybe still *is* broken

3. Where’s my shark?

Sadly, this is the state of computing these days. I don’t know if I was accidentally tricked into *allowing* linkedin to dump its contacts into my database, or if an upgrade changed some default, sneaking the data in there through a side channel. Whatever the case, I’m stuck now.

And Apple was supposed to protect us from this craziness. Apple where’ve you gone?

Related: Do Linux & Apple tell the Gilgamesh story of hacker culture?

4. Complexity is a hard beast to wrestle

This may be the ultimate culprit. As more apps hit the app store, and more codepaths lay dormant, more bugs or call them surprise features, lay undiscovered.

With all these power we have in our hands, it seems it is devolving by day, into a bigger and bigger mess.

Related: Is Apple betting against big data?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Before you do infrastructure as code, consider your workflow carefully

via GIPHY

What happens with infrastructure as code when you want to make a change on prod?

I’ve been working on automation for a few years now. When you build your cloud infrastructure with code, you really take everything to a whole new level.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

You might be wondering, what’s the day to day workflow look like? It’s not terribly different from regular software development. You create a branch, write some code, test it, and commit it. But there are some differences.

1. Make a change on production

In this scenario, I have development branch reference the repo directly on my laptop. There is a module, but it references locally like this:

source = “../”

So here are the steps:

1. Make your change to terraform in main repo

This happens in the root source directory. Make your changes to .tf files, and save them.

2. Apply change on dev environment

You haven’t committed any changes to the git repo yet. You want to test them. Make sure there are no syntax errors, and they actually build the cloud resources you expect.

$ terraform plan
$ terraform apply

fix errors, etc.

3. Redeploy containers

If you’re using ECS, your code above may have changed a task definition, or other resources. You may need to update the service. This will force the containers to redeploy fresh with any updates from ECR etc.

4. Eyeball test

You’ll need to ssh to the ecs-host and attach to containers. There you can review env, or verify that docker ps shows your new containers are running.

5. commit changes to version control

Ok, now you’re happy with the changes on dev. Things seem to work, what next? You’ll want to commit your changes to the git repo:

$ git commit -am “added some variables to the application task definition”

6. Now tag your code – we’ll use v1.5

$ git tag -a v1.5 -m “added variables to app task definition”
$ git push origin v1.5

Be sure to push the tag (step 2 above)

7. Update stage terraform module to use v1.5

In your stage main.tf where your stage module definition is, change the source line:

source = “git::https://github.com/hullsean/infra-repo.git?ref=v1.5”

8. Apply changes to stage

$ terraform init
$ terraform plan
$ terraform apply

Note that you have to do terraform init this time. That’s because you are using a new version of your code. So terraform has to go and fetch the whole thing, and cache it in .terraform directory.

9. apply change on prod

Redo steps 7 & 8 for your prod-module main.tf.

Related: When you have to take the fall

2. Pros of infrastructure code

o very professional pipeline
o pipeline can be further automated with tests
o very safe changes on prod
o infra changes managed carefully in version control
o you can back out changes, or see how you got here
o you can audit what has happened historically

Related: When clients don’t pay

3. Cons of over automating

o no easy way to sidestep
o manual changes will break everything
o you have to have a strong knowledge of Terraform
o you need a strong in-depth knowledge of AWS
o the whole team has to be on-board with automation
o you can’t just go in and tweak things

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

I have a new appreciation for Agile and not because it worked

via GIPHY

A couple of years ago I worked at a startup in the online publishing space. But this story isn’t about online publishing. This story is about deadlines, and not meeting them.

For those who don’t know me, I’m a professional services consultant. I’ve worked on a project basis for 90% of two plus decades of my professional career. I mentioned this to give my opinions and perspectives some context. Although I’m not a manager, I’ve worked under more managers than most. Because I work on 5-10 projects per year, that comes to close to two hundred in my career.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

My career path has given me a unique perspective, on teams, leadership and motivation. Of course like anyone who’s worked in tech, I’ve see a lot of agile.

Daily standups are de rigueur of course. As are breaking up your work into two week sprints, and assigning story points to those tasks.

I have to admit there are times when Agile seemed the most fashionable way I saw teams working. And I guess I believe it was more trendy than functional.

Doesn’t everybody already work off tight todo lists, break tasks down into manageable pieces, and always deliver what they promised? It may be because I’ve spent a lot of time managing myself, and surviving as a freelancer that I’ve picked up some of these habits. But I digress…

1. Crunch time

While we as a team had been working on two week sprints, something happened to derail us. Suddenly a major customer was in jeopardy, because we had not delivered on sales promises.

Although what was being promised, we *could* build, we were stumbling over the details.

As an emergency plan, we dropped our current sprint tasks, and marshaled our forces towards this new goal. Everyone on the team knew the customer was hanging by a thread.

Related: When you have to take the fall

2. We need to deliver production by Friday

While this edict looks great on paper, it didn’t work out so well. Developers and ops weren’t clear which systems were included in “production”, and how they needed to be connected together.

Each engineer ended up interpreting the goal in their own way, often assuming that others were shouldering ultimate responsibility of delivery.

“Well I did my part, this other part of the team is responsible for those other pieces…” was the refrain I heard. Sadly the deadline was missed, and everything was a mess. Only after management later dug in and sifted through the rubble, did they actually break up the tasks, assign story points, and give each engineer actionable items to work on.

Related: When clients don’t pay

3. Please work together to make that happen

This doesn’t work because “production” is not an actionable item.

Actionable is much more specific
o deploy this container
o setup these five servers
o fix these three bugs and push code
o setup these new environment variables

Why is “production” not specific?

Which application? Which layer or API must work? Are there intervening steps before production will work? Are their individual tasks for each engineer?

To me you need to “break things down” further if you have tasks that span multiple people, and multiple sessions. I think of a session as a 2-3 hour bucket of productive work. For me it is also the length of time my laptop battery takes to drain from 100% to 0%. When that happens I know I need a break.

And I know that’s how I get chunks of work done.

So to take this to a more specific level, if Friday is 5 work days away, I figure I have 12 increments of work I can do in that time, and if I have 3 engineers, then I need 36 chunks of work.

If you assign 36 chunks of work I believe you are much more likely to get 25-30 of those chunks done.

If you stick to the one macro goal of “get production to work”, engineers may secretly drop the ball figuring, well there are dependent tasks that are not done, so we’re not gonna get there. And also since the goal points at everyone, nobody saddles the failure.

Whereas if you have 36 chunks of work, individual engineers are less likely to drag their feet because it’ll be clear that the hold up was three of john’s tasks…

Related: Why i ask for a deposit

All of this gave me a new appreciation for Agile. It truly does keep teams on track, and focuses everyone on actionable work. This allows management more transparency on whats working, and engineers the feedback they need to get to the finish line.

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to avoid legal problems in consulting

via GIPHY

I posted a newsletter recently entitled “When Clients Don’t Pay”.

I got a lot of responses in email, which is always encouraging. I’m happy to know that folks are reading and getting something out of my ideas.

One colleague suggested that I modify my last point about going to court. He suggested that legal action does make sense after other avenues are exhausted.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

My feeling about avoiding court, has only grown stronger over the years.

There are usually only a few reasons a customer won’t pay. In my experience each of them are avoidable without going to court.

Here are my thoughts on those…

1. Misaligned on tasks, deliverables or deadlines

I find weekly progress reports and endless notes go a long way towards avoiding this problem. If it does arise, there is usually something specific in those notes that can be remedied.

One also needs to be willing to compromise. Putting yourself in the other’s shoes will help to understand their perspective.

Communicate, communicate, and communicate more!

Related: When you have to take the fall

2. Budget problems

Here there isn’t a lot to do anyway. Although companies are obligated to meet payroll by law, they are not so with vendors. If they are out of cash, will court really resolve that?

My way of heading off this problem is, billing/invoicing in smaller increments, getting a deposit, and keep on top of things, so larger debts don’t build up.

Related: The fine art of resistance

3. Shady customers

These I usually suss out well before becoming engaged. I’ve had a few incidents where a prospect was meeting me to get “free advice”. They ask a lot of architectural questions, and take careful notes. Then don’t engage, or use their own people to implement.

One situation in particular I remember was around scalability. The product was a website & app for teachers. From the beginning they built it to sync data instantly. As they got bigger and more customers used the platform, their servers became heavily loaded.

I suggested, instead of looking for a technical solution, why not offer your customers, silver, bronze & gold service levels. For the gold customers, yeah they get their own servers, and can sync all the time. But for the silver ones, once-a-day would probably suffice. Much less load on the servers, because 75% of customers would go silver, 20% bronze and 5% gold.

They actually ran with the idea and implemented it, but never hired me even for an hour of work. I knew they implemented it because I had a friend in the company. It is experiences like that which teach you quite a lot about business and about how you conduct yourself.

This has happened a few times, and I guess it’s part of doing business. But usually that comes out before we go much further, so in a sense it’s a blessing in disguise. ๐Ÿ™‚

Related: How to hire a developer that doesn’t suck

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

30 questions to ask as you plan for multicloud

via GIPHY

I enjoy reading Corey Quinn’s Last week in AWS newsletter. And also really like his podcast Screaming in the cloud. This week Corey talks with Jay Gordon from Mongodb. While Jay seems somewhat of an advocate of multicloud, Corey is decidedly critical. Which makes for a great interview!

Corey also wrote a piece The Myth of cloud agnosticism. I like any writing that dubs a popular trope as a “myth” because it’s an opportunity to poke holes in optimism.

It is through this process that we become realistic, which is crucial to being reliable in operations and engineering.

Corey argues that multicloud, with respect to multiple infrastructure providers is usually a crappy idea. That’s because the cloud providers are evolving, your application is evolving, and it costs you in terms of feature velocity. What’s more it provides dubious instant uptime in the DR realm.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

The topic reminds me of similar myths in computer science… the myth of cross platform development or
the myth of the cross platform databases the myth of object relational modeling.

As always your mileage may vary. Here are my questions. Hope they can help provide perspective, and critical thinking around this.

1. Do you plan to use multiple cloud providers for infrastructure? And deploy your application twice?
2. Do you plan to use multiple SaaS providers?
3. Does hybrid cloud make sense? That’s an option where you deploy a data link to a public cloud, keeping some assets in your own datacenter.
4. Are their feature parallels across your chosen clouds? Or are there feature mismatches?
5. Your cloud providers have independent service level agreements. Are they consistent or not?

Related: When you have to take the fall

6. What does the outage history look like for each of your providers?
7. What is the potential for fatfinger outage on each platform? For example one may be unduly complicated and prone to mistakes, based on API or dashboard interface.
8. Is one cloud more complicated to implement? For example Amazon Web Services while being more feature rich, is also much more complicated to deploy than a Digital Ocean setup.
9. You can see your backups on both platforms. Have you done restores on both? Regularly? Recently?
10. Do you have time to automate everything twice? For example you may need to rewrite your ansible playbooks for each platform

Related: When clients don’t pay

11. What is driving your business to embrace the idea of multicloud?
12. Do you have time to rewrite scripts twice? one-off and user-data scripts alike?
13. Do you have time to firedrill twice? Smoketest twice?
14. Will different clouds fail in different ways? For example one might be weak around it’s network. Another might be weak around it’s database service, and a third might encounter multi-tenant traffic congestion (disk or network).
15. When one cloud doesn’t support a feature, Ex: lifecycle policies on S3 buckets, do you need to build it for the other cloud?

Related: Is Amazon too big to fail?

16. Will deploying multicloud encourage abstraction layers Ex: object relational modelers (ORM) which heavily slow down performance?
17. Have you tested performance on both clouds?
18. Is cloud #2 a temporary disaster recovery solution or an on-going load balancing solution via geo-dns?
19. If you go hybrid cloud, how does that impact security, firewalls, and access controls?
20. How do you monitor your object stores (S3), scanning for open buckets? Do you rewrite this code for the other API?

Related: Why i ask for a deposit

21. What are the disaster types you’re planning for?
22. What is the cost of maintaining your application on multiple platforms?
23. What is the cost of building infra for multiple platforms?
24. What is the cost of debugging & troubleshooting on multiple platforms?
25. How does multcloud complicate deployments?

Related: How to hire a developer that doesn’t suck

26. Does multicloud complicate GDPR and other compliance questions?
27. How does multicloud complicate your billing and budget management?
28. What about microservices? How will these multiple platforms play across two clouds?
29. Is the community around each cloud equally active?
30. If you deploy with an infrastructure as code language like Terraform, is there an active community there for both of your chosen clouds?

Related: Why generalists are better at scaling the web

31. Does each provider support customers well? What are their respective reputations?
32. Is each cloud provider equally solvent & invested in the business? Will they be around in a year? five years? ten years?
33. What complications arise when migrating to or from this provider?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

I tried to understand Amazon EKS internals and here’s what happened

via GIPHY

EKS is a service to run kubernetes, so you don’t have to install the software, or manage or patch it. Just like GKS on Google, kubernetes as a service is really the way to go if you want to build kubernetes apps on AWS.

Join 38,000 others and follow Sean Hull on twitter @hullsean.

So where do we get started? AWS docs are still coming together, so it’s not easy. I would start with Jerry Hargrove’s amazing EKS diagram. If a picture is worth a thousand words, this one is work 10,000!

1. Build your EKS cluster

I already did this in Terraform. There aren’t a lot of howtos, so I wrote one.

Basically you setup the service role, the cluster, then the worker nodes. Once you’ve done that you’re ready to run the demo app.

Related: When you have to take the fall

2. Build your app spec

These are very similar to ECS tasks. You’ll need to make slight changes. mountPoints become VolumeMounts, links get removed, and workingDirectory becomes workingDir and so on. Most of these changes are obvious, but the json syntax is obviously the biggest bear you’ll wrestle with.

When done do this:

$ kubectl apply -f my-controller.json

Related: When clients don’t pay

3. Build the service spec

The service is quite a bit different than an ECS service. I suggest starting from the guestbook service. Find it here

Edit that and add your own app name & details. Then apply:

$ kubectl apply -f my-service.json

Related: Why i ask for a deposit

4. Get the endpoint and go!

$ kubectl get service -o wide

You should see the EXTERNAL-IP display a loadbalancer endpoint. Copy that into your browser and you should see your app running.

Related: Why i ask for a deposit

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters