Tag Archives: Amazon

Is maintenance as sexy as innovation?

via GIPHY

A recent NYT piece on our aging american infrastructure got me thinking. It seems that roads, bridges, airports & city sewer systems are all in need of repair. Sadly as budgets to maintain these systems in good repair are often short, they become larger problems to fix as their status becomes critical.

Join 37,000 others and follow Sean Hull on twitter @hullsean.

“Americans have an impoverished and immature conception of technology, one that fetishizes innovation as a kind of art and demeans upkeep as a mere drudgery.”

I’m not sure this is an American-only phenomenon. However I do see it a lot with technology companies & startups.

1. Do we have to manage ops in the cloud?

The cloud has enabled infrastructure automation in some pretty phenomenal ways. Code pipelines can deliver changes to a repo, through automated unit testing, and out to customers all without human intervention. This makes teams more agile, and ultimately businesses faster & more profitable.

We might be distracted enough to stop worrying about operations altogether. After all Amazon knows how to manage broken servers & alert us right? I write do we have to manage operations in the cloud previously, as this sentiment seems to be growing.

Modern applications have a ton of interdependencies. Even with decent integration testing, the full stack is complex, and requires monitoring. Co-tenancy can complicate your performance tuning efforts as neighboring customers may directly affect your application. Third party services may be delivered from smaller or less experienced companies, whose SLA may be limiting besides. And hey if Amazon goes down, I can just tell my customers it was their fault, right?

Also: Is Amazon too big to fail?

2. Do you know Dustin Moskovitz?

Chances are I’m guessing you’ll say no. He was part of the original Facebook team alongside Zuckerberg. You don’t know his name? He had the sexy job of, you guessed it maintenance! He was the operations guy. Did he write the application code? More than likely he knew that code very well as he had to fix & maintain it. Along with the infrastructure to scale & support Facebook’s massive growth.

Read: Is AWS too complex for small dev teams? The growing demand for Cloud SRE

3. Is a little technical debt ok?

Ward Cunningham has an excellent interview about technical debt. Is a little bit ok? Maybe. But each amount is kicking the can down the road. As the NYT article on maintenance makes clear, you can move the responsibility on to the next administration, the next term, or someone else, but eventually you’ll have a critical problem on your hands, which will be much more expensive to fix.

Related: How to build an operational datastore on Amazon Redshift with S3

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What does the fight between palantir & nypd mean for your data?

via GIPHY

In a recent buzzfeed piece, NYPD goes to the mat with Palantir over their data. It seems the NYPD has recently gotten cold feet.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

As they explored options, they found an alternative that might save them a boatload of money. They considered switching to an IBM alternative called Cobalt.

And I mean this is Silicon Valley, what could go wrong?

Related: Will SQL just die already?

Who owns your data?

In the case of Palantir, they claim to be an open system. And of course this is good marketing. Essential in fact to get the contract. Promise that it’s easy to switch. Don’t dig too deep into the technical details there. According to the article, Palantir spokeperson claims:

“Palantir is an open platform. As with all our customers, their data & analysis are available to them at all times in an open & nonproprietary format.”

And that does appear to be true. What appears to be troubling NYPD isn’t that they can’t get the analysis, for that’s available to them in perpetuity. Within the Palantir system. But getting access to how the analysis is done, well now that’s the secret sauce. Palantir of course is not going to let go of that.

And that’s the devil in the details when you want to switch to a competing service.

Also: Top serverless interview questions for hiring aws lambda experts

Who owns the algorithms?

Although the NYPD can get their data into & out of the Palantir system easily, that’s just referring to the raw data. That’s the data they ingested in the first place, arrest records, license plate reads, parking tickets, stuff like that.

“This notion of how portable your data is when you engage in a contract with a platform is really, really complex, and hasn’t really been tested” – Tal Klein

Palantir’s secret sauce, their intellectual property, is finding the needle in the haystack. What pieces of data are relevant & how can I present the detectives the right information at the right time.

Analysis *is* the algorithms. It’s the big data 64 million dollar question. Or in this case $3.5 million per year, as the contract is reported to be worth!

Related: Which engineering roles are in greatest demand?

The nature of software as a service

The web is bringing us great platforms, like google & amazon cloud. It’s bringing a myriad of AI solutions to our fingertips. Palantir is providing a push button solution to those in need of insights like the NYPD.

The Cobalt solution that IBM is offering goes the other way. Build it yourself, manage it, and crucially control it. And that’s the difference.

It remains to be seen how the rush to migrate the universe of computing to Amazon’s own cloud will settle out. Right now their in a growth phase, so it’s all about lowering prices. But at some point their market muscle will mean they can go the Oracle route a la Larry Ellison. That’s why customers start feeling the squeeze.

If the NYPD example is any indication, it could get ugly!

Read: Can on-demand consulting save startups time & money?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Do we have to manage ops in the cloud?

via GIPHY

One of the things that is exciting about the cloud is the reduced need for operations staff. There seem to be two drivers of this trend. One is devops, and all the automation that comes with it. As we formalize configurations, things become repeatable, and fewer people can manage greater armies of servers.

The second is by moving to a cloud hosting provider, we essentially outsource the operations to their team.

1. Pretty abstractions? still hardware buried somewhere

That’s right, beneath all the virtual EC2 instances & VPCs there is physical hardware. Huge datacenters sit in North Virginia, Oregon, Ireland, London and many other cities. Within them there are racks upon racks of servers. The hypervisor layer, the abstraction built on top of that, orchestrates everything.

Although we outsource the management of those datacenters to Amazon, there are still responsibilities we carry. Let’s dig into those more.

Also: Top serverless interview questions to ask an expert

2. Full-stack dev – demand for generalists?

These days we see the demand for a full stack developer. That is someone who does not only front end dev, but also backend. In turn, they are often asked to wear the had of ops. Spinup EC2 instance, decide on the capacity & size, choose proper disk I/O, place it within the right subnet & vpc & then configure the security groups properly.

All of these tasks would previously been managed by a dedicated ops team, but now those responsibilities are being put on developers shoulders. In some cases, such as with microservices, devs also carry the on-call duties of their applications.

Lastly there is likely ops to handle automation. Devops will formalize configurations, into ansible playbooks or chef recipes, so they can be checked into version control. At this point infrastructure can even be unit tested.

Read: Build an operational datastore on aws S3 with Spectrum

3. Design, resiliency, instrumentation, debugging

In previous eras, ops teams would be heavily involved with design of applications & architecture to support that. Now that may be handed to devs, but it still needs to happen.

Furthermore resiliency is said to be the customers responsibility. In the pre-cloud days, hardware was more reliable. It had a slower failure rate. With virtual machines, they’re expected to fail, and all the components to make your applications resilient are given to you. But it’s your job to architect them together.

That means your applications need to be self-healing. Failures need to be detected, taken out of autoscaling groups, and replaced. All automatically. Code or not, that is certainly operations.

Check this: Which engineering roles are in top demand?

4. It’s amazon’s fault we’re down!

I’ve seen quite a few outages in the past year, from Dropbox to Airbnb, and DYN themselves. Ultimately these outages could be tied back to a failure with Amazon. But when your business customers are relying on your service, it is *YOUR* business that answers to it’s own SLA.

In the news we see many of these firms pointing the finger at Amazon, “hey it’s not our fault, our cloud provider went down!”. Ultimately your customers don’t care. They don’t want excuses. If using multiple regions in AWS is not sufficient, you’ll need to build your application to be multi-cloud.

Also: 30 questions to ask a serverless fanboy

5. It’s hard to outsource your expertise

Remember, while you outsource your operations to Amazon, you’re getting very professional management of those systems. However they will optimize for their many customers. Your particular problems are less of a concern.

Read this: What can startups learn from the DYN DNS outage?

6. Only you can thinking holistically about interdependancies

Your application more than likely uses a number of APIs to capture data, perhaps do single sign on or even a third party database like Firebase. It’s your responsibility to do integration testing. All that becomes more complex in the cloud.

Also: How to lock down systems from outgoing employees

7. How do services complicate things?

SaaS solutions are everywhere now. auth0, firebase and an infinite variety of third party apis complicate reliability, security, storage, performance, integration testing & debugging?

Security is a traditional responsibility held by the operations hat. Much of that becomes more complex in the cloud. With serverless applications for example you may use a few APIs, plus an authentication broker, and a backend database. As this list of services grows, the code you write may decrease. But testing & securing it all becomes much more complex.

With more services like this, the attack vector or surface area becomes greater. Each of those services, can and will have bugs. What if a zero day is found in the authentication broker, allowing a hacker to break into a broad cross section of applications across the internet? How do you discover this? What if your vendor hasn’t found out yet?

Read: Is Amazon cloud too complex for small dev teams?

8. How does co-tenancy impact performance tuning?

Back to point #1 above, all these virtual servers sit on real physical servers. That affects customers in two ways. One you may be sharing the same host. That is if you use a very small vm, it may sit along side another customer with a small vm. If those eat up CPU cycles or network on that box, neighbors or co-tenants will suffer.

There are many other instance types where you get your own dedicated hardware. With those you have your own nic as well, so no competition. Except wait there’s network storage! That’s right all the machines in the AWS environment use EBS now, which is all co-tenant. So your data is sitting alongside other customers, and you are all fighting for usage of the same disk read heads.

One way to mitigate this is to configure specific provisioned IOPS for your servers. But that costs more. It’s normally reserved for database instances where disk I/O is really crucial.

Granted the NewRelics of the world will certainly help us with this process. But they’re not giving us a hypervisor or global view of those servers, network or storage. So we can’t see how the overall systems performance may be impacted.

Related: Is AWS a patient that needs constant medication?

9. Operations can be invisible

When security is done well, you don’t have breakins, you don’t have data stolen, everything just runs smoothly. Operations is like that too. When it is done well it can be invisible.

It can also be invisible in a different way. When you deploy your application on serverless, all the servers & autoscaling is completely abstracted away. So when you get some weird outage because the farm of servers is offline, or because you hit some account limit in the number of functions you can run at once, then it quickly comes into focus.

Beware of invisible operations, because it’s harder to see what to monitor, and know how to stay ahead of outages.

Read: Is amazon too big to fail?

10. We can’t oursource true ownership

At the end of the day you can’t outsource ownership of your application or your business. The holistic view of your application in totality can only be understood by your engineers.

And that in the end is what operations is all about, no matter who’s wearing the hat!

Also: 5 reasons to move data to amazon redshift

Get my monthly newsletter for more thoughts on data, startups & innovation. Scalability. Automation. Amazon cloud.

Is Amazon about to disrupt your data warehouse?

via GIPHY

Amazon is about to launch a product called glue. As you can see below, this is the last piece in the data warehousing puzzle. With that in place, Amazon will own you! Or at least have push button products to meet all of enterprises varying needs.

Even if you’re a small startup, you can do big-shot big enterprise data warehousing. That means everyone can use cutting edge data driven techniques for product & business decisions.

Join 33,000 others and follow Sean Hull on twitter @hullsean.

What is Redshift

Redshift is like the OLAP databases of years past, the Oracle’s of the world purpose built for warehousing data. Obviously without the crazy licensing model Oracle was famous for. With Amazon you can get enterprise class data warehouse for modest hourly prices.

If my recent conversations with recruiters about Redshift demand are any indication, there’s been a sudden uptick in startups looking for redshift expertise.

Also: Top serverless interview questions for hiring aws lambda experts

What is Spectrum?

Spectrum is a very new extension of Redshift allowing you to access & query S3 file data directly. This means you can have petabytes of data that you can access pre-load time. So you will ETL and load portions of it, but with Spectrum you can still access the offline data too.

In the old Oracle days this was called an EXTERNAL TABLE. I mention this only to say that Amazon isn’t doing anything that hasn’t been done before. Rather they’re bringing these advanced features within reach of everyday startups. That’s cool.

Related: Which engineering roles are in greatest demand?

What is glue?

Glue is still in beta, but if the RE:Invent talk above is any indication, it’s set to disrupt an entire industry. Wow!

Glue first catalogs your data sources. What does this mean, it scans them & models their schemas.

It then generates sample python ETL code. Modify it, or write your own. Share your code on Git. Or borrow other open source pieces, that already address your specific ETL use case!

Lastly it includes a job scheduler which handles dependencies. Job A must be completed before B can run and so forth. Error handling & logging are also all included.

Since these are native Amazon services, of course they’re going to integrate with their dangerously fast Redshift warehouse.

Read: Can on-demand consulting save startups time & money?

What is serverless?

I’ve written about how to throw fastballs at a serverless fanboy and even how to hire a serverless expert. But really what is it?

Serverless means deploying functions directly into the cloud. No servers, no configuration. All the systems administration & automation is hidden. No more devops to argue with! Amazon’s own offering is called Lambda.

Also: 30 questions to ask a serverless fanboy

What is Quicksight?

Amazon’s even jumped into the fray at the presentation layer. Quicksight is a BI tool along the lines of mode, domo, looker or Tableau.

Now it’s possible to stay completely within the cozy Amazon ecosystem even for business insight and analytics.

Also: What can startups learn from the DYN DNS outage?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Top questions to ask a devops expert when hiring or preparing for job & interview

xkcd_goodcode
Strip by Randall Munroe; xkcd.com

Whether your a hiring manager, head of HR or recruiter, you are probably looking for a devops expert. These days good ones are not easy to find. The spectrum of tools & technologies is broad. To manage today’s cloud you need a generalist.

Join 33,000 others and follow Sean Hull on twitter @hullsean.

If you’re a devops expert and looking for a job, these are also some essential questions you should have in your pocket. Be able to elaborate on these high level concepts as they’re crucial in todays agile startups.

Check out: 8 questions to ask an aws ec2 expert

Also new: Top questions to ask on a devops expert interview

And: How to hire a developer that doesn’t suck

1. How do you automate deployments?

A. Get your code in version control (git)

Believe it or not there are small 1 person teams that haven’t done this. But even with those, there’s real benefit. Get on it!

B. Evolve to one script push-button deploy (script)

If deploying new code involves a lot of manual steps, move file here, set config there, set variable, setup S3 bucket, etc, then start scripting. That midnight deploy process should be one master script which includes all the logic.

It’s a process to get there, but keep the goal in sight.

C. Build confidence over many iterations (team process & agile)

As you continue to deploy manually with a master script, you’ll iron out more details, contingencies, and problems. Over time You’ll gain confidence that the script does the job.

D. Employ continuous integration Tools to formalize process (CircleCI, Jenkins)

Now that you’ve formalized your deploy in code, putting these CI tools to use becomes easier. Because they’re custom built for you at this stage!

E. 10 deploys per day (long term goal)

Your longer term goal is 10 deploys a day. After you’ve automated tests, team confidence will grow around developers being able to deploy to production. On smaller teams of 1-5 people this may still be only 10 deploys per week, but still a useful benchmark.

Also: Top serverless interview questions for hiring aws lambda experts

2. What is microservices?

Microservices is about two-pizza teams. Small enough that there’s little beaurocracy. Able to be agile, focus on one business function. Iterate quickly without logjams with other business teams & functions.

Microservices interact with each other through APIs, deploy their own components, and use their own isolated data stores.

Function as a service, Amazon Lambda, or serverless computing enables microservices in a huge way.

Related: Which engineering roles are in greatest demand?

3. What is serverless computing?

Serverless computing is a model where servers & infrastructure do not need to be formalized. Only the code is deployed, and the platform, AWS Lambda for example, takes care of instant provisioning of containers & VMs when the code gets called.

Events within the cloud environment, such a file added to S3 bucket, trigger the serverless functions. API Gateway endpoints can also trigger the functions to run.

Authentication services are used for user login & identity management such as Auth0 or Amazon Cognito. The backend data store could be Dynamodb or Google’s Firebase for example.

Read: Can on-demand consulting save startups time & money?

4. What is containerization?

Containers are like faster deploying VMs. They have all the advantages of an image or snapshot of a server. Why is this useful? Because you can containerize your microservices, so each one does one thing. One has a webserver, with specific version of xyz.

Containers can also help with legacy applications, as you isolate older versions & dependencies that those applications still rely on.

Containers enable developers to setup environments quickly, and be more agile.

Also: 30 questions to ask a serverless fanboy

5. What is CloudFormation?

CloudFormation, formalizes all of your cloud infrastructure into json files. Want to add an IAM user, S3 bucket, rds database, or EC2 server? Want to configure a VPC, subnet or access control list? All these things can be formalized into cloudformation files.

Once you’ve started down this road, you can checkin your infrastructure definitions into version control, and manage them just like you manage all your other code. Want to do unit tests? Have at it. Now you can test & deploy with more confidence.

Terraform is an extension of CloudFormation with even more power built in.

Also: What can startups learn from the DYN DNS outage?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Some irresistible reading for March – outages, code, databases, legacy & hiring

via GIPHY

I decided this week to write a different type of blog post. Because some of my favorite newsletters are lists of articles on topics of the day.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Here’s what I’m reading right now.

1. On Outages

While everyone is scrambling to figure out why part of the internet went down … wait is S3 is part of the internet, really? While I’m figuring out if it is a service of Amazon, or if Amazon is so big that Amazon *is* the internet now…

Let’s look at s3 architectural flaws in depth.

Meanwhile Gitlab had an outage too in which they *gasp* lost data. Seriously? An outage is one thing, losing data though. Hmmm…

And this article is brilliant on so many levels. No least because Matthew knows that “post truth” is a trending topic now, and uses it his title. So here we go, AWS Service status truth in a post truth world. Wow!

And meanwhile the Atlantic tries to track down where exactly are those Amazon datacenters?

Also: Is Amazon too big to fail?

2. On Code

Project wise I’m fiddling around with a few fun things.

Take a look at Guy Geerling’s Ansible on a Mac playbooks. Nice!

And meanwhile a very nice deep dive on Amazon Lambda serverless best practices.

Brandur Leach explains how to build awesome APIs aka ones that are robust & idempotent

Meanwhile Frans Rosen explains how to 0wn slack. And no you don’t want this. 🙂

Related: 5 surprising features in Amazon’s serverless Lambda offering

3. On Hiring & Talent

Are you a rock star dev or a digital nomad? Take a look at the 12 best international cities to live in for software devs.

And if you’re wondering who’s hiring? Well just about everyone!

Devs are you blogging? You should be.

Looking to learn or teach… check out codementor.

Also: why did dev & ops used to be separate job roles?

4. On Legacy Systems

I loved Drew Bell’s story of stumbling into home ownership, attempting to fix a doorbell, and falling down a familiar rabbit hole. With parallels to legacy software systems… aka any older then oh say five years?

Ian Bogost ruminates why nothing works anymore… and I don’t think an hour goes by where I don’t ask myself the same question!

Also: Are we fast approaching cloud-mageddon?

5. On Databases

If you grew up on the virtual world of the cloud, you may have never touched hardware besides your own laptop. Developing in this world may completely remove us from understanding those pesky underlying physical layers. Yes indeed folks containers do run in “virtual” machines, but those themselves are running on metal, somewhere down the stack.

With that let’s not forget that No, databases are not for containers… but a healthy reminder ain’t bad..

Meanwhile Larry’s mothership is sinking…(hint: Oracle) Does anybody really care? Now’s the time to revisit Mike Wilson’s classic The difference between god and Larry Ellison.

Read: Are SQL Databases Dead?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Does AWS have a dirty little secret?

tell a secret

I was recently talking with a colleague of mine about where AWS is today. Obviously there companies are migrating to EC2 & the cloud rapidly. The growth rates are staggering.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

The question was…

“What’s good and bad with Amazon today?”

It’s an interesting question. I think there some dirty little secrets here, but also some very surprising bright spots. This is my take.

1. VPC is not well understood  (FAIL)

This is the biggest one in my mind.  Amazon’s security model is all new to traditional ops folks.  Many customers I see deploy in “classic EC2”.  Other’s deploy haphazerdly in their own VPC, without a clear plan.

The best practices is to have one or more VPCs, with private & public subnet.  Put databases in private, webservers in public.  Then create a jump box in the public subnet, and funnel all ssh connections through there, allow any source IP, use users for authentication & auditing (only on this box), then use google-authenticator for 2factor at the command line.  It also provides an easy way to decommission accounts, and lock out users who leave the company.

However most customers have done little of this, or a mixture but not all of it.  So GETTING TO BEST PRACTICES around vpc, would mean deploying a vpc as described, then moving each and every one of your boxes & services over there.  Imagine the risk to production services.  Imagine the chances of error, even if you’re using Chef or your own standardized AMIs.

Also: Are we fast approaching cloud-mageddon?

2. Feature fatigue (FAIL)

Another problem is a sort of “paradox of choice”.  That is that Amazon is releasing so many new offerings so quickly, few engineers know it all.  So you find a lot of shops implementing things wrong because they didn’t understand a feature.  In other words AWS already solved the problem.

OpenRoad comes to mind.  They’ve got media files on the filesystem, when S3 is plainly Amazon’s purpose-built service for this.  

Is AWS too complex for small dev teams & startups?

Related: Does Amazon eat it’s own dogfood? Apparently yes!

3. Required redundancy & automation  (FAIL)

The model here is what Netflix has done with ChaosMonkey.  They literally knock machines offline to test their setup.  The problem is detected, and new hardware brought online automatically.  Deploying across AZs is another example.  As Amazon says, we give you the tools, it’s up to you to implement the resiliency.

But few firms do this.  They’re deployed on Amazon as if it’s a traditional hosting platform.  So they’re at risk in various ways.  Of Amazon outages.  Of hardware problems under the VMs.  Of EBS network issues, of localized outages, etc.

Read: Is Amazon too big to fail?

4. Lambda  (WIN)

I went to the serverless conference a week ago.  It was exiting to see what is happening.  It is truely the *bleeding edge* of cloud.  IBM & Azure & Google all have a serverless offering now.  

The potential here is huge.  Eliminating *ALL* of the server management headaches, from packages to config management & scaling, hiding all of that could have a huge upside.  What’s more it takes the on-demand model even further.  YOu have no compute running idle until you hit an endpoint.  Cost savings could be huge.  Wonder if it has the potential to cannibalize Amazon’s own EC2 …  we’ll see.

Charity Majors wrote a very good critical piece – WTF is Operations? #serverless
WTF is operations? #serverless

Patrick Dubois 

Also: Is the difference between dev & ops a four-letter word?

5. Redshift  (WIN)

Seems like *everybody* is deploying a data warehouse on Redshift these days.  It’s no wonder, because they already have their transactional database, their web backend on RDS of some kind.  So it makes sense that Amazon would build an offering for reporting.

I’ve heard customers rave about reports that took 10 hours on MySQL run in under a minute on Redshift.  It’s not surprising because MySQL wasn’t built for the size servers it’s being deployed on today.  So it doesn’t make good use of all that memory.  Even with SSD drives, query plans can execute badly.

Also: Is there a better way to build a warehouse in 2016?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is there a new better way to build a data warehouse in 2016?

redshift warehouse

In the old days… the bygone days of 2005 🙂 That was when you’d pony up for an Oracle license, get the hardware, and build your warehouse. Somewhere along the way you crossed your fingers.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Today everybody wants to treat data as a product. And for good reason. Knowing how to better server your customers & iterate more quickly is essential in todays hypercompetitive startup world.

1. Amazon Redshift enters the fray

Recently I’ve been wondering why is everyone suddenly talking about Amazon Redshift?? I ask not because recruiters are experts at database technology & predicting the industry trends, but rather because they have their finger on the pulse of what firms are doing.

Amazon launched Redshift in early 2013 using ParAccel technology. Adoption has been quick. Customers who already have their data in the AWS ecosystem find the offering a perfect match for their data analytics needs. And with stories swirling around of 10 hour MySQL reports running in under 60 seconds on Redshift, it’s no wonder.

Also: Is AWS too complex for small dev teams?

2. Old method – select carefully

Ralph Kimball’s opus having fully digested, you set out to meet with stakeholders, and figure out what you were building.

Of course no one understood your questions, and business units & engineering teams spoke english & french. Months went by, and things devolved. Morale got squashed. Eventually out the other end something would be built, nobody would be happy, and eyeballs would roll over the dollars spent.

This model was known in the data warehousing world by the wonderful acronym ETL which is short for extract, transform & load. The transform part happens before you load it. So that your warehouse is a shining, trimmed & manicured copy of your data, ready for reporting.

Also: Is Amazon too big to fail?

3. Today – mirror everything & then build views

Today you’re more likely to see the ELT model employed. That is Extract, Load & Transform. A subtle change, with big differences. When you load first, you mirror all of your transactional data into your warehouse, then build views or new summary tables to fit your ongoing needs.

Customers are using tools like Looker & Tableau to layer on top of these ELT warehouses which are also have some intelligence around the transform piece. This makes the process more self serve for business units, and requires less back & forth between engineering & product teams. No more waiting a few days for a report to be built, because these non-technical teams can build for themselves.

Also: When hosting data on Amazon turns bloodsport?

Is Data your dirty little secret?

4. Pipeline services

So you’re going down the ELT path, but how do get your data into Redshift? I wrote Five ways to get data into Redshift to answer that question.

There are a number of service based offerings from the point & click Fivetran to the more full featured Alooma. And then RJ Metrics & Flydata also fit the bill. You may also want to build your own with xplenty that also has a lot of ELT ETL logic you can build without code. Pretty spiffy.

Read: Is aws a patient that needs constant medication?

5. Reporting databases

We’ll be covering a lot lot more in this space, so check back.

Related: Does Amazon eat it’s own dogfood?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is AWS too complex for small dev teams & startups?

via GIPHY

I was discussing a server outage with a colleague recently. AWS had done some confusing things, and the team was rallying to troublehsoot & fix.

He made an offhand comment that caught my attention…


AWS is too complex for small dev teams. I’d recommend we host in a traditional datacenter.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

It’s an interesting point. For all the fanfare over Amazon, lost in the shuffle is the staggering complexity that we’re taking on. For small firms, this is a cost that’s often forgotten when we smell the on-demand cool-aid that is EC2.

Here are my thoughts…

1. Over 70 services offered

Everytime I login to the AWS console there’s a new service offering. Lambda & serverless computing. CodeDeploy, Redshift, EMR, VPC’s, developer tools, IOT, the list goes on. If you haven’t enabled MFA on your IAM accounts you’re not alone!

Also: Is Amazon too big to fail?

2. Still complex to build high availability

The song I hear out of Amazon is, we offer all the components for a high availability infrastructure. multiple availability zones, regions, load balancers, autoscaling, geo & latency dns routing. What’s more companies like Netflix have open sourced tools to help.

But at a lot of startups that I see, all these components are not in use, nor are they well understood. Many admins are still using Amazon like an old-school datacenter. And that’s not good.

Sometimes it seems that AWS is a patient in need of constant medication.

Related: Are we fast approaching cloud-mageddon?

3. Need a dedicated devops

As AWS becomes more complex, and the offering more robust, so too the need for dedicated ops. If you’re devs are already out of bandwidth, but you don’t quite have so much need for a fulltime resource a consultant may be an option. Round out the team & keep costs manageable.

If you’re looking for an aws solutions architect, we can help!

Check out: Does Amazon eat it’s own dogfood?

4. Orchestration involves many moving parts

Infrastructure as code offers the promise of completely versioning all your servers, configurations and changes. From there we can apply test driven development & bring a more professional level of service to our business. That’s the theory anyway.

In practice it brings an incredible number of new toolsets to master and a more complex stack besides. All those components can have bugs, need troubleshooting. This sometimes just kicks the can down the road, moving the complexity elsewhere.

It’s not clear that for smaller shops, all this complexity is manageable.

Also: 5 things toxic to scalability

5. Troubleshooting failed deployments

I was looking at a problem with a broken deploy recently. Turns out a developer had copy & pasted some code solution off the internet, possibly from a tutorial, and broke deployments to staging.

Yes perhaps this was avoidable, and more checks & balances can fix. But my thought is continuous integration & continuous deployments are not a panacea. More complexity brings a more complex web to unweave.

I sometimes wonder if we aren’t fast approaching cloud-mageddon?

Read: Why Airbnb didn’t have to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is demand for aws skills skyrocketing?

aws solutions architect trend

If Google trends is any indication, we’re heading for a serious skills shortage around AWS. If you’re a devops, sysop or systems administrator… don’t walk, run in this direction!

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve pivoted a few times in my career, and knowing which way the wind blows is how I keep up with change. And right now it seems to be blowing into the cloud!

1. AWS datacenter growth is staggering

Also: Is Amazon too big to fail?

2. What I hear from recruiters

I’ve been hearing from more & more recruiters recently. And all they can talk about is redshift & AWS cloud solutions architects.

I think recruiters sit in a unique position & have the pulse of the market like nobody else does.

Related: 8 questions to ask an aws expert

3. Certification bandwagon

AWS is pushing hard to help sysops level up their skills. This can only help push adoption, but it’s also ideal for those who are ready to learn more about the cloud.

Read: When hosting data on Amazon turns bloodsport

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters