All posts by Sean Hull

What does the failure of Flatiron School mean for coding bootcamps?

via GIPHY

If you missed the news, New York’s AG announced a settlement with Flatiron School over operating without a license and false advertising.

The news of the coding bootcamp failure splashed all over hacker news a couple of days ago.

Join 37,000 others and follow Sean Hull on twitter @hullsean.

With the explosion of coding bootcamps in recent years, it speaks volumes, as demand for coding & software skills continues to outpace supply. What’s more the starting salaries aren’t bad either.

But how will this affect coding bootcamps going forward?

Would you like a helping of beaurocracy?

Part of the ruling was regarding licensed teachers…

“In order to obtain a SED license, a non-degree granting career school must meet a number of criteria, including using an approved curriculum and employing a licensed director and teachers.”

One thing that sets coding bootcamps apart is that they train their own teachers. And they also use their own curricula. And while protecting consumers is certainly a worthwhile goal, the ruling means bootcamps will have to navigate government bureaucracy for approval. Some have pointed out that the process can be slow & full of red tape. Which is sort of counter to the whole agile startup private industry philosophy. We’ll see!

Also: Is Amazon too big to fail?

$75k after 3 months?

One of the claims their marketing made was that many students were making $75k after a few months of study. The ruling underscored this as particularly misleading. more here

As anyone who has studied computer science knows, there’s a lot of foundational concepts in logic, mathematics & problem solving, which you don’t develop overnight. Hopefully this ruling with hammer home the idea, that it takes a little bit more time folks!

Read: Is AWS too complex for small dev teams? The growing demand for Cloud SRE

Does it please the crown?

One of the comments on hacker news asks “Does it please the crown?”. By slapping these guys on the wrist, the barrier to entry will be higher. Going forward, they will have to pass more hurdles & government beaurocracy.

One of the things that sets coding schools apart is that they can train their own teachers & build their own syllabus. We’ll see if these new hurdles slow things down or not.

Related: How to build an operational datastore on Amazon Redshift with S3

Billion dollar Google bet

Almost as if to provide a counterpoint, timely knews comes out of the google camp. They have commited to $1 billion in grants to train workers.

It seems demand for coding skills will be strong for the foreseeable future.

Related: How to build an operational datastore on Amazon Redshift with S3

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why would I help a customer that’s not paying? The reason might surprise you

via GIPHY

I just received an email. It was from a woman building a website, and wanted help with AWS. She wondered if I might be able to provide any assistance.

Join 37,000 others and follow Sean Hull on twitter @hullsean.

Having a popular publicly facing blog, I get a lot of leads that seem to come out of thin air. This is the good problem of publicity. 🙂

I followed up with her and asked what she was building. “Nothing” she explained, I just want to learn about AWS. I was a little confused at first, but as we talked further, it seemed she was just beginning to branch out onto the wild world of the internet, and didn’t know where to start.

I explained that to build an e-commerce site, she could use a service like Shopify, and would likely not need to use AWS directly, and certainly wouldn’t have to learn it. That might take five to ten years learning computing first!

I realized I was telling her she didn’t need the services of someone like me, and further giving her half of a solution. Though I couldn’t help her build a product, the information could surely help her sell it.

Then I thought to myself, why would I do that? Why give away your time & advice for free?

1. Find time to followup

LESSON: a quick call is always worthwhile networking

Yep it’s true, I’ve learned over the years it’s always worth your time for a quick call. I even talk to recruiters on occasion though I don’t work with them.

You’d be surprised how often you learn from someone, especially when they don’t work in your domain. You learn from the way they frame questions, how others might view or search for you. You learn how better to explain & sell your services to future customers too.

Also: When clients don’t pay

2. Be helpful

LESSON: Provide some real help or value

In a call like this one, it costs me very little to “drop some knowledge” as the cool kids like to say. 🙂 Sure my time is worth something, and yes I’m giving something away for free. But in this case it was someone who currently doesn’t have the budget for my services so isn’t my target audience anyway.

Read: When you have to take the fall

3. Pay forward

LESSON: Always be networking

Be patient. As Keith Ferrazzi likes to say “Never Eat Alone”! I’ve taken hundreds of calls like this one over the years, and some later get funded & call me back. They’re eager to put me to work, already sold on my integrity & personality.

What’s more she may run in different circles than I do, bump into a colleague or recommend me at some point. If your openness really stands out, it’ll leave a memorable impression long into the future.

In a place like New York where we’re often singularly focused on profit & personal gain, it’s easy to stand out by a small act of kindness.

Related: A look at the serverless hype cycle

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

10 things you definitely didn’t know about bitcoin

via GIPHY

The news is ablaze with stories of bitcoin. Everyday I see something new, whether it’s wild volatility, to some fraudulent transaction or other.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

I’ve been digging into them myself of late, and learned a few surprising facts about them. Read on!

1. It’s disrupting banking itself

That’s right, it’s bitcoin is not just an experiment anymore. Cyptocurrencies (of which Bitcoin is just one example), are ruffling the feathers of the IMF. Managing Director Christine Lagarde predicts the end of banking. Now that’s pretty big.

Meanwhile Chris Skinner outlines the major use cases for blockchains. These are the scaffolding of digital currencies, the financial plumbing if you will. Among those uses include smart contracts & assets, digital identity, clearing & settlement & of course payments.

Also: Is Amazon too big to fail?

2. There are many digital currencies

Although Bitcoin was the first, and the one you hear about in the news a lot, there are many more. Etherium is one. Take a look at this colorful map of all the current cryptocurrencies out there. There are hundreds!

Related: Was Fred Wilson wrong about Apple?

3. Banks are already using it

UBS-led 6 bank clearing house initiative
http://fortune.com/2017/08/31/banks-ubs-blockchain-settlements/

Read: Why the android ecosystem is broken

4. You can mine them & spend them

Coins can be mined like gold in the real world, and of course spent using a digital wallet.

A. Mining: This requires either a physical device or a contract with a provider such as Genesis Mining. The more coins in the world that are mined, the harder it becomes to discover more. My early results show that, if the price remains stable, my contract will become profitable in about 18 months. In other words, I will have ‘made my money’ back on the original upfront price to get started. From then on it’s profit, but not stellar… about 20% per year. If the value of the coin itself goes up, it becomes more profitable, if the value of coin goes down, well then mining pauses, or ceases entirely.

B. Wallets: You get a website and a phone app, that allows you to pay/receive person-to-person. One thing I noticed is that there is a LOT of security, because they must be a target of a lot of hacking. My online bitcoin wallet, CoinBase provides not only 2-factor authentication (text to phone) but after your account reaches $1000 USD value you unlock a ‘vault’ in which you can store coins. You don’t earn any interest, instead you get extra security: you must verify your withdrawl/transfer request from TWO different email addresses.

Read: 30 questions to ask a serverless fanboy

5. You can build your own

With Build a Coin project you can build your own currency. This isn’t building an app for Etherium or an API to some existing currency. This project helps you build the code to support your own virtual digital fiat currency!

Read: Professional services and the art of resistance

We’ll be back soon with 5 more. Don’t worry!

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is maintenance as sexy as innovation?

via GIPHY

A recent NYT piece on our aging american infrastructure got me thinking. It seems that roads, bridges, airports & city sewer systems are all in need of repair. Sadly as budgets to maintain these systems in good repair are often short, they become larger problems to fix as their status becomes critical.

Join 37,000 others and follow Sean Hull on twitter @hullsean.

“Americans have an impoverished and immature conception of technology, one that fetishizes innovation as a kind of art and demeans upkeep as a mere drudgery.”

I’m not sure this is an American-only phenomenon. However I do see it a lot with technology companies & startups.

1. Do we have to manage ops in the cloud?

The cloud has enabled infrastructure automation in some pretty phenomenal ways. Code pipelines can deliver changes to a repo, through automated unit testing, and out to customers all without human intervention. This makes teams more agile, and ultimately businesses faster & more profitable.

We might be distracted enough to stop worrying about operations altogether. After all Amazon knows how to manage broken servers & alert us right? I write do we have to manage operations in the cloud previously, as this sentiment seems to be growing.

Modern applications have a ton of interdependencies. Even with decent integration testing, the full stack is complex, and requires monitoring. Co-tenancy can complicate your performance tuning efforts as neighboring customers may directly affect your application. Third party services may be delivered from smaller or less experienced companies, whose SLA may be limiting besides. And hey if Amazon goes down, I can just tell my customers it was their fault, right?

Also: Is Amazon too big to fail?

2. Do you know Dustin Moskovitz?

Chances are I’m guessing you’ll say no. He was part of the original Facebook team alongside Zuckerberg. You don’t know his name? He had the sexy job of, you guessed it maintenance! He was the operations guy. Did he write the application code? More than likely he knew that code very well as he had to fix & maintain it. Along with the infrastructure to scale & support Facebook’s massive growth.

Read: Is AWS too complex for small dev teams? The growing demand for Cloud SRE

3. Is a little technical debt ok?

Ward Cunningham has an excellent interview about technical debt. Is a little bit ok? Maybe. But each amount is kicking the can down the road. As the NYT article on maintenance makes clear, you can move the responsibility on to the next administration, the next term, or someone else, but eventually you’ll have a critical problem on your hands, which will be much more expensive to fix.

Related: How to build an operational datastore on Amazon Redshift with S3

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Do I need a broker to buy real estate in Brooklyn?

via GIPHY

I’ve been toying with the idea of writing about Brooklyn real estate for a while. This week I decided to give it a whirl.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

Back in 2011 I had a crazy idea. I wondered if it was possible to sell the loft I owned in Manhattan, and trade up that equity for a townhouse in Brooklyn?

As it turns out selling the coop ended up being its own complicated ordeal. Think of it more like getting a divorce, with all the attendant challenges of ending a long love affair! I may write a longer post about that soon enough, but for today let’s talk about the buying part.

Also: Why airbnb didn’t have to fail

Do I need a buyer’s broker?

In NYC it seems every real estate transaction is still encumbered by brokers at either end. One for the buyer & one for the seller. The seller will pay a 6% transaction fee & that’ll be divided 3% for each of the teams. But that 3% is further divided, 1.5% to the broker themself, and 1.5% to their firm.

Also: Why I use airbnb chat even when texting is easier

It’s important to understand who is incentivized & by what. This helps you see motivations, and better discern the “truth” when you’re looking for an answer. Back to the question, do you need one? On the sell side I would say more than likely yes. Unless you are a master seller & even then it’s hard to get by without one. I tried it using RealDirect.com. Although this put my listing in all the right places (NYT & streeteasy) which brought me plenty of traffic, I still didn’t get many strong offers. Most buyers come with a broker, and they’re likely to steer away from you for fear they won’t get their fee. After all, you’re trying to play outside the system to begin with. Contract or not, it’s very tough. So on sell side I would say yes you’ll need a broker.

Also: My DISQUS hack for blog discovery

On buy side its different. If you don’t bring a broker, the seller will pay 5% to the single selling broker, so this is great for them. But what about for you? Again understanding incentives really helps. For me this made the time spent using Real Direct a learning experience. I saw the ins & outs of real estate.

Also: Is Amazon too big to fail?

I’m going to call it quits for today. But I have a lot more to write about. Future pieces will include working with NYC agencies, like HPD & DOB, avoiding violations, understanding all the players in a large transaction, from underwriters, bank lawyer, seller & buyer lawyer, mortgage agent, title company & insurance agent.

Also: Is AWS too complex for small startups?

I’ll also try to share what I’ve learned about the physical moving parts. The engineering… gas vs oil boilers, roofs, brick vs wood frame structures, cosmetic vs structural renovations, appliances, faucets, fixtures & cabinets, bath & kitchen tiling, decks & outdoor structures, backyards, plants & vegetation, pests & animals.

Until next time!

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

How to interview an amazon database expert

via GIPHY

Amazon releases a new database offering every other day. It sure isn’t easy to keep up.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

Let’s say you’re hiring a devops & you want to suss out their database knowledge? Or you’re hiring a professional services firm or freelance consultant. Whatever the case you’ll need to sift through for the best people. Here’s how.

Also: How to interview an AWS expert

What database does Amazon support for caching?

Caching is a popular way to speed up access to your backend database. Put Amazon’s elasticache behind your webserver, and you can reduce load on your database by 90%. Nice!

The two types that amazon supports are Memcache & Redis. Memcache is historically more popular. These days Redis seems a clear winner. It’s faster, and can maintain your cached data between restarts. That will save you I promise!

Also: Is AWS too complex for small dev teams?

How can I store big data in AWS?

Amazon’s data warehouse offering is called Redshift. I wrote Why is everyone suddenly talking about Redshift?. Why indeed!

When you’re doing large reports for your business intelligence team, you don’t want to bog down your backend relational database. Redshift is purpose built for this use case.

I’ve see a report that took over 8 hours in MySQL return in under 60 seconds in Redshift!

A new offering is Amazon Spectrum. This tech is super cool. Load up all your data into S3, in standard CSV format. Then without even loading it into Redshift, you can query the S3 data directly. This is super useful. Firstly because S3 is 1/10th the price. But also because it allows you to stage your data before loading into Redshift itself. Goodbye Google Big Query! I talked about spectrum here.

Related: Which engineering roles are in greatest demand?

What relational database options are there on Amazon?

Amazon supports a number of options through it’s Relational Database Service or RDS. This is managed databases, which means less work on your DBAs shoulders. It also may make upgrades slower and harder with more downtime, but you get what you pay for.

There are a lot of platforms available. As you might guess MySQL & Postgres are there. Great! Even better you can use MariaDB if that’s your favorite. You can also go with Aurora which is Amazon’s own home-brew drop in replacement for MySQL that promises greater durability and some speedups.

If you’re a glutton for punishment, you can even get Oracle & SQL Server working on RDS. Very nice!

Read: Can on-demand consulting save startups time & money?

Does AWS have a NoSQL database solution?

If NoSQL is to your taste, Amazon has DynamoDB. According to . I haven’t seen a lot of large production applications using it, but what he describes makes a lot of sense. The way Amazon scales nodes & data I/O is bound to run into real performance problems.

That said it can be a great way to get you up and running quickly.

Read: Can on-demand consulting save startups time & money?

How do I do ETL & migrate data to AWS?

Let’s be honest, Amazon wants to make this really easy. The quicker & simpler it is to get your data there, that more you’ll buy!

Amazon’s Database Migration Service or DMS allows you to configure your old database as a data source, then choose a Amazon db solution as destination, then just turn on the spigot and pump your data in!

ETL is extract transform and load, data warehouse terminology for slicing and dicing data before you load it into your warehouse. Many of todays warehouses are being built with the data lake model, because databases like Redshift have gotten so damn fast. That model means you stage all your source data as-is in your warehouse, then build views & summary tables as needed to speed up queries & reports. Even better you might look a tool like xplenty.

Amazon’s new offering is called Glue. Five ways to get data into Amazon Redshift. This solution is purpose build for creating a powerful data pipeline, complete with python code to do transformations.

Read: Is data your dirty little secret?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What does the fight between palantir & nypd mean for your data?

via GIPHY

In a recent buzzfeed piece, NYPD goes to the mat with Palantir over their data. It seems the NYPD has recently gotten cold feet.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

As they explored options, they found an alternative that might save them a boatload of money. They considered switching to an IBM alternative called Cobalt.

And I mean this is Silicon Valley, what could go wrong?

Related: Will SQL just die already?

Who owns your data?

In the case of Palantir, they claim to be an open system. And of course this is good marketing. Essential in fact to get the contract. Promise that it’s easy to switch. Don’t dig too deep into the technical details there. According to the article, Palantir spokeperson claims:

“Palantir is an open platform. As with all our customers, their data & analysis are available to them at all times in an open & nonproprietary format.”

And that does appear to be true. What appears to be troubling NYPD isn’t that they can’t get the analysis, for that’s available to them in perpetuity. Within the Palantir system. But getting access to how the analysis is done, well now that’s the secret sauce. Palantir of course is not going to let go of that.

And that’s the devil in the details when you want to switch to a competing service.

Also: Top serverless interview questions for hiring aws lambda experts

Who owns the algorithms?

Although the NYPD can get their data into & out of the Palantir system easily, that’s just referring to the raw data. That’s the data they ingested in the first place, arrest records, license plate reads, parking tickets, stuff like that.

“This notion of how portable your data is when you engage in a contract with a platform is really, really complex, and hasn’t really been tested” – Tal Klein

Palantir’s secret sauce, their intellectual property, is finding the needle in the haystack. What pieces of data are relevant & how can I present the detectives the right information at the right time.

Analysis *is* the algorithms. It’s the big data 64 million dollar question. Or in this case $3.5 million per year, as the contract is reported to be worth!

Related: Which engineering roles are in greatest demand?

The nature of software as a service

The web is bringing us great platforms, like google & amazon cloud. It’s bringing a myriad of AI solutions to our fingertips. Palantir is providing a push button solution to those in need of insights like the NYPD.

The Cobalt solution that IBM is offering goes the other way. Build it yourself, manage it, and crucially control it. And that’s the difference.

It remains to be seen how the rush to migrate the universe of computing to Amazon’s own cloud will settle out. Right now their in a growth phase, so it’s all about lowering prices. But at some point their market muscle will mean they can go the Oracle route a la Larry Ellison. That’s why customers start feeling the squeeze.

If the NYPD example is any indication, it could get ugly!

Read: Can on-demand consulting save startups time & money?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Do we have to manage ops in the cloud?

via GIPHY

One of the things that is exciting about the cloud is the reduced need for operations staff. There seem to be two drivers of this trend. One is devops, and all the automation that comes with it. As we formalize configurations, things become repeatable, and fewer people can manage greater armies of servers.

The second is by moving to a cloud hosting provider, we essentially outsource the operations to their team.

1. Pretty abstractions? still hardware buried somewhere

That’s right, beneath all the virtual EC2 instances & VPCs there is physical hardware. Huge datacenters sit in North Virginia, Oregon, Ireland, London and many other cities. Within them there are racks upon racks of servers. The hypervisor layer, the abstraction built on top of that, orchestrates everything.

Although we outsource the management of those datacenters to Amazon, there are still responsibilities we carry. Let’s dig into those more.

Also: Top serverless interview questions to ask an expert

2. Full-stack dev – demand for generalists?

These days we see the demand for a full stack developer. That is someone who does not only front end dev, but also backend. In turn, they are often asked to wear the had of ops. Spinup EC2 instance, decide on the capacity & size, choose proper disk I/O, place it within the right subnet & vpc & then configure the security groups properly.

All of these tasks would previously been managed by a dedicated ops team, but now those responsibilities are being put on developers shoulders. In some cases, such as with microservices, devs also carry the on-call duties of their applications.

Lastly there is likely ops to handle automation. Devops will formalize configurations, into ansible playbooks or chef recipes, so they can be checked into version control. At this point infrastructure can even be unit tested.

Read: Build an operational datastore on aws S3 with Spectrum

3. Design, resiliency, instrumentation, debugging

In previous eras, ops teams would be heavily involved with design of applications & architecture to support that. Now that may be handed to devs, but it still needs to happen.

Furthermore resiliency is said to be the customers responsibility. In the pre-cloud days, hardware was more reliable. It had a slower failure rate. With virtual machines, they’re expected to fail, and all the components to make your applications resilient are given to you. But it’s your job to architect them together.

That means your applications need to be self-healing. Failures need to be detected, taken out of autoscaling groups, and replaced. All automatically. Code or not, that is certainly operations.

Check this: Which engineering roles are in top demand?

4. It’s amazon’s fault we’re down!

I’ve seen quite a few outages in the past year, from Dropbox to Airbnb, and DYN themselves. Ultimately these outages could be tied back to a failure with Amazon. But when your business customers are relying on your service, it is *YOUR* business that answers to it’s own SLA.

In the news we see many of these firms pointing the finger at Amazon, “hey it’s not our fault, our cloud provider went down!”. Ultimately your customers don’t care. They don’t want excuses. If using multiple regions in AWS is not sufficient, you’ll need to build your application to be multi-cloud.

Also: 30 questions to ask a serverless fanboy

5. It’s hard to outsource your expertise

Remember, while you outsource your operations to Amazon, you’re getting very professional management of those systems. However they will optimize for their many customers. Your particular problems are less of a concern.

Read this: What can startups learn from the DYN DNS outage?

6. Only you can thinking holistically about interdependancies

Your application more than likely uses a number of APIs to capture data, perhaps do single sign on or even a third party database like Firebase. It’s your responsibility to do integration testing. All that becomes more complex in the cloud.

Also: How to lock down systems from outgoing employees

7. How do services complicate things?

SaaS solutions are everywhere now. auth0, firebase and an infinite variety of third party apis complicate reliability, security, storage, performance, integration testing & debugging?

Security is a traditional responsibility held by the operations hat. Much of that becomes more complex in the cloud. With serverless applications for example you may use a few APIs, plus an authentication broker, and a backend database. As this list of services grows, the code you write may decrease. But testing & securing it all becomes much more complex.

With more services like this, the attack vector or surface area becomes greater. Each of those services, can and will have bugs. What if a zero day is found in the authentication broker, allowing a hacker to break into a broad cross section of applications across the internet? How do you discover this? What if your vendor hasn’t found out yet?

Read: Is Amazon cloud too complex for small dev teams?

8. How does co-tenancy impact performance tuning?

Back to point #1 above, all these virtual servers sit on real physical servers. That affects customers in two ways. One you may be sharing the same host. That is if you use a very small vm, it may sit along side another customer with a small vm. If those eat up CPU cycles or network on that box, neighbors or co-tenants will suffer.

There are many other instance types where you get your own dedicated hardware. With those you have your own nic as well, so no competition. Except wait there’s network storage! That’s right all the machines in the AWS environment use EBS now, which is all co-tenant. So your data is sitting alongside other customers, and you are all fighting for usage of the same disk read heads.

One way to mitigate this is to configure specific provisioned IOPS for your servers. But that costs more. It’s normally reserved for database instances where disk I/O is really crucial.

Granted the NewRelics of the world will certainly help us with this process. But they’re not giving us a hypervisor or global view of those servers, network or storage. So we can’t see how the overall systems performance may be impacted.

Related: Is AWS a patient that needs constant medication?

9. Operations can be invisible

When security is done well, you don’t have breakins, you don’t have data stolen, everything just runs smoothly. Operations is like that too. When it is done well it can be invisible.

It can also be invisible in a different way. When you deploy your application on serverless, all the servers & autoscaling is completely abstracted away. So when you get some weird outage because the farm of servers is offline, or because you hit some account limit in the number of functions you can run at once, then it quickly comes into focus.

Beware of invisible operations, because it’s harder to see what to monitor, and know how to stay ahead of outages.

Read: Is amazon too big to fail?

10. We can’t oursource true ownership

At the end of the day you can’t outsource ownership of your application or your business. The holistic view of your application in totality can only be understood by your engineers.

And that in the end is what operations is all about, no matter who’s wearing the hat!

Also: 5 reasons to move data to amazon redshift

Get my monthly newsletter for more thoughts on data, startups & innovation. Scalability. Automation. Amazon cloud.

How to build an operational datastore on AWS with S3 & Redshift

via GIPHY

You’re building your data warehouse, and getting data into Redshift. You’ve got your ETL pipeline running, and presentation layer talking to the warehouse. Great.

But how to get access to that source data? Wouldn’t it be nice if that was close by too?

Join 35,000 others and follow Sean Hull on twitter @hullsean.

It may be you have 10-zillion rows of source data and don’t want or need to get all of that into Redshift and keep it there. But it would be nice to have access to it when you do.

Enter EXTERNAL tables, aka Spectrum. Now you can keep all your raw data in S3, an in place operational datastore of data before it’s been reworked and transformed. Use SQL to access it right where it sits.

Get all the advantages of lifecycle management in S3, and don’t pay all the redshift costs for data you don’t need all the time. Cool!

Let’s see how it works.

What is an EXTERNAL table?

Spectrum is Amazon’s rebranding of an old database technology called EXTERNAL TABLES. Back in the 90’s Oracle pioneered this work, allowing you to essentially map a CSV file, that sits outside the database proper. This means you can query all that juicy data sitting in flat files. Cool!

Athena allows you to query this stuff as a service, native to AWS. Spectrum allows you to create those external tables inside of Redshift.

Also: Top serverless interview questions for hiring aws lambda experts

Give Redshift permissions

Go into IAM and create a new role called “SeanSpectrumRole”. Assign the policy AmazonS3ReadOnlyPolicy. It looks like this:


{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": "*"
}
]
}

If you’re using the dashboard you just pick the policy from the named list. However if you’re using CloudFormation, you’ll use the code above.

Now navigate your aws console to the Redshift dashboard, click clusters, and click the checkbox for your cluster. Probably there’s only one.

Now click the “Manage IAM Roles” button, and a dialog should popup.. Select the role you created earlier, SeanSpectrumRole. Then click “Apply Changes”.

The beauty of the AWS world is that servers themselves can have API permissions. In this case we gave the redshift cluster or server itself, access to S3 for our use below!

Related: Which engineering roles are in greatest demand?

Create your spectrum schema

First you must create a spectrum schema. Here’s the syntax:


create external schema spectrum
from data catalog
database 'sean'
region 'us-east-1'
iam_role 'arn:aws:iam::9999999999999:role/SeanSpectrumRole';

Read: Can on-demand consulting save startups time & money?

Upload your data to S3 bucket

Here we create an s3 bucket called sean_spectrum, then upload one csv file named sean_numbers.txt.


$ aws s3api create-bucket --bucket sean_spectrum --region us-east-1
{
"Location": "/sean_spectrum"
}
$ cd spectrum/
$ cat sean_numbers.txt
21,Dr.,Who,44-22-55-77-88
35,Bat,Man,317-222-4777
15,Wonder,Woman,999-324-7878
99,Storm,Cloud,367-399-6767
75,Marvel,Girl,222-333-9595
32,Quick,Silver,22-33-77-99
12,Scarlet,Witch,23-35-47-555
$ aws s3 cp sean_numbers.txt s3://sean_spectrum/
upload: ./sean_numbers.txt to s3://sean_spectrum/sean_numbers.txt
$ aws s3 ls s3://sean_spectrum/
2017-05-18 20:28:41 193 sean_numbers.txt
$

Note the names. The table name won’t turn out to be sean_numbers. It will be called sean_spectrum, and all files inside that directory will be queried. So make sure they have consistent formats!

Also: 30 questions to ask a serverless fanboy

Create & query your external table

Here’s how you create your external table. Note this is just a map to data. The data is still stored in S3. it is not brought into Redshift except to slice, dice & present.


mydb=# create external table spectrum_schema.sean_numbers(
id int,
fname string,
lname string,
phone string)
row format delimited
fields terminated by ','
stored as textfile
location 's3://sean_spectrum/';

Here’s how you query it:


mydb=# select * from spectrum_schema.sean_numbers order by id;
id | fname | lname | phone
----------------+---------------
12 | Scarlet | Witch | 23-35-47-555
15 | Wonder | Woman | 999-324-7878
21 | Dr. | Who | 44-22-55-77-88
32 | Quick | Silver | 22-33-77-99
35 | Bat | Man | 317-222-4777
75 | Marvel | Girl | 222-333-9595
99 | Storm | Cloud | 367-399-6767

Cool. We reordered data read from an S3 file!!!

Although you can’t create a view over a redshift table *AND* an S3 external table, you can query them together.

So for example if I have a table in redshift with addresses, I can join them together:

mydb=# select a.id, a.fname, a.lname, b.address from spectrum_schema.sean_numbers a, sean_addresses b
where a.id = b.id order by id;

id | fname | lname | phone | address
----------------+----------------------------
12 | Scarlet | Witch | 23-35-47-555 | 10 main st
15 | Wonder | Woman | 999-324-7878 | 25 center st
21 | Dr. | Who | 44-22-55-77-88 | 32 broadway
32 | Quick | Silver | 22-33-77-99 | 1 first st
35 | Bat | Man | 317-222-4777 | 99 west st
75 | Marvel | Girl | 222-333-9595 | 66 East Ave
99 | Storm | Cloud | 367-399-6767 | 50 North st

Also: What can startups learn from the DYN DNS outage?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

A roughneck walk down database alley

via GIPHY

I was just responding to some Disqus comments on a recent blog post. Admittedly it had a provocative title Will SQL databases just die already. What do you think?

Join 34,000 others and follow Sean Hull on twitter @hullsean.

A reader pointed out that some No-SQL databases do support joins. Huh? My face contorts in total confusion. How? Why?

For years SQL was misunderstood & unloved

Relational databases have been around for decades. 43 years according to the StackExchange article. That’s a lot of years. I’ve spent a few years as a dba, aka database administrator. The role can be distilled down as a herder of sorts. Keep all the data bits in the right boxes with the right labels. A digital librarian, that makes sure the books don’t get lost.

Of course patrons don’t always put books back where they should, and strict rules get put in place to avoid losing that one volume of shakespear in miles of shelves.

In the fast moving world of web + mobile, product is king, and agile rules the day. And anything that can make us more agile also wins. SQL, much maligned & misunderstood, was not one of those things.

Also: Top serverless interview questions for hiring aws lambda experts

NoSQL burst on the scene with much fanfare

With all that pressure, it was no wonder engineers thought, there must be a better way. Then along comes the No SQL database. I mean just the name speaks volumes about the design goal.

We’ll sacrifice anything, just please don’t make me write SQL!

The promises…

1. Never have to deal with pesky SQL that we don’t understand!
2. Interact with the database like any other data structure in our code!
3. Be schemaless! Crotchety Database Administrators be damned!
4. Be distributed. Be everywhere consistent! Be indistinguishable from magic!
5. Always be fast.

In that rush into the abyss, we lost track of durability. And down the rabbithole we went!

Related: Which engineering roles are in greatest demand?

Relational databases tried to be key-value

Then I started hearing about crazy things, like MySQL providing a memcache plugin, so you could use it albeit lightening fast, as a key-value store. You could sidestep that pesky SQL engine, and get right down to the bare metal. But why? Memcache & Redis were already doing that & purpose built. Why indeed?

I started to argue maybe we shouldn’t be muddying the waters. I mean stick with what you know!

Read: Can on-demand consulting save startups time & money?

War was won, success declared

Around this I think was when Mongodb was declaring the war won. We had finally left SQL databases in the dustheap of history. It may or may not have inspired this popular youtube skit…

Also: 30 questions to ask a serverless fanboy

Meanwhile hadoop is losing ground. Bigquery & Redshift both speak SQL

But then something funny started to happen. It seemed there was a backlash against Mongodb. A lot of customers were losing data. (Yep that’s what durability means guys…) And the hype started reversing. Even the mighty hadoop has been losing popularity of late. How long does it take to write an EMR job versus an SQL query. Let’s be honest?

I asked myself, Is Hadoop losing ground to SQL warehouses like Redshift & Bigquery?. I wonder.

Also: What can startups learn from the DYN DNS outage?

NoSQL databases are looking for JOINs?

Recently I bumped into some interesting blog comments & discussions about how Orientdb was trying to add joins to their product.

As certain relational databases try to become No SQL databases, other No SQL databases are trying to add more complex SQL, because well somehow their product is missing something.

Also: What can startups learn from the DYN DNS outage?

Engineering truth versus fashion

43 years is a lot of years. And when we drop all the fashion trends in tech, and the new database du jour, what do we find?

There is room for No SQL databases. Yep. And the do certain things, and solve certain types of problems well. But their not general workhorses, nor can they slice and dice your data however you like. And when you get to that point in your project, you’re going to want to ask interesting questions of your data.

And surely that’s where SQL excels. It ain’t going anywhere, folks!

Also: What can startups learn from the DYN DNS outage?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters