5 ways to level up as cloud expert

aws certified

Cloud computing is blowing up! But don’t take my word for it, read this recent NY Times piece: Tech companies clamor to entice cloud computing experts.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Still don’t believe me? Get on the phone with a recruiter or two. They’ll convince you because they’ve got companies banging down the door looking for talent that is plainly in SHORT SUPPLY. And that’s the supply *you* want to be. 🙂

Check Gary’s Guide Jobs, or the ever popular Angel List Jobs. There’s also Stack Overflow jobs and many more.

1. Become a book reviewer

You’ve already got a technical background, and want to hone those skills. Take a look at technical book reviewing.

Manning is putting out some excellent technical books these days. Apply here to be a reviewer.

Also take a look at Pragmatic Bookshelf. They are are looking for reviewers too.

In either case you can expect to spend time reading a book chapter by chapter, as it’s written, offer strategic or layout advice, feedback on presentation, comprehension, and edits.

Also: When hosting data on Amazon turns bloodsport

2. Join an Open Source project

There are millions. Flip through github to some that you’re interested in. Contribute a bug fix or comment, reach out to the project leaders.

Afraid to dive in? Join one of the forums or google discussion groups, and lurk for a while. Ask questions, offer a helping hand!

Related: Is Amazon too big to fail?

3. Self-paced labs

Online education is blowing up, and for good reason. They get the job done & for the right price!

One of my favorites for AWS Certification is the A Cloud Guru courses. These offer lecture style introduction to all levels of AWS from Sysops Administration, Developer & Solutions Architect to Devops, Lambda & CodeDeploy.

The courses are priced right, and geared directly towards Amazon’s certifications. That helps you focus on the right things.

Amazon also partners with qwiklabs to offer courses geared towards getting certified. There are specific ones for the associate & professional certification, and many others besides.

You’ll need to signup for AWS Activate first, before you can use these qwiklabs. They offer you 80 credits right out of the gate.

For the next two weeks many of the courses are free! One thing I really like is they include a free temporary aws login for the students. That way there’s no risk of deploying infrastructure, and accidentally getting a big bill at the end of the month.

The labs though are more like reading documentation versus a nice video course lecture. So you the student have to do a lot more to get through it.

Read: Are we fast approaching cloud-mageddon?

4. Coursera, Khanacademy & Udemy

There’s a free class on Coursera called Startup Engineering by Balaji Srinivasan & Vijay Pande. Some pretty amazon material & lectures in here, and if you’re determined, it’s 12 weeks that will get you going on the right foot!

KhanAcademy has a great many courses on computer programming. Awesome and free stuff here. One particularly interesting is their hour of code. For those hesitant, that’s an easy way to jump in!

There is also udemy, which offers some great material on cloud computing. Notice that the certification courses are the same ones from A-Cloud Guru!

Also: Are SQL databases dead?

5. Interview tests

Apply to jobs. Even if you’re unsure if that is your dream job. Why? Because they often include a test to find out about your technical chops. Diving into these tests is a great way to push your own edge. You may do well, you may not. Learn where your weaknesses are.

I especially like the ones where you’re asked to login to a server, configure some things, write some code, and solve a real problem. Nothing beats a real-world example!

Also: Why dropbox didn’t have to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Sean Hull interviewed on the Doppler Cloud podcast

I recently got a chance to talk with Mike Kavis over at Cloud Technology Partners. It was fun to get away from the keyboard, and in front of the microphone for a change.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

1. Docker

Docker is making deployments easier & easier. But as the pace accelerates, are we introducing vulnerabilities & scalability problems faster than we can fix them?

Also: Are generalists better at scaling the web?

2. Redshift

I’ve blogged that I don’t work with recruiters but I do chat with them regularly.

In a recent conversation a recruiter asked me:

“Why is it that suddenly everyone is looking for Redshift?”

I’m seeing the same trend. And if you look at Hadoop you might see why. Writing SQL queries against Redshift data is wildly simpler than writing EMR jobs for Hadoop.

Related: Why Dropbox didn’t have to fail?

3. Devops automation

These days I hear a lot of talk that all operations is software development. Are you still SSHing into boxes. You’re doing it wrong!

Read: When hosting data on Amazon turns bloodsport

4. Hardware solves all speed problems

Having performance problems? Scale out! Database slow, scale up! These days it seems the old short sighted way of thinking is back with a vengence. Throw hardware at the problem and kick the can down the road.

Also: How to hire a developer that doesn’t suck?

5. Amazon disrupting VC

During dot-com version one-point-oh, you’d need hundreds of thousands to buy hardware & software licenses to get an idea off the ground. That necessarily meant real VC money to get off the ground.

Amazon web services & on-demand computing has brought world class infrastructure to even the smallest startups. For just dollars, they can get started.

Now we’re seeing startups get going with micro investments from the likes of Angel List syndicates. Cutting traditional VCs right out of the equation.

Also: Is Amazon too big to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Launch Festival 2016 Tickets for San Francisco event

launch festival 2016

One of the biggest startup festivals of the year LAUNCH is coming up next week in San Francisco.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Are you feeling lucky? If so enter to win tickets to Launch 2016. The winner will get a pass to the event, a one-on-one with Jason Calacanis, a one-year RushTix pass which seems pretty damn cool, and lastly a Dining on Reserve pass.

Nice!

1. Launch Festival 2016

The Launch Festival is a creation of Jason Calacanis. Formerly one of New York’s own, he started Silicon Alley Reporter way back in the dot-com era v1.0. Remember that? After some huge successes here, he moved on to become a huge figure in the Silicon Valley scene & the bay area.

Past events have included folks like Paul Graham & Mark Cuban & this years event is shaping up to fill Fort Mason Center to capacity again.

Also: Why is everyone suddenly talking about Amazon Redshift?

2. RushTix Membership

RushTix is a membership based way to discover local artists, concerts & events in the bay area. As a member you get comped tickets to all sorts of cool events. Check it out!

Related: Which tech do startups use most?

3. Dining on Reserve

Reserve consolidates restaurant discovery, reservations, and payment all in one smartphone app. What’s more you can use it at restaurants in a few big cities, like our own New York, LA, SF, Philadelphia, Boston & DC. Not bad!

Related: Is data your dirty little secret?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Are you getting errors building Amazon lambda functions? Don’t fret I got you!

aws lambda python

Why does Amazon make lambda functions so hard to create? Well my guess is that when you live at the bleeding edge you should expect to get scrapes!!

Everybody is trying to build lambda functions these days. And it’s no wonder. Once you get them running, Amazon takes care of all the infrastructure drudge work! So cool.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

I’ve been spending some time trying to get answers out of AWS support, and let me tell you it’s no fun. Yes all this stuff is new technology, and nobody has expertise in it in the way say you might in Linux or Oracle or another technology that’s been around for a decade.

Still you’d hope the techs would have some clue. In the end it was a slog dealing with support, and I think I was the one teaching them!

I did find Matt Perry’s howto which is pretty good.

Hopefully my own notes can help someone, so read on!

1. No lambda_function?

The very first issue you’re gonna run into is if you name the file incorrectly, you get this error:


Unable to import module 'lambda_function': No module named lambda_function

If you name the function incorrectly you get this error:


Handler 'handler' missing on module 'lambda_function_file': 'module' object has no attribute 'handler'

On the dashboard, make sure the handler field is entered as function_filename.actual_function_name and make sure they match up in your deployment package.

If only the messages were a bit more instructive that would have been a simpler step, but oh well!

Also: Is Amazon too big to fail?

2. No module named MySQLdb

This is a very tricky one. I mean after all you just spent all this time building your deployment package specifically for lambda, so what gives??


"Unable to import module 'lambda_function': No module named MySQLdb"

Turns out when you use a virtualenv, files will be installed into proj/lib/python2.7/site-packages/ or lib64. However Lambda wants them in the root proj/ directory! So move them there. I know I know. Weird, but that’s what they want.

Related: When hosting on Amazon turns bloodsport

3. Can’t find libmysqlclient

If you’re using the MySQLdb library like I was, you’ll eventually bump into this error:


Unable to import module 'lambda_function': libmysqlclient.so.18: cannot open shared object file: No such file or directory

Turns out that /usr/lib/libmysqlclient.so.18 needs to be COPIED from /usr/lib. Don’t do “mv” or your system won’t have the mysql lib anymore!

Related: Are SQL databases dead?

4. Use the Amazon Lambda environment

One thing the support pointed out is that AWS as *supported images* for lambda development.

After all the errors above were resolved, it’s not clear to me that the supported AMI’s are truly required. However if you’re hitting intractable problems building a properly lambda deploy, you might wanna look at building one of these boxes.

Read: Why dropbox didn’t have to fail

5. Build your lambda deploy package

Now let’s roll it all together. Here’s are all the steps to build your deploy package.


- SSH to the instance
- mkdir test
- virtualenv test
- source proj/bin/activate
- sudo yum groupinstall 'Development Tools'
- sudo yum install mysql
- sudo yum install mysql-devel
- pip install MySQL-python
- cd test
- emacs -nw lamdba_function.py
- add your code to that file
- save the lambda_function.py
- mv proj/lib/python2.7/site-packages/* proj/
- mv proj/lib64/python2.7/site-packages/* proj/
- rm -rf proj/lib (don't need dist-packages in the deploy pkg)
- rm -rf proj/lib64 (don't need dist-packages
- zip -r proj.zip *

Also: How to hire a developer that doesn’t suck?

6. Upload your code

Uploading your code via the AWS dashboard is fine when you’re first testing things. But after a while it’ll get tiring going in the front door.

Create a new lambda function by specifying the basics as follows:


aws lambda create-function \
--function-name testfunc1 \
--runtime python2.7 \
--role arn:aws:iam::996225510001:role/lambda_basic_execution \
--handler lambda_function_file.handler_name \
--zip-file file://proj.zip

And when you want to update your function, do the following:


aws lambda update-function-code \
--function-name testfunc1 \
--zip-file file://proj.zip

Also: How to deploy on EC2 with Vagrant

Good luck with lambda. Once you get past Amazon’s weak documentation it’s pretty cool to be in a serverless computing environment. Happy deploying!

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why is everyone suddenly talking about Amazon Redshift?

par accel redshift

It seems like all I hear these days is Redshift, Redshift, Redshift!

I met up with a recruiter today. We talked about this & that. The usual. Then when he came to the topic of technology he said,

“yeah it seems as though suddenly everybody is looking for Redshift & Snowflake”

As I blogged about before, I don’t work with recruiters, I learn a lot from them.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Luckily I got to cut my teeth on Redshift about a year ago. I was senior database engineer managing Amazon & MySQL RDS, and they wanted to build a data warehouse. Bingo!

Here’s the big takeaway from my discussion today. Recruiters have their fingers on the pulse!

1. We need an Amazon expert

Here’s what else I’m hearing everywhere. “We’re migrating to AWS, can you help?” Complexity & confusion around the new virtual networking, moving into the cloud, and tuning applications & components to get the same performance as before. All of these are real & present needs for firms.

Related: Is data your dirty little secret?

2. We need a Redshift expert

Amazon bought Par Accel, a bleedingly fast warehouse. It uses SQL. It looks like Postgres, and handles petabytes. You read that petabytes! It’s so good in fact that it seems a lot of folks are now dumping Hadoop.

Incredible as that sounds, Redshift is delivering *that* kind of speed on that kind of big data. Wow! What’s more you skip the whole Hadoop cycle of write, test, debug, schedule job, fix bugs, and stir. With SQL you bring back the iterative agile process!

Read: 5 cloud challenges I’m thinking about today

3. We need a Hadoop expert

Ok, for those enterprises who aren’t sold on Redshift yet, there is still a ton of Hadoop out there. And for good reason.

Apache Spark is also getting really big now too. It’s an easier to manage successor to Hadoop, based around much of the same concepts.

Also: 5 core pieces of the Amazon cloud puzzle to get your project off the ground

4. We need strong Python skills

Python is everywhere. Amazon’s command line interface is python based. You see it everywhere. If it’s not in your wheelhouse get it there!

Also: Why Dropbox didn’t have to fail

5. We need communicators

Another interesting thing the recruiter said

“I was surprised & a little shocked that you suggested we meet for coffee. Most developers are hard to get out to have a conversation with.”

Good communicators are as in-demand as ever! Being able to and happy to talk with people who aren’t deeply technical, and distill complex technical jargon into plain english. And do that with a smile too & enjoy it?

That’s special!

Also: Should we be muddying the waters? Use cases for MySQL & Mongodb

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Do we need computer science for all?

I was recently digging through AVC, Fred Wilson’s blog. These days it’s where I get most of my tech news. 🙂 I ran into this AVC Post on Obama’s Computer Science for All initiative. I hadn’t been paying attention to these weekly addresses.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

It’s exciting to see this reach the national stage. Theres’s been a shortage of computer science graduates since the 90’s. In fact it’s only grown.

Computer Science for All

Here’s the full address. It’s short & worth watching.

Also: 5 core pieces of the Amazon cloud to get your project off the ground

Code is everywhere

The president points out that it’s not just at trendy startups & silicon valley that you see code anymore. Car mechanics, nurses & everyone in the new economy touches code. It isn’t an optional skill anymore but rather a basic one.

Related: 5 tech challenges I’m thinking about today

1M unfilled computer science jobs by 2020!

This is an incredible figure. That’s not the number of jobs, but rather the number we’ll be short! That’s right we’ll need a million more graduates than we’ll have.

The Bureau of Labor Statistics says that by 2020, there’ll be 1 million more jobs in computer-science and related fields than students graduating for them.

While disruption affects a lot of other industries, in high tech skills, the demand is actually exploding!

Read: Is Data your dirty little secret?

Digital divide

Sometimes called a “digital skills gap” or “digital divide”, attention to this problem is sorely needed. Want more? Check out Girls who code or any of the many courses offered at General Assembly or a coding meetup group near you.

Also: Is Amazon too big to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is data your dirty little secret?

data comparison cloud

While I was fumbling for the dictionary to figure out what polyglot persistence was, the CTO had decided to build a warehouse on Redshift.

“Everybody’s moving data there.” He declared. I looked on quizzically.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

“That’s a very new database engine”, I chimed in. “Lets do some testing”.

And there began an adventurous ride into the bleeding edge of Amazon’s new data service offerings!

1. The data scientist comes crying

Are our transactional database we were using Amazon RDS for MySQL. It’s a great managed service that eliminates some of the headaches of doing it yourself. I wrote about thisRDS or MySQL Use Cases.

We needed some way to get data over to Redshift. We evaluated AWS Data Pipeline, but it wasn’t realtime enough. In a pinch we decided on a service called Flydata. After weeks of effort to get it setup & administered we had it running smoothly.

I since discovered some pipelining solutions dedicated to Redshift such as Alooma – modern data plumbing, RJMetrics pipeline and Domo. I *did not* manage to get Tungsten working. It supports redshift on paper, but has a lot of growing up to do.

Until one data when the data scientist shows up at my desk. “We have problems in our data on redshift.”. I look back confused. “Are you sure? Can you tell me where you’re seeing that?” I respond.

Also: When hosting data on Amazon turns bloodsport

2. Deleted data reappears in Redshift!

He sends me over some queries, that I rerun myself. I see extra data in Redshift too, data that had been deleted in MySQL. Strange. We dig deeper together trying to figure out what’s happening.

We find that the tables with extra data are child tables of a parent where data was deleted. Imagine Citibank deletes a customer, they also want to delete the records for monthly bills. Otherwise those will just be hanging around, and won’t match up anymore with a parent record for a customer. In real life Citibank probably doesn’t delete like this but it’s a helpful example.

The first thing I do is open a ticket with Flydata. After all we hadn’t gotten any errors logged. Things *must* be running correctly.

After highlighting the severity of the issue, we setup a conference call with Flydata. Digging further they discover the problem. Child table data can’t get deleted on Redshift, because it doesn’t support ON DELETE CASCADE. Wait what?

Turns out Flydata makes use of the MySQL transaction log to move data. In mysql to mysql replication this works fine because downstream you also have MySQL. It also implements on delete cascade so those child records will get cleaned up correctly. Since Redshift doesn’t have this, there’s no way for Flydata to instruct Redshift what to do. Again I said, wait what?

My surprise wasn’t that a new unproven technology like Redshift had a lot of holes & missing features. My surprise was that Flydata was just silently ignoring the problem. No logged messages to tell the customer about inconsistencies. At least?

Related: Is Amazon too big to fail?

3. The problem – comparing data

As you might imagine, this is a terrible way to find out about data problems. As the person tasked with moving data between these systems, eyes were on me. My thought was, we chose a service-based solution, so they’re manage data movement. If there’s a problem, they’ll surely alert us.

From there the conversation became, ok, how do we figure out where all these data differences are? Is it widespread or isolated to a few tables? Can we mitigate it with changed queries? Cleanup on a daily basis? These are some questions that’ll immediately come to mind.

To answer them we needed a way to compare table data across different databases. This is hard to do within a homogenous environment where server versions & datatypes are likely to be the same. It is much more complicated when you’re working across heterogenous systems.

Read: 5 Reasons to move data to amazon redshift

4. Build some way to spot check data

Although this still doesn’t seem a solved problem, there are some tools. One way is to perform checksums on tables & rows. These can then be compared to find differences.

This drew me to find Jason Friedman’s
table hash script on Github. It can work across MySQL, Postgres & redshift. Pretty cool stuff if you ask me.

One problem remains. Databases are always in flux. As such you may find discrepancies based on data that hasn’t been moved yet. Data that’s just changed in the last few minutes.

If you refresh data nightly, you may for example be able to stop a slave to compare data at an instant in time.

Also: Is Redshift outpacing hadoop as the warehouse for startups?

5. The mentality: treat data as a product & monitor

Solving tough problems like these is a work in progress. What it taught me is that:

You should own your data pipeline

This allows you to be vigilant about monitoring, throw errors if data is different, and ultimately treat data as a product. Owning the pipeline will mean you can build monitoring around stages, and automate spot checks on your data.

You won’t get it perfect, but you want to know when it isn’t.

Also: 5 core pieces of the Amazon puzzle to get your project off the ground

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

5 tech challenges I’m thinking about today

fast fish

Technical operations & startup tech are experiencing an incredible upheaval which is bringing a lot of great things.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Here are some of the questions it raises for me.

1. Are we adopting Docker without enough consideration?

Container deployments are accelerating at a blistering pace. I was reading Julian Dunn recently, and he had an interesting critical post Are container deployments like an oncoming train?

He argues that we should be wary of a few trends. One of taking legacy applications and blindly containerizing them. Now we can keep them alive forever. 🙂 He also argues that there is a tendency for folks who aren’t particularly technical or qualified who start evangelizing it everywhere. A balm for every ailment!

Also: Is Amazon too big to fail?

2. Is Redshift supplanting hadoop & spark for startup analytics?

In a recent blog post I asked Is Redshift outpacing hadoop as the big data warehouse for startups.

On the one hand this is exciting. Speed & agile is always good right? But what of more Amazon & vendor lock-in?

Related: Did Dropbox have to fail?

3. Does devops automation make all of operations a software development exercise?

I asked this question a while back on my blog. Is automation killing old-school operations?

Automation suites like Chef & Puppet are very valuable, in enabling the administration of fleets of servers in the cloud. They’re essential. But there’s some risk in moving further away from the bare metal, that we might weaken our everyday tuning & troubleshooting skills that are essential to technical operations.

Read: When hosting data on Amazon turns bloodsport?

4. Is the cloud encouraging the old pattern of throwing hardware at the problem?

Want to scale your application? Forget tighter code. Don’t worry about tuning SQL queries that could be made 1000x faster. We’re in the cloud. Just scale out!

That’s right with virtualization, we can elastically scale anything. Infinitely. 🙂

I’ve argued that throwing hardware at the problem is like kicking the can down the road. Eventually you have to pay your technical debt & tune your application.

Also: Are SQL databases dead?

5. Is Amazon disrupting venture capital itself?

I’m not expert on the VC business. But Ben Thompson & James Allworth surely are. And they suggested that because of AWS, startups can setup their software for pennies.

This resonates loud & clear for me. Why? Because in the 90’s I remember startups needing major venture money to buy Sun hardware & Oracle licenses to get going. A half million easy.

They asked Is Amazon Web Services enabling AngelList syndicates to disrupt the Venture capital business? That’s a pretty interesting perspective. It would be ironic if all of this disruption that VC’s bring to entrenched businesses, began unravel their own business!

Also: Are we fast approaching cloud-mageddon?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Is Redshift outpacing Hadoop as the big data warehouse for startups?

redshift hadoop killer

More and more startups are looking at Redshift as a cheaper & faster solution for big data & analytics.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Saggi Neumann posted a pretty good side-by-side comparison of Redshift & Hadoop and concluded they were tied based on your individual use case.

Meanwhile Bitly engineering concluded Redshift was much easier.

1. More agile

One thing pointed out by the bitly blog post, which I’ve seen countless times, is the slow iteration cycle. Write your map-reduce job, run, test, debug, then run on your cluster. Wait for it to return and you might feel like you’re submitting a stack of punched cards. LOL Resolve the errors that come back and then rerun on your cluster. Over & over & over again.

With Redshift you’re writing SQL, so your iterating through syntax errors quickly. What’s more since Redshift is a column-compressed database, you can do full table scans on columns without indexes.

What that means for you and me is that queries just run. And they run blazingly fast!

Also: When hosting data on Amazon turns bloodsport

2. Cheap

Redshift is pretty darn cheap.

Saggi’s article above quotes Redshift at $1000/TB/yr for reserved, and $3700/TB/yr for on-demand. This compared with a hadoop cluster at $5000/TB/yr.

But neither will come with spitting distance of the old-world of Oracle, where customers host big iron servers in their own datacenter, paying north of a million dollars between hardware & license costs. Amazon cloud FTW!

Related: Did dropbox have to fail?

3. Even faster

Airbnb’s nerds blog has a post showing it costing 25% of a Hadoop cluster, and getting 5x performance boost. That’s pretty darn impressive!

Flydata has done benchmarks showing 10x speedup.

Read: Are SQL Databases dead?

4. SQL Toolchains

***

Also: 5 core pieces of the Amazon cloud puzzle to get your project off the ground

5. Limitations

o data loading

You load data into Redshift using the COPY command. This command reads flat files from S3 and dumps them into tables. It can be extremely fast if you do things in parallel. However getting your data into those flat files is up to you.

There are a few solutions to this.

– amazon data pipeline

This is Amazon’s own toolchain, which allows you to move data from RDS & other Amazon hosted data sources. Data pipeline does not move data realtime, but in batch. Also it doesn’t take care of schema changes so you have to do that manually.

I mentioned it in my 5 reasons to move data to Amazon Redshift

– Flydata service

Flydata is a service with a monthly subscription which will connect to your RDS database, and move the data into Redshift. This seems like a no brainer, and given the heft pricetag of thousands per month, you’d expect it to cover your bases.

In my experience there are a lot of problems & it still required a lot of administration. When schema changes happen, those have to be carefully applied on Redshift. What’s more there’s no silver bullet around the datatype differences.

Also: Some thoughts on 12 factor apps

Flydata also makes use of the binary logs to replicate your data. Anything that doesn’t show up in the binary logs is going to cause you trouble. That includes when you do sql_log_bin=0 in the session, an SQL statement includes a no logging hint. Also watch out for replicate-ignore-db options in your my.cnf. But it also will fail if you use ON DELETE CASCADE. That’s because these downstream changes happen via Constraint in MySQL. But… drumroll please, Redshift doesn’t support ON DELETE CASCADE. In our case the child tables ended up with extra rows, and some queries broke.

– scripts such as Donors choose loader

Donors Choose has open sourced their nightly Redshift loader script. It appears to reload all data each night. This will nicely sidestep the ON DELETE CASCADE problem. As you grow though you may quickly hit a point where you can’t load the entire data set each night.

Their script sources from Postgres, though I’m interested to see if it can be modified for MySQL RDS.

– Tried & failed with Tungsten replicator

Theoretically Tungsten replicator can do the above. What’s more it seems like a tool custom made for such a use case. I tried for over a month to troubleshoot. I worked closely with the team to iron out bugs. I wrote wrestling with bears or how I tamed Tungsten replicator for MySQL and then I wrote a second article Tungsten replicator the good the bad & the ugly. Ultimately I did get some data moving between MySQL RDS & Redshift, however certain data clogged the system & it wouldn’t work for any length of time.

Also: Secrets of a happy Amazon hacker or how to lock down your account with IAM and multi-factor authentication

o data types & character sets

There are a few things here to keep in mind. Redshift counts bytes, so if in mysql or some other database you had a varchar(5) it may be varchar(20) in Redshift. Even then I had cases where it still didn’t fit & I had to make the field bigger by 4.

I also ran into problems around string character encodings. According to the docs Redshift handles 4-byte UTF-8.

Redshift doesn’t support ARRAYs, BIT, BYTEA, DATE/TIME, ENUM, JSON and a bunch of others. So don’t go into it expecting full Postgres support.

What you will get are multibyte characters, numeric, character, datetime, boolean and some type conversion.

Also: Is the difference between dev & ops a four-letter word?

o rebalancing

If and when you want to add nodes, expect some downtime. Yes theoretically the database is online while it’s shipping data to the new nodes & redistributing things, the latency can start to feel like an outage. What’s more it can easily push into the hours to do.

Also: Is AWS enabling startups which enable AngelList Syndicates to boil the VC business?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

What events are good for tech & startup networking in New York City?

garys guide events

I’ve worked in the NYC startup scene since the mid-nineties. It seems to keep growing every year, and there are so many events it’s hard to keep track.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Here’s where to look for the best stuff.

1. Gary’s Guide

Gary Sharma hosts an authoritative guide to all the events in the new york tech & startup scene. It’s sort of the one-stop shop for knowing what’s going on.

Lucky for us, in a city the size of new york, there’s an opportunity to meet & network with people everyday of the week.

Also: 5 core pieces of the Amazon cloud puzzle to get your project off the ground

2. Meetups

Meetup.com is another invaluable resource. There are technical groups & social ones, and plenty of niche groups to for specific areas of interest.

For example there’s NYC Tech Talks, NY Women in Tech, Tech for good & NY Entrepreneurs & Startup Network. There are plenty more.

Related: Some thoughts on 12-factor apps

3. Eventbrite

A lot of events us Eventbrite for ticketing, so it turns out to be a great place to search. Some of the startup related events .

Read: Why dropbox didn’t have to fail

4. Techdrinkup

Michael Gold’s #techdrinkup event keeps getting bigger & better. More social hour than presentations & such, you’re sure to bump elbows with some folks in NY’s exploding tech scene.

Take a look at some of the event photos on their facebook page.

Also: How do hackers secure their Amazon Web Services account?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters