Tag Archives: bigdata

What does the fight between palantir & nypd mean for your data?

via GIPHY

In a recent buzzfeed piece, NYPD goes to the mat with Palantir over their data. It seems the NYPD has recently gotten cold feet.

Join 35,000 others and follow Sean Hull on twitter @hullsean.

As they explored options, they found an alternative that might save them a boatload of money. They considered switching to an IBM alternative called Cobalt.

And I mean this is Silicon Valley, what could go wrong?

Related: Will SQL just die already?

Who owns your data?

In the case of Palantir, they claim to be an open system. And of course this is good marketing. Essential in fact to get the contract. Promise that it’s easy to switch. Don’t dig too deep into the technical details there. According to the article, Palantir spokeperson claims:

“Palantir is an open platform. As with all our customers, their data & analysis are available to them at all times in an open & nonproprietary format.”

And that does appear to be true. What appears to be troubling NYPD isn’t that they can’t get the analysis, for that’s available to them in perpetuity. Within the Palantir system. But getting access to how the analysis is done, well now that’s the secret sauce. Palantir of course is not going to let go of that.

And that’s the devil in the details when you want to switch to a competing service.

Also: Top serverless interview questions for hiring aws lambda experts

Who owns the algorithms?

Although the NYPD can get their data into & out of the Palantir system easily, that’s just referring to the raw data. That’s the data they ingested in the first place, arrest records, license plate reads, parking tickets, stuff like that.

“This notion of how portable your data is when you engage in a contract with a platform is really, really complex, and hasn’t really been tested” – Tal Klein

Palantir’s secret sauce, their intellectual property, is finding the needle in the haystack. What pieces of data are relevant & how can I present the detectives the right information at the right time.

Analysis *is* the algorithms. It’s the big data 64 million dollar question. Or in this case $3.5 million per year, as the contract is reported to be worth!

Related: Which engineering roles are in greatest demand?

The nature of software as a service

The web is bringing us great platforms, like google & amazon cloud. It’s bringing a myriad of AI solutions to our fingertips. Palantir is providing a push button solution to those in need of insights like the NYPD.

The Cobalt solution that IBM is offering goes the other way. Build it yourself, manage it, and crucially control it. And that’s the difference.

It remains to be seen how the rush to migrate the universe of computing to Amazon’s own cloud will settle out. Right now their in a growth phase, so it’s all about lowering prices. But at some point their market muscle will mean they can go the Oracle route a la Larry Ellison. That’s why customers start feeling the squeeze.

If the NYPD example is any indication, it could get ugly!

Read: Can on-demand consulting save startups time & money?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Sean Hull interviewed on the Doppler Cloud podcast

I recently got a chance to talk with Mike Kavis over at Cloud Technology Partners. It was fun to get away from the keyboard, and in front of the microphone for a change.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

1. Docker

Docker is making deployments easier & easier. But as the pace accelerates, are we introducing vulnerabilities & scalability problems faster than we can fix them?

Also: Are generalists better at scaling the web?

2. Redshift

I’ve blogged that I don’t work with recruiters but I do chat with them regularly.

In a recent conversation a recruiter asked me:

“Why is it that suddenly everyone is looking for Redshift?”

I’m seeing the same trend. And if you look at Hadoop you might see why. Writing SQL queries against Redshift data is wildly simpler than writing EMR jobs for Hadoop.

Related: Why Dropbox didn’t have to fail?

3. Devops automation

These days I hear a lot of talk that all operations is software development. Are you still SSHing into boxes. You’re doing it wrong!

Read: When hosting data on Amazon turns bloodsport

4. Hardware solves all speed problems

Having performance problems? Scale out! Database slow, scale up! These days it seems the old short sighted way of thinking is back with a vengence. Throw hardware at the problem and kick the can down the road.

Also: How to hire a developer that doesn’t suck?

5. Amazon disrupting VC

During dot-com version one-point-oh, you’d need hundreds of thousands to buy hardware & software licenses to get an idea off the ground. That necessarily meant real VC money to get off the ground.

Amazon web services & on-demand computing has brought world class infrastructure to even the smallest startups. For just dollars, they can get started.

Now we’re seeing startups get going with micro investments from the likes of Angel List syndicates. Cutting traditional VCs right out of the equation.

Also: Is Amazon too big to fail?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why is everyone suddenly talking about Amazon Redshift?

par accel redshift

It seems like all I hear these days is Redshift, Redshift, Redshift!

I met up with a recruiter today. We talked about this & that. The usual. Then when he came to the topic of technology he said,

“yeah it seems as though suddenly everybody is looking for Redshift & Snowflake”

As I blogged about before, I don’t work with recruiters, I learn a lot from them.

Join 32,000 others and follow Sean Hull on twitter @hullsean.

Luckily I got to cut my teeth on Redshift about a year ago. I was senior database engineer managing Amazon & MySQL RDS, and they wanted to build a data warehouse. Bingo!

Here’s the big takeaway from my discussion today. Recruiters have their fingers on the pulse!

1. We need an Amazon expert

Here’s what else I’m hearing everywhere. “We’re migrating to AWS, can you help?” Complexity & confusion around the new virtual networking, moving into the cloud, and tuning applications & components to get the same performance as before. All of these are real & present needs for firms.

Related: Is data your dirty little secret?

2. We need a Redshift expert

Amazon bought Par Accel, a bleedingly fast warehouse. It uses SQL. It looks like Postgres, and handles petabytes. You read that petabytes! It’s so good in fact that it seems a lot of folks are now dumping Hadoop.

Incredible as that sounds, Redshift is delivering *that* kind of speed on that kind of big data. Wow! What’s more you skip the whole Hadoop cycle of write, test, debug, schedule job, fix bugs, and stir. With SQL you bring back the iterative agile process!

Read: 5 cloud challenges I’m thinking about today

3. We need a Hadoop expert

Ok, for those enterprises who aren’t sold on Redshift yet, there is still a ton of Hadoop out there. And for good reason.

Apache Spark is also getting really big now too. It’s an easier to manage successor to Hadoop, based around much of the same concepts.

Also: 5 core pieces of the Amazon cloud puzzle to get your project off the ground

4. We need strong Python skills

Python is everywhere. Amazon’s command line interface is python based. You see it everywhere. If it’s not in your wheelhouse get it there!

Also: Why Dropbox didn’t have to fail

5. We need communicators

Another interesting thing the recruiter said

“I was surprised & a little shocked that you suggested we meet for coffee. Most developers are hard to get out to have a conversation with.”

Good communicators are as in-demand as ever! Being able to and happy to talk with people who aren’t deeply technical, and distill complex technical jargon into plain english. And do that with a smile too & enjoy it?

That’s special!

Also: Should we be muddying the waters? Use cases for MySQL & Mongodb

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Why I like Etsy’s site performance report

etsy code as craft

Etsy publishes a great tech blog titled Code As Craft.

Join 28,000 others and follow Sean Hull on twitter @hullsean.

I was recently sifting through some of their newer posts & stumbled upon their Q2 2015 Site Performance Report. It’s really in-depth, though not impossibly technical. Here’s what I liked.

1. Transparency to business & public

Show real performance to customers

The first thing I thought while reading, is the strong show of transparency. The blog is public, so it’s not just an internally facing document that shares with the company, but sharing with the wider world. True, presented as a technical post it may only appeal to a segment of readers, but it’s great none the less.

Show real performance to non-technical business units

I think this kind of analysis & summary also provides transparency to the business itself. Product teams, business operations & sales teams can all view what’s happening. Where are there problems? What is being done to address them?

Also: When hosting data on Amazon turns bloodsport

2. Highlighting change

Added pagination to the cart

One thing that popped out, was the discussion of pagination changes, that impacted page load times in the shopping cart. Page load times in the shopping cart are particularly crucial, because that’s where customers can “abandon” an order out of frustration.

Illustrating performance impact to product decisions

When product is evaluating that new feature, and they can see how changes affect performance, it better *sells* what all those engineering resources are being used for.

Related: 5 reasons to move data to amazon redshift

3. Where we don’t have data

We can’t analyze what data we haven’t captured

The report highlights that data around the shopping cart is new. That’s great because it highlights what the value collecting data offers, by providing new insights that were not available previously. This also pushes for more metrics collection & analysis as the business begins to see the value of all of this gymnastics.

Read: Is Amazon too big to fail?

4. Product tradeoffs

The discussion around the shopping cart performance also illustrates how the business makes product decisions. The engineering team can only build & write so much code. Deciding to spend time on pagination, means time not spent on some other new feature. Which is more valuable? Selling new feature A in one corner of the product, that customers may spend real money on? Or speeding up page load times on page B?

Also: Is Apple betting against big data?

5. Cleaner data

At a Look & Tell event, I heard Lincoln Ritter talk about Data as a product to the business.

When you expose a performance report like this to the business, an iterative process begins to happen. The company gains insight from the report, makes better decisions, and thus can spend more energy time & resources on clean data. Cleaner data in term means better reports, which produce better decisions & so on.

Also: What is venue analytics & why is it important?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters