Tag Archives: interview

Cloud DBA and Management Interview

What does a cloud computing expert need to know? This is the last of a three part guide to interviewing for a cloud operations position. You can find them here – part one Operations Interview and part two Deployment Interview.

Here’s my guide to do just that.

1. Database administration experience

Although in some shops the DBA role is a completely separate one, there are many others where the Linux and Operations teams manage these services as well. We do have a some other material Oracle DBA Interview questions and MySQL DBA Interview Guide. Here’s a taste of what to expect.

o What is RAID? Which type is best?

RAID is a way to share a whole bunch of disks on one server. Databases like Oracle or MySQL do a lot of writing and reading from disk. If there are more disks sharing this work, it’s like you have more waiters in your restaurant. Faster serivce.

Although some folks still hang onto RAID 5 as an option, it’s generally a very bad one. It has a serious write penalty because of parity checking it must perform. Most databases do a lot of writing, even when user transactions are not doing INSERT or UPDATE. What’s more if a disk fails, RAID 5 although technically online, will be so slow as to be effectively unusable while the long slow rebuild happens.

What’s the answer then? RAID 10! It mirrors each volume, and then stripes across those mirrored sets. Fast I/O, fast recovery. Done & done.

o What are the tradeoffs with more indexes versus fewer?

In all relational databases, you build indexes on data. Indexes are just like the ones you think of in the yellow pages, phonebooks of yore. An index on first name means you can look up Obama by Barack as well. Index on street addresses means you can lookup on the White House. So the more indexes you have, the more different ways you can search for & fetch what you want.

On the other hand the penalty here, is that whenever you add new data & records to this database, all those indexes must be updated. That’s overhead, which slows down writes.

So the tradeoff is more indexes – faster fetching, slower writing. Fewer indexes slower fetching, faster writing.

o What do NoSQL databases eliminate? How do they achieve great speed?

There are quite a few different types of NoSQL databases. So I’m generalizing quite a lot here. One thing NoSQL databases eliminate is the ability to JOIN data across different columns. By removing this great feature of relational databases, they dramatically simplify the underlying implementation. No free lunch!

What else? Many of these databases cut corners on what’s called durability. What is durability? Imagine you are in a lecture hall and bring your notebook or are waiting tables, and taking orders. It might be quicker to do so without writing things down. You keep it all in your head. Great, but what if you forget something? You have to go ask for the order again! It may be faster, but more prone to error. Losing data is not something to be taken lightly. NoSQL databases don’t always flush data to permanent storage.

Whether or not an web operations candidate uses command line may seem like a small issue. But it speaks to what their DNA is, and the strength of their foundation. Strength and comfort on the command line is key.

o What is Amazon RDS? When should I use it?

Amazon has a managed relational database solution called RDS. It’s basically MySQL, Oracle or SQL Server, but modified so you can’t shoot yourself in the foot. Administrative tasks are simplified, but so are your configuration options.

I wrote an in-depth Amazon RDS use cases article. It mostly covers MySQL, but the general rules apply to Oracle & SQL Server. At the end of the data RDS is a lot less configurable and flexible. But if you don’t have a regular DBA on staff, it will probably simplify your administration of these servers.

o What are read-replicas? What about Multi-az?

Read-replicas are read-only copies of your data. Using MySQL these are fairly stock master-slave configurations. Note since they’re the standard technology, they’re still asyncronous. So yes the read-replica can lag behind.

Multi-az is a proprietary technology, and Amazon doesn’t disclose what’s under the hood. However it’s likely running on top of something like DRBD which is a distributed filesystem. This allows the underlying disk I/O to be mirrored across the internet, and to another availability zone. You’ll enjoy syncronous copies of your data, and no data consistency problems. Keep in mind those that the alternate server is offline or cold and can take time to come online.

o What is the primary bottleneck of hosting databases in the cloud? How has Amazon recently addressed this?

As I explained above disk I/O remains the largest bottleneck for relational databases, even if the entire dataset fits in memory. Why? Because sorting, joining, and rearranging data can take orders of magnitude more memory to magically do in memory. And that’s not even talking about durability guarentees.

The cloud has traditionally lagged quite a lot behind physical servers in terms of disk I/O so some internet firms have shyed away from moving to the cloud. EBS volumes were typically limited to a few hundred IOPs.

Amazon’s recently announced Provisioned IOPs. It’s a mouthful of a name for a very big development. It means you can provision how fast you want those virtual disks to be. For individual volumes the limit seems to be 2000 IOPs but you can also software raid across many of those virtual disks. For Amazon RDS the limit is reportedly 10,000 IOPs. This new feature will make a huge difference for hosting large high I/O databases in Amazon’s cloud.

2. Architecture & Management Questions

o Why does the API battle between Amazon & Eucalyptus (FOSS) matter?

As large applications are architected to build hardware components, and resources in the cloud, the API they work through becomes key. Sticking to an open standard for this API means you can change cloud vendors and/or build on multiple ones. We talked about this multi-cloud solution as a key way to avoid outages like AirBNB and Reddit experienced when AWS had an outage.

Following on the heels of that article, we were quoted about multi-cloud by Brandon Butler in his Network World piece .

o Do you use command line tools? Why?

A good web operations candidate should be very comfortable with command line tools. Everything in Linux is command line. It’s like broadway acting to movie acting, or literature to books. It’s the original source, much more powerful, what’s more it indicates and requires much stronger theoretical knowledge of the underlying systems being managed.

o What can go wrong with backups? How do we test them?

Everything can go wrong with them. They can fail to complete. Be backups of the wrong service or resource. Even the backup software itself can have bugs. The only way to sleep well at night is if you run firedrills and restore your application and data top to bottom.

o Should we encrypt filesystems in the cloud? What are the risks?

This depends on your environment and how sensitive your data is. If you’re collecting credit card data for instance, it may be key. However some surprising blips may push other applications to encrypt as well. Bugs in the hypervisor could potentially make your data vulnerable. What’s more if the cloud provider gets subpeonaed, it may well capture your server and data into the net. Better safe than sorry. Remember you don’t know where your data actually resides, but you do control who has access if you’re encrypted.

We wrote a very in-depth piece on Deploying on Amazon EC2 where we discuss questions such as encryption in more depth.
o Should we use offsite backups?

It’s definitely worth doing this. One more layer of insurance.

o What is load balancing? Why is it difficult with databases?

Load balancing puts a digital traffic circle into your infrastructure, giving you two roads or paths to resources. However those resources have to be exactly the same. With databases you are constantly writing to tables, and updating records. When you scale those horizontally, it becomes impossible to keep track of changes.

Relational databases are inherently difficult to scale. Most environments scale a single authoritative master vertically, and add multiple read-only slaves horizontally to allow the appplication to serve more customers.

o Why use a package manager? Can we install from source?

Package managers simplify the installation of software components. A team such as Redhat, Ubuntu or Debian builds a distribution, and compiles all components storing them in a repository. Installing packages this way allows your setup to be standard across servers. This allows more automation, and is simpler for another admin to figure out what you have, down the line when it passes to someone elses shoulders.

Installing from source is generally a bad idea. Although it allows you to tweak and configure each piece of software the way you want, tightly and efficiently, it also means everything is custom. No commoditization advantages.

o What is horizontal scalability?

This involves adding more hardware, more individual servers to service the same application and users.

o What is vertical scalability?

This means scaling up or growing your existing single server, so it is larger, has more memory, cpu or faster disk.

o What can go wrong with automatic failover?

Just about everything. Applications and services can stall, disks can fail, servers can hang. What’s more networks can exhibit latency. Automatic failover is ultimately a piece of software or algorithm trying to diagnose and handle situations. And it does so based on a very small list of rules or heuristics. The real world is messy, so this can often lead to false failure detection, and potentially loss of data.

o How do cloud vendors implement vertical scalability?

This may vary dramatically between cloud providers. Ultimately, however since virtualization allows you to boot a disk image onto any hardware, you can snapshot your current root volume or disk and then boot it on another server, one that is larger, smaller and so forth. About the only thing you need to watch out for is 32 versus 64 bit questions.

If you haven’t already, don’t forget to checkout the rest of this series – part one Operations Interview and part two Deployment Interview.

Read this far? Grab our newsletter – startup scalability.

Cloud Operations Interview

What does a cloud computing expert need to know? How do you hire a cloud computing expert? Competition for operations & DBAs is fierce, so you’ll want to know how to find the best.

If you’re a systems administrator or ops guy, you may want to prepare for an interview for such a position. Meanwhile, if you’re a director of it or operations, a recruiter or manager in HR, you’ll want to have some idea how to find the right candidate.

Here’s my guide to do just that. You may also jump to part two Cloud Deployment Interview or the last part three Cloud DBA, Architecture and Management Interview.

1. Solid unix systems administrator

At the top of the list, a cloud operations expert needs to understand Unix and more importantly Linux. Here are some sample questions to get the conversation moving:

o What is web operations and what have you done day-to-day?

Prepare some stories.

o What’s your favorite feature of the linux kernel?

This is an open ended question, but a systems administrator should have some knowledge here. The kernel is the most basic piece of software that runs when a computer boots up, whether it is a desktop or a server. This piece of software coordinates everything, manages resources, and directs traffic.

o Name some distributions of linux. What is a distro?

Linux is built by a collaborative team of thousands on the internet. That’s what makes it open source. The distributions, include the operating system, along with a collection of software to go along with it. All the supporting utilities, libraries and servers must be compiled and held in a repository. That’s what makes up a distribution. Debian, Redhat and Ubuntu are a few popular ones.

A cloud operations expert needs to have a wide ranging skillset, from unix administration, architecture, scalability, database & webserver administration, troubleshooting & performance, load & stress testing. You’ll also want someone who has learned hard lessons from some failures, has some war stories to tell and has a hard nose for stability.

o What’s the difference between apache and nginx?

These two pieces of software are both webservers, that is they respond to the HTTP protocol, and can serve HTML pages. They also have a myriad of plugins to support different languages and features. The difference? Nginx (pronounced engine-X) is a newer incarnation. It’s been rearchitected from the ground up, building on all the things learned from Apache over the years. Its tighter, more efficient code, and easier to configure.

You might also enjoy our Intro to EC2 Cloud Deployments Guide.

o What is a key value store? examples?

There are lots of examples of these types of databases. They are a very simple memory cache that can interface with most applications. Memcache is a popular example of a key value store. Redis, CouchDB and Voldemort can also do this.

o What is a page cache? Reverse proxy cache? examples?

These are all the same thing. They are basically a very minimal webserver without all the plugins or bells and whistles. You put one of these in front of your webserver to handle all the easy stuff, and speed up overall throughput. Varnish is a popular example.

o What filesystem do you prefer?

This is a bit arcane, but one should have some opinions here. xfs is a popular filesystem, though ext3 and ext4 are also common. Emphasize the journaling aspect here. Journaling means that if you pull the cord or your server crashes, the filesystem can recover upon reboot. It does this by journaling changes, much how a database keeps a redolog cache of recent changes to database tables.

o Command line tools

There are lots of commands in the day-to-day toolbox of a web ops expert. Here are some examples:
rsync (pronounced our-sync) – sync files between servers & do checksums to allow easy restarts
scp (pronounced s-c-p) – secure copy, similar to rsync but no checksums, so less reliable
curl (pronounced kurl) – diagnose & test urls and HTTP from the command line
cron (pronounced cron) – run commands at scheduled times
ssh (pronounced s-s-h) – secure shell, the most basic tool to reach a cloud server
ifconfig (pronounced if-config) – check the network interfaces on the server
vi/emacs (pronounced v-i and e-macks) – terminal editors, to modify config files
uptime (pronounced up-time) – display the current load average of the server
top (pronounced top) – interactive display of system metrics like memory, load, swap & processes
ps (pronounced p-s) – shows running processes on the server
/var/log/messages – essential system logfile

o What are application servers? How are they different from webservers?

Tomcat & Glassfish are two examples of application servers. These handle heavier weight languages & applications like Java. Application server on some level is just a more heavyduty webserver and these days Apache can be thought of as an application server also.

2. Cloud concepts

o What is virtualization? What is a hypervisor?

Virtualization allows you to run one or more computers within a computer. You can do virtualization on a desktop, sharing network, memory, cpu and disk resources among a number of virtual servers. But more importantly in cloud computing or IaaS offerings you can do virtualization at the datacenter level. The hypervisor layer is a datacenter virtualization technology that provisions server resources, and balances shared network and disk resources.

o What is an image?

In Amazon the world, the AMI or amazon machine image is a snapshot of a server state at one moment in time. This image is take at the block level, and includes the master block record, the first block on disk that a server boots from. All that is the state of a server, when it is shutdown, is what is stored on disk or in this image. All config files, logfiles, and anything else writing to disk.

o What is multi-tenant?

This means that there are multiple servers sharing resources. The tenants are the customers who each want to get the server, cpu, memory, network and disk that they paid for.

o What is the downside to shared resources?

Contention for resources is always the challenge. If your fellow tenants are not very thirsty, this can work to your advantage. But if they’re also heavy users, the hypervisor layer has manage the balancing act. You may get a spike of disk I/O at one point, but later get a dearth. This can cause a relational database like MySQL or Oracle to suddenly look stalled.

o What is instance-store? What is ebs?

Instance store servers were Amazon’s original offering, where servers had their own local (and slow) storage. This storage was ephemeral, so all machine state was lost on reboot. These servers also boot slowly. EBS also known as elastic block storage is a virtualized storage option, similar to NAS or NFS. You can create arbitrary chunks of storage, and attach them to servers, all from command line APIs. Cool!

o What is virtual private cloud?

With the VPC offering, Amazon drops a router into your existing datacenter. You can then provision virtual servers to your hearts content, and they all appear to be servers in your existing datacenter. Elastically scale, within the network and security model you’re already using.

o What is a hybrid approach to cloud adoption?

Keeping your investments in hardware and datacenter is obviously an appealing option for firms that have large existing environment. A hybrid approach with a VPC allows you to get your feet wet, but still keep essential applications on physical servers.

o What is Amazon EC2?

Elastic Compute Cloud refers to the virtual servers you spinup in Amazon Web Services.

o What is Amazon RDS, Oracle RDS, Mysql RDS?

Amazon has various relational and non-relational database offerings. RDS stands for relational database service.

RDS or roll your own – which is better? Here are some use cases to help you decide.

o What is multi-az?

Amazon’s infrastructure offering isn’t just a single datacenter with servers. The beauty of what they’ve built is that they offer a number of datacenters (called availability zones) in each of many regions such as Northern Virginia, Oregon and Singapore.

Incidentally multi-az is a key feature to how businesses can protect themselves from failure. Amazon recently had an outage, but AirBNB, Reddit & Foursquare didn’t have to fail.

o What does a CDN do? How does it work? examples?

A CDN is a content delivery network. Remember all those files that make up a webpage? Images, video, css files? Turns out serving these components from servers *closer* to your customer, make their webpages load much faster. CDNs are networks of servers that hold the content of your pages, and serve them faster.

It works by replacing content paths with a special one from your provider. A simple change in your code will allow content to dynamically load from across the web. Cool!

CloudFront is Amazon’s offering coupled with S3 for file storage. Akamai is another big provider.

We’re not done yet. In part two on deployments and http://www.iheavy.com/2012/11/01/cloud-deployment-interview/”>part three of this series, we’ll hit on other important skills a cloud ops expert should have including scripting, database administration (Our MySQL Interview Guide), scalability, performance, configuration management, metrics, monitoring, and some all important war stories!

Here are some questions to pique your interest:

o Why does the API battle between Amazon & Eucalyptus (FOSS) matter?
o Do you use command line tools? why?
o What can go wrong with backups? how do we test them?
o Should we encrypt filesystems in the cloud? what are the risks?
o Should we use offsite backups?
o What is DRBD?
o Why is auditing important? access control?
o What is load balancing? why is it difficult with databases?
o How do you perform a benchmark? perform load testing?
o Why use a package manager? can we install from source?

Our Deploying MySQL on Amazon EC2 Guide is also related to this interview process.

You may also jump to part two Cloud Deployment Interview or the last part three Cloud DBA, Architecture and Management Interview.

Read this far? Grab our newsletter – startup scalability.

Oracle DBA Interview Questions

Oracle Database Administrator or often called DBAs are an indispensable part of your operations team. They manage the systems that house all your business data, your customers, products, transactions and all that analytical data on what customers are actually doing. If you’ve ever been on the hunt, you may wonder, why the shortage of DBAs? To that we’ll answer, have you ever heard of Dustin Moskovitz?

So you certainly want to entrust that to someone who knows what they’re talking about. Enter the Oracle DBA Interview, a process that some will see as a technical test, while others will see as a fit of personalities, behaviors, and work ethic.

From the technical side we thought we’d bring you a quick and dirty checklist of questions. This isn’t an exhaustive list by any means, but is a good place to start and will certainly provide you with a glimpse of their knowledge.

Also if you’re looking to hire a MySQL DBA here’s a guide, and also one for hiring and EC2 expert.

1. What is the difference between RMAN and a traditional hotbackup?

RMAN is faster, can do incremental (changes only) backups, and does not place tablespaces into hotbackup mode.

2. What are bind variables and why are they important?

With bind variables in SQL, Oracle can cache related queries a single time in the SQL cache (area). This avoids a hard parse each time, which saves on various locking and latching resources we use to check objects existence and so on. BONUS: For rarely run queries, especially BATCH queries, we explicitely DO NOT want to use bind variables, as they hide information from the Cost Based Opitmizer.

BONUS BONUS: For batch queries from 3rd party apps like peoplesoft, if we can’t remove bind variables, we can use bind variable peeking!

3. In PL/SQL, what is bulk binding, and when/how would it help performance?

Oracle’s SQL and PL/SQL engines are separate parts of the kernel which require context switching, like between unix processes. This is slow, and uses up resources. If we loop on an SQL statement, we are implicitely flipping between these two engines. We can minimize this by loading our data into an array, and using PL/SQL bulk binding operation to do it all in one go!

4. Why is SQL*Loader direct path so fast?

SQL*Loader with direct path option can load data ABOVE the high water mark of a table, and DIRECTLY into the datafiles, without going through the SQL engine at all. This avoids all the locking, latching, and so on, and doesn’t impact the db (except possibly the I/O subsystem) at all.

5. What are the tradeoffs between many vs few indexes? When would you want to have many, and when would it be better to have fewer?

Fewer indexes on a table mean faster inserts/updates. More indexes mean faster, more specific WHERE clauses possibly without index merges.

6. What is the difference between RAID 5 and RAID 10? Which is better for Oracle?

RAID 5 is striping with an extra disk for parity. If we lose a disk we can reconstruct from that parity disk.

RAID 10 is mirroring pairs of disks, and then striping across those sets.

RAID 5 was created when disks were expensive. Its purpose was to provide RAID on the cheap. If a disk fails, the IO subsystem will perform VERY slowly during the rebuild process. What’s more your liklihood of failure increases dramatically during this period, with all the added weight of the rebuild. Even when it is operating normally RAID 5 is slow for everything but reading. Given that and knowing databases (especially Oracle’s redo logs) continue to experience write activity all the time, we should avoid RAID5 in all but the rare database that is MOSTLY read activity. Don’t put redologs on RAID5.

RAID10 is just all around goodness. If you lose one disk in a set of 10 for example, you could lose any one of eight other disks and have no troubles. What’s more rebuilding does not impact performance at all since you’re simply making a mirror copy. Lastly RAID10 perform exceedingly well in all types of databases.

7. When using Oracle export/import what character set concerns might come up? How do you handle them?

Be sure to set NLS_LANG for example to “AMERCIAN_AMERICA.WE8ISO8859P1”. If your source database is US7ASCII, beware of 8-bit characters. Also be wary of multi-byte characters sets as those may require extra attention. Also watch export/import for messages about any “character set conversions” which may occur.

8. How do you use automatic PGA memory management with Oracle 9i and above?


9. Explain two easy SQL optimizations.

a. EXISTS can be better than IN under various conditions

b. UNION ALL is faster than UNION (not sorting)

10. Name three SQL operations that perform a SORT.









11. What is your favorite tool for day-to-day Oracle operation?

Hopefully we hear some use of command line as the answer!

12. What is the difference between Truncate and Delete? Why is one faster?

Can we ROLLBACK both? How would a full table scan behave after?

Truncate is nearly instantaenous, cannot be rolled back, and is fast because Oracle simply resets the HWM. When a full table scan is performed on a table, such as for a sort operation, Oracle reads to the HWM. So if you delete every single solitary row in 10 million row table so it is now empty, sorting on that table of 0 rows would still be extremely slow.

13. What is the difference between a materialized view (snapshot) fast refresh versus complete refresh? When is one better, and when the other?

Fast refresh maintains a change log table, which records change vectors, not unlike how the redo logs work. There is overhead to this, as with a table that has a LOT of indexes on it, and inserts and updates will be slower. However if you are performing refreshes often, like every few minutes, you want to do fast refresh so you don’t have to full-table-scan the source table. Complete refresh is good if you’re going to refresh once a day. Does a full table scan on the source table, and recreats the snapshot/mview. Also inserts/updates on the source table are NOT impacted on tables where complete refresh snapshots have been created.

14. What does the NO LOGGING option do? Why would we use it? Why would we be careful of using it?

It disables the logging of changes to the redologs. It does not disable ALL LOGGING, however as Oracle continues to use a base of changes, for recovery if you pull the plug on the box, for instance. However it will cause problems if you are using standby database. Use it to speed up operations, like an index rebuild, or partition maintenance operations.

15. Tell me about standby database? What are some of the configurations of it? What should we watch out for?

Standby databases allow us to create a copy of our production db, for disaster recovery. We merely switch mode on the target db, and bring it up as read/write. Can setup as master->slave or master->master. The latter allows the former prod db to become the standby, once the failure cause is remedied. Watch out for NO LOGGING!! Be sure we’re in archivelog mode.

Hey you! If you made it this far, definitely grab our newsletter.

MySQL DBA Interview Questions

One of the more popular articles on our site according to Google is the Oracle DBA Interview Questions article we did a few years ago. So with that in mind, we’ve put together a similar article for MySQL DBA Interviews.

1. Explain two ways that MySQL Replication can get out of sync. What are the solutions to these problems?

One way is if your code contains non-deterministic functions such as SYSDATE, USER and UUID. The second way is if you have mixed transactions between InnoDB and MyISAM tables, in some cases those can get replicated incorrectly.

2. How does one create a new user and give it privileges on an existing database? Why should “with grant option” be avoided?

mysql> grant all privileges on test.* to ‘newuser’@’localhost’ identified by ‘mypassword’;

3. Explain the differences, advantages and disadvantages to using the MERGE storage engine, versus using Partitioned tables to manage large datasets.

The MERGE storage behaves much like a view with UNION ALL between the tables. It is easy to add and remove tables and redefine the MERGE table. They are good for logging and huge datasets.

One of the primary differences between partitioning is that a row can exist in one and only one partition, which is not the case with a MERGE table. Also you can cluster data together in certain partitions reducing the amount of work the server may have to do to get at related data. Also partitioned data can be distributed across multiple harddrives.

4. How do you determine what storage engines are installed?

mysql> show global variables like 'have%';                                       
| Variable_name         | Value    |
| have_archive          | NO       |
| have_bdb              | YES      |
| have_blackhole_engine | NO       |
| have_compress         | YES      |
| have_crypt            | YES      |
| have_csv              | NO       |
| have_dynamic_loading  | YES      |
| have_example_engine   | NO       |
| have_federated_engine | NO       |
| have_geometry         | YES      |
| have_innodb           | YES      |
| have_isam             | NO       |
| have_merge_engine     | YES      |
| have_ndbcluster       | NO       |
| have_openssl          | DISABLED |
| have_query_cache      | YES      |
| have_raid             | NO       |
| have_rtree_keys       | YES      |
| have_symlink          | YES      |
19 rows in set (0.00 sec)

5. How do you get the query cache status? How do you tune it?

mysql> show global status like 'qcache%';                                        
| Variable_name           | Value |
| Qcache_free_blocks      | 0     |
| Qcache_free_memory      | 0     |
| Qcache_hits             | 0     |
| Qcache_inserts          | 0     |
| Qcache_lowmem_prunes    | 0     |
| Qcache_not_cached       | 0     |
| Qcache_queries_in_cache | 0     |
| Qcache_total_blocks     | 0     |
8 rows in set (0.16 sec)

Tune it by looking at qcache_hits/(qcache_hits+com_selects)

6. What is DRBD?  Explain the advantages and disadvantages to MySQL Replication for High Availability.

7. What is circular replication?  How is it different from master-slave replication?