Tag Archives: unix

Top questions to ask a devops expert when hiring or preparing for job & interview

xkcd_goodcode
Strip by Randall Munroe; xkcd.com

Whether your a hiring manager, head of HR or recruiter, you are probably looking for a devops expert. These days good ones are not easy to find. The spectrum of tools & technologies is broad. To manage today’s cloud you need a generalist.

Join 33,000 others and follow Sean Hull on twitter @hullsean.

If you’re a devops expert and looking for a job, these are also some essential questions you should have in your pocket. Be able to elaborate on these high level concepts as they’re crucial in todays agile startups.

Check out: 8 questions to ask an aws ec2 expert

Also new: Top questions to ask on a devops expert interview

And: How to hire a developer that doesn’t suck

1. How do you automate deployments?

A. Get your code in version control (git)

Believe it or not there are small 1 person teams that haven’t done this. But even with those, there’s real benefit. Get on it!

B. Evolve to one script push-button deploy (script)

If deploying new code involves a lot of manual steps, move file here, set config there, set variable, setup S3 bucket, etc, then start scripting. That midnight deploy process should be one master script which includes all the logic.

It’s a process to get there, but keep the goal in sight.

C. Build confidence over many iterations (team process & agile)

As you continue to deploy manually with a master script, you’ll iron out more details, contingencies, and problems. Over time You’ll gain confidence that the script does the job.

D. Employ continuous integration Tools to formalize process (CircleCI, Jenkins)

Now that you’ve formalized your deploy in code, putting these CI tools to use becomes easier. Because they’re custom built for you at this stage!

E. 10 deploys per day (long term goal)

Your longer term goal is 10 deploys a day. After you’ve automated tests, team confidence will grow around developers being able to deploy to production. On smaller teams of 1-5 people this may still be only 10 deploys per week, but still a useful benchmark.

Also: Top serverless interview questions for hiring aws lambda experts

2. What is microservices?

Microservices is about two-pizza teams. Small enough that there’s little beaurocracy. Able to be agile, focus on one business function. Iterate quickly without logjams with other business teams & functions.

Microservices interact with each other through APIs, deploy their own components, and use their own isolated data stores.

Function as a service, Amazon Lambda, or serverless computing enables microservices in a huge way.

Related: Which engineering roles are in greatest demand?

3. What is serverless computing?

Serverless computing is a model where servers & infrastructure do not need to be formalized. Only the code is deployed, and the platform, AWS Lambda for example, takes care of instant provisioning of containers & VMs when the code gets called.

Events within the cloud environment, such a file added to S3 bucket, trigger the serverless functions. API Gateway endpoints can also trigger the functions to run.

Authentication services are used for user login & identity management such as Auth0 or Amazon Cognito. The backend data store could be Dynamodb or Google’s Firebase for example.

Read: Can on-demand consulting save startups time & money?

4. What is containerization?

Containers are like faster deploying VMs. They have all the advantages of an image or snapshot of a server. Why is this useful? Because you can containerize your microservices, so each one does one thing. One has a webserver, with specific version of xyz.

Containers can also help with legacy applications, as you isolate older versions & dependencies that those applications still rely on.

Containers enable developers to setup environments quickly, and be more agile.

Also: 30 questions to ask a serverless fanboy

5. What is CloudFormation?

CloudFormation, formalizes all of your cloud infrastructure into json files. Want to add an IAM user, S3 bucket, rds database, or EC2 server? Want to configure a VPC, subnet or access control list? All these things can be formalized into cloudformation files.

Once you’ve started down this road, you can checkin your infrastructure definitions into version control, and manage them just like you manage all your other code. Want to do unit tests? Have at it. Now you can test & deploy with more confidence.

Terraform is an extension of CloudFormation with even more power built in.

Also: What can startups learn from the DYN DNS outage?

Get more. Grab our exclusive monthly Scalable Startups. We share tips and special content. Our latest Why I don’t work with recruiters

Configuration Management – What is it and why is it important?

Every software service or component on a server requires configurations. In your desktop applications you set preferences for what your default page will be, how you’d like your margins set, or whether to save and restore cookies each time you restart.

Enterprise applications also require complex configuration settings.  Want to monitor a webserver and a database with Nagios, that’s set in the config file.  What to start MySQL with 8G of memory for InnoDB, that’s also set in a config file.  What’s more config files contain server specific settings, based on IP address, or the servers role, webserver or database for example.   The webserver may also have memcache and outbound email services running.

With more traditional deployments, the systems administrator will setup each physical box, and configure those services based on the business needs.  As you bring online 10’s or 100’s of servers, however, you can quickly see how labor intensive this process would be, and also how much redundancy there is.

Enter configuration management into the picture.  Previously I blogged about tools like Puppet that can bring great new best practices to the table. There is also cfengine, and the newer Chef which incorporates cloud deployments as well into the mix.  Configuration management allows you to remotely administer servers, install packages, manage dependencies, install configurations based on a central copy, and even define roles and templates for new servers.  This brings a whole new level of professionalism to deployments, and also newfound power and flexibility.

We’ll be writing more about configuration management, especially in the context of cloud deployments such as Amazon EC2 so please stay tuned.

Sean Hull asks on Quora – What is configuration management and why is it important?

Oracle DBA Interview Questions

Oracle Database Administrator or often called DBAs are an indispensable part of your operations team. They manage the systems that house all your business data, your customers, products, transactions and all that analytical data on what customers are actually doing. If you’ve ever been on the hunt, you may wonder, why the shortage of DBAs? To that we’ll answer, have you ever heard of Dustin Moskovitz?

So you certainly want to entrust that to someone who knows what they’re talking about. Enter the Oracle DBA Interview, a process that some will see as a technical test, while others will see as a fit of personalities, behaviors, and work ethic.

From the technical side we thought we’d bring you a quick and dirty checklist of questions. This isn’t an exhaustive list by any means, but is a good place to start and will certainly provide you with a glimpse of their knowledge.

Also if you’re looking to hire a MySQL DBA here’s a guide, and also one for hiring and EC2 expert.

1. What is the difference between RMAN and a traditional hotbackup?

RMAN is faster, can do incremental (changes only) backups, and does not place tablespaces into hotbackup mode.

2. What are bind variables and why are they important?

With bind variables in SQL, Oracle can cache related queries a single time in the SQL cache (area). This avoids a hard parse each time, which saves on various locking and latching resources we use to check objects existence and so on. BONUS: For rarely run queries, especially BATCH queries, we explicitely DO NOT want to use bind variables, as they hide information from the Cost Based Opitmizer.

BONUS BONUS: For batch queries from 3rd party apps like peoplesoft, if we can’t remove bind variables, we can use bind variable peeking!

3. In PL/SQL, what is bulk binding, and when/how would it help performance?

Oracle’s SQL and PL/SQL engines are separate parts of the kernel which require context switching, like between unix processes. This is slow, and uses up resources. If we loop on an SQL statement, we are implicitely flipping between these two engines. We can minimize this by loading our data into an array, and using PL/SQL bulk binding operation to do it all in one go!

4. Why is SQL*Loader direct path so fast?

SQL*Loader with direct path option can load data ABOVE the high water mark of a table, and DIRECTLY into the datafiles, without going through the SQL engine at all. This avoids all the locking, latching, and so on, and doesn’t impact the db (except possibly the I/O subsystem) at all.

5. What are the tradeoffs between many vs few indexes? When would you want to have many, and when would it be better to have fewer?

Fewer indexes on a table mean faster inserts/updates. More indexes mean faster, more specific WHERE clauses possibly without index merges.

6. What is the difference between RAID 5 and RAID 10? Which is better for Oracle?

RAID 5 is striping with an extra disk for parity. If we lose a disk we can reconstruct from that parity disk.

RAID 10 is mirroring pairs of disks, and then striping across those sets.

RAID 5 was created when disks were expensive. Its purpose was to provide RAID on the cheap. If a disk fails, the IO subsystem will perform VERY slowly during the rebuild process. What’s more your liklihood of failure increases dramatically during this period, with all the added weight of the rebuild. Even when it is operating normally RAID 5 is slow for everything but reading. Given that and knowing databases (especially Oracle’s redo logs) continue to experience write activity all the time, we should avoid RAID5 in all but the rare database that is MOSTLY read activity. Don’t put redologs on RAID5.

RAID10 is just all around goodness. If you lose one disk in a set of 10 for example, you could lose any one of eight other disks and have no troubles. What’s more rebuilding does not impact performance at all since you’re simply making a mirror copy. Lastly RAID10 perform exceedingly well in all types of databases.

7. When using Oracle export/import what character set concerns might come up? How do you handle them?

Be sure to set NLS_LANG for example to “AMERCIAN_AMERICA.WE8ISO8859P1”. If your source database is US7ASCII, beware of 8-bit characters. Also be wary of multi-byte characters sets as those may require extra attention. Also watch export/import for messages about any “character set conversions” which may occur.

8. How do you use automatic PGA memory management with Oracle 9i and above?

Set the WORKAREA_SIZE_POLICY parameter to AUTO and set PGA_AGGREGATE_TARGET

9. Explain two easy SQL optimizations.

a. EXISTS can be better than IN under various conditions

b. UNION ALL is faster than UNION (not sorting)

10. Name three SQL operations that perform a SORT.

a. CREATE INDEX

b. DISTINCT

c. GROUP BY

d. ORDER BY

f. INTERSECT

g. MINUS

h. UNION

i. UNINDEXED TABLE JOIN

11. What is your favorite tool for day-to-day Oracle operation?

Hopefully we hear some use of command line as the answer!

12. What is the difference between Truncate and Delete? Why is one faster?

Can we ROLLBACK both? How would a full table scan behave after?

Truncate is nearly instantaenous, cannot be rolled back, and is fast because Oracle simply resets the HWM. When a full table scan is performed on a table, such as for a sort operation, Oracle reads to the HWM. So if you delete every single solitary row in 10 million row table so it is now empty, sorting on that table of 0 rows would still be extremely slow.

13. What is the difference between a materialized view (snapshot) fast refresh versus complete refresh? When is one better, and when the other?

Fast refresh maintains a change log table, which records change vectors, not unlike how the redo logs work. There is overhead to this, as with a table that has a LOT of indexes on it, and inserts and updates will be slower. However if you are performing refreshes often, like every few minutes, you want to do fast refresh so you don’t have to full-table-scan the source table. Complete refresh is good if you’re going to refresh once a day. Does a full table scan on the source table, and recreats the snapshot/mview. Also inserts/updates on the source table are NOT impacted on tables where complete refresh snapshots have been created.

14. What does the NO LOGGING option do? Why would we use it? Why would we be careful of using it?

It disables the logging of changes to the redologs. It does not disable ALL LOGGING, however as Oracle continues to use a base of changes, for recovery if you pull the plug on the box, for instance. However it will cause problems if you are using standby database. Use it to speed up operations, like an index rebuild, or partition maintenance operations.

15. Tell me about standby database? What are some of the configurations of it? What should we watch out for?

Standby databases allow us to create a copy of our production db, for disaster recovery. We merely switch mode on the target db, and bring it up as read/write. Can setup as master->slave or master->master. The latter allows the former prod db to become the standby, once the failure cause is remedied. Watch out for NO LOGGING!! Be sure we’re in archivelog mode.

Hey you! If you made it this far, definitely grab our newsletter.

MySQL DBA Interview Questions

One of the more popular articles on our site according to Google is the Oracle DBA Interview Questions article we did a few years ago. So with that in mind, we’ve put together a similar article for MySQL DBA Interviews.

1. Explain two ways that MySQL Replication can get out of sync. What are the solutions to these problems?

One way is if your code contains non-deterministic functions such as SYSDATE, USER and UUID. The second way is if you have mixed transactions between InnoDB and MyISAM tables, in some cases those can get replicated incorrectly.

2. How does one create a new user and give it privileges on an existing database? Why should “with grant option” be avoided?

mysql> grant all privileges on test.* to ‘newuser’@’localhost’ identified by ‘mypassword’;

3. Explain the differences, advantages and disadvantages to using the MERGE storage engine, versus using Partitioned tables to manage large datasets.

The MERGE storage behaves much like a view with UNION ALL between the tables. It is easy to add and remove tables and redefine the MERGE table. They are good for logging and huge datasets.

One of the primary differences between partitioning is that a row can exist in one and only one partition, which is not the case with a MERGE table. Also you can cluster data together in certain partitions reducing the amount of work the server may have to do to get at related data. Also partitioned data can be distributed across multiple harddrives.

4. How do you determine what storage engines are installed?

mysql> show global variables like 'have%';                                       
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| have_archive          | NO       |
| have_bdb              | YES      |
| have_blackhole_engine | NO       |
| have_compress         | YES      |
| have_crypt            | YES      |
| have_csv              | NO       |
| have_dynamic_loading  | YES      |
| have_example_engine   | NO       |
| have_federated_engine | NO       |
| have_geometry         | YES      |
| have_innodb           | YES      |
| have_isam             | NO       |
| have_merge_engine     | YES      |
| have_ndbcluster       | NO       |
| have_openssl          | DISABLED |
| have_query_cache      | YES      |
| have_raid             | NO       |
| have_rtree_keys       | YES      |
| have_symlink          | YES      |
+-----------------------+----------+
19 rows in set (0.00 sec)
mysql>

5. How do you get the query cache status? How do you tune it?

mysql> show global status like 'qcache%';                                        
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Qcache_free_blocks      | 0     |
| Qcache_free_memory      | 0     |
| Qcache_hits             | 0     |
| Qcache_inserts          | 0     |
| Qcache_lowmem_prunes    | 0     |
| Qcache_not_cached       | 0     |
| Qcache_queries_in_cache | 0     |
| Qcache_total_blocks     | 0     |
+-------------------------+-------+
8 rows in set (0.16 sec)
mysql>

Tune it by looking at qcache_hits/(qcache_hits+com_selects)

6. What is DRBD?  Explain the advantages and disadvantages to MySQL Replication for High Availability.

7. What is circular replication?  How is it different from master-slave replication?