Top 3 MySQL Interview Questions

Top 3 MySQL Interview Questions

Introduction

Over the past 18 months, I’ve applied to several IT companies for Senior Backend Developer (Laravel) positions and decided to compile the top three MySQL interview questions that came up during these interviews.

Database Indexes

Questions about indexes are the most common. Simply put, an index on a column (or a set of columns) is a data structure that helps speed up queries involving those columns. For example, if I have a table that stores user data with an age column, creating an index on that column allows queries like the one below to run more efficiently:

If you don’t have an index on that column and you have a large number of records, the database engine will have to load the entire table into the RAM in portions in order to find all the records satisfying the MySQL query. Instead, if you have an index that has the values of the age column and the appropriate references to the table records, the engine just runs the query on the index, collects the references, and then fetches the records from the disk.

Indexes can be either dense or sparse, meaning whether the index includes every column value or only certain ones. Typically, indexes are stored in sorted order, and common data structures for indexes include hash tables and B-trees. For more information, I recommend reading Database Systems: The Complete Book by Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom.

Usually afterwards interviewers ask about the type of an index you would choose for a particular column. The classic example of that would be to ask about the following query:

Since the WHERE clause includes a range of values, then it makes sense to use a B-tree index as it’s the most efficient when it comes to ranges. On the other hand, if the clause consists of an exact match, then a hash table is the way to go.

Transactions and Deadlocks

A transaction is a sequence of SQL queries that must either all succeed or none at all - if one query fails, the database is rolled back to its initial state. Here's a typical transaction:

Note: Not all SQL-like systems support transactions, e.g. MySQL with MyISAM storage engine.

A deadlock occurs when two or more queries are waiting for each other to release locks on overlapping data. A common interview exercise is to demonstrate how a deadlock could occur with overlapping transactions, like so:

Transaction A starts by locking account 1 and updating its balance. It then tries to update account 2, but since Transaction B has already locked account 2, it has to wait. Similarly, Transaction B starts by locking account 2 and updating the balance. It then tries to update account 1, but it's locked by Transaction A, so it also has to wait.

Optimized Queries

This can be a range of questions aimed at assessing your knowledge about joins, sorting, and aggregation. An interesting one is about finding the most recent report by date. Let’s say we have a web application that allows users to add URLs and track the number of clicks each day. Now, we need to show the users the number of clicks their URLs have received today and if there aren’t any, show the number of clicks for the latest available date (can be yesterday, 2 days ago, etc). So we have the following tables:

The straightforward solution to that is to find the latest available date per site that is sooner than today and calculate the number of clicks (aka the number of rows in site_clicks):

The problem with that query is that it uses a correlated subquery, which means that the latest date calculation is run per each site_clicks row, therefore can be inefficient if the table contains many rows. If the outer query yields m rows and site_clicks contains n rows, the query’s complexity is O(m * n). The optimized solution would be to assemble a set of rows containing site_id and latest_date and then join it with site_clicks:

which reduces the complexity to O(m + n).

References