Skip to content

Instantly share code, notes, and snippets.

@amritms
Forked from reinink/query.sql
Created June 4, 2020 17:11
Show Gist options
  • Select an option

  • Save amritms/0e21037968aa29f063c18a10d2fd8f56 to your computer and use it in GitHub Desktop.

Select an option

Save amritms/0e21037968aa29f063c18a10d2fd8f56 to your computer and use it in GitHub Desktop.
Text search across multiple tables using MySQL
select
first_name,
last_name
from
users
left join
companies on companies.id = users.company_id
where (
companies.name like 'TERM%' or
first_name like 'TERM%' or
last_name like 'TERM%'
)
Companies
+------------+-----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-----------------+------+-----+---------+----------------+
| id | bigint unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | MUL | NULL | |
+------------+-----------------+------+-----+---------+----------------+
+-----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| companies | 0 | PRIMARY | 1 | id | A | 10106 | NULL | NULL | | BTREE | | | YES | NULL |
| companies | 1 | companies_name_index | 1 | name | A | 8624 | NULL | NULL | | BTREE | | | YES | NULL |
+-----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Users
+------------+-----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-----------------+------+-----+---------+----------------+
| id | bigint unsigned | NO | PRI | NULL | auto_increment |
| company_id | bigint unsigned | NO | MUL | NULL | |
| first_name | varchar(255) | NO | MUL | NULL | |
| last_name | varchar(255) | NO | MUL | NULL | |
+------------+-----------------+------+-----+---------+----------------+
+-------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| users | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
| users | 1 | users_company_id_foreign | 1 | company_id | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
| users | 1 | users_first_name_index | 1 | first_name | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
| users | 1 | users_last_name_index | 1 | last_name | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
+-------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
@amritms
Copy link
Author

amritms commented Jun 4, 2020

https://gist.github.com/reinink/28bd174087e929ffa1f150e3fe8ffbfa#gistcomment-3295902

So, just an update here, for anyone who is interested. πŸ˜‚

I managed to get this query working in Laravel, as a SINGLE query, using the union approach, where it supports searching multiple queries. 🎊

The following is my final User model scope.

Brace yourselves. 😬

public function scopeSearch($query, string $term = null)
{
    collect(str_getcsv($term, ' ', '"'))->filter()->each(function ($term) use ($query) {
        $term = $term.'%';
        $query->whereIn('id', function ($query) use ($term) {
            $query->select('id')
                ->from(function ($query) use ($term) {
                    $query->select('users.id')
                        ->from('users')
                        ->where('users.first_name', 'like', $term)
                        ->orWhere('users.last_name', 'like', $term)
                        ->union(
                            $query->newQuery()
                                ->select('users.id')
                                ->from('users')
                                ->join('companies', 'users.company_id', '=', 'companies.id')
                                ->where('companies.name', 'like', $term)
                        );
                }, 'matches');
        });
    });
}

And the generated query (for the search "bill gates microsoft"):

SELECT
    *
FROM
    `users`
WHERE
    `id` in(
        SELECT
            `id` FROM ((
                SELECT
                    `users`.`id` FROM `users`
                WHERE
                    `users`.`first_name` LIKE 'bill%'
                    OR `users`.`last_name` LIKE 'bill%')
            UNION (
                SELECT
                    `users`.`id` FROM `users`
                    INNER JOIN `companies` ON `users`.`company_id` = `companies`.`id`
                WHERE
                    `companies`.`name` LIKE 'bill%')) AS `matches`)
    AND `id` in(
        SELECT
            `id` FROM ((
                SELECT
                    `users`.`id` FROM `users`
                WHERE
                    `users`.`first_name` LIKE 'gates%'
                    OR `users`.`last_name` LIKE 'gates%')
            UNION (
                SELECT
                    `users`.`id` FROM `users`
                    INNER JOIN `companies` ON `users`.`company_id` = `companies`.`id`
                WHERE
                    `companies`.`name` LIKE 'gates%')) AS `matches`)
    AND `id` in(
        SELECT
            `id` FROM ((
                SELECT
                    `users`.`id` FROM `users`
                WHERE
                    `users`.`first_name` LIKE 'microsoft%'
                    OR `users`.`last_name` LIKE 'microsoft%')
            UNION (
                SELECT
                    `users`.`id` FROM `users`
                    INNER JOIN `companies` ON `users`.`company_id` = `companies`.`id`
                WHERE
                    `companies`.`name` LIKE 'microsoft%')) AS `matches`)
ORDER BY
    `last_name` ASC, `first_name` ASC
LIMIT 15 OFFSET 0

Overall, the final results are amazing.

Running it for three terms (ie. "bill gates microsoft"), against 1 million users, and 100,000 companies results in query times between 3-6ms.

The trick? Err..tricks?

First, using a union (as folks have suggested) allows the query builder to run each query (the users match and companies match) independently of each other, and therefore use all the available indexes. This final query now successfully uses the users_first_name_index, users_last_name_index and companies_name_index.

Second, and this was the piece I was missing before, you must run this union as a derived table, not a normal (correlated/dependent) subquery.

Without the derived table, all of subqueries are "dependent":

image

Adding the derived table makes all the problems go away:

image

I would have thought that the sub queries would only be dependent if you actually had some type of dependency between the two. For example, inner.id = outer.id, or something like that. But I guess the MySQL query planner doesn't look at the conditions to see if these two queries are actually dependent.

The only outstanding issue I have still is running a 1-2 character search. For example, searching for "b", or even "bi" is rather slow, taking upwards of 800ms on a million rows. The previous approaches that don't use the indexes are faster in those cases. I suspect that this is because the derived tables end up becoming massive, and that just takes computation time. One simple solution is to just not perform the search until you have at least 3 characters, which seems to be the sweet spot.

And, now for a quick sales pitch. πŸ˜‚

I plan to cover all this in detail in my upcoming Eloquent Performance Patterns course. If that sounds interesting to you, be sure to join my mailing list on that website. 🀟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment