Create index for multiple WHERE EXISTS subqueriesHigh Disk IO, How to mitigate?Identical query, tables, but...
Copy the content of an environment
Sing Baby Shark
Why does calling Python's 'magic method' not do type conversion like it would for the corresponding operator?
Why do most space probes survive for far longer than they were designed for?
Ethernet cable only works in certain positions
Is layered encryption more secure than long passwords?
Does limiting the number of sources help simplify the game for a new DM with new and experienced players?
Pictures from Mars
How bad is a Computer Science course that doesn't teach Design Patterns?
Why don't hotels offer (at least) 1 kitchen bookable by any guest?
Why does finding small effects in large studies indicate publication bias?
How to make clear what a part-humanoid character looks like when they're quite common in their world?
Why do climate experts from the UN/IPCC rarely mention Grand Solar Minimum?
Does a star need to be inside a galaxy?
Badly designed reimbursement form. What does that say about the company?
Squeeze theorem laws
Why would you use 2 alternate layout buttons instead of 1, when only one can be selected at once
Manager has noticed coworker's excessive breaks. Should I warn him?
Does a code with length 6, size 32 and distance 2 exist?
How should I ship cards?
Why is it a problem for Freddie if the guys from Munich did what he wanted?
Why does the current not skip resistors R3 and R5 when R6 and R4 have no resistance?
Someone wants me to use my credit card at a card-only gas/petrol pump in return for cash
Why did Shae (falsely) implicate Sansa?
Create index for multiple WHERE EXISTS subqueries
High Disk IO, How to mitigate?Identical query, tables, but different EXPLAIN and performanceWhy does IN (subquery) perform bad when = (subquery) is blazing fast?Deciding which MySQL execution plan is betterOptimizing slow queryOptimizing a simple query on a large tableSlow SELECT examining whole tableselect MAX() from MySQL view (2x INNER JOIN) is slowJoining mysql tablesPerformance of mysql equi-join observed in HDD and SSD
I have to find an efficient way to write a very specific query for an application that I'm developing.
The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".
So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".
I've got these tables: accessions
(plant lines), markers
(the genetic markers), genotypes
(the genotype; one per marker, accession and dataset), datasets
(to subdivide genotype values into datasets) and taxonomies
(the plant species).
I've come up with this query:
SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)
NOTE: If you're wondering about the SELECT 1 AS
bits, this is only because I also run the query for 0
and NOT EXISTS
. So this is only one example query.
It returns the correct result, but is quite slow on a genotypes
table with roughtly 2 million rows.
I've had a look at EXPLAIN
of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a
.
However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.
Does anyone have any suggestions as to how to improve performance?
EDIT:
CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1
CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1
CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,
) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1
CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
mysql index query-performance subquery
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
|
show 2 more comments
I have to find an efficient way to write a very specific query for an application that I'm developing.
The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".
So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".
I've got these tables: accessions
(plant lines), markers
(the genetic markers), genotypes
(the genotype; one per marker, accession and dataset), datasets
(to subdivide genotype values into datasets) and taxonomies
(the plant species).
I've come up with this query:
SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)
NOTE: If you're wondering about the SELECT 1 AS
bits, this is only because I also run the query for 0
and NOT EXISTS
. So this is only one example query.
It returns the correct result, but is quite slow on a genotypes
table with roughtly 2 million rows.
I've had a look at EXPLAIN
of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a
.
However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.
Does anyone have any suggestions as to how to improve performance?
EDIT:
CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1
CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1
CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,
) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1
CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
mysql index query-performance subquery
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Show us the execution plan and theSHOW CREATE TABLE table_name;
output for all the 4 tables.
– ypercubeᵀᴹ
Jun 8 '16 at 9:16
@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?
– Baz
Jun 8 '16 at 9:29
The EXPLAIN output of the query.
– ypercubeᵀᴹ
Jun 8 '16 at 9:30
@ypercubeᵀᴹ Ok, I've added that as well.
– Baz
Jun 8 '16 at 10:13
2
I'd try an index on(a, marker_id, accession_id)
and on(a, accession_id, marker_id)
. And since thata
is a varchar, use string literals, not numbers:where a = '1'
. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).
– ypercubeᵀᴹ
Jun 8 '16 at 10:31
|
show 2 more comments
I have to find an efficient way to write a very specific query for an application that I'm developing.
The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".
So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".
I've got these tables: accessions
(plant lines), markers
(the genetic markers), genotypes
(the genotype; one per marker, accession and dataset), datasets
(to subdivide genotype values into datasets) and taxonomies
(the plant species).
I've come up with this query:
SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)
NOTE: If you're wondering about the SELECT 1 AS
bits, this is only because I also run the query for 0
and NOT EXISTS
. So this is only one example query.
It returns the correct result, but is quite slow on a genotypes
table with roughtly 2 million rows.
I've had a look at EXPLAIN
of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a
.
However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.
Does anyone have any suggestions as to how to improve performance?
EDIT:
CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1
CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1
CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,
) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1
CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
mysql index query-performance subquery
I have to find an efficient way to write a very specific query for an application that I'm developing.
The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".
So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".
I've got these tables: accessions
(plant lines), markers
(the genetic markers), genotypes
(the genotype; one per marker, accession and dataset), datasets
(to subdivide genotype values into datasets) and taxonomies
(the plant species).
I've come up with this query:
SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)
NOTE: If you're wondering about the SELECT 1 AS
bits, this is only because I also run the query for 0
and NOT EXISTS
. So this is only one example query.
It returns the correct result, but is quite slow on a genotypes
table with roughtly 2 million rows.
I've had a look at EXPLAIN
of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a
.
However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.
Does anyone have any suggestions as to how to improve performance?
EDIT:
CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1
CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1
CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,
) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1
CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
mysql index query-performance subquery
mysql index query-performance subquery
edited Jun 8 '16 at 10:13
Baz
asked Jun 8 '16 at 9:00
BazBaz
105119
105119
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Show us the execution plan and theSHOW CREATE TABLE table_name;
output for all the 4 tables.
– ypercubeᵀᴹ
Jun 8 '16 at 9:16
@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?
– Baz
Jun 8 '16 at 9:29
The EXPLAIN output of the query.
– ypercubeᵀᴹ
Jun 8 '16 at 9:30
@ypercubeᵀᴹ Ok, I've added that as well.
– Baz
Jun 8 '16 at 10:13
2
I'd try an index on(a, marker_id, accession_id)
and on(a, accession_id, marker_id)
. And since thata
is a varchar, use string literals, not numbers:where a = '1'
. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).
– ypercubeᵀᴹ
Jun 8 '16 at 10:31
|
show 2 more comments
Show us the execution plan and theSHOW CREATE TABLE table_name;
output for all the 4 tables.
– ypercubeᵀᴹ
Jun 8 '16 at 9:16
@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?
– Baz
Jun 8 '16 at 9:29
The EXPLAIN output of the query.
– ypercubeᵀᴹ
Jun 8 '16 at 9:30
@ypercubeᵀᴹ Ok, I've added that as well.
– Baz
Jun 8 '16 at 10:13
2
I'd try an index on(a, marker_id, accession_id)
and on(a, accession_id, marker_id)
. And since thata
is a varchar, use string literals, not numbers:where a = '1'
. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).
– ypercubeᵀᴹ
Jun 8 '16 at 10:31
Show us the execution plan and the
SHOW CREATE TABLE table_name;
output for all the 4 tables.– ypercubeᵀᴹ
Jun 8 '16 at 9:16
Show us the execution plan and the
SHOW CREATE TABLE table_name;
output for all the 4 tables.– ypercubeᵀᴹ
Jun 8 '16 at 9:16
@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?
– Baz
Jun 8 '16 at 9:29
@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?
– Baz
Jun 8 '16 at 9:29
The EXPLAIN output of the query.
– ypercubeᵀᴹ
Jun 8 '16 at 9:30
The EXPLAIN output of the query.
– ypercubeᵀᴹ
Jun 8 '16 at 9:30
@ypercubeᵀᴹ Ok, I've added that as well.
– Baz
Jun 8 '16 at 10:13
@ypercubeᵀᴹ Ok, I've added that as well.
– Baz
Jun 8 '16 at 10:13
2
2
I'd try an index on
(a, marker_id, accession_id)
and on (a, accession_id, marker_id)
. And since that a
is a varchar, use string literals, not numbers: where a = '1'
. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).– ypercubeᵀᴹ
Jun 8 '16 at 10:31
I'd try an index on
(a, marker_id, accession_id)
and on (a, accession_id, marker_id)
. And since that a
is a varchar, use string literals, not numbers: where a = '1'
. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).– ypercubeᵀᴹ
Jun 8 '16 at 10:31
|
show 2 more comments
1 Answer
1
active
oldest
votes
g.dataset_id = 3 is killing the left
try this
FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3
The where is killing the left joins in the exists so you can drop that
This may give you better performance
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)
ypercube is correct you cannot combine all 3
This would mean that had to be equal on the same accession.taxonomy_id
SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"
@ypercubeᵀᴹ what is different?
– paparazzo
Jun 8 '16 at 10:41
@ypercubeᵀᴹ I will take another look at it
– paparazzo
Jun 8 '16 at 11:01
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f140702%2fcreate-index-for-multiple-where-exists-subqueries%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
g.dataset_id = 3 is killing the left
try this
FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3
The where is killing the left joins in the exists so you can drop that
This may give you better performance
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)
ypercube is correct you cannot combine all 3
This would mean that had to be equal on the same accession.taxonomy_id
SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"
@ypercubeᵀᴹ what is different?
– paparazzo
Jun 8 '16 at 10:41
@ypercubeᵀᴹ I will take another look at it
– paparazzo
Jun 8 '16 at 11:01
add a comment |
g.dataset_id = 3 is killing the left
try this
FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3
The where is killing the left joins in the exists so you can drop that
This may give you better performance
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)
ypercube is correct you cannot combine all 3
This would mean that had to be equal on the same accession.taxonomy_id
SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"
@ypercubeᵀᴹ what is different?
– paparazzo
Jun 8 '16 at 10:41
@ypercubeᵀᴹ I will take another look at it
– paparazzo
Jun 8 '16 at 11:01
add a comment |
g.dataset_id = 3 is killing the left
try this
FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3
The where is killing the left joins in the exists so you can drop that
This may give you better performance
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)
ypercube is correct you cannot combine all 3
This would mean that had to be equal on the same accession.taxonomy_id
SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"
g.dataset_id = 3 is killing the left
try this
FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3
The where is killing the left joins in the exists so you can drop that
This may give you better performance
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)
ypercube is correct you cannot combine all 3
This would mean that had to be equal on the same accession.taxonomy_id
SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"
edited Jun 8 '16 at 12:27
answered Jun 8 '16 at 10:38
paparazzopaparazzo
4,6141230
4,6141230
@ypercubeᵀᴹ what is different?
– paparazzo
Jun 8 '16 at 10:41
@ypercubeᵀᴹ I will take another look at it
– paparazzo
Jun 8 '16 at 11:01
add a comment |
@ypercubeᵀᴹ what is different?
– paparazzo
Jun 8 '16 at 10:41
@ypercubeᵀᴹ I will take another look at it
– paparazzo
Jun 8 '16 at 11:01
@ypercubeᵀᴹ what is different?
– paparazzo
Jun 8 '16 at 10:41
@ypercubeᵀᴹ what is different?
– paparazzo
Jun 8 '16 at 10:41
@ypercubeᵀᴹ I will take another look at it
– paparazzo
Jun 8 '16 at 11:01
@ypercubeᵀᴹ I will take another look at it
– paparazzo
Jun 8 '16 at 11:01
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f140702%2fcreate-index-for-multiple-where-exists-subqueries%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Show us the execution plan and the
SHOW CREATE TABLE table_name;
output for all the 4 tables.– ypercubeᵀᴹ
Jun 8 '16 at 9:16
@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?
– Baz
Jun 8 '16 at 9:29
The EXPLAIN output of the query.
– ypercubeᵀᴹ
Jun 8 '16 at 9:30
@ypercubeᵀᴹ Ok, I've added that as well.
– Baz
Jun 8 '16 at 10:13
2
I'd try an index on
(a, marker_id, accession_id)
and on(a, accession_id, marker_id)
. And since thata
is a varchar, use string literals, not numbers:where a = '1'
. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).– ypercubeᵀᴹ
Jun 8 '16 at 10:31