Create index for multiple WHERE EXISTS subqueriesHigh Disk IO, How to mitigate?Identical query, tables, but...

Copy the content of an environment

Sing Baby Shark

Why does calling Python's 'magic method' not do type conversion like it would for the corresponding operator?

Why do most space probes survive for far longer than they were designed for?

Ethernet cable only works in certain positions

Is layered encryption more secure than long passwords?

Does limiting the number of sources help simplify the game for a new DM with new and experienced players?

Pictures from Mars

How bad is a Computer Science course that doesn't teach Design Patterns?

Why don't hotels offer (at least) 1 kitchen bookable by any guest?

Why does finding small effects in large studies indicate publication bias?

How to make clear what a part-humanoid character looks like when they're quite common in their world?

Why do climate experts from the UN/IPCC rarely mention Grand Solar Minimum?

Does a star need to be inside a galaxy?

Badly designed reimbursement form. What does that say about the company?

Squeeze theorem laws

Why would you use 2 alternate layout buttons instead of 1, when only one can be selected at once

Manager has noticed coworker's excessive breaks. Should I warn him?

Does a code with length 6, size 32 and distance 2 exist?

How should I ship cards?

Why is it a problem for Freddie if the guys from Munich did what he wanted?

Why does the current not skip resistors R3 and R5 when R6 and R4 have no resistance?

Someone wants me to use my credit card at a card-only gas/petrol pump in return for cash

Why did Shae (falsely) implicate Sansa?



Create index for multiple WHERE EXISTS subqueries


High Disk IO, How to mitigate?Identical query, tables, but different EXPLAIN and performanceWhy does IN (subquery) perform bad when = (subquery) is blazing fast?Deciding which MySQL execution plan is betterOptimizing slow queryOptimizing a simple query on a large tableSlow SELECT examining whole tableselect MAX() from MySQL view (2x INNER JOIN) is slowJoining mysql tablesPerformance of mysql equi-join observed in HDD and SSD













3















I have to find an efficient way to write a very specific query for an application that I'm developing.



The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".



So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".



I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).



I've come up with this query:



SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.



It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.



I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.



However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.



Does anyone have any suggestions as to how to improve performance?





EDIT:



CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1

CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1

CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,

) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1

CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where









share|improve this question
















bumped to the homepage by Community 1 min ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:16













  • @ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

    – Baz
    Jun 8 '16 at 9:29











  • The EXPLAIN output of the query.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:30













  • @ypercubeᵀᴹ Ok, I've added that as well.

    – Baz
    Jun 8 '16 at 10:13






  • 2





    I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

    – ypercubeᵀᴹ
    Jun 8 '16 at 10:31


















3















I have to find an efficient way to write a very specific query for an application that I'm developing.



The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".



So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".



I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).



I've come up with this query:



SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.



It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.



I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.



However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.



Does anyone have any suggestions as to how to improve performance?





EDIT:



CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1

CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1

CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,

) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1

CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where









share|improve this question
















bumped to the homepage by Community 1 min ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:16













  • @ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

    – Baz
    Jun 8 '16 at 9:29











  • The EXPLAIN output of the query.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:30













  • @ypercubeᵀᴹ Ok, I've added that as well.

    – Baz
    Jun 8 '16 at 10:13






  • 2





    I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

    – ypercubeᵀᴹ
    Jun 8 '16 at 10:31
















3












3








3








I have to find an efficient way to write a very specific query for an application that I'm developing.



The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".



So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".



I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).



I've come up with this query:



SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.



It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.



I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.



However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.



Does anyone have any suggestions as to how to improve performance?





EDIT:



CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1

CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1

CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,

) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1

CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where









share|improve this question
















I have to find an efficient way to write a very specific query for an application that I'm developing.



The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".



So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".



I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).



I've come up with this query:



SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.



It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.



I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.



However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.



Does anyone have any suggestions as to how to improve performance?





EDIT:



CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1

CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1

CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,

) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1

CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where






mysql index query-performance subquery






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 8 '16 at 10:13







Baz

















asked Jun 8 '16 at 9:00









BazBaz

105119




105119





bumped to the homepage by Community 1 min ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 1 min ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:16













  • @ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

    – Baz
    Jun 8 '16 at 9:29











  • The EXPLAIN output of the query.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:30













  • @ypercubeᵀᴹ Ok, I've added that as well.

    – Baz
    Jun 8 '16 at 10:13






  • 2





    I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

    – ypercubeᵀᴹ
    Jun 8 '16 at 10:31





















  • Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:16













  • @ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

    – Baz
    Jun 8 '16 at 9:29











  • The EXPLAIN output of the query.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:30













  • @ypercubeᵀᴹ Ok, I've added that as well.

    – Baz
    Jun 8 '16 at 10:13






  • 2





    I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

    – ypercubeᵀᴹ
    Jun 8 '16 at 10:31



















Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16







Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16















@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29





@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29













The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30







The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30















@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13





@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13




2




2





I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31







I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31












1 Answer
1






active

oldest

votes


















0














g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"





share|improve this answer


























  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f140702%2fcreate-index-for-multiple-where-exists-subqueries%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"





share|improve this answer


























  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01
















0














g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"





share|improve this answer


























  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01














0












0








0







g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"





share|improve this answer















g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"






share|improve this answer














share|improve this answer



share|improve this answer








edited Jun 8 '16 at 12:27

























answered Jun 8 '16 at 10:38









paparazzopaparazzo

4,6141230




4,6141230













  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01



















  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01

















@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41





@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41













@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01





@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f140702%2fcreate-index-for-multiple-where-exists-subqueries%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Anexo:Material bélico de la Fuerza Aérea de Chile Índice Aeronaves Defensa...

Always On Availability groups resolving state after failover - Remote harden of transaction...

update json value to null Announcing the arrival of Valued Associate #679: Cesar Manara ...