Create index for multiple WHERE EXISTS subqueriesHigh Disk IO, How to mitigate?Identical query, tables, but...

Copy the content of an environment

Sing Baby Shark

Why does calling Python's 'magic method' not do type conversion like it would for the corresponding operator?

Why do most space probes survive for far longer than they were designed for?

Ethernet cable only works in certain positions

Is layered encryption more secure than long passwords?

Does limiting the number of sources help simplify the game for a new DM with new and experienced players?

Pictures from Mars

How bad is a Computer Science course that doesn't teach Design Patterns?

Why don't hotels offer (at least) 1 kitchen bookable by any guest?

Why does finding small effects in large studies indicate publication bias?

How to make clear what a part-humanoid character looks like when they're quite common in their world?

Why do climate experts from the UN/IPCC rarely mention Grand Solar Minimum?

Does a star need to be inside a galaxy?

Badly designed reimbursement form. What does that say about the company?

Squeeze theorem laws

Why would you use 2 alternate layout buttons instead of 1, when only one can be selected at once

Manager has noticed coworker's excessive breaks. Should I warn him?

Does a code with length 6, size 32 and distance 2 exist?

How should I ship cards?

Why is it a problem for Freddie if the guys from Munich did what he wanted?

Why does the current not skip resistors R3 and R5 when R6 and R4 have no resistance?

Someone wants me to use my credit card at a card-only gas/petrol pump in return for cash

Why did Shae (falsely) implicate Sansa?

Create index for multiple WHERE EXISTS subqueries

High Disk IO, How to mitigate?Identical query, tables, but different EXPLAIN and performanceWhy does IN (subquery) perform bad when = (subquery) is blazing fast?Deciding which MySQL execution plan is betterOptimizing slow queryOptimizing a simple query on a large tableSlow SELECT examining whole tableselect MAX() from MySQL view (2x INNER JOIN) is slowJoining mysql tablesPerformance of mysql equi-join observed in HDD and SSD

I have to find an efficient way to write a very specific query for an application that I'm developing.

The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".

So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".

I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).

I've come up with this query:

SELECT

    markers.*, 1 AS "species_1",

    1 AS "species_2",

    1 AS "species_3"

FROM

    markers

LEFT JOIN genotypes g ON g.marker_id = markers.id

WHERE

    g.dataset_id = 3

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus1"

    AND taxonomies.species = "species1"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus2"

    AND taxonomies.species = "species2"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus3"

    AND taxonomies.species = "species3"

)

NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.

It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.

I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.

However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.

Does anyone have any suggestions as to how to improve performance?

EDIT:

CREATE TABLE `genotypes` (

    `id` BIGINT (20) NOT NULL AUTO_INCREMENT,

    `marker_id` INT (11) NOT NULL,

    `dataset_id` INT (11) NOT NULL,

    `accession_id` INT (11) NOT NULL,

    `a` VARCHAR (3) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `marker_id` (`marker_id`) USING BTREE,

    KEY `dataset_id` (`dataset_id`) USING BTREE,

    KEY `accession_id` (`accession_id`) USING BTREE,

    KEY `genotypes_marker_dataset_allele1` (

        `marker_id`,

        `dataset_id`,

        `a`

    ) USING BTREE,

    KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,

    CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1



CREATE TABLE `markers` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `marker_name` VARCHAR (45) DEFAULT NULL,

    PRIMARY KEY (`id`)

) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1



CREATE TABLE `accessions` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `name` VARCHAR (255) NOT NULL,

    `taxonomy_id` INT (11) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,

    CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,



) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1



CREATE TABLE `taxonomies` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `genus` VARCHAR (255) NOT NULL DEFAULT '',

    `species` VARCHAR (255) NOT NULL DEFAULT '',

    PRIMARY KEY (`id`),

    KEY `taxonomies_genus` (`genus`) USING BTREE,

    KEY `taxonomies_species` (`species`) USING BTREE,

    KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE

) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   PRIMARY markers ALL PRIMARY             8803    Using where; Using temporary

1   PRIMARY g   ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1  genotypes_marker_dataset_allele1    8   markers.id,const    104 Using index; Distinct

6   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

6   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

6   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

5   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

5   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

5   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

4   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

4   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

4   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

3   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

3   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

3   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

2   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

2   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

2   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

bumped to the homepage by Community♦ 1 min ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

2

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

|
show 2 more comments

I have to find an efficient way to write a very specific query for an application that I'm developing.

The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".

So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".

I've come up with this query:

SELECT

    markers.*, 1 AS "species_1",

    1 AS "species_2",

    1 AS "species_3"

FROM

    markers

LEFT JOIN genotypes g ON g.marker_id = markers.id

WHERE

    g.dataset_id = 3

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus1"

    AND taxonomies.species = "species1"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus2"

    AND taxonomies.species = "species2"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus3"

    AND taxonomies.species = "species3"

)

NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.

It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.

However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.

Does anyone have any suggestions as to how to improve performance?

EDIT:

CREATE TABLE `genotypes` (

    `id` BIGINT (20) NOT NULL AUTO_INCREMENT,

    `marker_id` INT (11) NOT NULL,

    `dataset_id` INT (11) NOT NULL,

    `accession_id` INT (11) NOT NULL,

    `a` VARCHAR (3) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `marker_id` (`marker_id`) USING BTREE,

    KEY `dataset_id` (`dataset_id`) USING BTREE,

    KEY `accession_id` (`accession_id`) USING BTREE,

    KEY `genotypes_marker_dataset_allele1` (

        `marker_id`,

        `dataset_id`,

        `a`

    ) USING BTREE,

    KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,

    CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1



CREATE TABLE `markers` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `marker_name` VARCHAR (45) DEFAULT NULL,

    PRIMARY KEY (`id`)

) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1



CREATE TABLE `accessions` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `name` VARCHAR (255) NOT NULL,

    `taxonomy_id` INT (11) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,

    CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,



) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1



CREATE TABLE `taxonomies` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `genus` VARCHAR (255) NOT NULL DEFAULT '',

    `species` VARCHAR (255) NOT NULL DEFAULT '',

    PRIMARY KEY (`id`),

    KEY `taxonomies_genus` (`genus`) USING BTREE,

    KEY `taxonomies_species` (`species`) USING BTREE,

    KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE

) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   PRIMARY markers ALL PRIMARY             8803    Using where; Using temporary

1   PRIMARY g   ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1  genotypes_marker_dataset_allele1    8   markers.id,const    104 Using index; Distinct

6   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

6   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

6   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

5   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

5   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

5   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

4   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

4   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

4   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

3   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

3   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

3   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

2   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

2   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

2   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

bumped to the homepage by Community♦ 1 min ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

2

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

|
show 2 more comments

I have to find an efficient way to write a very specific query for an application that I'm developing.

The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".

So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".

I've come up with this query:

SELECT

    markers.*, 1 AS "species_1",

    1 AS "species_2",

    1 AS "species_3"

FROM

    markers

LEFT JOIN genotypes g ON g.marker_id = markers.id

WHERE

    g.dataset_id = 3

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus1"

    AND taxonomies.species = "species1"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus2"

    AND taxonomies.species = "species2"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus3"

    AND taxonomies.species = "species3"

)

NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.

It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.

However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.

Does anyone have any suggestions as to how to improve performance?

EDIT:

CREATE TABLE `genotypes` (

    `id` BIGINT (20) NOT NULL AUTO_INCREMENT,

    `marker_id` INT (11) NOT NULL,

    `dataset_id` INT (11) NOT NULL,

    `accession_id` INT (11) NOT NULL,

    `a` VARCHAR (3) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `marker_id` (`marker_id`) USING BTREE,

    KEY `dataset_id` (`dataset_id`) USING BTREE,

    KEY `accession_id` (`accession_id`) USING BTREE,

    KEY `genotypes_marker_dataset_allele1` (

        `marker_id`,

        `dataset_id`,

        `a`

    ) USING BTREE,

    KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,

    CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1



CREATE TABLE `markers` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `marker_name` VARCHAR (45) DEFAULT NULL,

    PRIMARY KEY (`id`)

) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1



CREATE TABLE `accessions` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `name` VARCHAR (255) NOT NULL,

    `taxonomy_id` INT (11) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,

    CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,



) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1



CREATE TABLE `taxonomies` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `genus` VARCHAR (255) NOT NULL DEFAULT '',

    `species` VARCHAR (255) NOT NULL DEFAULT '',

    PRIMARY KEY (`id`),

    KEY `taxonomies_genus` (`genus`) USING BTREE,

    KEY `taxonomies_species` (`species`) USING BTREE,

    KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE

) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   PRIMARY markers ALL PRIMARY             8803    Using where; Using temporary

1   PRIMARY g   ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1  genotypes_marker_dataset_allele1    8   markers.id,const    104 Using index; Distinct

6   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

6   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

6   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

5   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

5   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

5   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

4   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

4   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

4   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

3   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

3   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

3   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

2   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

2   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

2   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

I have to find an efficient way to write a very specific query for an application that I'm developing.

The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".

So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".

I've come up with this query:

SELECT

    markers.*, 1 AS "species_1",

    1 AS "species_2",

    1 AS "species_3"

FROM

    markers

LEFT JOIN genotypes g ON g.marker_id = markers.id

WHERE

    g.dataset_id = 3

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus1"

    AND taxonomies.species = "species1"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus2"

    AND taxonomies.species = "species2"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus3"

    AND taxonomies.species = "species3"

)

NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.

It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.

However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.

Does anyone have any suggestions as to how to improve performance?

EDIT:

CREATE TABLE `genotypes` (

    `id` BIGINT (20) NOT NULL AUTO_INCREMENT,

    `marker_id` INT (11) NOT NULL,

    `dataset_id` INT (11) NOT NULL,

    `accession_id` INT (11) NOT NULL,

    `a` VARCHAR (3) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `marker_id` (`marker_id`) USING BTREE,

    KEY `dataset_id` (`dataset_id`) USING BTREE,

    KEY `accession_id` (`accession_id`) USING BTREE,

    KEY `genotypes_marker_dataset_allele1` (

        `marker_id`,

        `dataset_id`,

        `a`

    ) USING BTREE,

    KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,

    CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1



CREATE TABLE `markers` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `marker_name` VARCHAR (45) DEFAULT NULL,

    PRIMARY KEY (`id`)

) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1



CREATE TABLE `accessions` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `name` VARCHAR (255) NOT NULL,

    `taxonomy_id` INT (11) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,

    CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,



) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1



CREATE TABLE `taxonomies` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `genus` VARCHAR (255) NOT NULL DEFAULT '',

    `species` VARCHAR (255) NOT NULL DEFAULT '',

    PRIMARY KEY (`id`),

    KEY `taxonomies_genus` (`genus`) USING BTREE,

    KEY `taxonomies_species` (`species`) USING BTREE,

    KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE

) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   PRIMARY markers ALL PRIMARY             8803    Using where; Using temporary

1   PRIMARY g   ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1  genotypes_marker_dataset_allele1    8   markers.id,const    104 Using index; Distinct

6   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

6   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

6   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

5   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

5   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

5   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

4   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

4   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

4   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

3   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

3   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

3   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

2   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

2   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

2   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

mysql index query-performance subquery

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

asked Jun 8 '16 at 9:00

Baz

105119

asked Jun 8 '16 at 9:00

Baz

105119

bumped to the homepage by Community♦ 1 min ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 1 min ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

2

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

|
show 2 more comments

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

2

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

|
show 2 more comments

1 Answer
1

active

oldest

votes

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f140702%2fcreate-index-for-multiple-where-exists-subqueries%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sfrgttk