Very slow nested SQL query The Next CEO of Stack OverflowPostgreSQL insert into table (not...

How to count occurrences of text in a file?

A small doubt about the dominated convergence theorem

Is micro rebar a better way to reinforce concrete than rebar?

Prepend last line of stdin to entire stdin

WOW air has ceased operation, can I get my tickets refunded?

Easy to read palindrome checker

Why isn't the Mueller report being released completely and unredacted?

Is it ever safe to open a suspicious HTML file (e.g. email attachment)?

Is it convenient to ask the journal's editor for two additional days to complete a review?

Why do airplanes bank sharply to the right after air-to-air refueling?

Where does this common spurious transmission come from? Is there a quality difference?

Proper way to express "He disappeared them"

What did we know about the Kessel run before the prequels?

How many extra stops do monopods offer for tele photographs?

Some questions about different axiomatic systems for neighbourhoods

Is there always a complete, orthogonal set of unitary matrices?

Why doesn't UK go for the same deal Japan has with EU to resolve Brexit?

Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis

Rotate a column

RigExpert AA-35 - Interpreting The Information

Make solar eclipses exceedingly rare, but still have new moons

Won the lottery - how do I keep the money?

Why, when going from special to general relativity, do we just replace partial derivatives with covariant derivatives?

TikZ: How to reverse arrow direction without switching start/end point?



Very slow nested SQL query



The Next CEO of Stack OverflowPostgreSQL insert into table (not origin) based on a condition on fields on different tablesPostgresql query plan differs with limit value making the query slower for lower limitsSlow running Oracle query caused by unnecessary full table scanSlow SQL query with LEFT JOINVery Slow QuerySlow query caused by nested loop on simple join?Very slow simple JOIN queryVery slow IndexOnlyScan on partial ~16MiB indexesMySQL join two large tables is very slow












0















I seem to have hit some bottlenecks around nested SQL queries.



I have three tables in PostgreSQL with the following:




  1. Table-1: has 2443 unique rows

  2. Table-2: has 3414569 unique rows

  3. Table-3: has 9516434 unique rows


My Goal



For data in each row in Table-1, I would like to pick data in each
row in Table-2, and then use the (table-1-data, table-2-data) pair
to make query into Table-3 such that SELECT'ed rows from table-3
contain data that include (table-1-data, table-2-data).



Problem



On the surface, this seems like an O(n^2) operation, and my SQL
queries under the nested-loop lead to horrible performance.



For example, I start with:



SELECT row_a FROM table_1; // query-1
SELECT row_b FROM table_2; // query-2


Then proceed to make multiple queries for each output of query-1,
combined with each output of query-2 as shown below:



SELECT count(*)
FROM table_3
WHERE row_c = <single-value-from-query-1>
AND row_d = <single-value-from-query-2>


Needless to say, this nested O(n^2) operation is running very slow
as a whole, where I have about 2443 x 3414569 = 8341792067 queries
to make through 9516434 rows in table-3 each time.



I want to speed up the whole process, and I'd be grateful if you
could please point me in the right direction.










share|improve this question
















bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • SELECT ... FORM table_1 JOIN table_2 on table_1.data=table_2.data JOIN table_3 ON table_2.data=table_3.data, and use the right indexes, show the table structure on your tables.

    – danblack
    Oct 17 '18 at 23:59











  • I have about 2443 x 3414569 = 8341792067 queries ... and 8341792067 output recordsets. What do you do with those array of data??? Do you really need in ALL of them separately? I think there exists some task, and you select wrong way to solve it...

    – Akina
    Oct 18 '18 at 6:39













  • "make multiple queries for each output" - sounds like the wrong approach. Typically running SQL statements in a loop is not going to scale well. Why can't you use WHERE row_c IN (... query_1 ) (and the same for the other column) and thus get rid of the looping in your code.

    – a_horse_with_no_name
    Oct 18 '18 at 8:15











  • Do you really need the results for all 2,443×3,414,569 combinations of (row_a, row_b) or is it all right to just get all the matching pairs, i.e. just the ones that are actually there in (row_c, row_d) of table_3?

    – Andriy M
    Oct 18 '18 at 13:21











  • @AndriyM I need to derive the exhaustive list of pairs i.e. (row_a, row_b). I'm currently looking at "CARTESIAN JOIN". Is there any faster approach? Thank you.

    – gsbabil
    Oct 18 '18 at 13:33
















0















I seem to have hit some bottlenecks around nested SQL queries.



I have three tables in PostgreSQL with the following:




  1. Table-1: has 2443 unique rows

  2. Table-2: has 3414569 unique rows

  3. Table-3: has 9516434 unique rows


My Goal



For data in each row in Table-1, I would like to pick data in each
row in Table-2, and then use the (table-1-data, table-2-data) pair
to make query into Table-3 such that SELECT'ed rows from table-3
contain data that include (table-1-data, table-2-data).



Problem



On the surface, this seems like an O(n^2) operation, and my SQL
queries under the nested-loop lead to horrible performance.



For example, I start with:



SELECT row_a FROM table_1; // query-1
SELECT row_b FROM table_2; // query-2


Then proceed to make multiple queries for each output of query-1,
combined with each output of query-2 as shown below:



SELECT count(*)
FROM table_3
WHERE row_c = <single-value-from-query-1>
AND row_d = <single-value-from-query-2>


Needless to say, this nested O(n^2) operation is running very slow
as a whole, where I have about 2443 x 3414569 = 8341792067 queries
to make through 9516434 rows in table-3 each time.



I want to speed up the whole process, and I'd be grateful if you
could please point me in the right direction.










share|improve this question
















bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • SELECT ... FORM table_1 JOIN table_2 on table_1.data=table_2.data JOIN table_3 ON table_2.data=table_3.data, and use the right indexes, show the table structure on your tables.

    – danblack
    Oct 17 '18 at 23:59











  • I have about 2443 x 3414569 = 8341792067 queries ... and 8341792067 output recordsets. What do you do with those array of data??? Do you really need in ALL of them separately? I think there exists some task, and you select wrong way to solve it...

    – Akina
    Oct 18 '18 at 6:39













  • "make multiple queries for each output" - sounds like the wrong approach. Typically running SQL statements in a loop is not going to scale well. Why can't you use WHERE row_c IN (... query_1 ) (and the same for the other column) and thus get rid of the looping in your code.

    – a_horse_with_no_name
    Oct 18 '18 at 8:15











  • Do you really need the results for all 2,443×3,414,569 combinations of (row_a, row_b) or is it all right to just get all the matching pairs, i.e. just the ones that are actually there in (row_c, row_d) of table_3?

    – Andriy M
    Oct 18 '18 at 13:21











  • @AndriyM I need to derive the exhaustive list of pairs i.e. (row_a, row_b). I'm currently looking at "CARTESIAN JOIN". Is there any faster approach? Thank you.

    – gsbabil
    Oct 18 '18 at 13:33














0












0








0








I seem to have hit some bottlenecks around nested SQL queries.



I have three tables in PostgreSQL with the following:




  1. Table-1: has 2443 unique rows

  2. Table-2: has 3414569 unique rows

  3. Table-3: has 9516434 unique rows


My Goal



For data in each row in Table-1, I would like to pick data in each
row in Table-2, and then use the (table-1-data, table-2-data) pair
to make query into Table-3 such that SELECT'ed rows from table-3
contain data that include (table-1-data, table-2-data).



Problem



On the surface, this seems like an O(n^2) operation, and my SQL
queries under the nested-loop lead to horrible performance.



For example, I start with:



SELECT row_a FROM table_1; // query-1
SELECT row_b FROM table_2; // query-2


Then proceed to make multiple queries for each output of query-1,
combined with each output of query-2 as shown below:



SELECT count(*)
FROM table_3
WHERE row_c = <single-value-from-query-1>
AND row_d = <single-value-from-query-2>


Needless to say, this nested O(n^2) operation is running very slow
as a whole, where I have about 2443 x 3414569 = 8341792067 queries
to make through 9516434 rows in table-3 each time.



I want to speed up the whole process, and I'd be grateful if you
could please point me in the right direction.










share|improve this question
















I seem to have hit some bottlenecks around nested SQL queries.



I have three tables in PostgreSQL with the following:




  1. Table-1: has 2443 unique rows

  2. Table-2: has 3414569 unique rows

  3. Table-3: has 9516434 unique rows


My Goal



For data in each row in Table-1, I would like to pick data in each
row in Table-2, and then use the (table-1-data, table-2-data) pair
to make query into Table-3 such that SELECT'ed rows from table-3
contain data that include (table-1-data, table-2-data).



Problem



On the surface, this seems like an O(n^2) operation, and my SQL
queries under the nested-loop lead to horrible performance.



For example, I start with:



SELECT row_a FROM table_1; // query-1
SELECT row_b FROM table_2; // query-2


Then proceed to make multiple queries for each output of query-1,
combined with each output of query-2 as shown below:



SELECT count(*)
FROM table_3
WHERE row_c = <single-value-from-query-1>
AND row_d = <single-value-from-query-2>


Needless to say, this nested O(n^2) operation is running very slow
as a whole, where I have about 2443 x 3414569 = 8341792067 queries
to make through 9516434 rows in table-3 each time.



I want to speed up the whole process, and I'd be grateful if you
could please point me in the right direction.







postgresql query-performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 18 '18 at 7:48









Paul White

53.9k14287459




53.9k14287459










asked Oct 17 '18 at 23:35









gsbabilgsbabil

1041




1041





bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • SELECT ... FORM table_1 JOIN table_2 on table_1.data=table_2.data JOIN table_3 ON table_2.data=table_3.data, and use the right indexes, show the table structure on your tables.

    – danblack
    Oct 17 '18 at 23:59











  • I have about 2443 x 3414569 = 8341792067 queries ... and 8341792067 output recordsets. What do you do with those array of data??? Do you really need in ALL of them separately? I think there exists some task, and you select wrong way to solve it...

    – Akina
    Oct 18 '18 at 6:39













  • "make multiple queries for each output" - sounds like the wrong approach. Typically running SQL statements in a loop is not going to scale well. Why can't you use WHERE row_c IN (... query_1 ) (and the same for the other column) and thus get rid of the looping in your code.

    – a_horse_with_no_name
    Oct 18 '18 at 8:15











  • Do you really need the results for all 2,443×3,414,569 combinations of (row_a, row_b) or is it all right to just get all the matching pairs, i.e. just the ones that are actually there in (row_c, row_d) of table_3?

    – Andriy M
    Oct 18 '18 at 13:21











  • @AndriyM I need to derive the exhaustive list of pairs i.e. (row_a, row_b). I'm currently looking at "CARTESIAN JOIN". Is there any faster approach? Thank you.

    – gsbabil
    Oct 18 '18 at 13:33



















  • SELECT ... FORM table_1 JOIN table_2 on table_1.data=table_2.data JOIN table_3 ON table_2.data=table_3.data, and use the right indexes, show the table structure on your tables.

    – danblack
    Oct 17 '18 at 23:59











  • I have about 2443 x 3414569 = 8341792067 queries ... and 8341792067 output recordsets. What do you do with those array of data??? Do you really need in ALL of them separately? I think there exists some task, and you select wrong way to solve it...

    – Akina
    Oct 18 '18 at 6:39













  • "make multiple queries for each output" - sounds like the wrong approach. Typically running SQL statements in a loop is not going to scale well. Why can't you use WHERE row_c IN (... query_1 ) (and the same for the other column) and thus get rid of the looping in your code.

    – a_horse_with_no_name
    Oct 18 '18 at 8:15











  • Do you really need the results for all 2,443×3,414,569 combinations of (row_a, row_b) or is it all right to just get all the matching pairs, i.e. just the ones that are actually there in (row_c, row_d) of table_3?

    – Andriy M
    Oct 18 '18 at 13:21











  • @AndriyM I need to derive the exhaustive list of pairs i.e. (row_a, row_b). I'm currently looking at "CARTESIAN JOIN". Is there any faster approach? Thank you.

    – gsbabil
    Oct 18 '18 at 13:33

















SELECT ... FORM table_1 JOIN table_2 on table_1.data=table_2.data JOIN table_3 ON table_2.data=table_3.data, and use the right indexes, show the table structure on your tables.

– danblack
Oct 17 '18 at 23:59





SELECT ... FORM table_1 JOIN table_2 on table_1.data=table_2.data JOIN table_3 ON table_2.data=table_3.data, and use the right indexes, show the table structure on your tables.

– danblack
Oct 17 '18 at 23:59













I have about 2443 x 3414569 = 8341792067 queries ... and 8341792067 output recordsets. What do you do with those array of data??? Do you really need in ALL of them separately? I think there exists some task, and you select wrong way to solve it...

– Akina
Oct 18 '18 at 6:39







I have about 2443 x 3414569 = 8341792067 queries ... and 8341792067 output recordsets. What do you do with those array of data??? Do you really need in ALL of them separately? I think there exists some task, and you select wrong way to solve it...

– Akina
Oct 18 '18 at 6:39















"make multiple queries for each output" - sounds like the wrong approach. Typically running SQL statements in a loop is not going to scale well. Why can't you use WHERE row_c IN (... query_1 ) (and the same for the other column) and thus get rid of the looping in your code.

– a_horse_with_no_name
Oct 18 '18 at 8:15





"make multiple queries for each output" - sounds like the wrong approach. Typically running SQL statements in a loop is not going to scale well. Why can't you use WHERE row_c IN (... query_1 ) (and the same for the other column) and thus get rid of the looping in your code.

– a_horse_with_no_name
Oct 18 '18 at 8:15













Do you really need the results for all 2,443×3,414,569 combinations of (row_a, row_b) or is it all right to just get all the matching pairs, i.e. just the ones that are actually there in (row_c, row_d) of table_3?

– Andriy M
Oct 18 '18 at 13:21





Do you really need the results for all 2,443×3,414,569 combinations of (row_a, row_b) or is it all right to just get all the matching pairs, i.e. just the ones that are actually there in (row_c, row_d) of table_3?

– Andriy M
Oct 18 '18 at 13:21













@AndriyM I need to derive the exhaustive list of pairs i.e. (row_a, row_b). I'm currently looking at "CARTESIAN JOIN". Is there any faster approach? Thank you.

– gsbabil
Oct 18 '18 at 13:33





@AndriyM I need to derive the exhaustive list of pairs i.e. (row_a, row_b). I'm currently looking at "CARTESIAN JOIN". Is there any faster approach? Thank you.

– gsbabil
Oct 18 '18 at 13:33










1 Answer
1






active

oldest

votes


















0














Thanks everyone for your inputs.




  • As discussed above, I was able to solve my problem above using CARTESIAN JOIN and ...


  • Nested SELECT like the following:



    SELECT * FROM (SELECT column1, column2 FROM table)




Thank you.






share|improve this answer
























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f220409%2fvery-slow-nested-sql-query%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Thanks everyone for your inputs.




    • As discussed above, I was able to solve my problem above using CARTESIAN JOIN and ...


    • Nested SELECT like the following:



      SELECT * FROM (SELECT column1, column2 FROM table)




    Thank you.






    share|improve this answer




























      0














      Thanks everyone for your inputs.




      • As discussed above, I was able to solve my problem above using CARTESIAN JOIN and ...


      • Nested SELECT like the following:



        SELECT * FROM (SELECT column1, column2 FROM table)




      Thank you.






      share|improve this answer


























        0












        0








        0







        Thanks everyone for your inputs.




        • As discussed above, I was able to solve my problem above using CARTESIAN JOIN and ...


        • Nested SELECT like the following:



          SELECT * FROM (SELECT column1, column2 FROM table)




        Thank you.






        share|improve this answer













        Thanks everyone for your inputs.




        • As discussed above, I was able to solve my problem above using CARTESIAN JOIN and ...


        • Nested SELECT like the following:



          SELECT * FROM (SELECT column1, column2 FROM table)




        Thank you.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Oct 27 '18 at 18:08









        gsbabilgsbabil

        1041




        1041






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f220409%2fvery-slow-nested-sql-query%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Anexo:Material bélico de la Fuerza Aérea de Chile Índice Aeronaves Defensa...

            Always On Availability groups resolving state after failover - Remote harden of transaction...

            update json value to null Announcing the arrival of Valued Associate #679: Cesar Manara ...