PostgreSQL 9.4: Index Only Used For Lower INET RangesOptimizing queries on a range of timestamps (two...

Injecting creativity into a cookbook

Are there any modern advantages of a fire piston?

Why do neural networks need so many training examples to perform?

Why do stocks necessarily drop during a recession?

In Linux what happens if 1000 files in a directory are moved to another location while another 300 files were added to the source directory?

How to say "Brexit" in Latin?

Can I string the D&D Starter Set campaign into another module, keeping the same characters?

what does しにみえてる mean?

Intern applicant asking for compensation equivalent to that of permanent employee

Is my visa status for all destinations in a flight with connections checked in the beginning or before each flight?

How can I get my players to come to the game session after agreeing to a date?

If I deleted a game I lost the disc for, can I reinstall it digitally?

Word or phrase for showing great skill at something WITHOUT formal training in it

Why is mind meld hard for T'pol in Star Trek: Enterprise?

Dilemma of explaining to interviewer that he is the reason for declining second interview

Can I write a book of my D&D game?

Can I become debt free or should I file bankruptcy ? How to manage my debt and finances?

Why is working on the same position for more than 15 years not a red flag?

Can a person refuse a presidential pardon?

Why did other German political parties disband so fast when Hitler was appointed chancellor?

Why zero tolerance on nudity in space?

Early credit roll before the end of the film

Lagrangian corresponding to these equations of motion

What is 6÷2×(1+2) =?



PostgreSQL 9.4: Index Only Used For Lower INET Ranges


Optimizing queries on a range of timestamps (two columns)PostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatesIndex for numeric field is not usedpostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?













0















For some reason queries on the high range of are extremely slow and queries on the low range are extremely fast.



My index is an adapted version of the answer in Optimizing queries on a range of timestamps (two columns):
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high); but it only works for low ranges for some reason.



Table



CREATE TABLE IF NOT EXISTS ip_range_domains (
ip_range_domain_id BIGSERIAL PRIMARY KEY,
domain_id BIGINT REFERENCES domains NOT NULL,
source_type_id INTEGER REFERENCES source_types NOT NULL,
low INET NOT NULL,
high INET NOT NULL,
auto_high_conf BOOLEAN NOT NULL DEFAULT FALSE,
invalidation_reason_id INTEGER REFERENCES invalidation_reasons DEFAULT NULL,
invalidated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT current_timestamp
);
CREATE INDEX domain_id_btree ON ip_range_domains (domain_id);
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);


This is fast:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '8.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=25411.02..278369.96 rows=948529 width=55) (actual time=61.514..61.567 rows=55 loops=1)
Recheck Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Heap Blocks: exact=23
Buffers: shared hit=3613
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..25173.89 rows=948529 width=0) (actual time=61.493..61.493 rows=55 loops=1)
Index Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Buffers: shared hit=3590
Planning time: 0.537 ms
Execution time: 61.631 ms


This is slow:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '200.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=1016801 width=55) (actual time=14090.840..21951.343 rows=1 loops=1)
Filter: (('200.8.8.8'::inet >= low) AND ('200.8.8.8'::inet <= high))
Rows Removed by Filter: 23156868
Buffers: shared hit=21232 read=217499
Planning time: 0.111 ms
Execution time: 21951.376 ms


After a bit of manual investigation I found that 74.181.234.146 uses the index but 74.181.234.147 does not. Interestingly, as I get higher, the queries that use the index start taking 600-700ms. Perhaps that's just an issue of finding the data on disk. That's an acceptable response time but faster would be better.



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.146'::INET BETWEEN low AND high;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=256258.42..580278.67 rows=5685950 width=55) (actual time=593.066..593.068 rows=3 loops=1)
Recheck Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Heap Blocks: exact=3
Buffers: shared hit=38630
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..254836.93 rows=5685950 width=0) (actual time=593.057..593.057 rows=3 loops=1)
Index Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Buffers: shared hit=38627
Planning time: 0.108 ms
Execution time: 593.094 ms


Lowest query that doesn't use an index:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.147'::INET BETWEEN low AND high;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=5685950 width=55) (actual time=5723.461..21914.826 rows=3 loops=1)
Filter: (('74.181.234.147'::inet >= low) AND ('74.181.234.147'::inet <= high))
Rows Removed by Filter: 23156866
Buffers: shared hit=21864 read=216867
Planning time: 0.108 ms
Execution time: 21914.850 ms


Version:



=> SELECT version();

version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit









share|improve this question
















bumped to the homepage by Community 5 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • What's the distribution of data, is it roughly uniform? What is sample data for low and high?

    – Jakub Kania
    Jan 27 '16 at 20:02


















0















For some reason queries on the high range of are extremely slow and queries on the low range are extremely fast.



My index is an adapted version of the answer in Optimizing queries on a range of timestamps (two columns):
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high); but it only works for low ranges for some reason.



Table



CREATE TABLE IF NOT EXISTS ip_range_domains (
ip_range_domain_id BIGSERIAL PRIMARY KEY,
domain_id BIGINT REFERENCES domains NOT NULL,
source_type_id INTEGER REFERENCES source_types NOT NULL,
low INET NOT NULL,
high INET NOT NULL,
auto_high_conf BOOLEAN NOT NULL DEFAULT FALSE,
invalidation_reason_id INTEGER REFERENCES invalidation_reasons DEFAULT NULL,
invalidated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT current_timestamp
);
CREATE INDEX domain_id_btree ON ip_range_domains (domain_id);
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);


This is fast:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '8.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=25411.02..278369.96 rows=948529 width=55) (actual time=61.514..61.567 rows=55 loops=1)
Recheck Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Heap Blocks: exact=23
Buffers: shared hit=3613
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..25173.89 rows=948529 width=0) (actual time=61.493..61.493 rows=55 loops=1)
Index Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Buffers: shared hit=3590
Planning time: 0.537 ms
Execution time: 61.631 ms


This is slow:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '200.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=1016801 width=55) (actual time=14090.840..21951.343 rows=1 loops=1)
Filter: (('200.8.8.8'::inet >= low) AND ('200.8.8.8'::inet <= high))
Rows Removed by Filter: 23156868
Buffers: shared hit=21232 read=217499
Planning time: 0.111 ms
Execution time: 21951.376 ms


After a bit of manual investigation I found that 74.181.234.146 uses the index but 74.181.234.147 does not. Interestingly, as I get higher, the queries that use the index start taking 600-700ms. Perhaps that's just an issue of finding the data on disk. That's an acceptable response time but faster would be better.



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.146'::INET BETWEEN low AND high;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=256258.42..580278.67 rows=5685950 width=55) (actual time=593.066..593.068 rows=3 loops=1)
Recheck Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Heap Blocks: exact=3
Buffers: shared hit=38630
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..254836.93 rows=5685950 width=0) (actual time=593.057..593.057 rows=3 loops=1)
Index Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Buffers: shared hit=38627
Planning time: 0.108 ms
Execution time: 593.094 ms


Lowest query that doesn't use an index:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.147'::INET BETWEEN low AND high;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=5685950 width=55) (actual time=5723.461..21914.826 rows=3 loops=1)
Filter: (('74.181.234.147'::inet >= low) AND ('74.181.234.147'::inet <= high))
Rows Removed by Filter: 23156866
Buffers: shared hit=21864 read=216867
Planning time: 0.108 ms
Execution time: 21914.850 ms


Version:



=> SELECT version();

version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit









share|improve this question
















bumped to the homepage by Community 5 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • What's the distribution of data, is it roughly uniform? What is sample data for low and high?

    – Jakub Kania
    Jan 27 '16 at 20:02
















0












0








0








For some reason queries on the high range of are extremely slow and queries on the low range are extremely fast.



My index is an adapted version of the answer in Optimizing queries on a range of timestamps (two columns):
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high); but it only works for low ranges for some reason.



Table



CREATE TABLE IF NOT EXISTS ip_range_domains (
ip_range_domain_id BIGSERIAL PRIMARY KEY,
domain_id BIGINT REFERENCES domains NOT NULL,
source_type_id INTEGER REFERENCES source_types NOT NULL,
low INET NOT NULL,
high INET NOT NULL,
auto_high_conf BOOLEAN NOT NULL DEFAULT FALSE,
invalidation_reason_id INTEGER REFERENCES invalidation_reasons DEFAULT NULL,
invalidated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT current_timestamp
);
CREATE INDEX domain_id_btree ON ip_range_domains (domain_id);
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);


This is fast:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '8.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=25411.02..278369.96 rows=948529 width=55) (actual time=61.514..61.567 rows=55 loops=1)
Recheck Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Heap Blocks: exact=23
Buffers: shared hit=3613
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..25173.89 rows=948529 width=0) (actual time=61.493..61.493 rows=55 loops=1)
Index Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Buffers: shared hit=3590
Planning time: 0.537 ms
Execution time: 61.631 ms


This is slow:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '200.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=1016801 width=55) (actual time=14090.840..21951.343 rows=1 loops=1)
Filter: (('200.8.8.8'::inet >= low) AND ('200.8.8.8'::inet <= high))
Rows Removed by Filter: 23156868
Buffers: shared hit=21232 read=217499
Planning time: 0.111 ms
Execution time: 21951.376 ms


After a bit of manual investigation I found that 74.181.234.146 uses the index but 74.181.234.147 does not. Interestingly, as I get higher, the queries that use the index start taking 600-700ms. Perhaps that's just an issue of finding the data on disk. That's an acceptable response time but faster would be better.



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.146'::INET BETWEEN low AND high;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=256258.42..580278.67 rows=5685950 width=55) (actual time=593.066..593.068 rows=3 loops=1)
Recheck Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Heap Blocks: exact=3
Buffers: shared hit=38630
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..254836.93 rows=5685950 width=0) (actual time=593.057..593.057 rows=3 loops=1)
Index Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Buffers: shared hit=38627
Planning time: 0.108 ms
Execution time: 593.094 ms


Lowest query that doesn't use an index:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.147'::INET BETWEEN low AND high;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=5685950 width=55) (actual time=5723.461..21914.826 rows=3 loops=1)
Filter: (('74.181.234.147'::inet >= low) AND ('74.181.234.147'::inet <= high))
Rows Removed by Filter: 23156866
Buffers: shared hit=21864 read=216867
Planning time: 0.108 ms
Execution time: 21914.850 ms


Version:



=> SELECT version();

version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit









share|improve this question
















For some reason queries on the high range of are extremely slow and queries on the low range are extremely fast.



My index is an adapted version of the answer in Optimizing queries on a range of timestamps (two columns):
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high); but it only works for low ranges for some reason.



Table



CREATE TABLE IF NOT EXISTS ip_range_domains (
ip_range_domain_id BIGSERIAL PRIMARY KEY,
domain_id BIGINT REFERENCES domains NOT NULL,
source_type_id INTEGER REFERENCES source_types NOT NULL,
low INET NOT NULL,
high INET NOT NULL,
auto_high_conf BOOLEAN NOT NULL DEFAULT FALSE,
invalidation_reason_id INTEGER REFERENCES invalidation_reasons DEFAULT NULL,
invalidated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NULL,
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT current_timestamp
);
CREATE INDEX domain_id_btree ON ip_range_domains (domain_id);
CREATE INDEX idx_ip_range_inversed ON ip_range_domains(low, high);


This is fast:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '8.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=25411.02..278369.96 rows=948529 width=55) (actual time=61.514..61.567 rows=55 loops=1)
Recheck Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Heap Blocks: exact=23
Buffers: shared hit=3613
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..25173.89 rows=948529 width=0) (actual time=61.493..61.493 rows=55 loops=1)
Index Cond: (('8.8.8.8'::inet >= low) AND ('8.8.8.8'::inet <= high))
Buffers: shared hit=3590
Planning time: 0.537 ms
Execution time: 61.631 ms


This is slow:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '200.8.8.8'::INET BETWEEN low AND high;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=1016801 width=55) (actual time=14090.840..21951.343 rows=1 loops=1)
Filter: (('200.8.8.8'::inet >= low) AND ('200.8.8.8'::inet <= high))
Rows Removed by Filter: 23156868
Buffers: shared hit=21232 read=217499
Planning time: 0.111 ms
Execution time: 21951.376 ms


After a bit of manual investigation I found that 74.181.234.146 uses the index but 74.181.234.147 does not. Interestingly, as I get higher, the queries that use the index start taking 600-700ms. Perhaps that's just an issue of finding the data on disk. That's an acceptable response time but faster would be better.



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.146'::INET BETWEEN low AND high;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ip_range_domains (cost=256258.42..580278.67 rows=5685950 width=55) (actual time=593.066..593.068 rows=3 loops=1)
Recheck Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Heap Blocks: exact=3
Buffers: shared hit=38630
-> Bitmap Index Scan on idx_ip_range_inversed (cost=0.00..254836.93 rows=5685950 width=0) (actual time=593.057..593.057 rows=3 loops=1)
Index Cond: (('74.181.234.146'::inet >= low) AND ('74.181.234.146'::inet <= high))
Buffers: shared hit=38627
Planning time: 0.108 ms
Execution time: 593.094 ms


Lowest query that doesn't use an index:



=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM ip_range_domains WHERE '74.181.234.147'::INET BETWEEN low AND high;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on ip_range_domains (cost=0.00..586084.02 rows=5685950 width=55) (actual time=5723.461..21914.826 rows=3 loops=1)
Filter: (('74.181.234.147'::inet >= low) AND ('74.181.234.147'::inet <= high))
Rows Removed by Filter: 23156866
Buffers: shared hit=21864 read=216867
Planning time: 0.108 ms
Execution time: 21914.850 ms


Version:



=> SELECT version();

version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit






postgresql query-performance postgresql-9.4






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 13 '17 at 12:42









Community

1




1










asked Jan 27 '16 at 16:51









Josh HorowitzJosh Horowitz

1133




1133





bumped to the homepage by Community 5 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 5 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • What's the distribution of data, is it roughly uniform? What is sample data for low and high?

    – Jakub Kania
    Jan 27 '16 at 20:02





















  • What's the distribution of data, is it roughly uniform? What is sample data for low and high?

    – Jakub Kania
    Jan 27 '16 at 20:02



















What's the distribution of data, is it roughly uniform? What is sample data for low and high?

– Jakub Kania
Jan 27 '16 at 20:02







What's the distribution of data, is it roughly uniform? What is sample data for low and high?

– Jakub Kania
Jan 27 '16 at 20:02












1 Answer
1






active

oldest

votes


















0














Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.



In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).



What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000) and run ANALYZE ip_range_domains. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).



The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).



If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.






share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f127500%2fpostgresql-9-4-index-only-used-for-lower-inet-ranges%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.



    In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).



    What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000) and run ANALYZE ip_range_domains. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).



    The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).



    If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.






    share|improve this answer




























      0














      Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.



      In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).



      What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000) and run ANALYZE ip_range_domains. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).



      The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).



      If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.






      share|improve this answer


























        0












        0








        0







        Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.



        In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).



        What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000) and run ANALYZE ip_range_domains. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).



        The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).



        If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.






        share|improve this answer













        Your statistics are way off. The planner thinks it's going to retrieve 948529, 1016801, 5685950 and 5685950 rows, respectively, when infact it retrieves 55, 1, 3 and 3.



        In the second example, that is why it's preferring a seqscan (I am guessing it expects to have about 5e+6 values in the table and 1e+6, being 20% of the entire table, is not worth an index scan).



        What you need is increasing the statistics target on the two columns (ALTER TABLE ip_range_domains ALTER COLUMN high SET STATISTICS 1000) and run ANALYZE ip_range_domains. This will give the planner a much better idea of the data contained in this table (specifically, the two columns you are filtering on). Try playing with different values (10000 is the maximum).



        The decision the planner needs to make is that between a sequential scan of a table, where each individual read is quite cheap (not least because the OS can prefetch that for you, but also because of the mechanics of the storage medium), and a random read (which is believed to be much more expensive because of... well, random, unpredictable reading pattern).



        If the planner's conclusion is it will have to read 30% of the index to come up with 20% of the table, which it will all have to retrieve using random fetches, that is simply just too costly, so it will prefer a full sequential scan.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 27 '16 at 21:48









        Grega BremecGrega Bremec

        744




        744






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f127500%2fpostgresql-9-4-index-only-used-for-lower-inet-ranges%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            ORA-01691 (unable to extend lob segment) even though my tablespace has AUTOEXTEND onORA-01692: unable to...

            Always On Availability groups resolving state after failover - Remote harden of transaction...

            Circunscripción electoral de Guipúzcoa Referencias Menú de navegaciónLas claves del sistema electoral en...