How to optimize an ordered date range query with several OR filters (for streaming)?Optimizing ORDER BY in a...

Stream.findFirst different than Optional.of?

What happens if you declare more than $10,000 at the US border?

The Longest Chess Game

Coworker is trying to get me to sign his petition to run for office. How to decline politely?

Badly designed reimbursement form. What does that say about the company?

我可不觉得 - agree or disagree?

Why is Bernie Sanders maximum accepted donation on actblue $5600?

In a world with multiracial creatures, what word can be used instead of mankind?

How to encircle section of matrix in LaTeX?

Found a major flaw in paper from home university – to which I would like to return

Have the conservatives lost the working majority and if so, what does this mean?

Boss asked me to sign a resignation paper without a date on it along with my new contract

Ramanujan's radical and how we define an infinite nested radical

Current measurement op-amp calculation

Discouraging missile alpha strikes

Which was the first story to feature space elevators?

How can I differentiate duration vs starting time

How can a kingdom keep the secret of a missing monarch from the public?

Define function that behaves almost identically to Mathematica function

Why would you use 2 alternate layout buttons instead of 1, when only one can be selected at once

Are all power cords made equal?

How do I know my password or backup information is not being shared when creating a new wallet?

How to not forget my phone in the bathroom?

How can changes in personality/values of a person who turned into a vampire be explained?



How to optimize an ordered date range query with several OR filters (for streaming)?


Optimizing ORDER BY in a full text search queryPostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatespostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?How to index two tables for JOINed query optimisation













3















I have a table of data which is be outputted into a stream, so the upper bound of the cost doesn't matter as much as the lower bound. The table contains about 200M rows. The question is how to structurize indexes and queries with order by such that Postgres would use the index instead of doing additional sorting.



Each entry is characterized by 5 fields each of which is not unique on its own:





  • date of occurrence (timestamp)


  • num (order within a timestamp)

  • type1

  • type2


  • id_in_type2 (never used without type2)


A typical query for this data is bounded by date (say, from 1 day before now until now) and zero or more (about 5 actually) filters by any combination of type1, type2 and id_in_type2 connected by OR, and expects a stream of data (it doesn't matter how long it takes to output it completely, but it should start streaming as fast as possible), example:



SELECT *
FROM MY_TABLE
WHERE date > ?
AND (
(type1 = ?)
OR (type2 = ?)
OR (type2 = ? AND associated_id = ?)
OR ...
)
ORDER BY date, num;


When there are no additional filters or there is just one, a simple index by (date, num, type1, type2, associated_id) works perfectly - since the results are stored along the sort order of the index, there's no problem to start outputting the results.



However, when things get more complicated, no combination of structures of my indexes and my queries seem to yield satisfying results.
Here's roughly what I tried so far:



Indexes: I tried having a separate index for (date, num) and 3 separate ones for other fields (like in the example below), I tried adding date to filters 2-4, i tried putting date and num as first fields in the index as well as the last ones.



CREATE INDEX my_table_1 ON my_table (date, num);
CREATE INDEX my_table_2 ON my_table (type1);
CREATE INDEX my_table_3 ON my_table (type2, id_in_type2);
CREATE INDEX my_table_4 ON my_table (type1, type2, id_in_type2);


Queries: I tried to select by date range and then by filters, I tried putting the filter by date to the end, I tried splitting the filters to separate ones and make a union.



No matter what I do, it won't apply the filters along the (date, num) index order, it will always do the sorting, yielding lower cost about the same as the upper cost.



Oh yes the table is also partitioned by date.



Any ideas on structurizing the indexes? Maybe I'd be better off returning all results with matching date and make the filtering in the app that's reading the stream?



Here's an execution plan for a naively constructed query with index setting as shown above (I ran this one on a test DB which is not partitioned and has 2M rows total, I'm presuming that on production table numbers will be different but orders of magnitude of difference between lower cost and upper cost are going to be about the same)



explain analyze
select * from my_table
where date >= now() - '1 day'::interval
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;

Sort (cost=157714.25..158431.63 rows=286949 width=262) (actual time=2153.327..2313.075 rows=600314 loops=1)
Sort Key: date, num
Sort Method: external merge Disk: 144376kB
-> Bitmap Heap Scan on my_table (cost=16602.76..96390.20 rows=286949 width=262) (actual time=199.193..1281.214 rows=600314 loops=1)
Recheck Cond: (((type1)::text = 'T1A'::text) OR (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text)) OR ((type1)::text = 'T1C'::text) OR (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text)))
Rows Removed by Index Recheck: 455892
Filter: ((date >= (now() - '1 day'::interval)) AND ((((type2)::text = 'T2A'::text) AND ((type1)::text = 'T1A'::text)) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B1'::text)) OR ((type1)::text = 'T1C'::text) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B2'::text))))
Rows Removed by Filter: 2
Heap Blocks: exact=42763 lossy=26546
-> BitmapOr (cost=16602.76..16602.76 rows=299384 width=0) (actual time=188.980..188.980 rows=0 loops=1)
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.037..0.037 rows=30 loops=1)
Index Cond: ((type1)::text = 'T1A'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8094.98 rows=148643 width=0) (actual time=93.562..93.562 rows=300240 loops=1)
Index Cond: (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text))
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.059..0.059 rows=28 loops=1)
Index Cond: ((type1)::text = 'T1C'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8211.96 rows=150741 width=0) (actual time=95.315..95.315 rows=300018 loops=1)
Index Cond: (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text))
Planning time: 0.435 ms
Execution time: 2356.305 ms


The query may return 100s of thousands of rows.










share|improve this question
















bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Please also show the execution plan for the desirable query plan, where you just specify date and chop off all the OR conditions.

    – jjanes
    Nov 17 '16 at 21:02











  • Put all 4 fields in the index starting with the date,num sort key.

    – eckes
    Apr 23 '17 at 13:58











  • I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.

    – joanolo
    Apr 23 '17 at 20:31
















3















I have a table of data which is be outputted into a stream, so the upper bound of the cost doesn't matter as much as the lower bound. The table contains about 200M rows. The question is how to structurize indexes and queries with order by such that Postgres would use the index instead of doing additional sorting.



Each entry is characterized by 5 fields each of which is not unique on its own:





  • date of occurrence (timestamp)


  • num (order within a timestamp)

  • type1

  • type2


  • id_in_type2 (never used without type2)


A typical query for this data is bounded by date (say, from 1 day before now until now) and zero or more (about 5 actually) filters by any combination of type1, type2 and id_in_type2 connected by OR, and expects a stream of data (it doesn't matter how long it takes to output it completely, but it should start streaming as fast as possible), example:



SELECT *
FROM MY_TABLE
WHERE date > ?
AND (
(type1 = ?)
OR (type2 = ?)
OR (type2 = ? AND associated_id = ?)
OR ...
)
ORDER BY date, num;


When there are no additional filters or there is just one, a simple index by (date, num, type1, type2, associated_id) works perfectly - since the results are stored along the sort order of the index, there's no problem to start outputting the results.



However, when things get more complicated, no combination of structures of my indexes and my queries seem to yield satisfying results.
Here's roughly what I tried so far:



Indexes: I tried having a separate index for (date, num) and 3 separate ones for other fields (like in the example below), I tried adding date to filters 2-4, i tried putting date and num as first fields in the index as well as the last ones.



CREATE INDEX my_table_1 ON my_table (date, num);
CREATE INDEX my_table_2 ON my_table (type1);
CREATE INDEX my_table_3 ON my_table (type2, id_in_type2);
CREATE INDEX my_table_4 ON my_table (type1, type2, id_in_type2);


Queries: I tried to select by date range and then by filters, I tried putting the filter by date to the end, I tried splitting the filters to separate ones and make a union.



No matter what I do, it won't apply the filters along the (date, num) index order, it will always do the sorting, yielding lower cost about the same as the upper cost.



Oh yes the table is also partitioned by date.



Any ideas on structurizing the indexes? Maybe I'd be better off returning all results with matching date and make the filtering in the app that's reading the stream?



Here's an execution plan for a naively constructed query with index setting as shown above (I ran this one on a test DB which is not partitioned and has 2M rows total, I'm presuming that on production table numbers will be different but orders of magnitude of difference between lower cost and upper cost are going to be about the same)



explain analyze
select * from my_table
where date >= now() - '1 day'::interval
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;

Sort (cost=157714.25..158431.63 rows=286949 width=262) (actual time=2153.327..2313.075 rows=600314 loops=1)
Sort Key: date, num
Sort Method: external merge Disk: 144376kB
-> Bitmap Heap Scan on my_table (cost=16602.76..96390.20 rows=286949 width=262) (actual time=199.193..1281.214 rows=600314 loops=1)
Recheck Cond: (((type1)::text = 'T1A'::text) OR (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text)) OR ((type1)::text = 'T1C'::text) OR (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text)))
Rows Removed by Index Recheck: 455892
Filter: ((date >= (now() - '1 day'::interval)) AND ((((type2)::text = 'T2A'::text) AND ((type1)::text = 'T1A'::text)) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B1'::text)) OR ((type1)::text = 'T1C'::text) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B2'::text))))
Rows Removed by Filter: 2
Heap Blocks: exact=42763 lossy=26546
-> BitmapOr (cost=16602.76..16602.76 rows=299384 width=0) (actual time=188.980..188.980 rows=0 loops=1)
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.037..0.037 rows=30 loops=1)
Index Cond: ((type1)::text = 'T1A'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8094.98 rows=148643 width=0) (actual time=93.562..93.562 rows=300240 loops=1)
Index Cond: (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text))
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.059..0.059 rows=28 loops=1)
Index Cond: ((type1)::text = 'T1C'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8211.96 rows=150741 width=0) (actual time=95.315..95.315 rows=300018 loops=1)
Index Cond: (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text))
Planning time: 0.435 ms
Execution time: 2356.305 ms


The query may return 100s of thousands of rows.










share|improve this question
















bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Please also show the execution plan for the desirable query plan, where you just specify date and chop off all the OR conditions.

    – jjanes
    Nov 17 '16 at 21:02











  • Put all 4 fields in the index starting with the date,num sort key.

    – eckes
    Apr 23 '17 at 13:58











  • I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.

    – joanolo
    Apr 23 '17 at 20:31














3












3








3








I have a table of data which is be outputted into a stream, so the upper bound of the cost doesn't matter as much as the lower bound. The table contains about 200M rows. The question is how to structurize indexes and queries with order by such that Postgres would use the index instead of doing additional sorting.



Each entry is characterized by 5 fields each of which is not unique on its own:





  • date of occurrence (timestamp)


  • num (order within a timestamp)

  • type1

  • type2


  • id_in_type2 (never used without type2)


A typical query for this data is bounded by date (say, from 1 day before now until now) and zero or more (about 5 actually) filters by any combination of type1, type2 and id_in_type2 connected by OR, and expects a stream of data (it doesn't matter how long it takes to output it completely, but it should start streaming as fast as possible), example:



SELECT *
FROM MY_TABLE
WHERE date > ?
AND (
(type1 = ?)
OR (type2 = ?)
OR (type2 = ? AND associated_id = ?)
OR ...
)
ORDER BY date, num;


When there are no additional filters or there is just one, a simple index by (date, num, type1, type2, associated_id) works perfectly - since the results are stored along the sort order of the index, there's no problem to start outputting the results.



However, when things get more complicated, no combination of structures of my indexes and my queries seem to yield satisfying results.
Here's roughly what I tried so far:



Indexes: I tried having a separate index for (date, num) and 3 separate ones for other fields (like in the example below), I tried adding date to filters 2-4, i tried putting date and num as first fields in the index as well as the last ones.



CREATE INDEX my_table_1 ON my_table (date, num);
CREATE INDEX my_table_2 ON my_table (type1);
CREATE INDEX my_table_3 ON my_table (type2, id_in_type2);
CREATE INDEX my_table_4 ON my_table (type1, type2, id_in_type2);


Queries: I tried to select by date range and then by filters, I tried putting the filter by date to the end, I tried splitting the filters to separate ones and make a union.



No matter what I do, it won't apply the filters along the (date, num) index order, it will always do the sorting, yielding lower cost about the same as the upper cost.



Oh yes the table is also partitioned by date.



Any ideas on structurizing the indexes? Maybe I'd be better off returning all results with matching date and make the filtering in the app that's reading the stream?



Here's an execution plan for a naively constructed query with index setting as shown above (I ran this one on a test DB which is not partitioned and has 2M rows total, I'm presuming that on production table numbers will be different but orders of magnitude of difference between lower cost and upper cost are going to be about the same)



explain analyze
select * from my_table
where date >= now() - '1 day'::interval
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;

Sort (cost=157714.25..158431.63 rows=286949 width=262) (actual time=2153.327..2313.075 rows=600314 loops=1)
Sort Key: date, num
Sort Method: external merge Disk: 144376kB
-> Bitmap Heap Scan on my_table (cost=16602.76..96390.20 rows=286949 width=262) (actual time=199.193..1281.214 rows=600314 loops=1)
Recheck Cond: (((type1)::text = 'T1A'::text) OR (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text)) OR ((type1)::text = 'T1C'::text) OR (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text)))
Rows Removed by Index Recheck: 455892
Filter: ((date >= (now() - '1 day'::interval)) AND ((((type2)::text = 'T2A'::text) AND ((type1)::text = 'T1A'::text)) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B1'::text)) OR ((type1)::text = 'T1C'::text) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B2'::text))))
Rows Removed by Filter: 2
Heap Blocks: exact=42763 lossy=26546
-> BitmapOr (cost=16602.76..16602.76 rows=299384 width=0) (actual time=188.980..188.980 rows=0 loops=1)
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.037..0.037 rows=30 loops=1)
Index Cond: ((type1)::text = 'T1A'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8094.98 rows=148643 width=0) (actual time=93.562..93.562 rows=300240 loops=1)
Index Cond: (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text))
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.059..0.059 rows=28 loops=1)
Index Cond: ((type1)::text = 'T1C'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8211.96 rows=150741 width=0) (actual time=95.315..95.315 rows=300018 loops=1)
Index Cond: (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text))
Planning time: 0.435 ms
Execution time: 2356.305 ms


The query may return 100s of thousands of rows.










share|improve this question
















I have a table of data which is be outputted into a stream, so the upper bound of the cost doesn't matter as much as the lower bound. The table contains about 200M rows. The question is how to structurize indexes and queries with order by such that Postgres would use the index instead of doing additional sorting.



Each entry is characterized by 5 fields each of which is not unique on its own:





  • date of occurrence (timestamp)


  • num (order within a timestamp)

  • type1

  • type2


  • id_in_type2 (never used without type2)


A typical query for this data is bounded by date (say, from 1 day before now until now) and zero or more (about 5 actually) filters by any combination of type1, type2 and id_in_type2 connected by OR, and expects a stream of data (it doesn't matter how long it takes to output it completely, but it should start streaming as fast as possible), example:



SELECT *
FROM MY_TABLE
WHERE date > ?
AND (
(type1 = ?)
OR (type2 = ?)
OR (type2 = ? AND associated_id = ?)
OR ...
)
ORDER BY date, num;


When there are no additional filters or there is just one, a simple index by (date, num, type1, type2, associated_id) works perfectly - since the results are stored along the sort order of the index, there's no problem to start outputting the results.



However, when things get more complicated, no combination of structures of my indexes and my queries seem to yield satisfying results.
Here's roughly what I tried so far:



Indexes: I tried having a separate index for (date, num) and 3 separate ones for other fields (like in the example below), I tried adding date to filters 2-4, i tried putting date and num as first fields in the index as well as the last ones.



CREATE INDEX my_table_1 ON my_table (date, num);
CREATE INDEX my_table_2 ON my_table (type1);
CREATE INDEX my_table_3 ON my_table (type2, id_in_type2);
CREATE INDEX my_table_4 ON my_table (type1, type2, id_in_type2);


Queries: I tried to select by date range and then by filters, I tried putting the filter by date to the end, I tried splitting the filters to separate ones and make a union.



No matter what I do, it won't apply the filters along the (date, num) index order, it will always do the sorting, yielding lower cost about the same as the upper cost.



Oh yes the table is also partitioned by date.



Any ideas on structurizing the indexes? Maybe I'd be better off returning all results with matching date and make the filtering in the app that's reading the stream?



Here's an execution plan for a naively constructed query with index setting as shown above (I ran this one on a test DB which is not partitioned and has 2M rows total, I'm presuming that on production table numbers will be different but orders of magnitude of difference between lower cost and upper cost are going to be about the same)



explain analyze
select * from my_table
where date >= now() - '1 day'::interval
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;

Sort (cost=157714.25..158431.63 rows=286949 width=262) (actual time=2153.327..2313.075 rows=600314 loops=1)
Sort Key: date, num
Sort Method: external merge Disk: 144376kB
-> Bitmap Heap Scan on my_table (cost=16602.76..96390.20 rows=286949 width=262) (actual time=199.193..1281.214 rows=600314 loops=1)
Recheck Cond: (((type1)::text = 'T1A'::text) OR (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text)) OR ((type1)::text = 'T1C'::text) OR (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text)))
Rows Removed by Index Recheck: 455892
Filter: ((date >= (now() - '1 day'::interval)) AND ((((type2)::text = 'T2A'::text) AND ((type1)::text = 'T1A'::text)) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B1'::text)) OR ((type1)::text = 'T1C'::text) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B2'::text))))
Rows Removed by Filter: 2
Heap Blocks: exact=42763 lossy=26546
-> BitmapOr (cost=16602.76..16602.76 rows=299384 width=0) (actual time=188.980..188.980 rows=0 loops=1)
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.037..0.037 rows=30 loops=1)
Index Cond: ((type1)::text = 'T1A'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8094.98 rows=148643 width=0) (actual time=93.562..93.562 rows=300240 loops=1)
Index Cond: (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text))
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.059..0.059 rows=28 loops=1)
Index Cond: ((type1)::text = 'T1C'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8211.96 rows=150741 width=0) (actual time=95.315..95.315 rows=300018 loops=1)
Index Cond: (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text))
Planning time: 0.435 ms
Execution time: 2356.305 ms


The query may return 100s of thousands of rows.







postgresql query-performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 23 '17 at 17:49









Paul White

52.2k14278450




52.2k14278450










asked Nov 16 '16 at 8:51









SergeySergey

1162




1162





bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • Please also show the execution plan for the desirable query plan, where you just specify date and chop off all the OR conditions.

    – jjanes
    Nov 17 '16 at 21:02











  • Put all 4 fields in the index starting with the date,num sort key.

    – eckes
    Apr 23 '17 at 13:58











  • I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.

    – joanolo
    Apr 23 '17 at 20:31



















  • Please also show the execution plan for the desirable query plan, where you just specify date and chop off all the OR conditions.

    – jjanes
    Nov 17 '16 at 21:02











  • Put all 4 fields in the index starting with the date,num sort key.

    – eckes
    Apr 23 '17 at 13:58











  • I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.

    – joanolo
    Apr 23 '17 at 20:31

















Please also show the execution plan for the desirable query plan, where you just specify date and chop off all the OR conditions.

– jjanes
Nov 17 '16 at 21:02





Please also show the execution plan for the desirable query plan, where you just specify date and chop off all the OR conditions.

– jjanes
Nov 17 '16 at 21:02













Put all 4 fields in the index starting with the date,num sort key.

– eckes
Apr 23 '17 at 13:58





Put all 4 fields in the index starting with the date,num sort key.

– eckes
Apr 23 '17 at 13:58













I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.

– joanolo
Apr 23 '17 at 20:31





I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.

– joanolo
Apr 23 '17 at 20:31










2 Answers
2






active

oldest

votes


















0














Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?



Example:



select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2A'
and type1 = 'T1A'

union all
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2B'
and type1 = 'T1B1'
...





share|improve this answer

































    0














    I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval, I would compute the constant value on the server side and actually try the query like this:



    explain analyze     
    select * from my_table
    where date >= '2017-04-22'
    and (
    (type2 = 'T2A' and type1 = 'T1A')
    or (type2 = 'T2B' and type1 = 'T1B1')
    or (type1 = 'T1C')
    or (type2 = 'T2B' and type1 = 'T1B2')
    )
    order by date, num;


    In some experiences, this makes a difference with regard to partitions





    Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):



    SELECT
    *
    FROM
    (SELECT
    *
    FROM
    my_table
    WHERE
    -- This where is *likely* to use index
    date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
    and
    type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
    ) AS s0
    WHERE
    -- Now you actually fine-filter result
    (type1 = 'T1A' and type2 = 'T2A')
    or (type1 = 'T1B1' and type2 = 'T2B')
    or (type1 = 'T1B2' and type2 = 'T2B')
    or (type1 = 'T1C')
    ORDER BY
    date, num;


    I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.



    These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.






    share|improve this answer



















    • 1





      The derived table won't make a difference. It the where clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though

      – a_horse_with_no_name
      Apr 24 '17 at 6:13











    • In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.

      – joanolo
      Apr 24 '17 at 6:55











    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f155427%2fhow-to-optimize-an-ordered-date-range-query-with-several-or-filters-for-streami%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?



    Example:



    select * from my_table
    where date >= now() - '1 day'::interval
    and type2 = 'T2A'
    and type1 = 'T1A'

    union all
    select * from my_table
    where date >= now() - '1 day'::interval
    and type2 = 'T2B'
    and type1 = 'T1B1'
    ...





    share|improve this answer






























      0














      Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?



      Example:



      select * from my_table
      where date >= now() - '1 day'::interval
      and type2 = 'T2A'
      and type1 = 'T1A'

      union all
      select * from my_table
      where date >= now() - '1 day'::interval
      and type2 = 'T2B'
      and type1 = 'T1B1'
      ...





      share|improve this answer




























        0












        0








        0







        Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?



        Example:



        select * from my_table
        where date >= now() - '1 day'::interval
        and type2 = 'T2A'
        and type1 = 'T1A'

        union all
        select * from my_table
        where date >= now() - '1 day'::interval
        and type2 = 'T2B'
        and type1 = 'T1B1'
        ...





        share|improve this answer















        Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?



        Example:



        select * from my_table
        where date >= now() - '1 day'::interval
        and type2 = 'T2A'
        and type1 = 'T1A'

        union all
        select * from my_table
        where date >= now() - '1 day'::interval
        and type2 = 'T2B'
        and type1 = 'T1B1'
        ...






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 23 '17 at 17:46









        Paul White

        52.2k14278450




        52.2k14278450










        answered Apr 23 '17 at 13:53









        oraclequestions7oraclequestions7

        1




        1

























            0














            I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval, I would compute the constant value on the server side and actually try the query like this:



            explain analyze     
            select * from my_table
            where date >= '2017-04-22'
            and (
            (type2 = 'T2A' and type1 = 'T1A')
            or (type2 = 'T2B' and type1 = 'T1B1')
            or (type1 = 'T1C')
            or (type2 = 'T2B' and type1 = 'T1B2')
            )
            order by date, num;


            In some experiences, this makes a difference with regard to partitions





            Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):



            SELECT
            *
            FROM
            (SELECT
            *
            FROM
            my_table
            WHERE
            -- This where is *likely* to use index
            date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
            and
            type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
            ) AS s0
            WHERE
            -- Now you actually fine-filter result
            (type1 = 'T1A' and type2 = 'T2A')
            or (type1 = 'T1B1' and type2 = 'T2B')
            or (type1 = 'T1B2' and type2 = 'T2B')
            or (type1 = 'T1C')
            ORDER BY
            date, num;


            I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.



            These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.






            share|improve this answer



















            • 1





              The derived table won't make a difference. It the where clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though

              – a_horse_with_no_name
              Apr 24 '17 at 6:13











            • In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.

              – joanolo
              Apr 24 '17 at 6:55
















            0














            I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval, I would compute the constant value on the server side and actually try the query like this:



            explain analyze     
            select * from my_table
            where date >= '2017-04-22'
            and (
            (type2 = 'T2A' and type1 = 'T1A')
            or (type2 = 'T2B' and type1 = 'T1B1')
            or (type1 = 'T1C')
            or (type2 = 'T2B' and type1 = 'T1B2')
            )
            order by date, num;


            In some experiences, this makes a difference with regard to partitions





            Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):



            SELECT
            *
            FROM
            (SELECT
            *
            FROM
            my_table
            WHERE
            -- This where is *likely* to use index
            date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
            and
            type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
            ) AS s0
            WHERE
            -- Now you actually fine-filter result
            (type1 = 'T1A' and type2 = 'T2A')
            or (type1 = 'T1B1' and type2 = 'T2B')
            or (type1 = 'T1B2' and type2 = 'T2B')
            or (type1 = 'T1C')
            ORDER BY
            date, num;


            I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.



            These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.






            share|improve this answer



















            • 1





              The derived table won't make a difference. It the where clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though

              – a_horse_with_no_name
              Apr 24 '17 at 6:13











            • In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.

              – joanolo
              Apr 24 '17 at 6:55














            0












            0








            0







            I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval, I would compute the constant value on the server side and actually try the query like this:



            explain analyze     
            select * from my_table
            where date >= '2017-04-22'
            and (
            (type2 = 'T2A' and type1 = 'T1A')
            or (type2 = 'T2B' and type1 = 'T1B1')
            or (type1 = 'T1C')
            or (type2 = 'T2B' and type1 = 'T1B2')
            )
            order by date, num;


            In some experiences, this makes a difference with regard to partitions





            Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):



            SELECT
            *
            FROM
            (SELECT
            *
            FROM
            my_table
            WHERE
            -- This where is *likely* to use index
            date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
            and
            type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
            ) AS s0
            WHERE
            -- Now you actually fine-filter result
            (type1 = 'T1A' and type2 = 'T2A')
            or (type1 = 'T1B1' and type2 = 'T2B')
            or (type1 = 'T1B2' and type2 = 'T2B')
            or (type1 = 'T1C')
            ORDER BY
            date, num;


            I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.



            These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.






            share|improve this answer













            I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval, I would compute the constant value on the server side and actually try the query like this:



            explain analyze     
            select * from my_table
            where date >= '2017-04-22'
            and (
            (type2 = 'T2A' and type1 = 'T1A')
            or (type2 = 'T2B' and type1 = 'T1B1')
            or (type1 = 'T1C')
            or (type2 = 'T2B' and type1 = 'T1B2')
            )
            order by date, num;


            In some experiences, this makes a difference with regard to partitions





            Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):



            SELECT
            *
            FROM
            (SELECT
            *
            FROM
            my_table
            WHERE
            -- This where is *likely* to use index
            date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
            and
            type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
            ) AS s0
            WHERE
            -- Now you actually fine-filter result
            (type1 = 'T1A' and type2 = 'T2A')
            or (type1 = 'T1B1' and type2 = 'T2B')
            or (type1 = 'T1B2' and type2 = 'T2B')
            or (type1 = 'T1C')
            ORDER BY
            date, num;


            I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.



            These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Apr 23 '17 at 21:14









            joanolojoanolo

            9,73342153




            9,73342153








            • 1





              The derived table won't make a difference. It the where clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though

              – a_horse_with_no_name
              Apr 24 '17 at 6:13











            • In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.

              – joanolo
              Apr 24 '17 at 6:55














            • 1





              The derived table won't make a difference. It the where clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though

              – a_horse_with_no_name
              Apr 24 '17 at 6:13











            • In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.

              – joanolo
              Apr 24 '17 at 6:55








            1




            1





            The derived table won't make a difference. It the where clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though

            – a_horse_with_no_name
            Apr 24 '17 at 6:13





            The derived table won't make a difference. It the where clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though

            – a_horse_with_no_name
            Apr 24 '17 at 6:13













            In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.

            – joanolo
            Apr 24 '17 at 6:55





            In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.

            – joanolo
            Apr 24 '17 at 6:55


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f155427%2fhow-to-optimize-an-ordered-date-range-query-with-several-or-filters-for-streami%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Anexo:Material bélico de la Fuerza Aérea de Chile Índice Aeronaves Defensa...

            Always On Availability groups resolving state after failover - Remote harden of transaction...

            update json value to null Announcing the arrival of Valued Associate #679: Cesar Manara ...