How to optimize an ordered date range query with several OR filters (for streaming)?Optimizing ORDER BY in a...
Stream.findFirst different than Optional.of?
What happens if you declare more than $10,000 at the US border?
The Longest Chess Game
Coworker is trying to get me to sign his petition to run for office. How to decline politely?
Badly designed reimbursement form. What does that say about the company?
我可不觉得 - agree or disagree?
Why is Bernie Sanders maximum accepted donation on actblue $5600?
In a world with multiracial creatures, what word can be used instead of mankind?
How to encircle section of matrix in LaTeX?
Found a major flaw in paper from home university – to which I would like to return
Have the conservatives lost the working majority and if so, what does this mean?
Boss asked me to sign a resignation paper without a date on it along with my new contract
Ramanujan's radical and how we define an infinite nested radical
Current measurement op-amp calculation
Discouraging missile alpha strikes
Which was the first story to feature space elevators?
How can I differentiate duration vs starting time
How can a kingdom keep the secret of a missing monarch from the public?
Define function that behaves almost identically to Mathematica function
Why would you use 2 alternate layout buttons instead of 1, when only one can be selected at once
Are all power cords made equal?
How do I know my password or backup information is not being shared when creating a new wallet?
How to not forget my phone in the bathroom?
How can changes in personality/values of a person who turned into a vampire be explained?
How to optimize an ordered date range query with several OR filters (for streaming)?
Optimizing ORDER BY in a full text search queryPostgreSQL 9.2 (PostGIS) performance problemHow to index WHERE (start_date >= '2013-12-15')How can I speed up a Postgres query containing lots of Joins with an ILIKE conditionpostgres explain plan with giant gaps between operationsSlow fulltext search due to wildly inaccurate row estimatespostgresql 9.2 hash join issueSorting killing my postgresql queryWhy is this query with WHERE, ORDER BY and LIMIT so slow?How to index two tables for JOINed query optimisation
I have a table of data which is be outputted into a stream, so the upper bound of the cost doesn't matter as much as the lower bound. The table contains about 200M rows. The question is how to structurize indexes and queries with order by
such that Postgres would use the index instead of doing additional sorting.
Each entry is characterized by 5 fields each of which is not unique on its own:
date
of occurrence (timestamp
)
num
(order within a timestamp)type1
type2
id_in_type2
(never used withouttype2
)
A typical query for this data is bounded by date (say, from 1 day before now until now) and zero or more (about 5 actually) filters by any combination of type1
, type2
and id_in_type2
connected by OR
, and expects a stream of data (it doesn't matter how long it takes to output it completely, but it should start streaming as fast as possible), example:
SELECT *
FROM MY_TABLE
WHERE date > ?
AND (
(type1 = ?)
OR (type2 = ?)
OR (type2 = ? AND associated_id = ?)
OR ...
)
ORDER BY date, num;
When there are no additional filters or there is just one, a simple index by (date, num, type1, type2, associated_id)
works perfectly - since the results are stored along the sort order of the index, there's no problem to start outputting the results.
However, when things get more complicated, no combination of structures of my indexes and my queries seem to yield satisfying results.
Here's roughly what I tried so far:
Indexes: I tried having a separate index for (date, num)
and 3 separate ones for other fields (like in the example below), I tried adding date
to filters 2-4, i tried putting date
and num
as first fields in the index as well as the last ones.
CREATE INDEX my_table_1 ON my_table (date, num);
CREATE INDEX my_table_2 ON my_table (type1);
CREATE INDEX my_table_3 ON my_table (type2, id_in_type2);
CREATE INDEX my_table_4 ON my_table (type1, type2, id_in_type2);
Queries: I tried to select by date range and then by filters, I tried putting the filter by date to the end, I tried splitting the filters to separate ones and make a union.
No matter what I do, it won't apply the filters along the (date, num)
index order, it will always do the sorting, yielding lower cost about the same as the upper cost.
Oh yes the table is also partitioned by date
.
Any ideas on structurizing the indexes? Maybe I'd be better off returning all results with matching date and make the filtering in the app that's reading the stream?
Here's an execution plan for a naively constructed query with index setting as shown above (I ran this one on a test DB which is not partitioned and has 2M rows total, I'm presuming that on production table numbers will be different but orders of magnitude of difference between lower cost and upper cost are going to be about the same)
explain analyze
select * from my_table
where date >= now() - '1 day'::interval
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;
Sort (cost=157714.25..158431.63 rows=286949 width=262) (actual time=2153.327..2313.075 rows=600314 loops=1)
Sort Key: date, num
Sort Method: external merge Disk: 144376kB
-> Bitmap Heap Scan on my_table (cost=16602.76..96390.20 rows=286949 width=262) (actual time=199.193..1281.214 rows=600314 loops=1)
Recheck Cond: (((type1)::text = 'T1A'::text) OR (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text)) OR ((type1)::text = 'T1C'::text) OR (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text)))
Rows Removed by Index Recheck: 455892
Filter: ((date >= (now() - '1 day'::interval)) AND ((((type2)::text = 'T2A'::text) AND ((type1)::text = 'T1A'::text)) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B1'::text)) OR ((type1)::text = 'T1C'::text) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B2'::text))))
Rows Removed by Filter: 2
Heap Blocks: exact=42763 lossy=26546
-> BitmapOr (cost=16602.76..16602.76 rows=299384 width=0) (actual time=188.980..188.980 rows=0 loops=1)
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.037..0.037 rows=30 loops=1)
Index Cond: ((type1)::text = 'T1A'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8094.98 rows=148643 width=0) (actual time=93.562..93.562 rows=300240 loops=1)
Index Cond: (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text))
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.059..0.059 rows=28 loops=1)
Index Cond: ((type1)::text = 'T1C'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8211.96 rows=150741 width=0) (actual time=95.315..95.315 rows=300018 loops=1)
Index Cond: (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text))
Planning time: 0.435 ms
Execution time: 2356.305 ms
The query may return 100s of thousands of rows.
postgresql query-performance
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
I have a table of data which is be outputted into a stream, so the upper bound of the cost doesn't matter as much as the lower bound. The table contains about 200M rows. The question is how to structurize indexes and queries with order by
such that Postgres would use the index instead of doing additional sorting.
Each entry is characterized by 5 fields each of which is not unique on its own:
date
of occurrence (timestamp
)
num
(order within a timestamp)type1
type2
id_in_type2
(never used withouttype2
)
A typical query for this data is bounded by date (say, from 1 day before now until now) and zero or more (about 5 actually) filters by any combination of type1
, type2
and id_in_type2
connected by OR
, and expects a stream of data (it doesn't matter how long it takes to output it completely, but it should start streaming as fast as possible), example:
SELECT *
FROM MY_TABLE
WHERE date > ?
AND (
(type1 = ?)
OR (type2 = ?)
OR (type2 = ? AND associated_id = ?)
OR ...
)
ORDER BY date, num;
When there are no additional filters or there is just one, a simple index by (date, num, type1, type2, associated_id)
works perfectly - since the results are stored along the sort order of the index, there's no problem to start outputting the results.
However, when things get more complicated, no combination of structures of my indexes and my queries seem to yield satisfying results.
Here's roughly what I tried so far:
Indexes: I tried having a separate index for (date, num)
and 3 separate ones for other fields (like in the example below), I tried adding date
to filters 2-4, i tried putting date
and num
as first fields in the index as well as the last ones.
CREATE INDEX my_table_1 ON my_table (date, num);
CREATE INDEX my_table_2 ON my_table (type1);
CREATE INDEX my_table_3 ON my_table (type2, id_in_type2);
CREATE INDEX my_table_4 ON my_table (type1, type2, id_in_type2);
Queries: I tried to select by date range and then by filters, I tried putting the filter by date to the end, I tried splitting the filters to separate ones and make a union.
No matter what I do, it won't apply the filters along the (date, num)
index order, it will always do the sorting, yielding lower cost about the same as the upper cost.
Oh yes the table is also partitioned by date
.
Any ideas on structurizing the indexes? Maybe I'd be better off returning all results with matching date and make the filtering in the app that's reading the stream?
Here's an execution plan for a naively constructed query with index setting as shown above (I ran this one on a test DB which is not partitioned and has 2M rows total, I'm presuming that on production table numbers will be different but orders of magnitude of difference between lower cost and upper cost are going to be about the same)
explain analyze
select * from my_table
where date >= now() - '1 day'::interval
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;
Sort (cost=157714.25..158431.63 rows=286949 width=262) (actual time=2153.327..2313.075 rows=600314 loops=1)
Sort Key: date, num
Sort Method: external merge Disk: 144376kB
-> Bitmap Heap Scan on my_table (cost=16602.76..96390.20 rows=286949 width=262) (actual time=199.193..1281.214 rows=600314 loops=1)
Recheck Cond: (((type1)::text = 'T1A'::text) OR (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text)) OR ((type1)::text = 'T1C'::text) OR (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text)))
Rows Removed by Index Recheck: 455892
Filter: ((date >= (now() - '1 day'::interval)) AND ((((type2)::text = 'T2A'::text) AND ((type1)::text = 'T1A'::text)) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B1'::text)) OR ((type1)::text = 'T1C'::text) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B2'::text))))
Rows Removed by Filter: 2
Heap Blocks: exact=42763 lossy=26546
-> BitmapOr (cost=16602.76..16602.76 rows=299384 width=0) (actual time=188.980..188.980 rows=0 loops=1)
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.037..0.037 rows=30 loops=1)
Index Cond: ((type1)::text = 'T1A'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8094.98 rows=148643 width=0) (actual time=93.562..93.562 rows=300240 loops=1)
Index Cond: (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text))
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.059..0.059 rows=28 loops=1)
Index Cond: ((type1)::text = 'T1C'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8211.96 rows=150741 width=0) (actual time=95.315..95.315 rows=300018 loops=1)
Index Cond: (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text))
Planning time: 0.435 ms
Execution time: 2356.305 ms
The query may return 100s of thousands of rows.
postgresql query-performance
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Please also show the execution plan for the desirable query plan, where you just specify date and chop off all theOR
conditions.
– jjanes
Nov 17 '16 at 21:02
Put all 4 fields in the index starting with the date,num sort key.
– eckes
Apr 23 '17 at 13:58
I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
– joanolo
Apr 23 '17 at 20:31
add a comment |
I have a table of data which is be outputted into a stream, so the upper bound of the cost doesn't matter as much as the lower bound. The table contains about 200M rows. The question is how to structurize indexes and queries with order by
such that Postgres would use the index instead of doing additional sorting.
Each entry is characterized by 5 fields each of which is not unique on its own:
date
of occurrence (timestamp
)
num
(order within a timestamp)type1
type2
id_in_type2
(never used withouttype2
)
A typical query for this data is bounded by date (say, from 1 day before now until now) and zero or more (about 5 actually) filters by any combination of type1
, type2
and id_in_type2
connected by OR
, and expects a stream of data (it doesn't matter how long it takes to output it completely, but it should start streaming as fast as possible), example:
SELECT *
FROM MY_TABLE
WHERE date > ?
AND (
(type1 = ?)
OR (type2 = ?)
OR (type2 = ? AND associated_id = ?)
OR ...
)
ORDER BY date, num;
When there are no additional filters or there is just one, a simple index by (date, num, type1, type2, associated_id)
works perfectly - since the results are stored along the sort order of the index, there's no problem to start outputting the results.
However, when things get more complicated, no combination of structures of my indexes and my queries seem to yield satisfying results.
Here's roughly what I tried so far:
Indexes: I tried having a separate index for (date, num)
and 3 separate ones for other fields (like in the example below), I tried adding date
to filters 2-4, i tried putting date
and num
as first fields in the index as well as the last ones.
CREATE INDEX my_table_1 ON my_table (date, num);
CREATE INDEX my_table_2 ON my_table (type1);
CREATE INDEX my_table_3 ON my_table (type2, id_in_type2);
CREATE INDEX my_table_4 ON my_table (type1, type2, id_in_type2);
Queries: I tried to select by date range and then by filters, I tried putting the filter by date to the end, I tried splitting the filters to separate ones and make a union.
No matter what I do, it won't apply the filters along the (date, num)
index order, it will always do the sorting, yielding lower cost about the same as the upper cost.
Oh yes the table is also partitioned by date
.
Any ideas on structurizing the indexes? Maybe I'd be better off returning all results with matching date and make the filtering in the app that's reading the stream?
Here's an execution plan for a naively constructed query with index setting as shown above (I ran this one on a test DB which is not partitioned and has 2M rows total, I'm presuming that on production table numbers will be different but orders of magnitude of difference between lower cost and upper cost are going to be about the same)
explain analyze
select * from my_table
where date >= now() - '1 day'::interval
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;
Sort (cost=157714.25..158431.63 rows=286949 width=262) (actual time=2153.327..2313.075 rows=600314 loops=1)
Sort Key: date, num
Sort Method: external merge Disk: 144376kB
-> Bitmap Heap Scan on my_table (cost=16602.76..96390.20 rows=286949 width=262) (actual time=199.193..1281.214 rows=600314 loops=1)
Recheck Cond: (((type1)::text = 'T1A'::text) OR (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text)) OR ((type1)::text = 'T1C'::text) OR (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text)))
Rows Removed by Index Recheck: 455892
Filter: ((date >= (now() - '1 day'::interval)) AND ((((type2)::text = 'T2A'::text) AND ((type1)::text = 'T1A'::text)) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B1'::text)) OR ((type1)::text = 'T1C'::text) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B2'::text))))
Rows Removed by Filter: 2
Heap Blocks: exact=42763 lossy=26546
-> BitmapOr (cost=16602.76..16602.76 rows=299384 width=0) (actual time=188.980..188.980 rows=0 loops=1)
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.037..0.037 rows=30 loops=1)
Index Cond: ((type1)::text = 'T1A'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8094.98 rows=148643 width=0) (actual time=93.562..93.562 rows=300240 loops=1)
Index Cond: (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text))
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.059..0.059 rows=28 loops=1)
Index Cond: ((type1)::text = 'T1C'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8211.96 rows=150741 width=0) (actual time=95.315..95.315 rows=300018 loops=1)
Index Cond: (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text))
Planning time: 0.435 ms
Execution time: 2356.305 ms
The query may return 100s of thousands of rows.
postgresql query-performance
I have a table of data which is be outputted into a stream, so the upper bound of the cost doesn't matter as much as the lower bound. The table contains about 200M rows. The question is how to structurize indexes and queries with order by
such that Postgres would use the index instead of doing additional sorting.
Each entry is characterized by 5 fields each of which is not unique on its own:
date
of occurrence (timestamp
)
num
(order within a timestamp)type1
type2
id_in_type2
(never used withouttype2
)
A typical query for this data is bounded by date (say, from 1 day before now until now) and zero or more (about 5 actually) filters by any combination of type1
, type2
and id_in_type2
connected by OR
, and expects a stream of data (it doesn't matter how long it takes to output it completely, but it should start streaming as fast as possible), example:
SELECT *
FROM MY_TABLE
WHERE date > ?
AND (
(type1 = ?)
OR (type2 = ?)
OR (type2 = ? AND associated_id = ?)
OR ...
)
ORDER BY date, num;
When there are no additional filters or there is just one, a simple index by (date, num, type1, type2, associated_id)
works perfectly - since the results are stored along the sort order of the index, there's no problem to start outputting the results.
However, when things get more complicated, no combination of structures of my indexes and my queries seem to yield satisfying results.
Here's roughly what I tried so far:
Indexes: I tried having a separate index for (date, num)
and 3 separate ones for other fields (like in the example below), I tried adding date
to filters 2-4, i tried putting date
and num
as first fields in the index as well as the last ones.
CREATE INDEX my_table_1 ON my_table (date, num);
CREATE INDEX my_table_2 ON my_table (type1);
CREATE INDEX my_table_3 ON my_table (type2, id_in_type2);
CREATE INDEX my_table_4 ON my_table (type1, type2, id_in_type2);
Queries: I tried to select by date range and then by filters, I tried putting the filter by date to the end, I tried splitting the filters to separate ones and make a union.
No matter what I do, it won't apply the filters along the (date, num)
index order, it will always do the sorting, yielding lower cost about the same as the upper cost.
Oh yes the table is also partitioned by date
.
Any ideas on structurizing the indexes? Maybe I'd be better off returning all results with matching date and make the filtering in the app that's reading the stream?
Here's an execution plan for a naively constructed query with index setting as shown above (I ran this one on a test DB which is not partitioned and has 2M rows total, I'm presuming that on production table numbers will be different but orders of magnitude of difference between lower cost and upper cost are going to be about the same)
explain analyze
select * from my_table
where date >= now() - '1 day'::interval
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;
Sort (cost=157714.25..158431.63 rows=286949 width=262) (actual time=2153.327..2313.075 rows=600314 loops=1)
Sort Key: date, num
Sort Method: external merge Disk: 144376kB
-> Bitmap Heap Scan on my_table (cost=16602.76..96390.20 rows=286949 width=262) (actual time=199.193..1281.214 rows=600314 loops=1)
Recheck Cond: (((type1)::text = 'T1A'::text) OR (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text)) OR ((type1)::text = 'T1C'::text) OR (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text)))
Rows Removed by Index Recheck: 455892
Filter: ((date >= (now() - '1 day'::interval)) AND ((((type2)::text = 'T2A'::text) AND ((type1)::text = 'T1A'::text)) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B1'::text)) OR ((type1)::text = 'T1C'::text) OR (((type2)::text = 'T2B'::text) AND ((type1)::text = 'T1B2'::text))))
Rows Removed by Filter: 2
Heap Blocks: exact=42763 lossy=26546
-> BitmapOr (cost=16602.76..16602.76 rows=299384 width=0) (actual time=188.980..188.980 rows=0 loops=1)
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.037..0.037 rows=30 loops=1)
Index Cond: ((type1)::text = 'T1A'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8094.98 rows=148643 width=0) (actual time=93.562..93.562 rows=300240 loops=1)
Index Cond: (((type1)::text = 'T1B1'::text) AND ((type2)::text = 'T2B'::text))
-> Bitmap Index Scan on my_table_2 (cost=0.00..4.44 rows=1 width=0) (actual time=0.059..0.059 rows=28 loops=1)
Index Cond: ((type1)::text = 'T1C'::text)
-> Bitmap Index Scan on my_table_4 (cost=0.00..8211.96 rows=150741 width=0) (actual time=95.315..95.315 rows=300018 loops=1)
Index Cond: (((type1)::text = 'T1B2'::text) AND ((type2)::text = 'T2B'::text))
Planning time: 0.435 ms
Execution time: 2356.305 ms
The query may return 100s of thousands of rows.
postgresql query-performance
postgresql query-performance
edited Apr 23 '17 at 17:49
Paul White♦
52.2k14278450
52.2k14278450
asked Nov 16 '16 at 8:51
SergeySergey
1162
1162
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 3 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Please also show the execution plan for the desirable query plan, where you just specify date and chop off all theOR
conditions.
– jjanes
Nov 17 '16 at 21:02
Put all 4 fields in the index starting with the date,num sort key.
– eckes
Apr 23 '17 at 13:58
I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
– joanolo
Apr 23 '17 at 20:31
add a comment |
Please also show the execution plan for the desirable query plan, where you just specify date and chop off all theOR
conditions.
– jjanes
Nov 17 '16 at 21:02
Put all 4 fields in the index starting with the date,num sort key.
– eckes
Apr 23 '17 at 13:58
I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
– joanolo
Apr 23 '17 at 20:31
Please also show the execution plan for the desirable query plan, where you just specify date and chop off all the
OR
conditions.– jjanes
Nov 17 '16 at 21:02
Please also show the execution plan for the desirable query plan, where you just specify date and chop off all the
OR
conditions.– jjanes
Nov 17 '16 at 21:02
Put all 4 fields in the index starting with the date,num sort key.
– eckes
Apr 23 '17 at 13:58
Put all 4 fields in the index starting with the date,num sort key.
– eckes
Apr 23 '17 at 13:58
I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
– joanolo
Apr 23 '17 at 20:31
I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
– joanolo
Apr 23 '17 at 20:31
add a comment |
2 Answers
2
active
oldest
votes
Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?
Example:
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2A'
and type1 = 'T1A'
union all
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2B'
and type1 = 'T1B1'
...
add a comment |
I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval
, I would compute the constant value on the server side and actually try the query like this:
explain analyze
select * from my_table
where date >= '2017-04-22'
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;
In some experiences, this makes a difference with regard to partitions
Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):
SELECT
*
FROM
(SELECT
*
FROM
my_table
WHERE
-- This where is *likely* to use index
date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
and
type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
) AS s0
WHERE
-- Now you actually fine-filter result
(type1 = 'T1A' and type2 = 'T2A')
or (type1 = 'T1B1' and type2 = 'T2B')
or (type1 = 'T1B2' and type2 = 'T2B')
or (type1 = 'T1C')
ORDER BY
date, num;
I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.
These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
1
The derived table won't make a difference. It thewhere
clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though
– a_horse_with_no_name
Apr 24 '17 at 6:13
In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.
– joanolo
Apr 24 '17 at 6:55
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f155427%2fhow-to-optimize-an-ordered-date-range-query-with-several-or-filters-for-streami%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?
Example:
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2A'
and type1 = 'T1A'
union all
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2B'
and type1 = 'T1B1'
...
add a comment |
Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?
Example:
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2A'
and type1 = 'T1A'
union all
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2B'
and type1 = 'T1B1'
...
add a comment |
Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?
Example:
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2A'
and type1 = 'T1A'
union all
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2B'
and type1 = 'T1B1'
...
Have you tried making 4 separate queries, and then using union all, so that you can get rid of the or statements?
Example:
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2A'
and type1 = 'T1A'
union all
select * from my_table
where date >= now() - '1 day'::interval
and type2 = 'T2B'
and type1 = 'T1B1'
...
edited Apr 23 '17 at 17:46
Paul White♦
52.2k14278450
52.2k14278450
answered Apr 23 '17 at 13:53
oraclequestions7oraclequestions7
1
1
add a comment |
add a comment |
I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval
, I would compute the constant value on the server side and actually try the query like this:
explain analyze
select * from my_table
where date >= '2017-04-22'
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;
In some experiences, this makes a difference with regard to partitions
Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):
SELECT
*
FROM
(SELECT
*
FROM
my_table
WHERE
-- This where is *likely* to use index
date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
and
type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
) AS s0
WHERE
-- Now you actually fine-filter result
(type1 = 'T1A' and type2 = 'T2A')
or (type1 = 'T1B1' and type2 = 'T2B')
or (type1 = 'T1B2' and type2 = 'T2B')
or (type1 = 'T1C')
ORDER BY
date, num;
I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.
These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
1
The derived table won't make a difference. It thewhere
clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though
– a_horse_with_no_name
Apr 24 '17 at 6:13
In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.
– joanolo
Apr 24 '17 at 6:55
add a comment |
I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval
, I would compute the constant value on the server side and actually try the query like this:
explain analyze
select * from my_table
where date >= '2017-04-22'
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;
In some experiences, this makes a difference with regard to partitions
Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):
SELECT
*
FROM
(SELECT
*
FROM
my_table
WHERE
-- This where is *likely* to use index
date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
and
type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
) AS s0
WHERE
-- Now you actually fine-filter result
(type1 = 'T1A' and type2 = 'T2A')
or (type1 = 'T1B1' and type2 = 'T2B')
or (type1 = 'T1B2' and type2 = 'T2B')
or (type1 = 'T1C')
ORDER BY
date, num;
I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.
These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
1
The derived table won't make a difference. It thewhere
clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though
– a_horse_with_no_name
Apr 24 '17 at 6:13
In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.
– joanolo
Apr 24 '17 at 6:55
add a comment |
I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval
, I would compute the constant value on the server side and actually try the query like this:
explain analyze
select * from my_table
where date >= '2017-04-22'
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;
In some experiences, this makes a difference with regard to partitions
Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):
SELECT
*
FROM
(SELECT
*
FROM
my_table
WHERE
-- This where is *likely* to use index
date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
and
type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
) AS s0
WHERE
-- Now you actually fine-filter result
(type1 = 'T1A' and type2 = 'T2A')
or (type1 = 'T1B1' and type2 = 'T2B')
or (type1 = 'T1B2' and type2 = 'T2B')
or (type1 = 'T1C')
ORDER BY
date, num;
I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.
These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
I would try a couple of things. First: instead of using an expression date >= now() - '1 day'::interval
, I would compute the constant value on the server side and actually try the query like this:
explain analyze
select * from my_table
where date >= '2017-04-22'
and (
(type2 = 'T2A' and type1 = 'T1A')
or (type2 = 'T2B' and type1 = 'T1B1')
or (type1 = 'T1C')
or (type2 = 'T2B' and type1 = 'T1B2')
)
order by date, num;
In some experiences, this makes a difference with regard to partitions
Secondly, I'd try to have the query in two stages, to see if this makes it easier for PostgreSQL to use the index more efficiently (if that is actually possible):
SELECT
*
FROM
(SELECT
*
FROM
my_table
WHERE
-- This where is *likely* to use index
date >= '2017-04-22' -- Use a CONSTANT (compute it on the client side)
and
type1 IN ('T1A', 'T1B1', 'T1C', 'T1B2')
) AS s0
WHERE
-- Now you actually fine-filter result
(type1 = 'T1A' and type2 = 'T2A')
or (type1 = 'T1B1' and type2 = 'T2B')
or (type1 = 'T1B2' and type2 = 'T2B')
or (type1 = 'T1C')
ORDER BY
date, num;
I've trie to simulate your scenario in DBFiddle, but, not knowing your specific data distribution... I don't manage to really simulate it well enough. In any case, in my simulation, the query worked differently when changing the function call with a constant.
These are a bit of "wild guesses". As already commented: I am not sure if PostgreSQL will start sending data to the client before the server has completed processing the whole query... You can probably speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
answered Apr 23 '17 at 21:14
joanolojoanolo
9,73342153
9,73342153
1
The derived table won't make a difference. It thewhere
clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though
– a_horse_with_no_name
Apr 24 '17 at 6:13
In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.
– joanolo
Apr 24 '17 at 6:55
add a comment |
1
The derived table won't make a difference. It thewhere
clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though
– a_horse_with_no_name
Apr 24 '17 at 6:13
In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.
– joanolo
Apr 24 '17 at 6:55
1
1
The derived table won't make a difference. It the
where
clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though– a_horse_with_no_name
Apr 24 '17 at 6:13
The derived table won't make a difference. It the
where
clause the the inner query really reduces the number of rows substantially, then putting that into a CTE might make a difference though– a_horse_with_no_name
Apr 24 '17 at 6:13
In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.
– joanolo
Apr 24 '17 at 6:55
In the query plan of PostgreSQL 9.6, it just pushes the condition and ANDs both WHEREs.
– joanolo
Apr 24 '17 at 6:55
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f155427%2fhow-to-optimize-an-ordered-date-range-query-with-several-or-filters-for-streami%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please also show the execution plan for the desirable query plan, where you just specify date and chop off all the
OR
conditions.– jjanes
Nov 17 '16 at 21:02
Put all 4 fields in the index starting with the date,num sort key.
– eckes
Apr 23 '17 at 13:58
I am not sure if PostgreSQL will start sending data to the client before the server has completed... So, you can speed-up the query, but I am not sure at all that the server will start sending any data to the client until it has processed all the query.
– joanolo
Apr 23 '17 at 20:31