Oracle: Physically reorganizing rows of a huge table by a different keyHow do databases store index key...
Connection Between Knot Theory and Number Theory
What should be the ideal length of sentences in a blog post for ease of reading?
Do native speakers use "ultima" and "proxima" frequently in spoken English?
Is there a page on which I can view all Sitecore jobs running?
How can I, as DM, avoid the Conga Line of Death occurring when implementing some form of flanking rule?
What is the purpose of using a decision tree?
Air travel with refrigerated insulin
Highest stage count that are used one right after the other?
Why is participating in the European Parliamentary elections used as a threat?
Should a narrator ever describe things based on a character's view instead of facts?
Friend wants my recommendation but I don't want to give it to him
Offset in split text content
Toggle window scroll bar
What's the meaning of "what it means for {something} to be {something}"?
How do you justify more code being written by following clean code practices?
1 John in Luther’s Bibel
Why doesn't Gödel's incompleteness theorem apply to false statements?
What is the meaning of "You've never met a graph you didn't like?"
Why is "la Gestapo" feminine?
Can a Knock spell open the door to Mordenkainen's Magnificent Mansion?
Capacitor electron flow
How are passwords stolen from companies if they only store hashes?
Sort with assumptions
What 1968 Moog synthesizer was used in the Movie Apollo 11?
Oracle: Physically reorganizing rows of a huge table by a different key
How do databases store index key values (on-disk) for variable length fields?What logical structure is used for storing rows in Oracle?How does PostgreSQL physically order new records on disk (after a cluster on primary key)?How to partition an Oracle table with 65 million rows?PostgreSQL: How does the implicit index work when using “COPY” for loading bulk rows into a newly created table?Calculating table sizeInnodb full table ScanFeasibility of Partitioning existing tables with huge data in OracleDoes a table with an index ever slow down?Local indexes vs Global indexes for partitioned tables in Oracle
I have a huge table RECORD -- a few billion rows -- which contains data loaded continuously in a lot of batches. I can assign a primary key RECORD_ID from sequence, or use natural composite key BATCH_NUM + RECORD_NUM, which gives me a nice, unfragmented table where a batch with thousands of records only spans over several physical blocks, which is useful for next batch-aware step of processing.
But later I need to group together rows from arbitrary batches based on another field value, some RECORD_KEY. There are hundreds of millions of different RECORD_KEYs spread across rows of the table, therefore if I want to pick 10 rows with the same RECORD_KEY, it will almost certainly require retrieval of 10 physical blocks. Due to the table size this also cannot easily be cached so it is 10 physical reads (plus overhead on index traversal). This is obviously very slow.
Example:
Table RECORD
RECORD_IDis primary key from sequence
BATCH_NUMandRECORD_NUMare unique constraint (and could be a primary key)
RECORD_KEYis an indexed column
Sample data:
| RECORD_ID | BATCH_NUM | RECORD_NUM | RECORD_KEY
| 1 | 1 | 1 | 987654321
| 2 | 1 | 2 | 876543219
| 3 | 1 | 3 | 765432198
| 4 | 2 | 1 | 654321987
| 5 | 2 | 2 | 543219876
...
| 100000006 | 3000003 | 2 | 432198765
| 100000007 | 3000003 | 3 | 876543219
| 100000008 | 3000003 | 4 | 321987654
...
| 200000009 | 6000004 | 3 | 219876543
| 200000010 | 6000004 | 4 | 876543219
| 200000011 | 6000004 | 5 | 198765432
...
This SQL command will be fast because it will only have to retrieve one physical block:
select RECORD_ID from RECORD where BATCH_NUM = 1
This SQL command will be slow because it will have to retrieve three physical blocks -- one for each retrieved row:
select RECORD_ID from RECORD where RECORD_KEY = 876543219
EDIT:
The above is just an example. Typically I would have:
- thousands of rows per BATCH_NUM
- tens of rows per RECORD_KEY
- thousands of rows per lookup, by BATCH_NUM of RECORD_KEY
SQLs look like this, both retrieving around 1000 rows:
select RECORD_ID from RECORD where BATCH_NUM = 123456
select RECORD_ID from RECORD where RECORD_KEY in (
select COLUMN_VALUE from TABLE(batch_num_tbl) -- 100 values
)
The execution plans look both reasonable and similar -- using the respective index. However while the BATCH_NUM lookup takes under 1 second to execute, RECORD_KEY lookup takes around 20 seconds.
Options which I considered:
- Partitioning: I could do of hash partitioning on
RECORD_KEY. It would put the records a bit closer to each other, increasing the chance of them being in the same block, and enabling some partition-wise joins. Might help a bit but will not resolve the problem for me. - Index-organized table: Since it requires primary key, I would have to make primary key
RECORD_KEY + RECORD_ID. Also the lookup onBATCH_NUMwould then become slow. - Re-inserting the rows: A rather ugly solution, where I would regularly select rows ordered by
RECORD_KEY, delete them and insert append them, effectively putting them next to each other.
Please note I am not any experienced DBA, just a developer stuck with this non-trivial DB problem.
oracle index oracle-12c
bumped to the homepage by Community♦ 9 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
I have a huge table RECORD -- a few billion rows -- which contains data loaded continuously in a lot of batches. I can assign a primary key RECORD_ID from sequence, or use natural composite key BATCH_NUM + RECORD_NUM, which gives me a nice, unfragmented table where a batch with thousands of records only spans over several physical blocks, which is useful for next batch-aware step of processing.
But later I need to group together rows from arbitrary batches based on another field value, some RECORD_KEY. There are hundreds of millions of different RECORD_KEYs spread across rows of the table, therefore if I want to pick 10 rows with the same RECORD_KEY, it will almost certainly require retrieval of 10 physical blocks. Due to the table size this also cannot easily be cached so it is 10 physical reads (plus overhead on index traversal). This is obviously very slow.
Example:
Table RECORD
RECORD_IDis primary key from sequence
BATCH_NUMandRECORD_NUMare unique constraint (and could be a primary key)
RECORD_KEYis an indexed column
Sample data:
| RECORD_ID | BATCH_NUM | RECORD_NUM | RECORD_KEY
| 1 | 1 | 1 | 987654321
| 2 | 1 | 2 | 876543219
| 3 | 1 | 3 | 765432198
| 4 | 2 | 1 | 654321987
| 5 | 2 | 2 | 543219876
...
| 100000006 | 3000003 | 2 | 432198765
| 100000007 | 3000003 | 3 | 876543219
| 100000008 | 3000003 | 4 | 321987654
...
| 200000009 | 6000004 | 3 | 219876543
| 200000010 | 6000004 | 4 | 876543219
| 200000011 | 6000004 | 5 | 198765432
...
This SQL command will be fast because it will only have to retrieve one physical block:
select RECORD_ID from RECORD where BATCH_NUM = 1
This SQL command will be slow because it will have to retrieve three physical blocks -- one for each retrieved row:
select RECORD_ID from RECORD where RECORD_KEY = 876543219
EDIT:
The above is just an example. Typically I would have:
- thousands of rows per BATCH_NUM
- tens of rows per RECORD_KEY
- thousands of rows per lookup, by BATCH_NUM of RECORD_KEY
SQLs look like this, both retrieving around 1000 rows:
select RECORD_ID from RECORD where BATCH_NUM = 123456
select RECORD_ID from RECORD where RECORD_KEY in (
select COLUMN_VALUE from TABLE(batch_num_tbl) -- 100 values
)
The execution plans look both reasonable and similar -- using the respective index. However while the BATCH_NUM lookup takes under 1 second to execute, RECORD_KEY lookup takes around 20 seconds.
Options which I considered:
- Partitioning: I could do of hash partitioning on
RECORD_KEY. It would put the records a bit closer to each other, increasing the chance of them being in the same block, and enabling some partition-wise joins. Might help a bit but will not resolve the problem for me. - Index-organized table: Since it requires primary key, I would have to make primary key
RECORD_KEY + RECORD_ID. Also the lookup onBATCH_NUMwould then become slow. - Re-inserting the rows: A rather ugly solution, where I would regularly select rows ordered by
RECORD_KEY, delete them and insert append them, effectively putting them next to each other.
Please note I am not any experienced DBA, just a developer stuck with this non-trivial DB problem.
oracle index oracle-12c
bumped to the homepage by Community♦ 9 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query onRECORD_KEY?
– mustaccio
Mar 30 '17 at 20:22
@rdfozz I edited the question to answer your comments.
– Blaf
Mar 31 '17 at 5:30
Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?
– Grimaldi
Mar 31 '17 at 6:50
I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.
– Raj
Mar 31 '17 at 11:56
add a comment |
I have a huge table RECORD -- a few billion rows -- which contains data loaded continuously in a lot of batches. I can assign a primary key RECORD_ID from sequence, or use natural composite key BATCH_NUM + RECORD_NUM, which gives me a nice, unfragmented table where a batch with thousands of records only spans over several physical blocks, which is useful for next batch-aware step of processing.
But later I need to group together rows from arbitrary batches based on another field value, some RECORD_KEY. There are hundreds of millions of different RECORD_KEYs spread across rows of the table, therefore if I want to pick 10 rows with the same RECORD_KEY, it will almost certainly require retrieval of 10 physical blocks. Due to the table size this also cannot easily be cached so it is 10 physical reads (plus overhead on index traversal). This is obviously very slow.
Example:
Table RECORD
RECORD_IDis primary key from sequence
BATCH_NUMandRECORD_NUMare unique constraint (and could be a primary key)
RECORD_KEYis an indexed column
Sample data:
| RECORD_ID | BATCH_NUM | RECORD_NUM | RECORD_KEY
| 1 | 1 | 1 | 987654321
| 2 | 1 | 2 | 876543219
| 3 | 1 | 3 | 765432198
| 4 | 2 | 1 | 654321987
| 5 | 2 | 2 | 543219876
...
| 100000006 | 3000003 | 2 | 432198765
| 100000007 | 3000003 | 3 | 876543219
| 100000008 | 3000003 | 4 | 321987654
...
| 200000009 | 6000004 | 3 | 219876543
| 200000010 | 6000004 | 4 | 876543219
| 200000011 | 6000004 | 5 | 198765432
...
This SQL command will be fast because it will only have to retrieve one physical block:
select RECORD_ID from RECORD where BATCH_NUM = 1
This SQL command will be slow because it will have to retrieve three physical blocks -- one for each retrieved row:
select RECORD_ID from RECORD where RECORD_KEY = 876543219
EDIT:
The above is just an example. Typically I would have:
- thousands of rows per BATCH_NUM
- tens of rows per RECORD_KEY
- thousands of rows per lookup, by BATCH_NUM of RECORD_KEY
SQLs look like this, both retrieving around 1000 rows:
select RECORD_ID from RECORD where BATCH_NUM = 123456
select RECORD_ID from RECORD where RECORD_KEY in (
select COLUMN_VALUE from TABLE(batch_num_tbl) -- 100 values
)
The execution plans look both reasonable and similar -- using the respective index. However while the BATCH_NUM lookup takes under 1 second to execute, RECORD_KEY lookup takes around 20 seconds.
Options which I considered:
- Partitioning: I could do of hash partitioning on
RECORD_KEY. It would put the records a bit closer to each other, increasing the chance of them being in the same block, and enabling some partition-wise joins. Might help a bit but will not resolve the problem for me. - Index-organized table: Since it requires primary key, I would have to make primary key
RECORD_KEY + RECORD_ID. Also the lookup onBATCH_NUMwould then become slow. - Re-inserting the rows: A rather ugly solution, where I would regularly select rows ordered by
RECORD_KEY, delete them and insert append them, effectively putting them next to each other.
Please note I am not any experienced DBA, just a developer stuck with this non-trivial DB problem.
oracle index oracle-12c
I have a huge table RECORD -- a few billion rows -- which contains data loaded continuously in a lot of batches. I can assign a primary key RECORD_ID from sequence, or use natural composite key BATCH_NUM + RECORD_NUM, which gives me a nice, unfragmented table where a batch with thousands of records only spans over several physical blocks, which is useful for next batch-aware step of processing.
But later I need to group together rows from arbitrary batches based on another field value, some RECORD_KEY. There are hundreds of millions of different RECORD_KEYs spread across rows of the table, therefore if I want to pick 10 rows with the same RECORD_KEY, it will almost certainly require retrieval of 10 physical blocks. Due to the table size this also cannot easily be cached so it is 10 physical reads (plus overhead on index traversal). This is obviously very slow.
Example:
Table RECORD
RECORD_IDis primary key from sequence
BATCH_NUMandRECORD_NUMare unique constraint (and could be a primary key)
RECORD_KEYis an indexed column
Sample data:
| RECORD_ID | BATCH_NUM | RECORD_NUM | RECORD_KEY
| 1 | 1 | 1 | 987654321
| 2 | 1 | 2 | 876543219
| 3 | 1 | 3 | 765432198
| 4 | 2 | 1 | 654321987
| 5 | 2 | 2 | 543219876
...
| 100000006 | 3000003 | 2 | 432198765
| 100000007 | 3000003 | 3 | 876543219
| 100000008 | 3000003 | 4 | 321987654
...
| 200000009 | 6000004 | 3 | 219876543
| 200000010 | 6000004 | 4 | 876543219
| 200000011 | 6000004 | 5 | 198765432
...
This SQL command will be fast because it will only have to retrieve one physical block:
select RECORD_ID from RECORD where BATCH_NUM = 1
This SQL command will be slow because it will have to retrieve three physical blocks -- one for each retrieved row:
select RECORD_ID from RECORD where RECORD_KEY = 876543219
EDIT:
The above is just an example. Typically I would have:
- thousands of rows per BATCH_NUM
- tens of rows per RECORD_KEY
- thousands of rows per lookup, by BATCH_NUM of RECORD_KEY
SQLs look like this, both retrieving around 1000 rows:
select RECORD_ID from RECORD where BATCH_NUM = 123456
select RECORD_ID from RECORD where RECORD_KEY in (
select COLUMN_VALUE from TABLE(batch_num_tbl) -- 100 values
)
The execution plans look both reasonable and similar -- using the respective index. However while the BATCH_NUM lookup takes under 1 second to execute, RECORD_KEY lookup takes around 20 seconds.
Options which I considered:
- Partitioning: I could do of hash partitioning on
RECORD_KEY. It would put the records a bit closer to each other, increasing the chance of them being in the same block, and enabling some partition-wise joins. Might help a bit but will not resolve the problem for me. - Index-organized table: Since it requires primary key, I would have to make primary key
RECORD_KEY + RECORD_ID. Also the lookup onBATCH_NUMwould then become slow. - Re-inserting the rows: A rather ugly solution, where I would regularly select rows ordered by
RECORD_KEY, delete them and insert append them, effectively putting them next to each other.
Please note I am not any experienced DBA, just a developer stuck with this non-trivial DB problem.
oracle index oracle-12c
oracle index oracle-12c
edited Mar 31 '17 at 5:28
Blaf
asked Mar 30 '17 at 20:19
BlafBlaf
1064
1064
bumped to the homepage by Community♦ 9 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 9 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query onRECORD_KEY?
– mustaccio
Mar 30 '17 at 20:22
@rdfozz I edited the question to answer your comments.
– Blaf
Mar 31 '17 at 5:30
Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?
– Grimaldi
Mar 31 '17 at 6:50
I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.
– Raj
Mar 31 '17 at 11:56
add a comment |
How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query onRECORD_KEY?
– mustaccio
Mar 30 '17 at 20:22
@rdfozz I edited the question to answer your comments.
– Blaf
Mar 31 '17 at 5:30
Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?
– Grimaldi
Mar 31 '17 at 6:50
I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.
– Raj
Mar 31 '17 at 11:56
How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query on
RECORD_KEY?– mustaccio
Mar 30 '17 at 20:22
How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query on
RECORD_KEY?– mustaccio
Mar 30 '17 at 20:22
@rdfozz I edited the question to answer your comments.
– Blaf
Mar 31 '17 at 5:30
@rdfozz I edited the question to answer your comments.
– Blaf
Mar 31 '17 at 5:30
Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?
– Grimaldi
Mar 31 '17 at 6:50
Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?
– Grimaldi
Mar 31 '17 at 6:50
I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.
– Raj
Mar 31 '17 at 11:56
I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.
– Raj
Mar 31 '17 at 11:56
add a comment |
2 Answers
2
active
oldest
votes
Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.
Be careful about, what impact this index has on your load performance.
You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.
– Blaf
Mar 31 '17 at 7:38
Fixed that error. Also added, that you can add more columns
– Grimaldi
Mar 31 '17 at 7:55
add a comment |
If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:
create table foo(
record_id integer not null constraint u_foo unique
, record_key integer not null
, record_id_grp integer not null -- artificial clustering column
, bar varchar(100) default lpad('A',10,'A')
, constraint pk_foo primary key(record_id_grp,record_key,record_id)
, check(record_id_grp=trunc(record_id/10000)) ) organization index;
create trigger trg_foo
before insert on foo
for each row
begin
:new.record_id_grp := trunc(:new.record_id/10000);
end;
/
insert into foo(record_id, record_key)
select level, floor(dbms_random.value(1,1000))
from dual
connect by level<=100000;
select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
from foo
where record_key=10;
| COUNT(*) |
| -------: |
| 106 |
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
| PLAN_TABLE_OUTPUT |
| :------------------------------------------------------------------------------------ |
| SQL_ID b1za5v8u2ccxj, child number 0 |
| ------------------------------------- |
| select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
| foo where record_key=10 |
| |
| Plan hash value: 2217311945 |
| |
| ------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
| ------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
| | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
| |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
| ------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 2 - access("RECORD_KEY"=10) |
| filter("RECORD_KEY"=10) |
| |
dbfiddle here
Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.
Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f168688%2foracle-physically-reorganizing-rows-of-a-huge-table-by-a-different-key%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.
Be careful about, what impact this index has on your load performance.
You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.
– Blaf
Mar 31 '17 at 7:38
Fixed that error. Also added, that you can add more columns
– Grimaldi
Mar 31 '17 at 7:55
add a comment |
Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.
Be careful about, what impact this index has on your load performance.
You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.
– Blaf
Mar 31 '17 at 7:38
Fixed that error. Also added, that you can add more columns
– Grimaldi
Mar 31 '17 at 7:55
add a comment |
Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.
Be careful about, what impact this index has on your load performance.
Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.
Be careful about, what impact this index has on your load performance.
edited Mar 31 '17 at 7:54
answered Mar 31 '17 at 6:54
GrimaldiGrimaldi
33316
33316
You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.
– Blaf
Mar 31 '17 at 7:38
Fixed that error. Also added, that you can add more columns
– Grimaldi
Mar 31 '17 at 7:55
add a comment |
You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.
– Blaf
Mar 31 '17 at 7:38
Fixed that error. Also added, that you can add more columns
– Grimaldi
Mar 31 '17 at 7:55
You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.
– Blaf
Mar 31 '17 at 7:38
You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.
– Blaf
Mar 31 '17 at 7:38
Fixed that error. Also added, that you can add more columns
– Grimaldi
Mar 31 '17 at 7:55
Fixed that error. Also added, that you can add more columns
– Grimaldi
Mar 31 '17 at 7:55
add a comment |
If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:
create table foo(
record_id integer not null constraint u_foo unique
, record_key integer not null
, record_id_grp integer not null -- artificial clustering column
, bar varchar(100) default lpad('A',10,'A')
, constraint pk_foo primary key(record_id_grp,record_key,record_id)
, check(record_id_grp=trunc(record_id/10000)) ) organization index;
create trigger trg_foo
before insert on foo
for each row
begin
:new.record_id_grp := trunc(:new.record_id/10000);
end;
/
insert into foo(record_id, record_key)
select level, floor(dbms_random.value(1,1000))
from dual
connect by level<=100000;
select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
from foo
where record_key=10;
| COUNT(*) |
| -------: |
| 106 |
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
| PLAN_TABLE_OUTPUT |
| :------------------------------------------------------------------------------------ |
| SQL_ID b1za5v8u2ccxj, child number 0 |
| ------------------------------------- |
| select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
| foo where record_key=10 |
| |
| Plan hash value: 2217311945 |
| |
| ------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
| ------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
| | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
| |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
| ------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 2 - access("RECORD_KEY"=10) |
| filter("RECORD_KEY"=10) |
| |
dbfiddle here
Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.
Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.
add a comment |
If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:
create table foo(
record_id integer not null constraint u_foo unique
, record_key integer not null
, record_id_grp integer not null -- artificial clustering column
, bar varchar(100) default lpad('A',10,'A')
, constraint pk_foo primary key(record_id_grp,record_key,record_id)
, check(record_id_grp=trunc(record_id/10000)) ) organization index;
create trigger trg_foo
before insert on foo
for each row
begin
:new.record_id_grp := trunc(:new.record_id/10000);
end;
/
insert into foo(record_id, record_key)
select level, floor(dbms_random.value(1,1000))
from dual
connect by level<=100000;
select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
from foo
where record_key=10;
| COUNT(*) |
| -------: |
| 106 |
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
| PLAN_TABLE_OUTPUT |
| :------------------------------------------------------------------------------------ |
| SQL_ID b1za5v8u2ccxj, child number 0 |
| ------------------------------------- |
| select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
| foo where record_key=10 |
| |
| Plan hash value: 2217311945 |
| |
| ------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
| ------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
| | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
| |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
| ------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 2 - access("RECORD_KEY"=10) |
| filter("RECORD_KEY"=10) |
| |
dbfiddle here
Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.
Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.
add a comment |
If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:
create table foo(
record_id integer not null constraint u_foo unique
, record_key integer not null
, record_id_grp integer not null -- artificial clustering column
, bar varchar(100) default lpad('A',10,'A')
, constraint pk_foo primary key(record_id_grp,record_key,record_id)
, check(record_id_grp=trunc(record_id/10000)) ) organization index;
create trigger trg_foo
before insert on foo
for each row
begin
:new.record_id_grp := trunc(:new.record_id/10000);
end;
/
insert into foo(record_id, record_key)
select level, floor(dbms_random.value(1,1000))
from dual
connect by level<=100000;
select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
from foo
where record_key=10;
| COUNT(*) |
| -------: |
| 106 |
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
| PLAN_TABLE_OUTPUT |
| :------------------------------------------------------------------------------------ |
| SQL_ID b1za5v8u2ccxj, child number 0 |
| ------------------------------------- |
| select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
| foo where record_key=10 |
| |
| Plan hash value: 2217311945 |
| |
| ------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
| ------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
| | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
| |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
| ------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 2 - access("RECORD_KEY"=10) |
| filter("RECORD_KEY"=10) |
| |
dbfiddle here
Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.
Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.
If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:
create table foo(
record_id integer not null constraint u_foo unique
, record_key integer not null
, record_id_grp integer not null -- artificial clustering column
, bar varchar(100) default lpad('A',10,'A')
, constraint pk_foo primary key(record_id_grp,record_key,record_id)
, check(record_id_grp=trunc(record_id/10000)) ) organization index;
create trigger trg_foo
before insert on foo
for each row
begin
:new.record_id_grp := trunc(:new.record_id/10000);
end;
/
insert into foo(record_id, record_key)
select level, floor(dbms_random.value(1,1000))
from dual
connect by level<=100000;
select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
from foo
where record_key=10;
| COUNT(*) |
| -------: |
| 106 |
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
| PLAN_TABLE_OUTPUT |
| :------------------------------------------------------------------------------------ |
| SQL_ID b1za5v8u2ccxj, child number 0 |
| ------------------------------------- |
| select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
| foo where record_key=10 |
| |
| Plan hash value: 2217311945 |
| |
| ------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
| ------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
| | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
| |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
| ------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 2 - access("RECORD_KEY"=10) |
| filter("RECORD_KEY"=10) |
| |
dbfiddle here
Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.
Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.
edited Mar 31 '17 at 8:48
answered Mar 31 '17 at 8:15
Jack Douglas♦Jack Douglas
28.2k1076152
28.2k1076152
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f168688%2foracle-physically-reorganizing-rows-of-a-huge-table-by-a-different-key%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query on
RECORD_KEY?– mustaccio
Mar 30 '17 at 20:22
@rdfozz I edited the question to answer your comments.
– Blaf
Mar 31 '17 at 5:30
Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?
– Grimaldi
Mar 31 '17 at 6:50
I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.
– Raj
Mar 31 '17 at 11:56