Oracle: Physically reorganizing rows of a huge table by a different keyHow do databases store index key...

Connection Between Knot Theory and Number Theory

What should be the ideal length of sentences in a blog post for ease of reading?

Do native speakers use "ultima" and "proxima" frequently in spoken English?

Is there a page on which I can view all Sitecore jobs running?

How can I, as DM, avoid the Conga Line of Death occurring when implementing some form of flanking rule?

What is the purpose of using a decision tree?

Air travel with refrigerated insulin

Highest stage count that are used one right after the other?

Why is participating in the European Parliamentary elections used as a threat?

Should a narrator ever describe things based on a character's view instead of facts?

Friend wants my recommendation but I don't want to give it to him

Offset in split text content

Toggle window scroll bar

What's the meaning of "what it means for {something} to be {something}"?

How do you justify more code being written by following clean code practices?

1 John in Luther’s Bibel

Why doesn't Gödel's incompleteness theorem apply to false statements?

What is the meaning of "You've never met a graph you didn't like?"

Why is "la Gestapo" feminine?

Can a Knock spell open the door to Mordenkainen's Magnificent Mansion?

Capacitor electron flow

How are passwords stolen from companies if they only store hashes?

Sort with assumptions

What 1968 Moog synthesizer was used in the Movie Apollo 11?



Oracle: Physically reorganizing rows of a huge table by a different key


How do databases store index key values (on-disk) for variable length fields?What logical structure is used for storing rows in Oracle?How does PostgreSQL physically order new records on disk (after a cluster on primary key)?How to partition an Oracle table with 65 million rows?PostgreSQL: How does the implicit index work when using “COPY” for loading bulk rows into a newly created table?Calculating table sizeInnodb full table ScanFeasibility of Partitioning existing tables with huge data in OracleDoes a table with an index ever slow down?Local indexes vs Global indexes for partitioned tables in Oracle













1















I have a huge table RECORD -- a few billion rows -- which contains data loaded continuously in a lot of batches. I can assign a primary key RECORD_ID from sequence, or use natural composite key BATCH_NUM + RECORD_NUM, which gives me a nice, unfragmented table where a batch with thousands of records only spans over several physical blocks, which is useful for next batch-aware step of processing.



But later I need to group together rows from arbitrary batches based on another field value, some RECORD_KEY. There are hundreds of millions of different RECORD_KEYs spread across rows of the table, therefore if I want to pick 10 rows with the same RECORD_KEY, it will almost certainly require retrieval of 10 physical blocks. Due to the table size this also cannot easily be cached so it is 10 physical reads (plus overhead on index traversal). This is obviously very slow.





Example:



Table RECORD





  • RECORD_ID is primary key from sequence


  • BATCH_NUM and RECORD_NUM are unique constraint (and could be a primary key)


  • RECORD_KEY is an indexed column


Sample data:



| RECORD_ID | BATCH_NUM | RECORD_NUM | RECORD_KEY
| 1 | 1 | 1 | 987654321
| 2 | 1 | 2 | 876543219
| 3 | 1 | 3 | 765432198
| 4 | 2 | 1 | 654321987
| 5 | 2 | 2 | 543219876
...
| 100000006 | 3000003 | 2 | 432198765
| 100000007 | 3000003 | 3 | 876543219
| 100000008 | 3000003 | 4 | 321987654
...
| 200000009 | 6000004 | 3 | 219876543
| 200000010 | 6000004 | 4 | 876543219
| 200000011 | 6000004 | 5 | 198765432
...


This SQL command will be fast because it will only have to retrieve one physical block:



select RECORD_ID from RECORD where BATCH_NUM = 1


This SQL command will be slow because it will have to retrieve three physical blocks -- one for each retrieved row:



select RECORD_ID from RECORD where RECORD_KEY = 876543219


EDIT:



The above is just an example. Typically I would have:




  • thousands of rows per BATCH_NUM

  • tens of rows per RECORD_KEY

  • thousands of rows per lookup, by BATCH_NUM of RECORD_KEY


SQLs look like this, both retrieving around 1000 rows:



select RECORD_ID from RECORD where BATCH_NUM = 123456

select RECORD_ID from RECORD where RECORD_KEY in (
select COLUMN_VALUE from TABLE(batch_num_tbl) -- 100 values
)


The execution plans look both reasonable and similar -- using the respective index. However while the BATCH_NUM lookup takes under 1 second to execute, RECORD_KEY lookup takes around 20 seconds.





Options which I considered:




  • Partitioning: I could do of hash partitioning on RECORD_KEY. It would put the records a bit closer to each other, increasing the chance of them being in the same block, and enabling some partition-wise joins. Might help a bit but will not resolve the problem for me.

  • Index-organized table: Since it requires primary key, I would have to make primary key RECORD_KEY + RECORD_ID. Also the lookup on BATCH_NUM would then become slow.

  • Re-inserting the rows: A rather ugly solution, where I would regularly select rows ordered by RECORD_KEY, delete them and insert append them, effectively putting them next to each other.


Please note I am not any experienced DBA, just a developer stuck with this non-trivial DB problem.










share|improve this question
















bumped to the homepage by Community 9 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query on RECORD_KEY?

    – mustaccio
    Mar 30 '17 at 20:22













  • @rdfozz I edited the question to answer your comments.

    – Blaf
    Mar 31 '17 at 5:30











  • Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?

    – Grimaldi
    Mar 31 '17 at 6:50











  • I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.

    – Raj
    Mar 31 '17 at 11:56
















1















I have a huge table RECORD -- a few billion rows -- which contains data loaded continuously in a lot of batches. I can assign a primary key RECORD_ID from sequence, or use natural composite key BATCH_NUM + RECORD_NUM, which gives me a nice, unfragmented table where a batch with thousands of records only spans over several physical blocks, which is useful for next batch-aware step of processing.



But later I need to group together rows from arbitrary batches based on another field value, some RECORD_KEY. There are hundreds of millions of different RECORD_KEYs spread across rows of the table, therefore if I want to pick 10 rows with the same RECORD_KEY, it will almost certainly require retrieval of 10 physical blocks. Due to the table size this also cannot easily be cached so it is 10 physical reads (plus overhead on index traversal). This is obviously very slow.





Example:



Table RECORD





  • RECORD_ID is primary key from sequence


  • BATCH_NUM and RECORD_NUM are unique constraint (and could be a primary key)


  • RECORD_KEY is an indexed column


Sample data:



| RECORD_ID | BATCH_NUM | RECORD_NUM | RECORD_KEY
| 1 | 1 | 1 | 987654321
| 2 | 1 | 2 | 876543219
| 3 | 1 | 3 | 765432198
| 4 | 2 | 1 | 654321987
| 5 | 2 | 2 | 543219876
...
| 100000006 | 3000003 | 2 | 432198765
| 100000007 | 3000003 | 3 | 876543219
| 100000008 | 3000003 | 4 | 321987654
...
| 200000009 | 6000004 | 3 | 219876543
| 200000010 | 6000004 | 4 | 876543219
| 200000011 | 6000004 | 5 | 198765432
...


This SQL command will be fast because it will only have to retrieve one physical block:



select RECORD_ID from RECORD where BATCH_NUM = 1


This SQL command will be slow because it will have to retrieve three physical blocks -- one for each retrieved row:



select RECORD_ID from RECORD where RECORD_KEY = 876543219


EDIT:



The above is just an example. Typically I would have:




  • thousands of rows per BATCH_NUM

  • tens of rows per RECORD_KEY

  • thousands of rows per lookup, by BATCH_NUM of RECORD_KEY


SQLs look like this, both retrieving around 1000 rows:



select RECORD_ID from RECORD where BATCH_NUM = 123456

select RECORD_ID from RECORD where RECORD_KEY in (
select COLUMN_VALUE from TABLE(batch_num_tbl) -- 100 values
)


The execution plans look both reasonable and similar -- using the respective index. However while the BATCH_NUM lookup takes under 1 second to execute, RECORD_KEY lookup takes around 20 seconds.





Options which I considered:




  • Partitioning: I could do of hash partitioning on RECORD_KEY. It would put the records a bit closer to each other, increasing the chance of them being in the same block, and enabling some partition-wise joins. Might help a bit but will not resolve the problem for me.

  • Index-organized table: Since it requires primary key, I would have to make primary key RECORD_KEY + RECORD_ID. Also the lookup on BATCH_NUM would then become slow.

  • Re-inserting the rows: A rather ugly solution, where I would regularly select rows ordered by RECORD_KEY, delete them and insert append them, effectively putting them next to each other.


Please note I am not any experienced DBA, just a developer stuck with this non-trivial DB problem.










share|improve this question
















bumped to the homepage by Community 9 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query on RECORD_KEY?

    – mustaccio
    Mar 30 '17 at 20:22













  • @rdfozz I edited the question to answer your comments.

    – Blaf
    Mar 31 '17 at 5:30











  • Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?

    – Grimaldi
    Mar 31 '17 at 6:50











  • I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.

    – Raj
    Mar 31 '17 at 11:56














1












1








1








I have a huge table RECORD -- a few billion rows -- which contains data loaded continuously in a lot of batches. I can assign a primary key RECORD_ID from sequence, or use natural composite key BATCH_NUM + RECORD_NUM, which gives me a nice, unfragmented table where a batch with thousands of records only spans over several physical blocks, which is useful for next batch-aware step of processing.



But later I need to group together rows from arbitrary batches based on another field value, some RECORD_KEY. There are hundreds of millions of different RECORD_KEYs spread across rows of the table, therefore if I want to pick 10 rows with the same RECORD_KEY, it will almost certainly require retrieval of 10 physical blocks. Due to the table size this also cannot easily be cached so it is 10 physical reads (plus overhead on index traversal). This is obviously very slow.





Example:



Table RECORD





  • RECORD_ID is primary key from sequence


  • BATCH_NUM and RECORD_NUM are unique constraint (and could be a primary key)


  • RECORD_KEY is an indexed column


Sample data:



| RECORD_ID | BATCH_NUM | RECORD_NUM | RECORD_KEY
| 1 | 1 | 1 | 987654321
| 2 | 1 | 2 | 876543219
| 3 | 1 | 3 | 765432198
| 4 | 2 | 1 | 654321987
| 5 | 2 | 2 | 543219876
...
| 100000006 | 3000003 | 2 | 432198765
| 100000007 | 3000003 | 3 | 876543219
| 100000008 | 3000003 | 4 | 321987654
...
| 200000009 | 6000004 | 3 | 219876543
| 200000010 | 6000004 | 4 | 876543219
| 200000011 | 6000004 | 5 | 198765432
...


This SQL command will be fast because it will only have to retrieve one physical block:



select RECORD_ID from RECORD where BATCH_NUM = 1


This SQL command will be slow because it will have to retrieve three physical blocks -- one for each retrieved row:



select RECORD_ID from RECORD where RECORD_KEY = 876543219


EDIT:



The above is just an example. Typically I would have:




  • thousands of rows per BATCH_NUM

  • tens of rows per RECORD_KEY

  • thousands of rows per lookup, by BATCH_NUM of RECORD_KEY


SQLs look like this, both retrieving around 1000 rows:



select RECORD_ID from RECORD where BATCH_NUM = 123456

select RECORD_ID from RECORD where RECORD_KEY in (
select COLUMN_VALUE from TABLE(batch_num_tbl) -- 100 values
)


The execution plans look both reasonable and similar -- using the respective index. However while the BATCH_NUM lookup takes under 1 second to execute, RECORD_KEY lookup takes around 20 seconds.





Options which I considered:




  • Partitioning: I could do of hash partitioning on RECORD_KEY. It would put the records a bit closer to each other, increasing the chance of them being in the same block, and enabling some partition-wise joins. Might help a bit but will not resolve the problem for me.

  • Index-organized table: Since it requires primary key, I would have to make primary key RECORD_KEY + RECORD_ID. Also the lookup on BATCH_NUM would then become slow.

  • Re-inserting the rows: A rather ugly solution, where I would regularly select rows ordered by RECORD_KEY, delete them and insert append them, effectively putting them next to each other.


Please note I am not any experienced DBA, just a developer stuck with this non-trivial DB problem.










share|improve this question
















I have a huge table RECORD -- a few billion rows -- which contains data loaded continuously in a lot of batches. I can assign a primary key RECORD_ID from sequence, or use natural composite key BATCH_NUM + RECORD_NUM, which gives me a nice, unfragmented table where a batch with thousands of records only spans over several physical blocks, which is useful for next batch-aware step of processing.



But later I need to group together rows from arbitrary batches based on another field value, some RECORD_KEY. There are hundreds of millions of different RECORD_KEYs spread across rows of the table, therefore if I want to pick 10 rows with the same RECORD_KEY, it will almost certainly require retrieval of 10 physical blocks. Due to the table size this also cannot easily be cached so it is 10 physical reads (plus overhead on index traversal). This is obviously very slow.





Example:



Table RECORD





  • RECORD_ID is primary key from sequence


  • BATCH_NUM and RECORD_NUM are unique constraint (and could be a primary key)


  • RECORD_KEY is an indexed column


Sample data:



| RECORD_ID | BATCH_NUM | RECORD_NUM | RECORD_KEY
| 1 | 1 | 1 | 987654321
| 2 | 1 | 2 | 876543219
| 3 | 1 | 3 | 765432198
| 4 | 2 | 1 | 654321987
| 5 | 2 | 2 | 543219876
...
| 100000006 | 3000003 | 2 | 432198765
| 100000007 | 3000003 | 3 | 876543219
| 100000008 | 3000003 | 4 | 321987654
...
| 200000009 | 6000004 | 3 | 219876543
| 200000010 | 6000004 | 4 | 876543219
| 200000011 | 6000004 | 5 | 198765432
...


This SQL command will be fast because it will only have to retrieve one physical block:



select RECORD_ID from RECORD where BATCH_NUM = 1


This SQL command will be slow because it will have to retrieve three physical blocks -- one for each retrieved row:



select RECORD_ID from RECORD where RECORD_KEY = 876543219


EDIT:



The above is just an example. Typically I would have:




  • thousands of rows per BATCH_NUM

  • tens of rows per RECORD_KEY

  • thousands of rows per lookup, by BATCH_NUM of RECORD_KEY


SQLs look like this, both retrieving around 1000 rows:



select RECORD_ID from RECORD where BATCH_NUM = 123456

select RECORD_ID from RECORD where RECORD_KEY in (
select COLUMN_VALUE from TABLE(batch_num_tbl) -- 100 values
)


The execution plans look both reasonable and similar -- using the respective index. However while the BATCH_NUM lookup takes under 1 second to execute, RECORD_KEY lookup takes around 20 seconds.





Options which I considered:




  • Partitioning: I could do of hash partitioning on RECORD_KEY. It would put the records a bit closer to each other, increasing the chance of them being in the same block, and enabling some partition-wise joins. Might help a bit but will not resolve the problem for me.

  • Index-organized table: Since it requires primary key, I would have to make primary key RECORD_KEY + RECORD_ID. Also the lookup on BATCH_NUM would then become slow.

  • Re-inserting the rows: A rather ugly solution, where I would regularly select rows ordered by RECORD_KEY, delete them and insert append them, effectively putting them next to each other.


Please note I am not any experienced DBA, just a developer stuck with this non-trivial DB problem.







oracle index oracle-12c






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 31 '17 at 5:28







Blaf

















asked Mar 30 '17 at 20:19









BlafBlaf

1064




1064





bumped to the homepage by Community 9 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 9 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query on RECORD_KEY?

    – mustaccio
    Mar 30 '17 at 20:22













  • @rdfozz I edited the question to answer your comments.

    – Blaf
    Mar 31 '17 at 5:30











  • Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?

    – Grimaldi
    Mar 31 '17 at 6:50











  • I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.

    – Raj
    Mar 31 '17 at 11:56



















  • How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query on RECORD_KEY?

    – mustaccio
    Mar 30 '17 at 20:22













  • @rdfozz I edited the question to answer your comments.

    – Blaf
    Mar 31 '17 at 5:30











  • Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?

    – Grimaldi
    Mar 31 '17 at 6:50











  • I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.

    – Raj
    Mar 31 '17 at 11:56

















How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query on RECORD_KEY?

– mustaccio
Mar 30 '17 at 20:22







How slow is "slow" when retrieving 3 physical blocks, and how "fast" you expect it to be? What does the explain plan look like for the query on RECORD_KEY?

– mustaccio
Mar 30 '17 at 20:22















@rdfozz I edited the question to answer your comments.

– Blaf
Mar 31 '17 at 5:30





@rdfozz I edited the question to answer your comments.

– Blaf
Mar 31 '17 at 5:30













Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?

– Grimaldi
Mar 31 '17 at 6:50





Do you have any other payload in your table? Given as above, I have the feeling, this is pointless. Do you lookup other things with the given natural or primary key?

– Grimaldi
Mar 31 '17 at 6:50













I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.

– Raj
Mar 31 '17 at 11:56





I am with @Grimaldi. There are many assumptions but few facts to analyze. Blaf why don't you try this with few million rows in a test db and see how it pans out and if your theory holds? Once you experiment, you can share results and then there could be even more constructive discussion.

– Raj
Mar 31 '17 at 11:56










2 Answers
2






active

oldest

votes


















0














Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.



Be careful about, what impact this index has on your load performance.






share|improve this answer


























  • You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.

    – Blaf
    Mar 31 '17 at 7:38











  • Fixed that error. Also added, that you can add more columns

    – Grimaldi
    Mar 31 '17 at 7:55



















0














If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:




create table foo(
record_id integer not null constraint u_foo unique
, record_key integer not null
, record_id_grp integer not null -- artificial clustering column
, bar varchar(100) default lpad('A',10,'A')
, constraint pk_foo primary key(record_id_grp,record_key,record_id)
, check(record_id_grp=trunc(record_id/10000)) ) organization index;




create trigger trg_foo
before insert on foo
for each row
begin
:new.record_id_grp := trunc(:new.record_id/10000);
end;
/




insert into foo(record_id, record_key)
select level, floor(dbms_random.value(1,1000))
from dual
connect by level<=100000;






select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
from foo
where record_key=10;



| COUNT(*) |
| -------: |
| 106 |






select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));



| PLAN_TABLE_OUTPUT |
| :------------------------------------------------------------------------------------ |
| SQL_ID b1za5v8u2ccxj, child number 0 |
| ------------------------------------- |
| select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
| foo where record_key=10 |
| |
| Plan hash value: 2217311945 |
| |
| ------------------------------------------------------------------------------------- |
| | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
| ------------------------------------------------------------------------------------- |
| | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
| | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
| |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
| ------------------------------------------------------------------------------------- |
| |
| Predicate Information (identified by operation id): |
| --------------------------------------------------- |
| |
| 2 - access("RECORD_KEY"=10) |
| filter("RECORD_KEY"=10) |
| |



dbfiddle here



Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.



Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.






share|improve this answer

























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f168688%2foracle-physically-reorganizing-rows-of-a-huge-table-by-a-different-key%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.



    Be careful about, what impact this index has on your load performance.






    share|improve this answer


























    • You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.

      – Blaf
      Mar 31 '17 at 7:38











    • Fixed that error. Also added, that you can add more columns

      – Grimaldi
      Mar 31 '17 at 7:55
















    0














    Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.



    Be careful about, what impact this index has on your load performance.






    share|improve this answer


























    • You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.

      – Blaf
      Mar 31 '17 at 7:38











    • Fixed that error. Also added, that you can add more columns

      – Grimaldi
      Mar 31 '17 at 7:55














    0












    0








    0







    Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.



    Be careful about, what impact this index has on your load performance.






    share|improve this answer















    Create an index on (record_key, record_id). This way oracle will most likely use the index only to answer your slow running query. You can add more columns to that index, so that you also get the needed content. Hopefully you don't need all columns.



    Be careful about, what impact this index has on your load performance.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 31 '17 at 7:54

























    answered Mar 31 '17 at 6:54









    GrimaldiGrimaldi

    33316




    33316













    • You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.

      – Blaf
      Mar 31 '17 at 7:38











    • Fixed that error. Also added, that you can add more columns

      – Grimaldi
      Mar 31 '17 at 7:55



















    • You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.

      – Blaf
      Mar 31 '17 at 7:38











    • Fixed that error. Also added, that you can add more columns

      – Grimaldi
      Mar 31 '17 at 7:55

















    You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.

    – Blaf
    Mar 31 '17 at 7:38





    You probably meant (RECORD_KEY, RECORD_ID) -- the slow query is on RECORD_KEY. It would make sense, but in the next step I will need to actually access some data columns from the record (excluded in my example), so this is not suitable for me.

    – Blaf
    Mar 31 '17 at 7:38













    Fixed that error. Also added, that you can add more columns

    – Grimaldi
    Mar 31 '17 at 7:55





    Fixed that error. Also added, that you can add more columns

    – Grimaldi
    Mar 31 '17 at 7:55













    0














    If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:




    create table foo(
    record_id integer not null constraint u_foo unique
    , record_key integer not null
    , record_id_grp integer not null -- artificial clustering column
    , bar varchar(100) default lpad('A',10,'A')
    , constraint pk_foo primary key(record_id_grp,record_key,record_id)
    , check(record_id_grp=trunc(record_id/10000)) ) organization index;




    create trigger trg_foo
    before insert on foo
    for each row
    begin
    :new.record_id_grp := trunc(:new.record_id/10000);
    end;
    /




    insert into foo(record_id, record_key)
    select level, floor(dbms_random.value(1,1000))
    from dual
    connect by level<=100000;






    select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
    from foo
    where record_key=10;



    | COUNT(*) |
    | -------: |
    | 106 |






    select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));



    | PLAN_TABLE_OUTPUT |
    | :------------------------------------------------------------------------------------ |
    | SQL_ID b1za5v8u2ccxj, child number 0 |
    | ------------------------------------- |
    | select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
    | foo where record_key=10 |
    | |
    | Plan hash value: 2217311945 |
    | |
    | ------------------------------------------------------------------------------------- |
    | | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
    | ------------------------------------------------------------------------------------- |
    | | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
    | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
    | |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
    | ------------------------------------------------------------------------------------- |
    | |
    | Predicate Information (identified by operation id): |
    | --------------------------------------------------- |
    | |
    | 2 - access("RECORD_KEY"=10) |
    | filter("RECORD_KEY"=10) |
    | |



    dbfiddle here



    Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.



    Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.






    share|improve this answer






























      0














      If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:




      create table foo(
      record_id integer not null constraint u_foo unique
      , record_key integer not null
      , record_id_grp integer not null -- artificial clustering column
      , bar varchar(100) default lpad('A',10,'A')
      , constraint pk_foo primary key(record_id_grp,record_key,record_id)
      , check(record_id_grp=trunc(record_id/10000)) ) organization index;




      create trigger trg_foo
      before insert on foo
      for each row
      begin
      :new.record_id_grp := trunc(:new.record_id/10000);
      end;
      /




      insert into foo(record_id, record_key)
      select level, floor(dbms_random.value(1,1000))
      from dual
      connect by level<=100000;






      select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
      from foo
      where record_key=10;



      | COUNT(*) |
      | -------: |
      | 106 |






      select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));



      | PLAN_TABLE_OUTPUT |
      | :------------------------------------------------------------------------------------ |
      | SQL_ID b1za5v8u2ccxj, child number 0 |
      | ------------------------------------- |
      | select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
      | foo where record_key=10 |
      | |
      | Plan hash value: 2217311945 |
      | |
      | ------------------------------------------------------------------------------------- |
      | | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
      | ------------------------------------------------------------------------------------- |
      | | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
      | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
      | |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
      | ------------------------------------------------------------------------------------- |
      | |
      | Predicate Information (identified by operation id): |
      | --------------------------------------------------- |
      | |
      | 2 - access("RECORD_KEY"=10) |
      | filter("RECORD_KEY"=10) |
      | |



      dbfiddle here



      Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.



      Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.






      share|improve this answer




























        0












        0








        0







        If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:




        create table foo(
        record_id integer not null constraint u_foo unique
        , record_key integer not null
        , record_id_grp integer not null -- artificial clustering column
        , bar varchar(100) default lpad('A',10,'A')
        , constraint pk_foo primary key(record_id_grp,record_key,record_id)
        , check(record_id_grp=trunc(record_id/10000)) ) organization index;




        create trigger trg_foo
        before insert on foo
        for each row
        begin
        :new.record_id_grp := trunc(:new.record_id/10000);
        end;
        /




        insert into foo(record_id, record_key)
        select level, floor(dbms_random.value(1,1000))
        from dual
        connect by level<=100000;






        select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
        from foo
        where record_key=10;



        | COUNT(*) |
        | -------: |
        | 106 |






        select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));



        | PLAN_TABLE_OUTPUT |
        | :------------------------------------------------------------------------------------ |
        | SQL_ID b1za5v8u2ccxj, child number 0 |
        | ------------------------------------- |
        | select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
        | foo where record_key=10 |
        | |
        | Plan hash value: 2217311945 |
        | |
        | ------------------------------------------------------------------------------------- |
        | | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
        | ------------------------------------------------------------------------------------- |
        | | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
        | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
        | |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
        | ------------------------------------------------------------------------------------- |
        | |
        | Predicate Information (identified by operation id): |
        | --------------------------------------------------- |
        | |
        | 2 - access("RECORD_KEY"=10) |
        | filter("RECORD_KEY"=10) |
        | |



        dbfiddle here



        Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.



        Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.






        share|improve this answer















        If you find that a simple covering index (which will solve all your query performance problems immediately) has an unacceptable impact on INSERT performance, you can introduce an artificial 'clustering' column to the table:




        create table foo(
        record_id integer not null constraint u_foo unique
        , record_key integer not null
        , record_id_grp integer not null -- artificial clustering column
        , bar varchar(100) default lpad('A',10,'A')
        , constraint pk_foo primary key(record_id_grp,record_key,record_id)
        , check(record_id_grp=trunc(record_id/10000)) ) organization index;




        create trigger trg_foo
        before insert on foo
        for each row
        begin
        :new.record_id_grp := trunc(:new.record_id/10000);
        end;
        /




        insert into foo(record_id, record_key)
        select level, floor(dbms_random.value(1,1000))
        from dual
        connect by level<=100000;






        select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*)
        from foo
        where record_key=10;



        | COUNT(*) |
        | -------: |
        | 106 |






        select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));



        | PLAN_TABLE_OUTPUT |
        | :------------------------------------------------------------------------------------ |
        | SQL_ID b1za5v8u2ccxj, child number 0 |
        | ------------------------------------- |
        | select /*+ gather_plan_statistics index_ss(foo pk_foo) */ count(*) from |
        | foo where record_key=10 |
        | |
        | Plan hash value: 2217311945 |
        | |
        | ------------------------------------------------------------------------------------- |
        | | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | |
        | ------------------------------------------------------------------------------------- |
        | | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | |
        | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | |
        | |* 2 | INDEX SKIP SCAN| PK_FOO | 1 | 100 | 106 |00:00:00.01 | 21 | |
        | ------------------------------------------------------------------------------------- |
        | |
        | Predicate Information (identified by operation id): |
        | --------------------------------------------------- |
        | |
        | 2 - access("RECORD_KEY"=10) |
        | filter("RECORD_KEY"=10) |
        | |



        dbfiddle here



        Notice that only 21 blocks are touched even though about 100 rows are returned. The degree of clustering defined by trunc(record_id/X) can be tuned to balance the speed of query on RECORD_KEY with the speed of INSERTS — you can try different values for the divisor. You may be lucky and find you can simply use BATCH_NUM for clustering instead.



        Note that you don't need to use an IOT — you could instead have a covering index leading with RECORD_ID_GRP,RECORD_KEY (or BATCH_NUM,RECORD_KEY). With billions of rows I am making the assumption that doubling storage size will come with a cost — a single IOT to achieve clustering will be the solution that takes the least space.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 31 '17 at 8:48

























        answered Mar 31 '17 at 8:15









        Jack DouglasJack Douglas

        28.2k1076152




        28.2k1076152






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f168688%2foracle-physically-reorganizing-rows-of-a-huge-table-by-a-different-key%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Parapolítica Índice Antecedentes El escándalo Proceso judicial Consecuencias Véase...

            How to remove border from elements in the last row?Targeting flex items on the last rowHow to vertically wrap...

            Tecnologías entrañables Índice Antecedentes Desarrollo Tecnologías Entrañables en la...