How to partition existing table in postgres?postgres partition trigger and checking for child tablePostgres -...
How does Artisan's Blessing handle rusted and mistreated weapons?
Are encryption algorithms with fixed-point free permutations inherently flawed?
I hate taking lectures, can I still survive in academia?
What is the benefit of assigning null to some parameters in construct?
Short story where Earth is given a racist governor who likes species of a certain color
Why would you use 2 alternate layout buttons instead of 1, when only one can be selected at once
The Late Queen Gives in to Remorse - Reverse Hangman
Was Opportunity's last message to Earth "My battery is low and it's getting dark"?
Do chaperone proteins misfold?
How can I differentiate duration vs starting time
Can I reorder the coordinates of a line when importing into QGIS a WKT linestring?
Why does finding small effects in large studies indicate publication bias?
What caused Doctor Strange to repent of his selfishness and become Earth's protector?
Can I use a larger HVAC Hard Start kit than is recommended?
Which was the first story to feature space elevators?
How to encourage team to refactor
How do I write a maintainable, fast, compile-time bit-mask in C++?
Automated testing of chained Queueable jobs in Salesforce
What is the correct way to shuffle?
Why do BLDC motor (1 kW) controllers have so many MOSFETs?
How can I portray body horror and still be sensitive to people with disabilities?
React Native EXPO Apple upload fail
What is the source for this Leonardo Da Vinci quote?
How long will my money last at roulette?
How to partition existing table in postgres?
postgres partition trigger and checking for child tablePostgres - Partitioning old tables, partition query planning, optimisationBulk loading to partitioned tablePostgresql: Insert trigger function fails on partial insert statementHow to merge partitions in Postgres?postgres partition trigger and checking for child tableFeasibility of Partitioning existing tables with huge data in OracleHow to reduce table partition timing for existing table in SQL ServerPostgres table partitioning “no partition of relation ”parsel_part“ found for row” error?Transfer row from 'parent' to 'child' without creating duplicate and without knowing all 'parent' columnsCreate Range Partition on existing large MySQL table
I would like to partition a table with 1M+ rows by date range. How is this commonly done without requiring much downtime or risking losing data? Here are the strategies I am considering, but open to suggestions:
The existing table is the master and children inherit from it. Over time move data from master to child, but there will be a period of time where some of the data is in the master table and some in the children.
Create a new master and children tables. Create copy of data in existing table in child tables (so data will reside in two places). Once child tables have most recent data, change all inserts going forward to point to new master table and delete existing table.
postgresql optimization partitioning
|
show 4 more comments
I would like to partition a table with 1M+ rows by date range. How is this commonly done without requiring much downtime or risking losing data? Here are the strategies I am considering, but open to suggestions:
The existing table is the master and children inherit from it. Over time move data from master to child, but there will be a period of time where some of the data is in the master table and some in the children.
Create a new master and children tables. Create copy of data in existing table in child tables (so data will reside in two places). Once child tables have most recent data, change all inserts going forward to point to new master table and delete existing table.
postgresql optimization partitioning
Here my ideas: if tables have datetime column -> create new master + new child -> insert new data to NEW + OLD (ex: datetime = 2015-07-06 00:00:00) -> copy from OLD to NEW base on time column (where: datetime < 2015-07-06 00:00:00) -> rename table -> change insert to NEW else -> create "partition trigger" for insert/update on master (insert/update new data -> move to childs, so new data will be inserted to childs) -> update master , trigger will move data to childs.
– Luan Huynh
Jul 6 '15 at 2:06
@Innnh, so you are suggesting the second option, but then once the data is copied over, delete the old table and rename the new table to have the same name as the old table. Is that right?
– Evan Appleby
Jul 6 '15 at 2:18
rename new table to old table, but you should keep old table until new flow partition tables is completely ok.
– Luan Huynh
Jul 6 '15 at 2:30
2
For just few million rows I don't think partitioning is actually necessary. Why do you think you need it? What problem are you trying to solve?
– a_horse_with_no_name
Jul 6 '15 at 6:30
1
@EvanApplebyDELETE FROM ONLY master_table
is the solution.
– dezso
Jul 6 '15 at 13:40
|
show 4 more comments
I would like to partition a table with 1M+ rows by date range. How is this commonly done without requiring much downtime or risking losing data? Here are the strategies I am considering, but open to suggestions:
The existing table is the master and children inherit from it. Over time move data from master to child, but there will be a period of time where some of the data is in the master table and some in the children.
Create a new master and children tables. Create copy of data in existing table in child tables (so data will reside in two places). Once child tables have most recent data, change all inserts going forward to point to new master table and delete existing table.
postgresql optimization partitioning
I would like to partition a table with 1M+ rows by date range. How is this commonly done without requiring much downtime or risking losing data? Here are the strategies I am considering, but open to suggestions:
The existing table is the master and children inherit from it. Over time move data from master to child, but there will be a period of time where some of the data is in the master table and some in the children.
Create a new master and children tables. Create copy of data in existing table in child tables (so data will reside in two places). Once child tables have most recent data, change all inserts going forward to point to new master table and delete existing table.
postgresql optimization partitioning
postgresql optimization partitioning
edited Jul 6 '15 at 2:15
Evan Appleby
asked Jul 6 '15 at 1:26
Evan ApplebyEvan Appleby
4131413
4131413
Here my ideas: if tables have datetime column -> create new master + new child -> insert new data to NEW + OLD (ex: datetime = 2015-07-06 00:00:00) -> copy from OLD to NEW base on time column (where: datetime < 2015-07-06 00:00:00) -> rename table -> change insert to NEW else -> create "partition trigger" for insert/update on master (insert/update new data -> move to childs, so new data will be inserted to childs) -> update master , trigger will move data to childs.
– Luan Huynh
Jul 6 '15 at 2:06
@Innnh, so you are suggesting the second option, but then once the data is copied over, delete the old table and rename the new table to have the same name as the old table. Is that right?
– Evan Appleby
Jul 6 '15 at 2:18
rename new table to old table, but you should keep old table until new flow partition tables is completely ok.
– Luan Huynh
Jul 6 '15 at 2:30
2
For just few million rows I don't think partitioning is actually necessary. Why do you think you need it? What problem are you trying to solve?
– a_horse_with_no_name
Jul 6 '15 at 6:30
1
@EvanApplebyDELETE FROM ONLY master_table
is the solution.
– dezso
Jul 6 '15 at 13:40
|
show 4 more comments
Here my ideas: if tables have datetime column -> create new master + new child -> insert new data to NEW + OLD (ex: datetime = 2015-07-06 00:00:00) -> copy from OLD to NEW base on time column (where: datetime < 2015-07-06 00:00:00) -> rename table -> change insert to NEW else -> create "partition trigger" for insert/update on master (insert/update new data -> move to childs, so new data will be inserted to childs) -> update master , trigger will move data to childs.
– Luan Huynh
Jul 6 '15 at 2:06
@Innnh, so you are suggesting the second option, but then once the data is copied over, delete the old table and rename the new table to have the same name as the old table. Is that right?
– Evan Appleby
Jul 6 '15 at 2:18
rename new table to old table, but you should keep old table until new flow partition tables is completely ok.
– Luan Huynh
Jul 6 '15 at 2:30
2
For just few million rows I don't think partitioning is actually necessary. Why do you think you need it? What problem are you trying to solve?
– a_horse_with_no_name
Jul 6 '15 at 6:30
1
@EvanApplebyDELETE FROM ONLY master_table
is the solution.
– dezso
Jul 6 '15 at 13:40
Here my ideas: if tables have datetime column -> create new master + new child -> insert new data to NEW + OLD (ex: datetime = 2015-07-06 00:00:00) -> copy from OLD to NEW base on time column (where: datetime < 2015-07-06 00:00:00) -> rename table -> change insert to NEW else -> create "partition trigger" for insert/update on master (insert/update new data -> move to childs, so new data will be inserted to childs) -> update master , trigger will move data to childs.
– Luan Huynh
Jul 6 '15 at 2:06
Here my ideas: if tables have datetime column -> create new master + new child -> insert new data to NEW + OLD (ex: datetime = 2015-07-06 00:00:00) -> copy from OLD to NEW base on time column (where: datetime < 2015-07-06 00:00:00) -> rename table -> change insert to NEW else -> create "partition trigger" for insert/update on master (insert/update new data -> move to childs, so new data will be inserted to childs) -> update master , trigger will move data to childs.
– Luan Huynh
Jul 6 '15 at 2:06
@Innnh, so you are suggesting the second option, but then once the data is copied over, delete the old table and rename the new table to have the same name as the old table. Is that right?
– Evan Appleby
Jul 6 '15 at 2:18
@Innnh, so you are suggesting the second option, but then once the data is copied over, delete the old table and rename the new table to have the same name as the old table. Is that right?
– Evan Appleby
Jul 6 '15 at 2:18
rename new table to old table, but you should keep old table until new flow partition tables is completely ok.
– Luan Huynh
Jul 6 '15 at 2:30
rename new table to old table, but you should keep old table until new flow partition tables is completely ok.
– Luan Huynh
Jul 6 '15 at 2:30
2
2
For just few million rows I don't think partitioning is actually necessary. Why do you think you need it? What problem are you trying to solve?
– a_horse_with_no_name
Jul 6 '15 at 6:30
For just few million rows I don't think partitioning is actually necessary. Why do you think you need it? What problem are you trying to solve?
– a_horse_with_no_name
Jul 6 '15 at 6:30
1
1
@EvanAppleby
DELETE FROM ONLY master_table
is the solution.– dezso
Jul 6 '15 at 13:40
@EvanAppleby
DELETE FROM ONLY master_table
is the solution.– dezso
Jul 6 '15 at 13:40
|
show 4 more comments
3 Answers
3
active
oldest
votes
Since #1 requires copying data from the master to the child while it is in an active production environment, I personally went with #2 (creating a new master). This prevents disruptions to the original table while it is actively in use and if there are any issues, I can easily delete the new master without issue and continue using the original table. Here are the steps to do it:
Create new master table.
CREATE TABLE new_master (
id serial,
counter integer,
dt_created DATE DEFAULT CURRENT_DATE NOT NULL
);
Create children that inherit from master.
CREATE TABLE child_2014 (
CONSTRAINT pk_2014 PRIMARY KEY (id),
CONSTRAINT ck_2014 CHECK ( dt_created < DATE '2015-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2014 ON child_2014 (dt_created);
CREATE TABLE child_2015 (
CONSTRAINT pk_2015 PRIMARY KEY (id),
CONSTRAINT ck_2015 CHECK ( dt_created >= DATE '2015-01-01' AND dt_created < DATE '2016-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2015 ON child_2015 (dt_created);
...
Copy all historical data to new master table
INSERT INTO child_2014 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created < '01/01/2015'::date;
Temporarily pause new inserts/updates to production database
Copy most recent data to new master table
INSERT INTO child_2015 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created >= '01/01/2015'::date AND dt_created < '01/01/2016'::date;
Rename tables so that new_master becomes the production database.
ALTER TABLE old_master RENAME TO old_master_backup;
ALTER TABLE new_master RENAME TO old_master;
Add function for INSERT statements to old_master so that data gets passed to correct partition.
CREATE OR REPLACE FUNCTION fn_insert() RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.dt_created >= DATE '2015-01-01' AND
NEW.dt_created < DATE '2016-01-01' ) THEN
INSERT INTO child_2015 VALUES (NEW.*);
ELSIF ( NEW.dt_created < DATE '2015-01-01' ) THEN
INSERT INTO child_2014 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Add trigger so that function is called on INSERTS
CREATE TRIGGER tr_insert BEFORE INSERT ON old_master
FOR EACH ROW EXECUTE PROCEDURE fn_insert();
Set constraint exclusion to ON
SET constraint_exclusion = on;
Re-enable UPDATES and INSERTS on production database
Set up trigger or cron so that new partitions get created and function gets updated to assign new data to correct partition. Reference this article for code examples
Delete old_master_backup
1
Nice writeup. It would be interesting if that actually makes your queries faster. 10 million still isn't that many rows that I would think about partitioning. I wonder if your degrading performance was maybe caused withvacuum
not catching up or being prevented due to "idle in transaction" sessions.
– a_horse_with_no_name
Jul 7 '15 at 14:17
@a_horse_with_no_name, so far it hasn't made the queries significantly better :( I use Heroku which has auto-vacuum settings and it appears to happen daily for this large table. Will look more into that tho.
– Evan Appleby
Jul 7 '15 at 14:38
Shouldn't the inserts in step 3 and 5 be to table new_master and let postgresql choose the right child table/partition?
– pakman
Jun 20 '17 at 17:14
@pakman the function to assign the right child doesn't get added until step 7
– Evan Appleby
Jun 20 '17 at 19:52
add a comment |
There is a new tool called pg_pathman (https://github.com/postgrespro/pg_pathman) that would do this for you automatically.
So something like the following would do it.
SELECT create_range_partitions('master', 'dt_created',
'2015-01-01'::date, '1 day'::interval);
add a comment |
qweqweqweqwqweqwe wq eqwe wqe qw eqwe
New contributor
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f106014%2fhow-to-partition-existing-table-in-postgres%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Since #1 requires copying data from the master to the child while it is in an active production environment, I personally went with #2 (creating a new master). This prevents disruptions to the original table while it is actively in use and if there are any issues, I can easily delete the new master without issue and continue using the original table. Here are the steps to do it:
Create new master table.
CREATE TABLE new_master (
id serial,
counter integer,
dt_created DATE DEFAULT CURRENT_DATE NOT NULL
);
Create children that inherit from master.
CREATE TABLE child_2014 (
CONSTRAINT pk_2014 PRIMARY KEY (id),
CONSTRAINT ck_2014 CHECK ( dt_created < DATE '2015-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2014 ON child_2014 (dt_created);
CREATE TABLE child_2015 (
CONSTRAINT pk_2015 PRIMARY KEY (id),
CONSTRAINT ck_2015 CHECK ( dt_created >= DATE '2015-01-01' AND dt_created < DATE '2016-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2015 ON child_2015 (dt_created);
...
Copy all historical data to new master table
INSERT INTO child_2014 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created < '01/01/2015'::date;
Temporarily pause new inserts/updates to production database
Copy most recent data to new master table
INSERT INTO child_2015 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created >= '01/01/2015'::date AND dt_created < '01/01/2016'::date;
Rename tables so that new_master becomes the production database.
ALTER TABLE old_master RENAME TO old_master_backup;
ALTER TABLE new_master RENAME TO old_master;
Add function for INSERT statements to old_master so that data gets passed to correct partition.
CREATE OR REPLACE FUNCTION fn_insert() RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.dt_created >= DATE '2015-01-01' AND
NEW.dt_created < DATE '2016-01-01' ) THEN
INSERT INTO child_2015 VALUES (NEW.*);
ELSIF ( NEW.dt_created < DATE '2015-01-01' ) THEN
INSERT INTO child_2014 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Add trigger so that function is called on INSERTS
CREATE TRIGGER tr_insert BEFORE INSERT ON old_master
FOR EACH ROW EXECUTE PROCEDURE fn_insert();
Set constraint exclusion to ON
SET constraint_exclusion = on;
Re-enable UPDATES and INSERTS on production database
Set up trigger or cron so that new partitions get created and function gets updated to assign new data to correct partition. Reference this article for code examples
Delete old_master_backup
1
Nice writeup. It would be interesting if that actually makes your queries faster. 10 million still isn't that many rows that I would think about partitioning. I wonder if your degrading performance was maybe caused withvacuum
not catching up or being prevented due to "idle in transaction" sessions.
– a_horse_with_no_name
Jul 7 '15 at 14:17
@a_horse_with_no_name, so far it hasn't made the queries significantly better :( I use Heroku which has auto-vacuum settings and it appears to happen daily for this large table. Will look more into that tho.
– Evan Appleby
Jul 7 '15 at 14:38
Shouldn't the inserts in step 3 and 5 be to table new_master and let postgresql choose the right child table/partition?
– pakman
Jun 20 '17 at 17:14
@pakman the function to assign the right child doesn't get added until step 7
– Evan Appleby
Jun 20 '17 at 19:52
add a comment |
Since #1 requires copying data from the master to the child while it is in an active production environment, I personally went with #2 (creating a new master). This prevents disruptions to the original table while it is actively in use and if there are any issues, I can easily delete the new master without issue and continue using the original table. Here are the steps to do it:
Create new master table.
CREATE TABLE new_master (
id serial,
counter integer,
dt_created DATE DEFAULT CURRENT_DATE NOT NULL
);
Create children that inherit from master.
CREATE TABLE child_2014 (
CONSTRAINT pk_2014 PRIMARY KEY (id),
CONSTRAINT ck_2014 CHECK ( dt_created < DATE '2015-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2014 ON child_2014 (dt_created);
CREATE TABLE child_2015 (
CONSTRAINT pk_2015 PRIMARY KEY (id),
CONSTRAINT ck_2015 CHECK ( dt_created >= DATE '2015-01-01' AND dt_created < DATE '2016-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2015 ON child_2015 (dt_created);
...
Copy all historical data to new master table
INSERT INTO child_2014 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created < '01/01/2015'::date;
Temporarily pause new inserts/updates to production database
Copy most recent data to new master table
INSERT INTO child_2015 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created >= '01/01/2015'::date AND dt_created < '01/01/2016'::date;
Rename tables so that new_master becomes the production database.
ALTER TABLE old_master RENAME TO old_master_backup;
ALTER TABLE new_master RENAME TO old_master;
Add function for INSERT statements to old_master so that data gets passed to correct partition.
CREATE OR REPLACE FUNCTION fn_insert() RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.dt_created >= DATE '2015-01-01' AND
NEW.dt_created < DATE '2016-01-01' ) THEN
INSERT INTO child_2015 VALUES (NEW.*);
ELSIF ( NEW.dt_created < DATE '2015-01-01' ) THEN
INSERT INTO child_2014 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Add trigger so that function is called on INSERTS
CREATE TRIGGER tr_insert BEFORE INSERT ON old_master
FOR EACH ROW EXECUTE PROCEDURE fn_insert();
Set constraint exclusion to ON
SET constraint_exclusion = on;
Re-enable UPDATES and INSERTS on production database
Set up trigger or cron so that new partitions get created and function gets updated to assign new data to correct partition. Reference this article for code examples
Delete old_master_backup
1
Nice writeup. It would be interesting if that actually makes your queries faster. 10 million still isn't that many rows that I would think about partitioning. I wonder if your degrading performance was maybe caused withvacuum
not catching up or being prevented due to "idle in transaction" sessions.
– a_horse_with_no_name
Jul 7 '15 at 14:17
@a_horse_with_no_name, so far it hasn't made the queries significantly better :( I use Heroku which has auto-vacuum settings and it appears to happen daily for this large table. Will look more into that tho.
– Evan Appleby
Jul 7 '15 at 14:38
Shouldn't the inserts in step 3 and 5 be to table new_master and let postgresql choose the right child table/partition?
– pakman
Jun 20 '17 at 17:14
@pakman the function to assign the right child doesn't get added until step 7
– Evan Appleby
Jun 20 '17 at 19:52
add a comment |
Since #1 requires copying data from the master to the child while it is in an active production environment, I personally went with #2 (creating a new master). This prevents disruptions to the original table while it is actively in use and if there are any issues, I can easily delete the new master without issue and continue using the original table. Here are the steps to do it:
Create new master table.
CREATE TABLE new_master (
id serial,
counter integer,
dt_created DATE DEFAULT CURRENT_DATE NOT NULL
);
Create children that inherit from master.
CREATE TABLE child_2014 (
CONSTRAINT pk_2014 PRIMARY KEY (id),
CONSTRAINT ck_2014 CHECK ( dt_created < DATE '2015-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2014 ON child_2014 (dt_created);
CREATE TABLE child_2015 (
CONSTRAINT pk_2015 PRIMARY KEY (id),
CONSTRAINT ck_2015 CHECK ( dt_created >= DATE '2015-01-01' AND dt_created < DATE '2016-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2015 ON child_2015 (dt_created);
...
Copy all historical data to new master table
INSERT INTO child_2014 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created < '01/01/2015'::date;
Temporarily pause new inserts/updates to production database
Copy most recent data to new master table
INSERT INTO child_2015 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created >= '01/01/2015'::date AND dt_created < '01/01/2016'::date;
Rename tables so that new_master becomes the production database.
ALTER TABLE old_master RENAME TO old_master_backup;
ALTER TABLE new_master RENAME TO old_master;
Add function for INSERT statements to old_master so that data gets passed to correct partition.
CREATE OR REPLACE FUNCTION fn_insert() RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.dt_created >= DATE '2015-01-01' AND
NEW.dt_created < DATE '2016-01-01' ) THEN
INSERT INTO child_2015 VALUES (NEW.*);
ELSIF ( NEW.dt_created < DATE '2015-01-01' ) THEN
INSERT INTO child_2014 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Add trigger so that function is called on INSERTS
CREATE TRIGGER tr_insert BEFORE INSERT ON old_master
FOR EACH ROW EXECUTE PROCEDURE fn_insert();
Set constraint exclusion to ON
SET constraint_exclusion = on;
Re-enable UPDATES and INSERTS on production database
Set up trigger or cron so that new partitions get created and function gets updated to assign new data to correct partition. Reference this article for code examples
Delete old_master_backup
Since #1 requires copying data from the master to the child while it is in an active production environment, I personally went with #2 (creating a new master). This prevents disruptions to the original table while it is actively in use and if there are any issues, I can easily delete the new master without issue and continue using the original table. Here are the steps to do it:
Create new master table.
CREATE TABLE new_master (
id serial,
counter integer,
dt_created DATE DEFAULT CURRENT_DATE NOT NULL
);
Create children that inherit from master.
CREATE TABLE child_2014 (
CONSTRAINT pk_2014 PRIMARY KEY (id),
CONSTRAINT ck_2014 CHECK ( dt_created < DATE '2015-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2014 ON child_2014 (dt_created);
CREATE TABLE child_2015 (
CONSTRAINT pk_2015 PRIMARY KEY (id),
CONSTRAINT ck_2015 CHECK ( dt_created >= DATE '2015-01-01' AND dt_created < DATE '2016-01-01' )
) INHERITS (new_master);
CREATE INDEX idx_2015 ON child_2015 (dt_created);
...
Copy all historical data to new master table
INSERT INTO child_2014 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created < '01/01/2015'::date;
Temporarily pause new inserts/updates to production database
Copy most recent data to new master table
INSERT INTO child_2015 (id,counter,dt_created)
SELECT id,counter,dt_created
from old_master
where dt_created >= '01/01/2015'::date AND dt_created < '01/01/2016'::date;
Rename tables so that new_master becomes the production database.
ALTER TABLE old_master RENAME TO old_master_backup;
ALTER TABLE new_master RENAME TO old_master;
Add function for INSERT statements to old_master so that data gets passed to correct partition.
CREATE OR REPLACE FUNCTION fn_insert() RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.dt_created >= DATE '2015-01-01' AND
NEW.dt_created < DATE '2016-01-01' ) THEN
INSERT INTO child_2015 VALUES (NEW.*);
ELSIF ( NEW.dt_created < DATE '2015-01-01' ) THEN
INSERT INTO child_2014 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
Add trigger so that function is called on INSERTS
CREATE TRIGGER tr_insert BEFORE INSERT ON old_master
FOR EACH ROW EXECUTE PROCEDURE fn_insert();
Set constraint exclusion to ON
SET constraint_exclusion = on;
Re-enable UPDATES and INSERTS on production database
Set up trigger or cron so that new partitions get created and function gets updated to assign new data to correct partition. Reference this article for code examples
Delete old_master_backup
answered Jul 7 '15 at 14:07
Evan ApplebyEvan Appleby
4131413
4131413
1
Nice writeup. It would be interesting if that actually makes your queries faster. 10 million still isn't that many rows that I would think about partitioning. I wonder if your degrading performance was maybe caused withvacuum
not catching up or being prevented due to "idle in transaction" sessions.
– a_horse_with_no_name
Jul 7 '15 at 14:17
@a_horse_with_no_name, so far it hasn't made the queries significantly better :( I use Heroku which has auto-vacuum settings and it appears to happen daily for this large table. Will look more into that tho.
– Evan Appleby
Jul 7 '15 at 14:38
Shouldn't the inserts in step 3 and 5 be to table new_master and let postgresql choose the right child table/partition?
– pakman
Jun 20 '17 at 17:14
@pakman the function to assign the right child doesn't get added until step 7
– Evan Appleby
Jun 20 '17 at 19:52
add a comment |
1
Nice writeup. It would be interesting if that actually makes your queries faster. 10 million still isn't that many rows that I would think about partitioning. I wonder if your degrading performance was maybe caused withvacuum
not catching up or being prevented due to "idle in transaction" sessions.
– a_horse_with_no_name
Jul 7 '15 at 14:17
@a_horse_with_no_name, so far it hasn't made the queries significantly better :( I use Heroku which has auto-vacuum settings and it appears to happen daily for this large table. Will look more into that tho.
– Evan Appleby
Jul 7 '15 at 14:38
Shouldn't the inserts in step 3 and 5 be to table new_master and let postgresql choose the right child table/partition?
– pakman
Jun 20 '17 at 17:14
@pakman the function to assign the right child doesn't get added until step 7
– Evan Appleby
Jun 20 '17 at 19:52
1
1
Nice writeup. It would be interesting if that actually makes your queries faster. 10 million still isn't that many rows that I would think about partitioning. I wonder if your degrading performance was maybe caused with
vacuum
not catching up or being prevented due to "idle in transaction" sessions.– a_horse_with_no_name
Jul 7 '15 at 14:17
Nice writeup. It would be interesting if that actually makes your queries faster. 10 million still isn't that many rows that I would think about partitioning. I wonder if your degrading performance was maybe caused with
vacuum
not catching up or being prevented due to "idle in transaction" sessions.– a_horse_with_no_name
Jul 7 '15 at 14:17
@a_horse_with_no_name, so far it hasn't made the queries significantly better :( I use Heroku which has auto-vacuum settings and it appears to happen daily for this large table. Will look more into that tho.
– Evan Appleby
Jul 7 '15 at 14:38
@a_horse_with_no_name, so far it hasn't made the queries significantly better :( I use Heroku which has auto-vacuum settings and it appears to happen daily for this large table. Will look more into that tho.
– Evan Appleby
Jul 7 '15 at 14:38
Shouldn't the inserts in step 3 and 5 be to table new_master and let postgresql choose the right child table/partition?
– pakman
Jun 20 '17 at 17:14
Shouldn't the inserts in step 3 and 5 be to table new_master and let postgresql choose the right child table/partition?
– pakman
Jun 20 '17 at 17:14
@pakman the function to assign the right child doesn't get added until step 7
– Evan Appleby
Jun 20 '17 at 19:52
@pakman the function to assign the right child doesn't get added until step 7
– Evan Appleby
Jun 20 '17 at 19:52
add a comment |
There is a new tool called pg_pathman (https://github.com/postgrespro/pg_pathman) that would do this for you automatically.
So something like the following would do it.
SELECT create_range_partitions('master', 'dt_created',
'2015-01-01'::date, '1 day'::interval);
add a comment |
There is a new tool called pg_pathman (https://github.com/postgrespro/pg_pathman) that would do this for you automatically.
So something like the following would do it.
SELECT create_range_partitions('master', 'dt_created',
'2015-01-01'::date, '1 day'::interval);
add a comment |
There is a new tool called pg_pathman (https://github.com/postgrespro/pg_pathman) that would do this for you automatically.
So something like the following would do it.
SELECT create_range_partitions('master', 'dt_created',
'2015-01-01'::date, '1 day'::interval);
There is a new tool called pg_pathman (https://github.com/postgrespro/pg_pathman) that would do this for you automatically.
So something like the following would do it.
SELECT create_range_partitions('master', 'dt_created',
'2015-01-01'::date, '1 day'::interval);
edited May 10 '16 at 20:59
RLF
13k12440
13k12440
answered May 10 '16 at 20:00
kakonikakoni
1411
1411
add a comment |
add a comment |
qweqweqweqwqweqwe wq eqwe wqe qw eqwe
New contributor
add a comment |
qweqweqweqwqweqwe wq eqwe wqe qw eqwe
New contributor
add a comment |
qweqweqweqwqweqwe wq eqwe wqe qw eqwe
New contributor
qweqweqweqwqweqwe wq eqwe wqe qw eqwe
New contributor
New contributor
answered 8 mins ago
qweqweqweqwe
1
1
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f106014%2fhow-to-partition-existing-table-in-postgres%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Here my ideas: if tables have datetime column -> create new master + new child -> insert new data to NEW + OLD (ex: datetime = 2015-07-06 00:00:00) -> copy from OLD to NEW base on time column (where: datetime < 2015-07-06 00:00:00) -> rename table -> change insert to NEW else -> create "partition trigger" for insert/update on master (insert/update new data -> move to childs, so new data will be inserted to childs) -> update master , trigger will move data to childs.
– Luan Huynh
Jul 6 '15 at 2:06
@Innnh, so you are suggesting the second option, but then once the data is copied over, delete the old table and rename the new table to have the same name as the old table. Is that right?
– Evan Appleby
Jul 6 '15 at 2:18
rename new table to old table, but you should keep old table until new flow partition tables is completely ok.
– Luan Huynh
Jul 6 '15 at 2:30
2
For just few million rows I don't think partitioning is actually necessary. Why do you think you need it? What problem are you trying to solve?
– a_horse_with_no_name
Jul 6 '15 at 6:30
1
@EvanAppleby
DELETE FROM ONLY master_table
is the solution.– dezso
Jul 6 '15 at 13:40