PostgreSQL 11: how to partition 1000 tenants Announcing the arrival of Valued Associate #679:...
What was the first language to use conditional keywords?
What does it mean that physics no longer uses mechanical models to describe phenomena?
What initially awakened the Balrog?
Find 108 by using 3,4,6
As a beginner, should I get a Squier Strat with a SSS config or a HSS?
Does the Weapon Master feat grant you a fighting style?
A term for a woman complaining about things/begging in a cute/childish way
How do I use the new nonlinear finite element in Mathematica 12 for this equation?
Generate an RGB colour grid
Disembodied hand growing fangs
Why is it faster to reheat something than it is to cook it?
When a candle burns, why does the top of wick glow if bottom of flame is hottest?
Multiple OR (||) Conditions in If Statement
Is it possible for SQL statements to execute concurrently within a single session in SQL Server?
How does light 'choose' between wave and particle behaviour?
How to tell that you are a giant?
Dating a Former Employee
ArcGIS Pro Python arcpy.CreatePersonalGDB_management
How do I make this wiring inside cabinet safer?
How to install press fit bottom bracket into new frame
How often does castling occur in grandmaster games?
Should I follow up with an employee I believe overracted to a mistake I made?
Hangman Game with C++
How come Sam didn't become Lord of Horn Hill?
PostgreSQL 11: how to partition 1000 tenants
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)PostgreSQL partition hierarchy design for centralized loggingHuge database logging of event type rows and ways to optimize itPostgres - Partitioning old tables, partition query planning, optimisationPostgreSQL partition pruning?Slow queries on billions-rows-table // index usedMultiple databases with medium tables or Single database with huge tablesHow can I implement a sequence for each foreign key value?Scalable query for running counts of events within x previous daysPostgreSQL ntile() partitionWhat are the downsides of using PARTITION for multi-tenant access?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
Setup
We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.
Problem
1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?
More details
Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.
Here are a few facts about the event table:
- It contains about 300M rows + a few of composite/partial indexes.
- The count of events by account_id is very uneven, 5% accounts have 50% of the data.
- There is a timestamp field and a few others (JSONB, author_id, etc..)
- Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.
- Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.
Possible solutions
Partionining by account_id:
Pros:
- Deletes will be fast because of DROP TABLE.
- Queries should also be pretty decent since all queries contains WHERE account_id=123
Cons:
- 1000 partitions too much for Postgres?
- Uneven distribution of events per account creating a few super big partitions and a few small ones.
Partitioning by timestamp:
Pros:
- Recent data is usually mostly accessed and will make queries with a timestamp faster.
- More predictable/even distribution of events.
Cons:
- Deletion of a single account might touch a lot of partitions - not a big concern.
- Always needs to contain a filter by timestamp - which is not always possible.
postgresql
add a comment |
Setup
We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.
Problem
1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?
More details
Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.
Here are a few facts about the event table:
- It contains about 300M rows + a few of composite/partial indexes.
- The count of events by account_id is very uneven, 5% accounts have 50% of the data.
- There is a timestamp field and a few others (JSONB, author_id, etc..)
- Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.
- Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.
Possible solutions
Partionining by account_id:
Pros:
- Deletes will be fast because of DROP TABLE.
- Queries should also be pretty decent since all queries contains WHERE account_id=123
Cons:
- 1000 partitions too much for Postgres?
- Uneven distribution of events per account creating a few super big partitions and a few small ones.
Partitioning by timestamp:
Pros:
- Recent data is usually mostly accessed and will make queries with a timestamp faster.
- More predictable/even distribution of events.
Cons:
- Deletion of a single account might touch a lot of partitions - not a big concern.
- Always needs to contain a filter by timestamp - which is not always possible.
postgresql
Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."
– Patrick Mevzek
20 hours ago
It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.
– Alex Plugaru
20 hours ago
Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.
– Patrick Mevzek
19 hours ago
The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.
– Patrick Mevzek
19 hours ago
Sorry, it's Postgres 11.
– Alex Plugaru
3 mins ago
add a comment |
Setup
We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.
Problem
1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?
More details
Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.
Here are a few facts about the event table:
- It contains about 300M rows + a few of composite/partial indexes.
- The count of events by account_id is very uneven, 5% accounts have 50% of the data.
- There is a timestamp field and a few others (JSONB, author_id, etc..)
- Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.
- Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.
Possible solutions
Partionining by account_id:
Pros:
- Deletes will be fast because of DROP TABLE.
- Queries should also be pretty decent since all queries contains WHERE account_id=123
Cons:
- 1000 partitions too much for Postgres?
- Uneven distribution of events per account creating a few super big partitions and a few small ones.
Partitioning by timestamp:
Pros:
- Recent data is usually mostly accessed and will make queries with a timestamp faster.
- More predictable/even distribution of events.
Cons:
- Deletion of a single account might touch a lot of partitions - not a big concern.
- Always needs to contain a filter by timestamp - which is not always possible.
postgresql
Setup
We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.
Problem
1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?
More details
Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.
Here are a few facts about the event table:
- It contains about 300M rows + a few of composite/partial indexes.
- The count of events by account_id is very uneven, 5% accounts have 50% of the data.
- There is a timestamp field and a few others (JSONB, author_id, etc..)
- Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.
- Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.
Possible solutions
Partionining by account_id:
Pros:
- Deletes will be fast because of DROP TABLE.
- Queries should also be pretty decent since all queries contains WHERE account_id=123
Cons:
- 1000 partitions too much for Postgres?
- Uneven distribution of events per account creating a few super big partitions and a few small ones.
Partitioning by timestamp:
Pros:
- Recent data is usually mostly accessed and will make queries with a timestamp faster.
- More predictable/even distribution of events.
Cons:
- Deletion of a single account might touch a lot of partitions - not a big concern.
- Always needs to contain a filter by timestamp - which is not always possible.
postgresql
postgresql
edited 3 mins ago
Alex Plugaru
asked 20 hours ago
Alex PlugaruAlex Plugaru
1163
1163
Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."
– Patrick Mevzek
20 hours ago
It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.
– Alex Plugaru
20 hours ago
Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.
– Patrick Mevzek
19 hours ago
The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.
– Patrick Mevzek
19 hours ago
Sorry, it's Postgres 11.
– Alex Plugaru
3 mins ago
add a comment |
Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."
– Patrick Mevzek
20 hours ago
It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.
– Alex Plugaru
20 hours ago
Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.
– Patrick Mevzek
19 hours ago
The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.
– Patrick Mevzek
19 hours ago
Sorry, it's Postgres 11.
– Alex Plugaru
3 mins ago
Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."
– Patrick Mevzek
20 hours ago
Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."
– Patrick Mevzek
20 hours ago
It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.
– Alex Plugaru
20 hours ago
It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.
– Alex Plugaru
20 hours ago
Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.
– Patrick Mevzek
19 hours ago
Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.
– Patrick Mevzek
19 hours ago
The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.
– Patrick Mevzek
19 hours ago
The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.
– Patrick Mevzek
19 hours ago
Sorry, it's Postgres 11.
– Alex Plugaru
3 mins ago
Sorry, it's Postgres 11.
– Alex Plugaru
3 mins ago
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f235108%2fpostgresql-11-how-to-partition-1000-tenants%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f235108%2fpostgresql-11-how-to-partition-1000-tenants%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."
– Patrick Mevzek
20 hours ago
It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.
– Alex Plugaru
20 hours ago
Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.
– Patrick Mevzek
19 hours ago
The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.
– Patrick Mevzek
19 hours ago
Sorry, it's Postgres 11.
– Alex Plugaru
3 mins ago