PostgreSQL 11: how to partition 1000 tenants Announcing the arrival of Valued Associate #679:...

What was the first language to use conditional keywords?

What does it mean that physics no longer uses mechanical models to describe phenomena?

What initially awakened the Balrog?

Find 108 by using 3,4,6

As a beginner, should I get a Squier Strat with a SSS config or a HSS?

Does the Weapon Master feat grant you a fighting style?

A term for a woman complaining about things/begging in a cute/childish way

How do I use the new nonlinear finite element in Mathematica 12 for this equation?

Generate an RGB colour grid

Disembodied hand growing fangs

Why is it faster to reheat something than it is to cook it?

When a candle burns, why does the top of wick glow if bottom of flame is hottest?

Multiple OR (||) Conditions in If Statement

Is it possible for SQL statements to execute concurrently within a single session in SQL Server?

How does light 'choose' between wave and particle behaviour?

How to tell that you are a giant?

Dating a Former Employee

ArcGIS Pro Python arcpy.CreatePersonalGDB_management

How do I make this wiring inside cabinet safer?

How to install press fit bottom bracket into new frame

How often does castling occur in grandmaster games?

Should I follow up with an employee I believe overracted to a mistake I made?

Hangman Game with C++

How come Sam didn't become Lord of Horn Hill?



PostgreSQL 11: how to partition 1000 tenants



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)PostgreSQL partition hierarchy design for centralized loggingHuge database logging of event type rows and ways to optimize itPostgres - Partitioning old tables, partition query planning, optimisationPostgreSQL partition pruning?Slow queries on billions-rows-table // index usedMultiple databases with medium tables or Single database with huge tablesHow can I implement a sequence for each foreign key value?Scalable query for running counts of events within x previous daysPostgreSQL ntile() partitionWhat are the downsides of using PARTITION for multi-tenant access?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







0















Setup



We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.



Problem



1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?



More details



Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.



Here are a few facts about the event table:




  • It contains about 300M rows + a few of composite/partial indexes.

  • The count of events by account_id is very uneven, 5% accounts have 50% of the data.

  • There is a timestamp field and a few others (JSONB, author_id, etc..)

  • Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.

  • Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.


Possible solutions



Partionining by account_id:



Pros:




  • Deletes will be fast because of DROP TABLE.

  • Queries should also be pretty decent since all queries contains WHERE account_id=123


Cons:




  • 1000 partitions too much for Postgres?

  • Uneven distribution of events per account creating a few super big partitions and a few small ones.


Partitioning by timestamp:



Pros:




  • Recent data is usually mostly accessed and will make queries with a timestamp faster.

  • More predictable/even distribution of events.


Cons:




  • Deletion of a single account might touch a lot of partitions - not a big concern.

  • Always needs to contain a filter by timestamp - which is not always possible.










share|improve this question

























  • Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

    – Patrick Mevzek
    20 hours ago













  • It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.

    – Alex Plugaru
    20 hours ago











  • Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.

    – Patrick Mevzek
    19 hours ago











  • The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.

    – Patrick Mevzek
    19 hours ago











  • Sorry, it's Postgres 11.

    – Alex Plugaru
    3 mins ago


















0















Setup



We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.



Problem



1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?



More details



Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.



Here are a few facts about the event table:




  • It contains about 300M rows + a few of composite/partial indexes.

  • The count of events by account_id is very uneven, 5% accounts have 50% of the data.

  • There is a timestamp field and a few others (JSONB, author_id, etc..)

  • Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.

  • Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.


Possible solutions



Partionining by account_id:



Pros:




  • Deletes will be fast because of DROP TABLE.

  • Queries should also be pretty decent since all queries contains WHERE account_id=123


Cons:




  • 1000 partitions too much for Postgres?

  • Uneven distribution of events per account creating a few super big partitions and a few small ones.


Partitioning by timestamp:



Pros:




  • Recent data is usually mostly accessed and will make queries with a timestamp faster.

  • More predictable/even distribution of events.


Cons:




  • Deletion of a single account might touch a lot of partitions - not a big concern.

  • Always needs to contain a filter by timestamp - which is not always possible.










share|improve this question

























  • Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

    – Patrick Mevzek
    20 hours ago













  • It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.

    – Alex Plugaru
    20 hours ago











  • Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.

    – Patrick Mevzek
    19 hours ago











  • The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.

    – Patrick Mevzek
    19 hours ago











  • Sorry, it's Postgres 11.

    – Alex Plugaru
    3 mins ago














0












0








0








Setup



We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.



Problem



1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?



More details



Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.



Here are a few facts about the event table:




  • It contains about 300M rows + a few of composite/partial indexes.

  • The count of events by account_id is very uneven, 5% accounts have 50% of the data.

  • There is a timestamp field and a few others (JSONB, author_id, etc..)

  • Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.

  • Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.


Possible solutions



Partionining by account_id:



Pros:




  • Deletes will be fast because of DROP TABLE.

  • Queries should also be pretty decent since all queries contains WHERE account_id=123


Cons:




  • 1000 partitions too much for Postgres?

  • Uneven distribution of events per account creating a few super big partitions and a few small ones.


Partitioning by timestamp:



Pros:




  • Recent data is usually mostly accessed and will make queries with a timestamp faster.

  • More predictable/even distribution of events.


Cons:




  • Deletion of a single account might touch a lot of partitions - not a big concern.

  • Always needs to contain a filter by timestamp - which is not always possible.










share|improve this question
















Setup



We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.



Problem



1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?



More details



Currently, the separation between our tenants is via an account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.



Here are a few facts about the event table:




  • It contains about 300M rows + a few of composite/partial indexes.

  • The count of events by account_id is very uneven, 5% accounts have 50% of the data.

  • There is a timestamp field and a few others (JSONB, author_id, etc..)

  • Write operations: inserts mostly and deletes (per account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.

  • Selects are either for a specific event (by account_id + id) or all events in a given period of time. Period of time is not always set. account_id is always present in the query.


Possible solutions



Partionining by account_id:



Pros:




  • Deletes will be fast because of DROP TABLE.

  • Queries should also be pretty decent since all queries contains WHERE account_id=123


Cons:




  • 1000 partitions too much for Postgres?

  • Uneven distribution of events per account creating a few super big partitions and a few small ones.


Partitioning by timestamp:



Pros:




  • Recent data is usually mostly accessed and will make queries with a timestamp faster.

  • More predictable/even distribution of events.


Cons:




  • Deletion of a single account might touch a lot of partitions - not a big concern.

  • Always needs to contain a filter by timestamp - which is not always possible.







postgresql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 3 mins ago







Alex Plugaru

















asked 20 hours ago









Alex PlugaruAlex Plugaru

1163




1163













  • Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

    – Patrick Mevzek
    20 hours ago













  • It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.

    – Alex Plugaru
    20 hours ago











  • Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.

    – Patrick Mevzek
    19 hours ago











  • The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.

    – Patrick Mevzek
    19 hours ago











  • Sorry, it's Postgres 11.

    – Alex Plugaru
    3 mins ago



















  • Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

    – Patrick Mevzek
    20 hours ago













  • It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.

    – Alex Plugaru
    20 hours ago











  • Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.

    – Patrick Mevzek
    19 hours ago











  • The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.

    – Patrick Mevzek
    19 hours ago











  • Sorry, it's Postgres 11.

    – Alex Plugaru
    3 mins ago

















Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

– Patrick Mevzek
20 hours ago







Did you think about using PostgreSQL schemas? They should exactly fit the purpose you need: postgresql.org/docs/11/ddl-schemas.html. "Schemas are analogous to directories at the operating system level, except that schemas cannot be nested."

– Patrick Mevzek
20 hours ago















It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.

– Alex Plugaru
20 hours ago





It’s too late now. We explored it at the beginning but we dropped the idea because it adds extra work for migrations and maintaince. I’m also not sure how it affects performance. We also don’t need to split everything, an tenant id is much cheaper than having schémas for every tenant.

– Alex Plugaru
20 hours ago













Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.

– Patrick Mevzek
19 hours ago





Maybe, but they allow clear logical separation of data inside the same physical database. I do not think they change anything much for migrations and maintenance and they can not badly impact performance. As a bonus you are able to better protect data and make sure stuff from customers X does not pollute stuff from customers Y, but obviously it depends how your application is written. As for "much cheaper" i do not know what that means, your own question is about if too many partitions due to account_id create a problem or not... Anyway that was just a suggestion in passing.

– Patrick Mevzek
19 hours ago













The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.

– Patrick Mevzek
19 hours ago





The partition you can do and their relevant performances or constraints depend on the PostgreSQL version you are using, but you are not saying which one. You may want to add that to your post.

– Patrick Mevzek
19 hours ago













Sorry, it's Postgres 11.

– Alex Plugaru
3 mins ago





Sorry, it's Postgres 11.

– Alex Plugaru
3 mins ago










0






active

oldest

votes












Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f235108%2fpostgresql-11-how-to-partition-1000-tenants%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f235108%2fpostgresql-11-how-to-partition-1000-tenants%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Parapolítica Índice Antecedentes El escándalo Proceso judicial Consecuencias Véase...

How to remove border from elements in the last row?Targeting flex items on the last rowHow to vertically wrap...

Tecnologías entrañables Índice Antecedentes Desarrollo Tecnologías Entrañables en la...