Choosing sort and distkeys for redshift with multiple joins The 2019 Stack Overflow Developer...

How to create dashed lines/arrows in Illustrator

What is the steepest angle that a canal can be traversable without locks?

Where to refill my bottle in India?

Why is Grand Jury testimony secret?

How was Skylab's orbit inclination chosen?

How can I fix this gap between bookcases I made?

Protecting Dualbooting Windows from dangerous code (like rm -rf)

What is the motivation for a law requiring 2 parties to consent for recording a conversation

What could be the right powersource for 15 seconds lifespan disposable giant chainsaw?

Does a dangling wire really electrocute me if I'm standing in water?

What can other administrators access on my machine?

Can't find the latex code for the ⍎ (down tack jot) symbol

Patience, young "Padovan"

Could JWST stay at L2 "forever"?

What does Linus Torvalds mean when he says that Git "never ever" tracks a file?

Is flight data recorder erased after every flight?

I see my dog run

In microwave frequencies, do you use a circulator when you need a (near) perfect diode?

Why can Shazam do this?

Access elements in std::string where positon of string is greater than its size

JSON.serialize: is it possible to suppress null values of a map?

Why is my p-value correlated to difference between means in two sample tests?

Does duplicating a spell with Wish count as casting that spell?

How to change the limits of integration

Choosing sort and distkeys for redshift with multiple joins

The 2019 Stack Overflow Developer Survey Results Are InManaging code and deployments to Amazon RedshiftAmazon Redshift “cache lookup failed for aggregate” error when using median aggregateRedshift table not showing up in tables for schema?Issue using COPY command with AWS RedshiftStorage size for varchar length in RedshiftRedshift: InternalError: cache lookup failed for relationredshift drop table with + and spaces in nameRedshift window function for trend % calculationRedshift DROP and TRUNCATE hang with no locksBest Distkey and Sortkey for the tables in Redshift

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I have a relatively large Redshift cluster which I am attempting to optimize via the selection of sort and dist keys. I have a recurring situation where multiple tables are joined together via an intermediary join. The join is relatively stable among all queries. I know that you're supposed to distribute on the field that you join to and sort on the conditions in the where clause. However, in this situation there is no where clause and it is joined twice. For example I have three tables:

users -> accounts -> subscriptions. These are in a 1:1:1 relationship. The joins are as follows:

users.id = accounts.user_id
accounts.id = subscriptions.accounts_id.

What is the optimal way to set the sort and dist keys on the accounts table in this situation?

asked May 9 '17 at 17:43

user2694306

10117

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

If they are 1:1:1, why not denormalize them into one table?

– CalZ
May 9 '17 at 17:48

I would love to, but business restraints require that they stay that way.

– user2694306
May 9 '17 at 18:53

(Sometimes is a matter of privileges: Who can see what. It's normally easier to handle permissions per table than to do it per column)

– joanolo
Jul 4 '17 at 23:13

add a comment |

users -> accounts -> subscriptions. These are in a 1:1:1 relationship. The joins are as follows:

users.id = accounts.user_id
accounts.id = subscriptions.accounts_id.

What is the optimal way to set the sort and dist keys on the accounts table in this situation?

asked May 9 '17 at 17:43

user2694306

10117

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

If they are 1:1:1, why not denormalize them into one table?

– CalZ
May 9 '17 at 17:48

I would love to, but business restraints require that they stay that way.

– user2694306
May 9 '17 at 18:53

(Sometimes is a matter of privileges: Who can see what. It's normally easier to handle permissions per table than to do it per column)

– joanolo
Jul 4 '17 at 23:13

add a comment |

users -> accounts -> subscriptions. These are in a 1:1:1 relationship. The joins are as follows:

users.id = accounts.user_id
accounts.id = subscriptions.accounts_id.

What is the optimal way to set the sort and dist keys on the accounts table in this situation?

asked May 9 '17 at 17:43

user2694306

10117

users -> accounts -> subscriptions. These are in a 1:1:1 relationship. The joins are as follows:

users.id = accounts.user_id
accounts.id = subscriptions.accounts_id.

What is the optimal way to set the sort and dist keys on the accounts table in this situation?

redshift

asked May 9 '17 at 17:43

user2694306

10117

asked May 9 '17 at 17:43

user2694306

10117

asked May 9 '17 at 17:43

user2694306

10117

asked May 9 '17 at 17:43

user2694306

10117

asked May 9 '17 at 17:43

user2694306

10117

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

If they are 1:1:1, why not denormalize them into one table?

– CalZ
May 9 '17 at 17:48

I would love to, but business restraints require that they stay that way.

– user2694306
May 9 '17 at 18:53

(Sometimes is a matter of privileges: Who can see what. It's normally easier to handle permissions per table than to do it per column)

– joanolo
Jul 4 '17 at 23:13

add a comment |

If they are 1:1:1, why not denormalize them into one table?

– CalZ
May 9 '17 at 17:48

I would love to, but business restraints require that they stay that way.

– user2694306
May 9 '17 at 18:53

(Sometimes is a matter of privileges: Who can see what. It's normally easier to handle permissions per table than to do it per column)

– joanolo
Jul 4 '17 at 23:13

If they are 1:1:1, why not denormalize them into one table?

– CalZ
May 9 '17 at 17:48

I would love to, but business restraints require that they stay that way.

– user2694306
May 9 '17 at 18:53

(Sometimes is a matter of privileges: Who can see what. It's normally easier to handle permissions per table than to do it per column)

– joanolo
Jul 4 '17 at 23:13

add a comment |

1 Answer
1

active

oldest

votes

The ideal situation is if you can have a user_id column added onto subscriptions, then you can distribute all three by user_id. Then your joins would be written as:

FROM subscriptions

LEFT JOIN accounts

  ON  accounts.id = subscriptions.account_id

  AND accounts.user_id = subscriptions.user_id --Note this add'l condition

LEFT JOIN users

  ON  users.id=accounts.user_id

The one caveat here is that your number of users should be large enough to ensure that this distribution key does not result in significant skew.

As a bonus, if you do not frequently filter these, you can choose the same column (user_id) as your sort key to enable merge joins rather than hash joins

If you can't add this column, then you are going to have network distribution happening in one of your two joins :-( Pick the one that you want based on frequency of that join and average width of rows participating in that join.

edited Jul 4 '17 at 23:12

joanolo

9,92842254

answered Jul 4 '17 at 18:28

Fabio Beltramini

1716

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f173115%2fchoosing-sort-and-distkeys-for-redshift-with-multiple-joins%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The ideal situation is if you can have a user_id column added onto subscriptions, then you can distribute all three by user_id. Then your joins would be written as:

FROM subscriptions

LEFT JOIN accounts

  ON  accounts.id = subscriptions.account_id

  AND accounts.user_id = subscriptions.user_id --Note this add'l condition

LEFT JOIN users

  ON  users.id=accounts.user_id

The one caveat here is that your number of users should be large enough to ensure that this distribution key does not result in significant skew.

As a bonus, if you do not frequently filter these, you can choose the same column (user_id) as your sort key to enable merge joins rather than hash joins

edited Jul 4 '17 at 23:12

joanolo

9,92842254

answered Jul 4 '17 at 18:28

Fabio Beltramini

1716

add a comment |

The ideal situation is if you can have a user_id column added onto subscriptions, then you can distribute all three by user_id. Then your joins would be written as:

FROM subscriptions

LEFT JOIN accounts

  ON  accounts.id = subscriptions.account_id

  AND accounts.user_id = subscriptions.user_id --Note this add'l condition

LEFT JOIN users

  ON  users.id=accounts.user_id

The one caveat here is that your number of users should be large enough to ensure that this distribution key does not result in significant skew.

As a bonus, if you do not frequently filter these, you can choose the same column (user_id) as your sort key to enable merge joins rather than hash joins

edited Jul 4 '17 at 23:12

joanolo

9,92842254

answered Jul 4 '17 at 18:28

Fabio Beltramini

1716

add a comment |

The ideal situation is if you can have a user_id column added onto subscriptions, then you can distribute all three by user_id. Then your joins would be written as:

FROM subscriptions

LEFT JOIN accounts

  ON  accounts.id = subscriptions.account_id

  AND accounts.user_id = subscriptions.user_id --Note this add'l condition

LEFT JOIN users

  ON  users.id=accounts.user_id

The one caveat here is that your number of users should be large enough to ensure that this distribution key does not result in significant skew.

As a bonus, if you do not frequently filter these, you can choose the same column (user_id) as your sort key to enable merge joins rather than hash joins

edited Jul 4 '17 at 23:12

joanolo

9,92842254

answered Jul 4 '17 at 18:28

Fabio Beltramini

1716

The ideal situation is if you can have a user_id column added onto subscriptions, then you can distribute all three by user_id. Then your joins would be written as:

FROM subscriptions

LEFT JOIN accounts

  ON  accounts.id = subscriptions.account_id

  AND accounts.user_id = subscriptions.user_id --Note this add'l condition

LEFT JOIN users

  ON  users.id=accounts.user_id

The one caveat here is that your number of users should be large enough to ensure that this distribution key does not result in significant skew.

As a bonus, if you do not frequently filter these, you can choose the same column (user_id) as your sort key to enable merge joins rather than hash joins

edited Jul 4 '17 at 23:12

joanolo

9,92842254

answered Jul 4 '17 at 18:28

Fabio Beltramini

1716

edited Jul 4 '17 at 23:12

joanolo

9,92842254

edited Jul 4 '17 at 23:12

joanolo

9,92842254

edited Jul 4 '17 at 23:12

joanolo

9,92842254

answered Jul 4 '17 at 18:28

Fabio Beltramini

1716

answered Jul 4 '17 at 18:28

Fabio Beltramini

1716

answered Jul 4 '17 at 18:28

Fabio Beltramini

1716

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sfrgttk