Choosing the right algorithm in HashBytes functionShould I use the deprecated MD5 function in SQL Server?Are...

Does the average primeness of natural numbers tend to zero?

Why is the design of haulage companies so “special”?

How do you conduct xenoanthropology after first contact?

Is Social Media Science Fiction?

How to make payment on the internet without leaving a money trail?

Why doesn't a const reference extend the life of a temporary object passed via a function?

Mapping arrows in commutative diagrams

Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?

Calculate Levenshtein distance between two strings in Python

Is domain driven design an anti-SQL pattern?

Why does this relative pronoun not take the case of the noun it is referring to?

Are cabin dividers used to "hide" the flex of the airplane?

How can I add custom success page

Some basic questions on halt and move in Turing machines

Latin words with no plurals in English

How to deal with fear of taking dependencies

What does "enim et" mean?

Why is making salt water prohibited on Shabbat?

Is it true that "The augmented fourth (A4) and the diminished fifth (d5) are the only aug and dim intervals that appear in diatonic scales"

New order #4: World

How to move the player while also allowing forces to affect it

Information to fellow intern about hiring?

Can a planet have a different gravitational pull depending on its location in orbit around its sun?

Deciding between multiple birth names and dates?

Choosing the right algorithm in HashBytes function

Should I use the deprecated MD5 function in SQL Server?Are there ways to only replace SQL Server stored procedures when the definition has changed?Find last (max) value according to TimeStamp using update methodIf “large value types out of row” table option is enabled, the table size is increasedImprove sql server lookup table performanceHow to get back value passed to HASHBYTES()?Matching table (with sub-tables) to another table(with sub-tables) efficientlyIs a MERGE with OUTPUT better practice than a conditional INSERT and SELECT?Convert varbinary(max) with CONVERT(nvarchar/varchar(max) ,value,0) gives no logic resultsWhat's a pathological case where a bitmap filter would not allow the PROBE(Field, IN-ROW) semijoin reduction optimization?What is a scalable way to simulate HASHBYTES using a SQL CLR scalar function?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

We need to create hash value of nvarchar data for comparison purposes. There are multiple hash algorithms available in T-SQL, but which one the best to choose from in this scenario?

We want to ensure the risk of having duplicate hash value for two different nvarchar value is the minimum. Based on my research on the internet MD5 seems the best one. Is that right? MSDN tells us (link below) about the available algorithms, but no description on which one for what conditions?

HASHBYTES (Transact-SQL)

We need to join two tables on two nvarchar(max) columns. As you can imagine the query takes along time to execute. We thought it would be better to keep the hash value of each nvarchar(max) data and do the join on the hash values rather than the nvarchar(max) values which are blobs. The question is which hash algorithm provides the uniqueness, so that we don't run into the risk of having one hash value for more than one nvarchar(max).

edited Sep 21 '15 at 7:45

Paul White♦

54.1k14287461

asked Feb 22 '13 at 5:44

Sky

1,804134065

add a comment |

We need to create hash value of nvarchar data for comparison purposes. There are multiple hash algorithms available in T-SQL, but which one the best to choose from in this scenario?

HASHBYTES (Transact-SQL)

edited Sep 21 '15 at 7:45

Paul White♦

54.1k14287461

asked Feb 22 '13 at 5:44

Sky

1,804134065

add a comment |

We need to create hash value of nvarchar data for comparison purposes. There are multiple hash algorithms available in T-SQL, but which one the best to choose from in this scenario?

HASHBYTES (Transact-SQL)

edited Sep 21 '15 at 7:45

Paul White♦

54.1k14287461

asked Feb 22 '13 at 5:44

Sky

1,804134065

We need to create hash value of nvarchar data for comparison purposes. There are multiple hash algorithms available in T-SQL, but which one the best to choose from in this scenario?

HASHBYTES (Transact-SQL)

sql-server sql-server-2008-r2 t-sql hashing

edited Sep 21 '15 at 7:45

Paul White♦

54.1k14287461

asked Feb 22 '13 at 5:44

Sky

1,804134065

edited Sep 21 '15 at 7:45

Paul White♦

54.1k14287461

asked Feb 22 '13 at 5:44

Sky

1,804134065

edited Sep 21 '15 at 7:45

Paul White♦

54.1k14287461

edited Sep 21 '15 at 7:45

Paul White♦

54.1k14287461

edited Sep 21 '15 at 7:45

Paul White♦

54.1k14287461

asked Feb 22 '13 at 5:44

Sky

1,804134065

asked Feb 22 '13 at 5:44

Sky

1,804134065

asked Feb 22 '13 at 5:44

Sky

1,804134065

add a comment |

4 Answers
4

active

oldest

votes

The HASHBYTES function only takes up to 8000 bytes as input. Because your inputs are potentially larger than that, duplicates in the range of the field that gets hashed will cause collisions, regardless of the algorithm chosen. Carefully consider the range of data you plan to hash -- using the first 4000 characters is the obvious choice, but may not be the best choice for your data.

In any event, because of what a hash function is, even if the inputs are 8000 bytes or less, the only way to ensure 100% correctness in the results is to compare the base values at some point (read: not necessarily first). Period.

The business will dictate whether or not 100% accuracy is required. This will tell you that either (a) comparing the base values is required, or (b) you should consider not comparing the base values -- how much accuracy should be traded off for performance.

While hash collisions are possible in a unique input set, they are infinitesimally rare, regardless of the algorithm chosen. The whole idea of using a hash value in this scenario is to efficiently narrow down the join results to a more manageable set, not to necessarily arrive at the final set of results immediately. Again, for 100% accuracy, this cannot be the final step in the process. This scenario isn't using hashing for the purpose of cryptography, so an algorithm such as MD5 will work fine.

It would be extremely hard for me to justify moving up to a SHA-x algorithm for "accuracy" purposes because if the business is going to freak out about the miniscule collision possibilities of MD5, chances are they're also going to freak out that the SHA-x algorithms aren't perfect either. They either have to come to terms with the slight inaccuracy, or mandate that the query be 100% accurate and live with the associated technical implications. I suppose if the CEO sleeps better at night knowing you used SHA-x instead of MD5, well, fine; it still doesn't mean much from a technical point of view in this case.

Speaking of performance, if the tables are read-mostly and the join result is needed frequently, consider implementing an indexed view to eliminate the need to compute the entire join every time it's requested. Of course you trade off storage for that, but it may be well worth it for the performance improvement, particularly if 100% accuracy is required.

For further reading on indexing long string values, I published an article that walks through an example of how to do this for a single table, and presents things to consider when attempting the full scenario in this question.

edited Jul 30 '13 at 13:23

answered Feb 25 '13 at 5:52

Jon Seigel

15.6k53673

add a comment |

MD5 should be fine and the output can be stored in a binary (16). The probability of a collision (see birthday paradox) is still very low, even with a large physical sample size. The output of SHA-1 takes 20 bytes and the output of SHA-256 takes 32 bytes. Unless you have such a large number of records that your birthday collision probability becomes significant (physically impossible or at least impractical with current hardware technologies) it will probably be OK.

edited May 23 '17 at 12:40

Community♦

answered Feb 23 '13 at 20:09

ConcernedOfTunbridgeWells

16k24869

add a comment |

I would go with SHA-1 it is the better one of the available algorithms and has the least collision expectency out of all of them (2^51 compared to MD5 which is 2^20.96). MD5 has also been proven to be vulnerable to collisions in certain scenarios.

Sources:

http://en.wikipedia.org/wiki/SHA-1
http://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions#Cryptanalysis
http://en.wikipedia.org/wiki/MD5

answered Feb 22 '13 at 11:55

Mr.Brownstone

9,83732343

add a comment |

I haven't seen this mentioned in the answers but per MSDN:

Beginning with SQL Server 2016 (13.x), all algorithms other than
SHA2_256, and SHA2_512 are deprecated. Older algorithms (not
recommended) will continue working, but they will raise a deprecation
event.

I asked a similar question so it's up to you if you want to use a deprecated function such as MD5 (if you're on 2016+). You can do testing to see how much difference there is in storage and performance between MD5 and SHA2.

answered 1 hour ago

Gabe

4501918

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f35219%2fchoosing-the-right-algorithm-in-hashbytes-function%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

edited Jul 30 '13 at 13:23

answered Feb 25 '13 at 5:52

Jon Seigel

15.6k53673

add a comment |

edited Jul 30 '13 at 13:23

answered Feb 25 '13 at 5:52

Jon Seigel

15.6k53673

add a comment |

edited Jul 30 '13 at 13:23

answered Feb 25 '13 at 5:52

Jon Seigel

15.6k53673

edited Jul 30 '13 at 13:23

answered Feb 25 '13 at 5:52

Jon Seigel

15.6k53673

edited Jul 30 '13 at 13:23

answered Feb 25 '13 at 5:52

Jon Seigel

15.6k53673

answered Feb 25 '13 at 5:52

Jon Seigel

15.6k53673

answered Feb 25 '13 at 5:52

Jon Seigel

15.6k53673

add a comment |

edited May 23 '17 at 12:40

Community♦

answered Feb 23 '13 at 20:09

ConcernedOfTunbridgeWells

16k24869

add a comment |

edited May 23 '17 at 12:40

Community♦

answered Feb 23 '13 at 20:09

ConcernedOfTunbridgeWells

16k24869

add a comment |

edited May 23 '17 at 12:40

Community♦

answered Feb 23 '13 at 20:09

ConcernedOfTunbridgeWells

16k24869

edited May 23 '17 at 12:40

Community♦

answered Feb 23 '13 at 20:09

ConcernedOfTunbridgeWells

16k24869

edited May 23 '17 at 12:40

Community♦

edited May 23 '17 at 12:40

Community♦

edited May 23 '17 at 12:40

Community♦

answered Feb 23 '13 at 20:09

ConcernedOfTunbridgeWells

16k24869

answered Feb 23 '13 at 20:09

ConcernedOfTunbridgeWells

16k24869

answered Feb 23 '13 at 20:09

ConcernedOfTunbridgeWells

16k24869

add a comment |

Sources:

http://en.wikipedia.org/wiki/SHA-1
http://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions#Cryptanalysis
http://en.wikipedia.org/wiki/MD5

answered Feb 22 '13 at 11:55

Mr.Brownstone

9,83732343

add a comment |

Sources:

http://en.wikipedia.org/wiki/SHA-1
http://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions#Cryptanalysis
http://en.wikipedia.org/wiki/MD5

answered Feb 22 '13 at 11:55

Mr.Brownstone

9,83732343

add a comment |

Sources:

http://en.wikipedia.org/wiki/SHA-1
http://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions#Cryptanalysis
http://en.wikipedia.org/wiki/MD5

answered Feb 22 '13 at 11:55

Mr.Brownstone

9,83732343

Sources:

http://en.wikipedia.org/wiki/SHA-1
http://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions#Cryptanalysis
http://en.wikipedia.org/wiki/MD5

answered Feb 22 '13 at 11:55

Mr.Brownstone

9,83732343

answered Feb 22 '13 at 11:55

Mr.Brownstone

9,83732343

answered Feb 22 '13 at 11:55

Mr.Brownstone

9,83732343

answered Feb 22 '13 at 11:55

Mr.Brownstone

9,83732343

add a comment |

I haven't seen this mentioned in the answers but per MSDN:

Beginning with SQL Server 2016 (13.x), all algorithms other than
SHA2_256, and SHA2_512 are deprecated. Older algorithms (not
recommended) will continue working, but they will raise a deprecation
event.

answered 1 hour ago

Gabe

4501918

add a comment |

I haven't seen this mentioned in the answers but per MSDN:

Beginning with SQL Server 2016 (13.x), all algorithms other than
SHA2_256, and SHA2_512 are deprecated. Older algorithms (not
recommended) will continue working, but they will raise a deprecation
event.

answered 1 hour ago

Gabe

4501918

add a comment |

I haven't seen this mentioned in the answers but per MSDN:

Beginning with SQL Server 2016 (13.x), all algorithms other than
SHA2_256, and SHA2_512 are deprecated. Older algorithms (not
recommended) will continue working, but they will raise a deprecation
event.

answered 1 hour ago

Gabe

4501918

I haven't seen this mentioned in the answers but per MSDN:

Beginning with SQL Server 2016 (13.x), all algorithms other than
SHA2_256, and SHA2_512 are deprecated. Older algorithms (not
recommended) will continue working, but they will raise a deprecation
event.

answered 1 hour ago

Gabe

4501918

answered 1 hour ago

Gabe

4501918

answered 1 hour ago

Gabe

4501918

answered 1 hour ago

Gabe

4501918

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sfrgttk