SQL Server bulk transfer help needed The 2019 Stack Overflow Developer Survey Results Are In ...
Do working physicists consider Newtonian mechanics to be "falsified"?
Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?
Did the UK government pay "millions and millions of dollars" to try to snag Julian Assange?
Make it rain characters
1960s short story making fun of James Bond-style spy fiction
Variable with quotation marks "$()"
Is it ok to offer lower paid work as a trial period before negotiating for a full-time job?
How do I design a circuit to convert a 100 mV and 50 Hz sine wave to a square wave?
Keeping a retro style to sci-fi spaceships?
Would an alien lifeform be able to achieve space travel if lacking in vision?
Match Roman Numerals
Why are PDP-7-style microprogrammed instructions out of vogue?
Word to describe a time interval
What aspect of planet Earth must be changed to prevent the industrial revolution?
What is the role of 'For' here?
The following signatures were invalid: EXPKEYSIG 1397BC53640DB551
Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?
Do ℕ, mathbb{N}, BbbN, symbb{N} effectively differ, and is there a "canonical" specification of the naturals?
How to handle characters who are more educated than the author?
Is every episode of "Where are my Pants?" identical?
What to do when moving next to a bird sanctuary with a loosely-domesticated cat?
Why not take a picture of a closer black hole?
How did passengers keep warm on sail ships?
What do I do when my TA workload is more than expected?
SQL Server bulk transfer help needed
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Is it good idea to change Simple Recover Model into Bulk-Logged on ETL system?Why does SQL Server 2008 R2 insert an invalid datetime when using bulk insert?Is bulk-logged recovery model ever a good choice if point-in-time restore is not needed?logging in Bulk logged recovery VS Full recovery modelSQL Bulk Insert Impersonation issueDatabase in SIMPLE recovery model full using SSIS bulk insert taskMsSql Bulk Insert With commit size 1 vs Simple InsertImporting Data in Parallel in SQL ServerSQL Bulk import vs SSIS in SimplicityWhat is the technical meaning of “Bulk” in the “Bulk-Logged Recovery Model” and why this name?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
We've got this very badly designed logging table that we want to add functionality to. The problem is that it's already a scalability nightmare and we want to fix the design before adding to it, but we only have the nightly upgrade window to do it in.
I've seen a lot of articles about the various bulk copy options with SQL Server, claiming "We could move 80M rows in 10 minutes!" but so far my testing doesn't get anywhere near that, and I'd like suggestions on how to improve on what I'm seeing.
Before the upgrade, there's always a full backup. I'm only interested in the end result and don't want a huge transaction log. I also don't want it to take too long and I don't want to blow out the disk space with transaction logs or temp files.
The table's been out there a while, so in our bigger customer dbs, it's already over 50 million rows. Each row is about 350-400 bytes. The columns are something like this
IdentityColID int, [type] int, [subtype] int,
created datetime, author nvarchar(100), Message nvarchar(max)
The problems with the design are
The primary clustered key is
(type, subtype, created, identitycolid)
, so it's an insert nightmare. Blocksplits all over the place. And even doing aSELECT COUNT(*)
takes like 8 minutes.There aren't good indexes to support the types of queries desired
I wanted to make a new table where the primary clustered index is the IdentityColId
and add the indexes to support the type of necessary queries, and then copy the existing data over and drop the old table.
So far, I tried using BCP to get the data out, and importing with
BCP
BULK INSERT
INSERT INTO ... FROM OPENROWSET
The bcp export took about 25 minutes and the imports all took about 1.3 hour - about 1.5 hours. With Recovery Model Simple, the transaction log didn't grow but the cpu consumption was in the 60-65% range most of the time.
I tried just using T-SQL INSERT INTO NewTable SELECT * FROM OldTable
, but even with Recovery Model Simple, the transaction log gets to 100 gig.
I tried using SSIS data import packages with the from/to model, and the net time was about an hour 20 minutes. With Recovery Model Simple, the transaction log stayed small.
Then I tried an SSIS Execute SQLTask package to effectively do the INSERT INTO NewTable...
line within SSIS. That got the execution time down to about 1:15, but no matter what the recovery model, the transaction log ended up around 100 gig, though CPU consumption stays modest.
I'd like the end result to be one new table, so the suggestion from some of the articles I've read to parallelize into multiple result tables doesn't seem a profitable path. But so far, I just can't seem to approach those stats from the articles I've read.
Anyone have any suggestions on how I can goose this a bit?
sql-server bulk-insert bulkcopy
bumped to the homepage by Community♦ 18 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
|
show 1 more comment
We've got this very badly designed logging table that we want to add functionality to. The problem is that it's already a scalability nightmare and we want to fix the design before adding to it, but we only have the nightly upgrade window to do it in.
I've seen a lot of articles about the various bulk copy options with SQL Server, claiming "We could move 80M rows in 10 minutes!" but so far my testing doesn't get anywhere near that, and I'd like suggestions on how to improve on what I'm seeing.
Before the upgrade, there's always a full backup. I'm only interested in the end result and don't want a huge transaction log. I also don't want it to take too long and I don't want to blow out the disk space with transaction logs or temp files.
The table's been out there a while, so in our bigger customer dbs, it's already over 50 million rows. Each row is about 350-400 bytes. The columns are something like this
IdentityColID int, [type] int, [subtype] int,
created datetime, author nvarchar(100), Message nvarchar(max)
The problems with the design are
The primary clustered key is
(type, subtype, created, identitycolid)
, so it's an insert nightmare. Blocksplits all over the place. And even doing aSELECT COUNT(*)
takes like 8 minutes.There aren't good indexes to support the types of queries desired
I wanted to make a new table where the primary clustered index is the IdentityColId
and add the indexes to support the type of necessary queries, and then copy the existing data over and drop the old table.
So far, I tried using BCP to get the data out, and importing with
BCP
BULK INSERT
INSERT INTO ... FROM OPENROWSET
The bcp export took about 25 minutes and the imports all took about 1.3 hour - about 1.5 hours. With Recovery Model Simple, the transaction log didn't grow but the cpu consumption was in the 60-65% range most of the time.
I tried just using T-SQL INSERT INTO NewTable SELECT * FROM OldTable
, but even with Recovery Model Simple, the transaction log gets to 100 gig.
I tried using SSIS data import packages with the from/to model, and the net time was about an hour 20 minutes. With Recovery Model Simple, the transaction log stayed small.
Then I tried an SSIS Execute SQLTask package to effectively do the INSERT INTO NewTable...
line within SSIS. That got the execution time down to about 1:15, but no matter what the recovery model, the transaction log ended up around 100 gig, though CPU consumption stays modest.
I'd like the end result to be one new table, so the suggestion from some of the articles I've read to parallelize into multiple result tables doesn't seem a profitable path. But so far, I just can't seem to approach those stats from the articles I've read.
Anyone have any suggestions on how I can goose this a bit?
sql-server bulk-insert bulkcopy
bumped to the homepage by Community♦ 18 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.
– Randi Vertongen
Dec 21 '18 at 21:40
5
Have you tried to useINSERT .. SELECT
in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.
– ypercubeᵀᴹ
Dec 21 '18 at 21:42
2
What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?
– Joe Obbish
Dec 22 '18 at 0:15
1
I dont understand why you are copying the data. You can drop and add indexes on the existing table.
– Sir Swears-a-lot
Dec 24 '18 at 9:44
The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.
– user1664043
Dec 26 '18 at 14:41
|
show 1 more comment
We've got this very badly designed logging table that we want to add functionality to. The problem is that it's already a scalability nightmare and we want to fix the design before adding to it, but we only have the nightly upgrade window to do it in.
I've seen a lot of articles about the various bulk copy options with SQL Server, claiming "We could move 80M rows in 10 minutes!" but so far my testing doesn't get anywhere near that, and I'd like suggestions on how to improve on what I'm seeing.
Before the upgrade, there's always a full backup. I'm only interested in the end result and don't want a huge transaction log. I also don't want it to take too long and I don't want to blow out the disk space with transaction logs or temp files.
The table's been out there a while, so in our bigger customer dbs, it's already over 50 million rows. Each row is about 350-400 bytes. The columns are something like this
IdentityColID int, [type] int, [subtype] int,
created datetime, author nvarchar(100), Message nvarchar(max)
The problems with the design are
The primary clustered key is
(type, subtype, created, identitycolid)
, so it's an insert nightmare. Blocksplits all over the place. And even doing aSELECT COUNT(*)
takes like 8 minutes.There aren't good indexes to support the types of queries desired
I wanted to make a new table where the primary clustered index is the IdentityColId
and add the indexes to support the type of necessary queries, and then copy the existing data over and drop the old table.
So far, I tried using BCP to get the data out, and importing with
BCP
BULK INSERT
INSERT INTO ... FROM OPENROWSET
The bcp export took about 25 minutes and the imports all took about 1.3 hour - about 1.5 hours. With Recovery Model Simple, the transaction log didn't grow but the cpu consumption was in the 60-65% range most of the time.
I tried just using T-SQL INSERT INTO NewTable SELECT * FROM OldTable
, but even with Recovery Model Simple, the transaction log gets to 100 gig.
I tried using SSIS data import packages with the from/to model, and the net time was about an hour 20 minutes. With Recovery Model Simple, the transaction log stayed small.
Then I tried an SSIS Execute SQLTask package to effectively do the INSERT INTO NewTable...
line within SSIS. That got the execution time down to about 1:15, but no matter what the recovery model, the transaction log ended up around 100 gig, though CPU consumption stays modest.
I'd like the end result to be one new table, so the suggestion from some of the articles I've read to parallelize into multiple result tables doesn't seem a profitable path. But so far, I just can't seem to approach those stats from the articles I've read.
Anyone have any suggestions on how I can goose this a bit?
sql-server bulk-insert bulkcopy
We've got this very badly designed logging table that we want to add functionality to. The problem is that it's already a scalability nightmare and we want to fix the design before adding to it, but we only have the nightly upgrade window to do it in.
I've seen a lot of articles about the various bulk copy options with SQL Server, claiming "We could move 80M rows in 10 minutes!" but so far my testing doesn't get anywhere near that, and I'd like suggestions on how to improve on what I'm seeing.
Before the upgrade, there's always a full backup. I'm only interested in the end result and don't want a huge transaction log. I also don't want it to take too long and I don't want to blow out the disk space with transaction logs or temp files.
The table's been out there a while, so in our bigger customer dbs, it's already over 50 million rows. Each row is about 350-400 bytes. The columns are something like this
IdentityColID int, [type] int, [subtype] int,
created datetime, author nvarchar(100), Message nvarchar(max)
The problems with the design are
The primary clustered key is
(type, subtype, created, identitycolid)
, so it's an insert nightmare. Blocksplits all over the place. And even doing aSELECT COUNT(*)
takes like 8 minutes.There aren't good indexes to support the types of queries desired
I wanted to make a new table where the primary clustered index is the IdentityColId
and add the indexes to support the type of necessary queries, and then copy the existing data over and drop the old table.
So far, I tried using BCP to get the data out, and importing with
BCP
BULK INSERT
INSERT INTO ... FROM OPENROWSET
The bcp export took about 25 minutes and the imports all took about 1.3 hour - about 1.5 hours. With Recovery Model Simple, the transaction log didn't grow but the cpu consumption was in the 60-65% range most of the time.
I tried just using T-SQL INSERT INTO NewTable SELECT * FROM OldTable
, but even with Recovery Model Simple, the transaction log gets to 100 gig.
I tried using SSIS data import packages with the from/to model, and the net time was about an hour 20 minutes. With Recovery Model Simple, the transaction log stayed small.
Then I tried an SSIS Execute SQLTask package to effectively do the INSERT INTO NewTable...
line within SSIS. That got the execution time down to about 1:15, but no matter what the recovery model, the transaction log ended up around 100 gig, though CPU consumption stays modest.
I'd like the end result to be one new table, so the suggestion from some of the articles I've read to parallelize into multiple result tables doesn't seem a profitable path. But so far, I just can't seem to approach those stats from the articles I've read.
Anyone have any suggestions on how I can goose this a bit?
sql-server bulk-insert bulkcopy
sql-server bulk-insert bulkcopy
edited Dec 26 '18 at 9:37
marc_s
7,13853849
7,13853849
asked Dec 21 '18 at 21:29
user1664043user1664043
1061
1061
bumped to the homepage by Community♦ 18 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 18 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.
– Randi Vertongen
Dec 21 '18 at 21:40
5
Have you tried to useINSERT .. SELECT
in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.
– ypercubeᵀᴹ
Dec 21 '18 at 21:42
2
What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?
– Joe Obbish
Dec 22 '18 at 0:15
1
I dont understand why you are copying the data. You can drop and add indexes on the existing table.
– Sir Swears-a-lot
Dec 24 '18 at 9:44
The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.
– user1664043
Dec 26 '18 at 14:41
|
show 1 more comment
If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.
– Randi Vertongen
Dec 21 '18 at 21:40
5
Have you tried to useINSERT .. SELECT
in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.
– ypercubeᵀᴹ
Dec 21 '18 at 21:42
2
What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?
– Joe Obbish
Dec 22 '18 at 0:15
1
I dont understand why you are copying the data. You can drop and add indexes on the existing table.
– Sir Swears-a-lot
Dec 24 '18 at 9:44
The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.
– user1664043
Dec 26 '18 at 14:41
If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.
– Randi Vertongen
Dec 21 '18 at 21:40
If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.
– Randi Vertongen
Dec 21 '18 at 21:40
5
5
Have you tried to use
INSERT .. SELECT
in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.– ypercubeᵀᴹ
Dec 21 '18 at 21:42
Have you tried to use
INSERT .. SELECT
in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.– ypercubeᵀᴹ
Dec 21 '18 at 21:42
2
2
What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?
– Joe Obbish
Dec 22 '18 at 0:15
What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?
– Joe Obbish
Dec 22 '18 at 0:15
1
1
I dont understand why you are copying the data. You can drop and add indexes on the existing table.
– Sir Swears-a-lot
Dec 24 '18 at 9:44
I dont understand why you are copying the data. You can drop and add indexes on the existing table.
– Sir Swears-a-lot
Dec 24 '18 at 9:44
The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.
– user1664043
Dec 26 '18 at 14:41
The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.
– user1664043
Dec 26 '18 at 14:41
|
show 1 more comment
2 Answers
2
active
oldest
votes
Did you change of the configurations when you tried SSIS? Using an Execute SQL Task with SQL statement is equivalent to running the same SQL in SSMS. Below is an overview of doing this process in SSIS. Of course, you'll want to change and test various settings to see what works best in your environment.
- Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a
SELECT
statement that contains only the column you need. - Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a
BULK INSERT
internally. On the destination un-check the Check Constraints option to bypass this. - To avoid excessive transaction log use, change the
Maximum Insert Commit Size
(OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in theMaxInsertCommitSize
which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction. - On the Data Flow Task, go to the Properties window and set
AutoAdjustBufferSize
to true. This will set the size of the buffer based on theDefaultBufferMaxRows
property, which indicates how many rows can be held by the buffers. - You can adjust parallel execution as well. On the package, the
MaxConcurrentExecutables
property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has anEngineThreads
property, which controls how many threads can be used by it.
add a comment |
I can't give the credit to Sir Swears-A-lot because his response was in a comment, but he was right. My assumption that dropping the primary key and creating a new one would be effectively the same or more expensive than loading up an empty new table was incorrect.
Setting recovery mode simple, dropping the primary key, recreating it (forcing the reshuffle), and adding all my new indexes afterwards clocked in at 40 minutes. That beat any bcp/SSIS method I'd tried by more than a half hour, so that was a big improvement.
And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f225591%2fsql-server-bulk-transfer-help-needed%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Did you change of the configurations when you tried SSIS? Using an Execute SQL Task with SQL statement is equivalent to running the same SQL in SSMS. Below is an overview of doing this process in SSIS. Of course, you'll want to change and test various settings to see what works best in your environment.
- Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a
SELECT
statement that contains only the column you need. - Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a
BULK INSERT
internally. On the destination un-check the Check Constraints option to bypass this. - To avoid excessive transaction log use, change the
Maximum Insert Commit Size
(OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in theMaxInsertCommitSize
which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction. - On the Data Flow Task, go to the Properties window and set
AutoAdjustBufferSize
to true. This will set the size of the buffer based on theDefaultBufferMaxRows
property, which indicates how many rows can be held by the buffers. - You can adjust parallel execution as well. On the package, the
MaxConcurrentExecutables
property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has anEngineThreads
property, which controls how many threads can be used by it.
add a comment |
Did you change of the configurations when you tried SSIS? Using an Execute SQL Task with SQL statement is equivalent to running the same SQL in SSMS. Below is an overview of doing this process in SSIS. Of course, you'll want to change and test various settings to see what works best in your environment.
- Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a
SELECT
statement that contains only the column you need. - Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a
BULK INSERT
internally. On the destination un-check the Check Constraints option to bypass this. - To avoid excessive transaction log use, change the
Maximum Insert Commit Size
(OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in theMaxInsertCommitSize
which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction. - On the Data Flow Task, go to the Properties window and set
AutoAdjustBufferSize
to true. This will set the size of the buffer based on theDefaultBufferMaxRows
property, which indicates how many rows can be held by the buffers. - You can adjust parallel execution as well. On the package, the
MaxConcurrentExecutables
property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has anEngineThreads
property, which controls how many threads can be used by it.
add a comment |
Did you change of the configurations when you tried SSIS? Using an Execute SQL Task with SQL statement is equivalent to running the same SQL in SSMS. Below is an overview of doing this process in SSIS. Of course, you'll want to change and test various settings to see what works best in your environment.
- Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a
SELECT
statement that contains only the column you need. - Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a
BULK INSERT
internally. On the destination un-check the Check Constraints option to bypass this. - To avoid excessive transaction log use, change the
Maximum Insert Commit Size
(OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in theMaxInsertCommitSize
which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction. - On the Data Flow Task, go to the Properties window and set
AutoAdjustBufferSize
to true. This will set the size of the buffer based on theDefaultBufferMaxRows
property, which indicates how many rows can be held by the buffers. - You can adjust parallel execution as well. On the package, the
MaxConcurrentExecutables
property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has anEngineThreads
property, which controls how many threads can be used by it.
Did you change of the configurations when you tried SSIS? Using an Execute SQL Task with SQL statement is equivalent to running the same SQL in SSMS. Below is an overview of doing this process in SSIS. Of course, you'll want to change and test various settings to see what works best in your environment.
- Add a Data Flow Task. In this, add an OLE DB Source component. I'd recommend using the "SQL Command" Data Access Mode and writing a
SELECT
statement that contains only the column you need. - Add either an OLE DB Destination or SQL Server Destination and connect the source to this. The SQL Server Destination tends to perform better, however is only available when importing to a database on a local server. If you use an OLE DB Destination, use the "Table or view - fast load" option. This will operate as a
BULK INSERT
internally. On the destination un-check the Check Constraints option to bypass this. - To avoid excessive transaction log use, change the
Maximum Insert Commit Size
(OLE DB Destination) to a value lower than the default. The equivelant to this on the SQL Server Destination in theMaxInsertCommitSize
which you can view by right-clicking the component, selecting Show Advanced Editor, and going to the Component Properties page. The default of 0 means all records will be commit in a single transaction. - On the Data Flow Task, go to the Properties window and set
AutoAdjustBufferSize
to true. This will set the size of the buffer based on theDefaultBufferMaxRows
property, which indicates how many rows can be held by the buffers. - You can adjust parallel execution as well. On the package, the
MaxConcurrentExecutables
property sets the max number of task that can be run. The default is -1, which is the number of processors where the package is running plus two.
The Data Flow Task has anEngineThreads
property, which controls how many threads can be used by it.
answered Dec 26 '18 at 19:49
userfl89userfl89
4616
4616
add a comment |
add a comment |
I can't give the credit to Sir Swears-A-lot because his response was in a comment, but he was right. My assumption that dropping the primary key and creating a new one would be effectively the same or more expensive than loading up an empty new table was incorrect.
Setting recovery mode simple, dropping the primary key, recreating it (forcing the reshuffle), and adding all my new indexes afterwards clocked in at 40 minutes. That beat any bcp/SSIS method I'd tried by more than a half hour, so that was a big improvement.
And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.
add a comment |
I can't give the credit to Sir Swears-A-lot because his response was in a comment, but he was right. My assumption that dropping the primary key and creating a new one would be effectively the same or more expensive than loading up an empty new table was incorrect.
Setting recovery mode simple, dropping the primary key, recreating it (forcing the reshuffle), and adding all my new indexes afterwards clocked in at 40 minutes. That beat any bcp/SSIS method I'd tried by more than a half hour, so that was a big improvement.
And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.
add a comment |
I can't give the credit to Sir Swears-A-lot because his response was in a comment, but he was right. My assumption that dropping the primary key and creating a new one would be effectively the same or more expensive than loading up an empty new table was incorrect.
Setting recovery mode simple, dropping the primary key, recreating it (forcing the reshuffle), and adding all my new indexes afterwards clocked in at 40 minutes. That beat any bcp/SSIS method I'd tried by more than a half hour, so that was a big improvement.
And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.
I can't give the credit to Sir Swears-A-lot because his response was in a comment, but he was right. My assumption that dropping the primary key and creating a new one would be effectively the same or more expensive than loading up an empty new table was incorrect.
Setting recovery mode simple, dropping the primary key, recreating it (forcing the reshuffle), and adding all my new indexes afterwards clocked in at 40 minutes. That beat any bcp/SSIS method I'd tried by more than a half hour, so that was a big improvement.
And the transaction log grew by 10 gig but didn't go nuts like some of my other experiments.
answered Jan 4 at 20:59
user1664043user1664043
1061
1061
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f225591%2fsql-server-bulk-transfer-help-needed%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you are doing the insert in one batch, then it will be logged, and your log file will grow until the transaction is finished, even when the recovery model is simple. The lob data (messages column) could be slowing your inserts down, if you could do anything about that, that would help.
– Randi Vertongen
Dec 21 '18 at 21:40
5
Have you tried to use
INSERT .. SELECT
in batches (of say 1-10K rows eacj)? It would probbaly hep to add the indexes after the table has been populated.– ypercubeᵀᴹ
Dec 21 '18 at 21:42
2
What version of SQL Server? The table is 20 GB, correct? If rows are only 400 bytes then why is there an nvarchar(max) column? Do you want to do the fastest load possible while minimizing transaction log growth?
– Joe Obbish
Dec 22 '18 at 0:15
1
I dont understand why you are copying the data. You can drop and add indexes on the existing table.
– Sir Swears-a-lot
Dec 24 '18 at 9:44
The nvarchar(max) is to allow people to type in whatever they want for audit notes, but on average people don't tend to type all that much. Yeah, I'm looking for fastest possible load while minimizing transaction log growth because of the maintenance window I have to work in.
– user1664043
Dec 26 '18 at 14:41