Database & Application design to support Reporting, Analytics and StatsReporting Service and Application...

PostGIS function to move a polygon to centre over new point coordinates

Have any astronauts or cosmonauts died in space?

UK visa start date and Flight Depature Time

Custom taxonomies

Excluding or including by awk

What happens if both players misunderstand the game state until it's too late?

What is the reward?

Is the tritone (A4 / d5) still banned in Roman Catholic music?

Manager has noticed coworker's excessive breaks. Should I warn him?

Is there a name for this series?

How do I avoid the "chosen hero" feeling?

Why is quixotic not Quixotic (a proper adjective)?

Is it possible to narrate a novel in a faux-historical style without alienating the reader?

How bad is a Computer Science course that doesn't teach Design Patterns?

Short story about a man betting a group he could tell a story, and one of them would disappear and the others would not notice

How can I persuade an unwilling soul to become willing?

What is an explicit bijection in combinatorics?

Sets which are both Sum-free and Product-free.

What does an unprocessed RAW file look like?

Are there any rules for handling distractions whilst performing skill checks?

Why can all solutions to the simple harmonic motion equation be written in terms of sines and cosines?

How can I differentiate duration vs starting time

Is layered encryption more secure than long passwords?

How to prep Curse of Strahd effectively

Database & Application design to support Reporting, Analytics and Stats

Reporting Service and Application RoleOnly allow one checked row in a Column in SQL ServerSQL Server Database Design for ReportingData Warehouse design for reporting against data for many time zonesINSERT/SELECT slow after TRUNCATE TABLE on table with 6 over mllion rowsService Broker received an error message on this conversationHow to design efficient queries and structure for data logging table (numerical stats)Actual Row Size vs My Estimated Row SizeColumnstore index on multi-tenant database

I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".

Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.

We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.

So, I am trying to figure out how I should set this up at the application and database level.

Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?

Ideas:

Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.

Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)

Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?

Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!

asked Sep 15 '15 at 20:13

Brook

bumped to the homepage by Community♦ 6 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

migrated from serverfault.com Sep 15 '15 at 20:50

This question came from our site for system and network administrators.

add a comment |

So, I am trying to figure out how I should set this up at the application and database level.

Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?

Ideas:

Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.

Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)

Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?

Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!

asked Sep 15 '15 at 20:13

Brook

bumped to the homepage by Community♦ 6 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

migrated from serverfault.com Sep 15 '15 at 20:50

This question came from our site for system and network administrators.

add a comment |

So, I am trying to figure out how I should set this up at the application and database level.

Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?

Ideas:

Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.

Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)

Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?

Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!

asked Sep 15 '15 at 20:13

Brook

So, I am trying to figure out how I should set this up at the application and database level.

Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?

Ideas:

Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.

Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)

Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?

Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!

sql-server

asked Sep 15 '15 at 20:13

Brook

asked Sep 15 '15 at 20:13

Brook

asked Sep 15 '15 at 20:13

Brook

asked Sep 15 '15 at 20:13

Brook

asked Sep 15 '15 at 20:13

Brook

bumped to the homepage by Community♦ 6 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 6 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

migrated from serverfault.com Sep 15 '15 at 20:50

This question came from our site for system and network administrators.

migrated from serverfault.com Sep 15 '15 at 20:50

This question came from our site for system and network administrators.

add a comment |

1 Answer
1

active

oldest

votes

Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.

I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.

answered Sep 15 '15 at 21:11

Gummibeer

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115143%2fdatabase-application-design-to-support-reporting-analytics-and-stats%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Sep 15 '15 at 21:11

Gummibeer

add a comment |

answered Sep 15 '15 at 21:11

Gummibeer

add a comment |

answered Sep 15 '15 at 21:11

Gummibeer

answered Sep 15 '15 at 21:11

Gummibeer

answered Sep 15 '15 at 21:11

Gummibeer

answered Sep 15 '15 at 21:11

Gummibeer

answered Sep 15 '15 at 21:11

Gummibeer

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sfrgttk