Database & Application design to support Reporting, Analytics and StatsReporting Service and Application...
PostGIS function to move a polygon to centre over new point coordinates
Have any astronauts or cosmonauts died in space?
UK visa start date and Flight Depature Time
Custom taxonomies
Excluding or including by awk
What happens if both players misunderstand the game state until it's too late?
What is the reward?
Is the tritone (A4 / d5) still banned in Roman Catholic music?
Manager has noticed coworker's excessive breaks. Should I warn him?
Is there a name for this series?
How do I avoid the "chosen hero" feeling?
Why is quixotic not Quixotic (a proper adjective)?
Is it possible to narrate a novel in a faux-historical style without alienating the reader?
How bad is a Computer Science course that doesn't teach Design Patterns?
Short story about a man betting a group he could tell a story, and one of them would disappear and the others would not notice
How can I persuade an unwilling soul to become willing?
What is an explicit bijection in combinatorics?
Sets which are both Sum-free and Product-free.
What does an unprocessed RAW file look like?
Are there any rules for handling distractions whilst performing skill checks?
Why can all solutions to the simple harmonic motion equation be written in terms of sines and cosines?
How can I differentiate duration vs starting time
Is layered encryption more secure than long passwords?
How to prep Curse of Strahd effectively
Database & Application design to support Reporting, Analytics and Stats
Reporting Service and Application RoleOnly allow one checked row in a Column in SQL ServerSQL Server Database Design for ReportingData Warehouse design for reporting against data for many time zonesINSERT/SELECT slow after TRUNCATE TABLE on table with 6 over mllion rowsService Broker received an error message on this conversationHow to design efficient queries and structure for data logging table (numerical stats)Actual Row Size vs My Estimated Row SizeColumnstore index on multi-tenant database
I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".
Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.
We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.
So, I am trying to figure out how I should set this up at the application and database level.
Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?
Ideas:
Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.
Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)
Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?
Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!
sql-server
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
migrated from serverfault.com Sep 15 '15 at 20:50
This question came from our site for system and network administrators.
add a comment |
I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".
Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.
We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.
So, I am trying to figure out how I should set this up at the application and database level.
Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?
Ideas:
Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.
Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)
Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?
Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!
sql-server
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
migrated from serverfault.com Sep 15 '15 at 20:50
This question came from our site for system and network administrators.
add a comment |
I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".
Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.
We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.
So, I am trying to figure out how I should set this up at the application and database level.
Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?
Ideas:
Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.
Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)
Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?
Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!
sql-server
I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".
Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.
We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.
So, I am trying to figure out how I should set this up at the application and database level.
Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?
Ideas:
Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.
Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)
Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?
Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!
sql-server
sql-server
asked Sep 15 '15 at 20:13
BrookBrook
11
11
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 6 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
migrated from serverfault.com Sep 15 '15 at 20:50
This question came from our site for system and network administrators.
migrated from serverfault.com Sep 15 '15 at 20:50
This question came from our site for system and network administrators.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115143%2fdatabase-application-design-to-support-reporting-analytics-and-stats%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
add a comment |
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
add a comment |
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.
I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
Cache and Queue are the keywords I think.
answered Sep 15 '15 at 21:11
GummibeerGummibeer
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115143%2fdatabase-application-design-to-support-reporting-analytics-and-stats%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown