Database & Application design to support Reporting, Analytics and StatsReporting Service and Application...

PostGIS function to move a polygon to centre over new point coordinates

Have any astronauts or cosmonauts died in space?

UK visa start date and Flight Depature Time

Custom taxonomies

Excluding or including by awk

What happens if both players misunderstand the game state until it's too late?

What is the reward?

Is the tritone (A4 / d5) still banned in Roman Catholic music?

Manager has noticed coworker's excessive breaks. Should I warn him?

Is there a name for this series?

How do I avoid the "chosen hero" feeling?

Why is quixotic not Quixotic (a proper adjective)?

Is it possible to narrate a novel in a faux-historical style without alienating the reader?

How bad is a Computer Science course that doesn't teach Design Patterns?

Short story about a man betting a group he could tell a story, and one of them would disappear and the others would not notice

How can I persuade an unwilling soul to become willing?

What is an explicit bijection in combinatorics?

Sets which are both Sum-free and Product-free.

What does an unprocessed RAW file look like?

Are there any rules for handling distractions whilst performing skill checks?

Why can all solutions to the simple harmonic motion equation be written in terms of sines and cosines?

How can I differentiate duration vs starting time

Is layered encryption more secure than long passwords?

How to prep Curse of Strahd effectively



Database & Application design to support Reporting, Analytics and Stats


Reporting Service and Application RoleOnly allow one checked row in a Column in SQL ServerSQL Server Database Design for ReportingData Warehouse design for reporting against data for many time zonesINSERT/SELECT slow after TRUNCATE TABLE on table with 6 over mllion rowsService Broker received an error message on this conversationHow to design efficient queries and structure for data logging table (numerical stats)Actual Row Size vs My Estimated Row SizeColumnstore index on multi-tenant database













0















I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".



Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.



We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.



So, I am trying to figure out how I should set this up at the application and database level.



Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?



Ideas:




  1. Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.


  2. Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)


  3. Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?



Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!










share|improve this question














bumped to the homepage by Community 6 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






migrated from serverfault.com Sep 15 '15 at 20:50


This question came from our site for system and network administrators.























    0















    I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".



    Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.



    We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.



    So, I am trying to figure out how I should set this up at the application and database level.



    Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?



    Ideas:




    1. Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.


    2. Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)


    3. Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?



    Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!










    share|improve this question














    bumped to the homepage by Community 6 mins ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






    migrated from serverfault.com Sep 15 '15 at 20:50


    This question came from our site for system and network administrators.





















      0












      0








      0








      I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".



      Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.



      We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.



      So, I am trying to figure out how I should set this up at the application and database level.



      Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?



      Ideas:




      1. Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.


      2. Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)


      3. Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?



      Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!










      share|improve this question














      I am trying to determine the best way to structure my database (MS SQL) and web application (JAVA) so that we can provide lots of reports and aggregated results in close to real time. Think "Google Analytics".



      Right now, we have a 12 million row table that records views and conversions and some other analytic data. This table is optimized for writes & updates (it has a limited set of indexes), but it is slow when querying for a date range and getting aggregated values.



      We want to be able to have a dashboard showing (as close to) real time stats (as possible). But reading from that giant table is too slow to do on demand. Plus, we want to show a bunch of other aggregated and summaries information on this web app dashboard.



      So, I am trying to figure out how I should set this up at the application and database level.



      Which of these ideas sound good? What the pitfalls? Anybody have any suggestions that would help?



      Ideas:




      1. Run a background process, potentially on a dedicated server, that compiles and aggregates the analytics’ data (from the 12 million plus row table) and populates a table with the aggregate data so that on the dashboard it can be basically read from this 'cache' table, and then the current day's or the last hour (or however long as elapsed since the last time stats were compiled) metrics would be added on.


      2. Use log shipping to ship SQL Server log data to a partner server and do all the reporting on this secondary database so that the intensive reads do not affect the write/update performance (which must stay high to avoid noticeable delays on the site when pages are loaded etc)


      3. Run a daily process to trim the rows in our 12 million+ row table, and move them to an archive table. I have tried doing this, but it slows down all of the write/updates on this table to a crawl. Is a partitioned table the answer here?



      Do any of these sound like feasible solutions. Does anybody have any suggestions? Thank you for yor help!!







      sql-server






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Sep 15 '15 at 20:13









      BrookBrook

      11




      11





      bumped to the homepage by Community 6 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 6 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






      migrated from serverfault.com Sep 15 '15 at 20:50


      This question came from our site for system and network administrators.









      migrated from serverfault.com Sep 15 '15 at 20:50


      This question came from our site for system and network administrators.
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



          I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
          Cache and Queue are the keywords I think.






          share|improve this answer























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "182"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115143%2fdatabase-application-design-to-support-reporting-analytics-and-stats%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



            I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
            Cache and Queue are the keywords I think.






            share|improve this answer




























              0














              Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



              I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
              Cache and Queue are the keywords I think.






              share|improve this answer


























                0












                0








                0







                Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



                I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
                Cache and Queue are the keywords I think.






                share|improve this answer













                Things I will suggest are server caching mechanics like memcached or anything like this to reduce the database requests. And save the calculated data. The next thing that can reduce the lags is a queue system with priority to do tasks if there is time and resources - this will be a "killer" for realtime. Another idea to search faster is to use search engines like elasticsearch, solr that can handle big data better. If you have a given number of calculates data per week/day/hour you can "cache" these for the past do you just have to do live for the current week/day/hour and archive the old ones in another table to keep history and be able to deliver single records. One important thing would be to know when the server has free resources to do hard tasks in this period and queue them in the work time. And the last idea is to make a second database that holds the real 12M+ rows and do the calculations and make a task that Imports the summaries into your real application database to reduce the write-lock lags.



                I hope it will help you a bit - but for so much data and I think it will grow MySQL is without very custom optimization not the best solution. There are much faster and better data management and hold solutions like bigtable, elayticsearch/solr, cache based systems and do on.
                Cache and Queue are the keywords I think.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Sep 15 '15 at 21:11









                GummibeerGummibeer

                11




                11






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Database Administrators Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f115143%2fdatabase-application-design-to-support-reporting-analytics-and-stats%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    ORA-01691 (unable to extend lob segment) even though my tablespace has AUTOEXTEND onORA-01692: unable to...

                    Always On Availability groups resolving state after failover - Remote harden of transaction...

                    Circunscripción electoral de Guipúzcoa Referencias Menú de navegaciónLas claves del sistema electoral en...