Database performance improvements for current setup. (mysql - marriaDB)Unexplained InnoDB timeoutsHow to...

Method to test if a number is a perfect power?

How to write papers efficiently when English isn't my first language?

Would this custom Sorcerer variant that can only learn any verbal-component-only spell be unbalanced?

Did Dumbledore lie to Harry about how long he had James Potter's invisibility cloak when he was examining it? If so, why?

Go Pregnant or Go Home

What is the opposite of 'gravitas'?

Purchasing a ticket for someone else in another country?

Escape a backup date in a file name

Unreliable Magic - Is it worth it?

Crossing the line between justified force and brutality

How can we prove that any integral in the set of non-elementary integrals cannot be expressed in the form of elementary functions?

How does Loki do this?

Is exact Kanji stroke length important?

How can I kill an app using Terminal?

Roman Numeral Treatment of Suspensions

For a non-Jew, is there a punishment for not observing the 7 Noahide Laws?

How do we know the LHC results are robust?

How can a function with a hole (removable discontinuity) equal a function with no hole?

Why Were Madagascar and New Zealand Discovered So Late?

Is HostGator storing my password in plaintext?

How easy is it to start Magic from scratch?

Applicability of Single Responsibility Principle

How to be diplomatic in refusing to write code that breaches the privacy of our users

Integer addition + constant, is it a group?



Database performance improvements for current setup. (mysql - marriaDB)


Unexplained InnoDB timeoutsHow to optimize indexes on MySQL query with various sortsis this dead simple query too slow?Is a update-only-once-row table worth sharding?Optimizing a simple query on a large tableRetrieving most recent X rows for each given user from a tableHow to improve query count execution with mySql replicate?slow queries on indexed columns (large datasets)Subquery and subselection for Count/Number doesnt workWhy does the Select statement stalls when executed directly in sequence?













0















I've currently got a pretty hefty database (for me anyway) with multiple tables exceeding 15 million rows. I'm trying to improve performance on the whole table lookup, I implemented memcached to cache the already parsed result, but the initial lookups are very slow. 20seconds + on a big query.



SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)

Current setup:




A managed vps server



InnoDB storage for big tables



10.0.36-MariaDB



18gb ram



8 x 2.40 GHz CPU.




we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?



Whats the best way to improve the initial lookup without upgrading the hardware?
I probably left out a lot of important details, so if you need more info just ask.
If you can steer me in a good general direction that would be appreciated also. I know about query optimization, but we are trying to improve the time of lookup and generally mysql performance.



EDIT:



full query:



SELECT  `metric`, SUM( `amount` )
FROM big_table
WHERE 1
AND (`metric` = '1' )
AND (`source` = 'some_unique_source'
OR `source` = 'some_other_unique_source'
OR `source` = 'yet_another_unique_source'
OR `source` = 'some_most_unique_source'
OR `source` = 'almost_last_unique_source'
OR `source` = 'the_last_unique_source'
)
AND (`platform` = '2'
OR `platform` = '1'
OR `platform` = '3'
)
AND (`account` = '1'
OR `account` = '2'
OR `account` = '3'
OR `account` = '4'
OR `account` = '5'
OR `account` = '6'
OR `account` = '7'
OR `account` = '8'
OR `account` = '9'
OR `account` = '10'
OR `account` = '11'
OR `account` = '12'
)
AND (`something_id` = 'some.unique.id' )
AND `target_date` >= '2018-08-27'
AND `target_date` <= '2018-08-27'
GROUP BY `metric`;


Create query:



CREATE TABLE `big_table` (
`stat_id` int(8) NOT NULL AUTO_INCREMENT,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`target_date` date DEFAULT NULL,
`country` varchar(2) DEFAULT NULL ,
`version` varchar(16) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
`tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,
`currency` varchar(3) DEFAULT NULL,
`currency_rate` decimal(12,6) DEFAULT '500.000000',
`rate_updated` int(11) NOT NULL DEFAULT '0',
`multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',
`unique_key` varchar(180) DEFAULT NULL ,
`caption` varchar(128) DEFAULT NULL,
`transaction_timestamp` timestamp NULL DEFAULT NULL ,
`finalised` tinyint(1) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`stat_id`),
UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,
KEY `target_date` (`target_date`,`currency`),
KEY `index_country` (`country`),
KEY `currency_rate` (`currency_rate`,`multiplier`)
) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8




Edit:



The date is always 1 day - the script runs a foreach date based on user inputted date range. The returned mysql result is parsed into a multidimensional array and then parsed into a json file after the datarange finishes.
Now that I think about it,a better approach may be to make a more intelligent query where the result would be grouped by date, though I don't know how much of an improvement that would be regarding speed. There are 5 main tabs, each selecting a different [main] thing - either grouping and selecting accounts, metrics, sources, platforms, countries and something_id, then the where clauses are also constructed from user input and may be different. This is a custom analytics dashboard if that helps understanding what we are using this for.



A lot of different selects can be chosen by the user and a custom query is constructed based on the user input. I've reduced the select size by excluding the countries because on default it loaded all > 250 countries as where clauses making the the query length pretty ridiculous and embarrassing. For clarification - countries are all marked as selected on default. Pasting it into the answer made me realise that it could be wastly improved. Removing countries if all are selected reduced the load time with parsing from 21 secs~ to 8-10secs (30day foreach loop with basically the select you see on a 14million database rows). Though if the user would exclude at least one country, the sql would be constructed with 250~ where countries. I'm thinking about making an algorithm where if there are only a few of unselected countries to make a where country <> 'unselected' instead of loading all the selected ones ultimately making the query less in size.










share|improve this question
















bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

    – danblack
    Sep 22 '18 at 8:00











  • explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

    – PaaPs
    Sep 24 '18 at 10:01













  • This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

    – danblack
    Sep 25 '18 at 0:29











  • added the create and explain queries.

    – PaaPs
    Sep 26 '18 at 13:38











  • Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

    – danblack
    Sep 26 '18 at 22:33
















0















I've currently got a pretty hefty database (for me anyway) with multiple tables exceeding 15 million rows. I'm trying to improve performance on the whole table lookup, I implemented memcached to cache the already parsed result, but the initial lookups are very slow. 20seconds + on a big query.



SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)

Current setup:




A managed vps server



InnoDB storage for big tables



10.0.36-MariaDB



18gb ram



8 x 2.40 GHz CPU.




we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?



Whats the best way to improve the initial lookup without upgrading the hardware?
I probably left out a lot of important details, so if you need more info just ask.
If you can steer me in a good general direction that would be appreciated also. I know about query optimization, but we are trying to improve the time of lookup and generally mysql performance.



EDIT:



full query:



SELECT  `metric`, SUM( `amount` )
FROM big_table
WHERE 1
AND (`metric` = '1' )
AND (`source` = 'some_unique_source'
OR `source` = 'some_other_unique_source'
OR `source` = 'yet_another_unique_source'
OR `source` = 'some_most_unique_source'
OR `source` = 'almost_last_unique_source'
OR `source` = 'the_last_unique_source'
)
AND (`platform` = '2'
OR `platform` = '1'
OR `platform` = '3'
)
AND (`account` = '1'
OR `account` = '2'
OR `account` = '3'
OR `account` = '4'
OR `account` = '5'
OR `account` = '6'
OR `account` = '7'
OR `account` = '8'
OR `account` = '9'
OR `account` = '10'
OR `account` = '11'
OR `account` = '12'
)
AND (`something_id` = 'some.unique.id' )
AND `target_date` >= '2018-08-27'
AND `target_date` <= '2018-08-27'
GROUP BY `metric`;


Create query:



CREATE TABLE `big_table` (
`stat_id` int(8) NOT NULL AUTO_INCREMENT,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`target_date` date DEFAULT NULL,
`country` varchar(2) DEFAULT NULL ,
`version` varchar(16) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
`tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,
`currency` varchar(3) DEFAULT NULL,
`currency_rate` decimal(12,6) DEFAULT '500.000000',
`rate_updated` int(11) NOT NULL DEFAULT '0',
`multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',
`unique_key` varchar(180) DEFAULT NULL ,
`caption` varchar(128) DEFAULT NULL,
`transaction_timestamp` timestamp NULL DEFAULT NULL ,
`finalised` tinyint(1) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`stat_id`),
UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,
KEY `target_date` (`target_date`,`currency`),
KEY `index_country` (`country`),
KEY `currency_rate` (`currency_rate`,`multiplier`)
) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8




Edit:



The date is always 1 day - the script runs a foreach date based on user inputted date range. The returned mysql result is parsed into a multidimensional array and then parsed into a json file after the datarange finishes.
Now that I think about it,a better approach may be to make a more intelligent query where the result would be grouped by date, though I don't know how much of an improvement that would be regarding speed. There are 5 main tabs, each selecting a different [main] thing - either grouping and selecting accounts, metrics, sources, platforms, countries and something_id, then the where clauses are also constructed from user input and may be different. This is a custom analytics dashboard if that helps understanding what we are using this for.



A lot of different selects can be chosen by the user and a custom query is constructed based on the user input. I've reduced the select size by excluding the countries because on default it loaded all > 250 countries as where clauses making the the query length pretty ridiculous and embarrassing. For clarification - countries are all marked as selected on default. Pasting it into the answer made me realise that it could be wastly improved. Removing countries if all are selected reduced the load time with parsing from 21 secs~ to 8-10secs (30day foreach loop with basically the select you see on a 14million database rows). Though if the user would exclude at least one country, the sql would be constructed with 250~ where countries. I'm thinking about making an algorithm where if there are only a few of unselected countries to make a where country <> 'unselected' instead of loading all the selected ones ultimately making the query less in size.










share|improve this question
















bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

    – danblack
    Sep 22 '18 at 8:00











  • explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

    – PaaPs
    Sep 24 '18 at 10:01













  • This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

    – danblack
    Sep 25 '18 at 0:29











  • added the create and explain queries.

    – PaaPs
    Sep 26 '18 at 13:38











  • Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

    – danblack
    Sep 26 '18 at 22:33














0












0








0








I've currently got a pretty hefty database (for me anyway) with multiple tables exceeding 15 million rows. I'm trying to improve performance on the whole table lookup, I implemented memcached to cache the already parsed result, but the initial lookups are very slow. 20seconds + on a big query.



SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)

Current setup:




A managed vps server



InnoDB storage for big tables



10.0.36-MariaDB



18gb ram



8 x 2.40 GHz CPU.




we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?



Whats the best way to improve the initial lookup without upgrading the hardware?
I probably left out a lot of important details, so if you need more info just ask.
If you can steer me in a good general direction that would be appreciated also. I know about query optimization, but we are trying to improve the time of lookup and generally mysql performance.



EDIT:



full query:



SELECT  `metric`, SUM( `amount` )
FROM big_table
WHERE 1
AND (`metric` = '1' )
AND (`source` = 'some_unique_source'
OR `source` = 'some_other_unique_source'
OR `source` = 'yet_another_unique_source'
OR `source` = 'some_most_unique_source'
OR `source` = 'almost_last_unique_source'
OR `source` = 'the_last_unique_source'
)
AND (`platform` = '2'
OR `platform` = '1'
OR `platform` = '3'
)
AND (`account` = '1'
OR `account` = '2'
OR `account` = '3'
OR `account` = '4'
OR `account` = '5'
OR `account` = '6'
OR `account` = '7'
OR `account` = '8'
OR `account` = '9'
OR `account` = '10'
OR `account` = '11'
OR `account` = '12'
)
AND (`something_id` = 'some.unique.id' )
AND `target_date` >= '2018-08-27'
AND `target_date` <= '2018-08-27'
GROUP BY `metric`;


Create query:



CREATE TABLE `big_table` (
`stat_id` int(8) NOT NULL AUTO_INCREMENT,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`target_date` date DEFAULT NULL,
`country` varchar(2) DEFAULT NULL ,
`version` varchar(16) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
`tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,
`currency` varchar(3) DEFAULT NULL,
`currency_rate` decimal(12,6) DEFAULT '500.000000',
`rate_updated` int(11) NOT NULL DEFAULT '0',
`multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',
`unique_key` varchar(180) DEFAULT NULL ,
`caption` varchar(128) DEFAULT NULL,
`transaction_timestamp` timestamp NULL DEFAULT NULL ,
`finalised` tinyint(1) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`stat_id`),
UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,
KEY `target_date` (`target_date`,`currency`),
KEY `index_country` (`country`),
KEY `currency_rate` (`currency_rate`,`multiplier`)
) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8




Edit:



The date is always 1 day - the script runs a foreach date based on user inputted date range. The returned mysql result is parsed into a multidimensional array and then parsed into a json file after the datarange finishes.
Now that I think about it,a better approach may be to make a more intelligent query where the result would be grouped by date, though I don't know how much of an improvement that would be regarding speed. There are 5 main tabs, each selecting a different [main] thing - either grouping and selecting accounts, metrics, sources, platforms, countries and something_id, then the where clauses are also constructed from user input and may be different. This is a custom analytics dashboard if that helps understanding what we are using this for.



A lot of different selects can be chosen by the user and a custom query is constructed based on the user input. I've reduced the select size by excluding the countries because on default it loaded all > 250 countries as where clauses making the the query length pretty ridiculous and embarrassing. For clarification - countries are all marked as selected on default. Pasting it into the answer made me realise that it could be wastly improved. Removing countries if all are selected reduced the load time with parsing from 21 secs~ to 8-10secs (30day foreach loop with basically the select you see on a 14million database rows). Though if the user would exclude at least one country, the sql would be constructed with 250~ where countries. I'm thinking about making an algorithm where if there are only a few of unselected countries to make a where country <> 'unselected' instead of loading all the selected ones ultimately making the query less in size.










share|improve this question
















I've currently got a pretty hefty database (for me anyway) with multiple tables exceeding 15 million rows. I'm trying to improve performance on the whole table lookup, I implemented memcached to cache the already parsed result, but the initial lookups are very slow. 20seconds + on a big query.



SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)

Current setup:




A managed vps server



InnoDB storage for big tables



10.0.36-MariaDB



18gb ram



8 x 2.40 GHz CPU.




we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?



Whats the best way to improve the initial lookup without upgrading the hardware?
I probably left out a lot of important details, so if you need more info just ask.
If you can steer me in a good general direction that would be appreciated also. I know about query optimization, but we are trying to improve the time of lookup and generally mysql performance.



EDIT:



full query:



SELECT  `metric`, SUM( `amount` )
FROM big_table
WHERE 1
AND (`metric` = '1' )
AND (`source` = 'some_unique_source'
OR `source` = 'some_other_unique_source'
OR `source` = 'yet_another_unique_source'
OR `source` = 'some_most_unique_source'
OR `source` = 'almost_last_unique_source'
OR `source` = 'the_last_unique_source'
)
AND (`platform` = '2'
OR `platform` = '1'
OR `platform` = '3'
)
AND (`account` = '1'
OR `account` = '2'
OR `account` = '3'
OR `account` = '4'
OR `account` = '5'
OR `account` = '6'
OR `account` = '7'
OR `account` = '8'
OR `account` = '9'
OR `account` = '10'
OR `account` = '11'
OR `account` = '12'
)
AND (`something_id` = 'some.unique.id' )
AND `target_date` >= '2018-08-27'
AND `target_date` <= '2018-08-27'
GROUP BY `metric`;


Create query:



CREATE TABLE `big_table` (
`stat_id` int(8) NOT NULL AUTO_INCREMENT,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`target_date` date DEFAULT NULL,
`country` varchar(2) DEFAULT NULL ,
`version` varchar(16) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
`tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,
`currency` varchar(3) DEFAULT NULL,
`currency_rate` decimal(12,6) DEFAULT '500.000000',
`rate_updated` int(11) NOT NULL DEFAULT '0',
`multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',
`unique_key` varchar(180) DEFAULT NULL ,
`caption` varchar(128) DEFAULT NULL,
`transaction_timestamp` timestamp NULL DEFAULT NULL ,
`finalised` tinyint(1) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`stat_id`),
UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,
KEY `target_date` (`target_date`,`currency`),
KEY `index_country` (`country`),
KEY `currency_rate` (`currency_rate`,`multiplier`)
) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8




Edit:



The date is always 1 day - the script runs a foreach date based on user inputted date range. The returned mysql result is parsed into a multidimensional array and then parsed into a json file after the datarange finishes.
Now that I think about it,a better approach may be to make a more intelligent query where the result would be grouped by date, though I don't know how much of an improvement that would be regarding speed. There are 5 main tabs, each selecting a different [main] thing - either grouping and selecting accounts, metrics, sources, platforms, countries and something_id, then the where clauses are also constructed from user input and may be different. This is a custom analytics dashboard if that helps understanding what we are using this for.



A lot of different selects can be chosen by the user and a custom query is constructed based on the user input. I've reduced the select size by excluding the countries because on default it loaded all > 250 countries as where clauses making the the query length pretty ridiculous and embarrassing. For clarification - countries are all marked as selected on default. Pasting it into the answer made me realise that it could be wastly improved. Removing countries if all are selected reduced the load time with parsing from 21 secs~ to 8-10secs (30day foreach loop with basically the select you see on a 14million database rows). Though if the user would exclude at least one country, the sql would be constructed with 250~ where countries. I'm thinking about making an algorithm where if there are only a few of unselected countries to make a where country <> 'unselected' instead of loading all the selected ones ultimately making the query less in size.







innodb optimization mariadb






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 1 '18 at 5:52







PaaPs

















asked Sep 21 '18 at 14:39









PaaPsPaaPs

12




12





bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

    – danblack
    Sep 22 '18 at 8:00











  • explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

    – PaaPs
    Sep 24 '18 at 10:01













  • This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

    – danblack
    Sep 25 '18 at 0:29











  • added the create and explain queries.

    – PaaPs
    Sep 26 '18 at 13:38











  • Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

    – danblack
    Sep 26 '18 at 22:33



















  • yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

    – danblack
    Sep 22 '18 at 8:00











  • explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

    – PaaPs
    Sep 24 '18 at 10:01













  • This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

    – danblack
    Sep 25 '18 at 0:29











  • added the create and explain queries.

    – PaaPs
    Sep 26 '18 at 13:38











  • Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

    – danblack
    Sep 26 '18 at 22:33

















yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

– danblack
Sep 22 '18 at 8:00





yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

– danblack
Sep 22 '18 at 8:00













explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

– PaaPs
Sep 24 '18 at 10:01







explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

– PaaPs
Sep 24 '18 at 10:01















This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

– danblack
Sep 25 '18 at 0:29





This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

– danblack
Sep 25 '18 at 0:29













added the create and explain queries.

– PaaPs
Sep 26 '18 at 13:38





added the create and explain queries.

– PaaPs
Sep 26 '18 at 13:38













Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

– danblack
Sep 26 '18 at 22:33





Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

– danblack
Sep 26 '18 at 22:33










3 Answers
3






active

oldest

votes


















0














As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.



As the query is always going to iterate over the range of the target_dates, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount). An index should start with date, and include all fields of the where clause and the amount. As this ends up with a largish index recommend creating it separately to the existing target_date index.



To make this query is faster when the user say selects one country, assuming this is a common case, a index country, target_date, {other fields} will aid this form of query. Likewise if a single metric, something_id or other field is common as a single value selection (i.e x=5, but not x=5 or x=9).






share|improve this answer
























  • so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

    – PaaPs
    Sep 27 '18 at 8:43













  • looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

    – danblack
    Sep 27 '18 at 8:47











  • created the indexes, load time decreased but 1-2 seconds. Thank you

    – PaaPs
    Sep 27 '18 at 12:26











  • 1-2 seconds for 30 days worth?

    – danblack
    Sep 30 '18 at 2:00











  • yes. The time was reduced from 8 secs to around 6.

    – PaaPs
    Sep 30 '18 at 9:51



















0














Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.



CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


Index with fields there's likely to be few single values of.



Populated will data group by all the parameters and amount totalled:



INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country





share|improve this answer
























  • Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

    – Rick James
    Oct 1 '18 at 3:06











  • I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

    – PaaPs
    Oct 1 '18 at 5:42











  • @PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

    – Rick James
    Oct 1 '18 at 15:55



















0














You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.



Start with the = columns (in any order):



INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE


If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.



Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the = columns. Don't bother with indexes longer than, say, 4 column.



Other issues:





  • int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.


  • DEFAULT NULL -- use NOT NULL when appropriate.

  • Change the ORs to INs. This won't improve performance, but it will be cleaner.

  • After moving to IN, you can say country NOT IN (short-list).

  • A Comment mentions JOIN vs LEFT JOIN, but I don't see either??


If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.



(I added some Comments about summary tables onto danblack's Answers.)



Variants:





  • foo LIKE '%something' -- can't use foo part of index due to leading wildcard


  • foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.


  • country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

  • Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.


Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes -- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.






share|improve this answer


























  • JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

    – danblack
    Oct 1 '18 at 3:28











  • @danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

    – Rick James
    Oct 1 '18 at 4:53













  • I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

    – PaaPs
    Oct 1 '18 at 5:51











  • @PaaPs - I responded to your Comment.

    – Rick James
    Oct 1 '18 at 15:52











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f218265%2fdatabase-performance-improvements-for-current-setup-mysql-marriadb%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.



As the query is always going to iterate over the range of the target_dates, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount). An index should start with date, and include all fields of the where clause and the amount. As this ends up with a largish index recommend creating it separately to the existing target_date index.



To make this query is faster when the user say selects one country, assuming this is a common case, a index country, target_date, {other fields} will aid this form of query. Likewise if a single metric, something_id or other field is common as a single value selection (i.e x=5, but not x=5 or x=9).






share|improve this answer
























  • so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

    – PaaPs
    Sep 27 '18 at 8:43













  • looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

    – danblack
    Sep 27 '18 at 8:47











  • created the indexes, load time decreased but 1-2 seconds. Thank you

    – PaaPs
    Sep 27 '18 at 12:26











  • 1-2 seconds for 30 days worth?

    – danblack
    Sep 30 '18 at 2:00











  • yes. The time was reduced from 8 secs to around 6.

    – PaaPs
    Sep 30 '18 at 9:51
















0














As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.



As the query is always going to iterate over the range of the target_dates, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount). An index should start with date, and include all fields of the where clause and the amount. As this ends up with a largish index recommend creating it separately to the existing target_date index.



To make this query is faster when the user say selects one country, assuming this is a common case, a index country, target_date, {other fields} will aid this form of query. Likewise if a single metric, something_id or other field is common as a single value selection (i.e x=5, but not x=5 or x=9).






share|improve this answer
























  • so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

    – PaaPs
    Sep 27 '18 at 8:43













  • looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

    – danblack
    Sep 27 '18 at 8:47











  • created the indexes, load time decreased but 1-2 seconds. Thank you

    – PaaPs
    Sep 27 '18 at 12:26











  • 1-2 seconds for 30 days worth?

    – danblack
    Sep 30 '18 at 2:00











  • yes. The time was reduced from 8 secs to around 6.

    – PaaPs
    Sep 30 '18 at 9:51














0












0








0







As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.



As the query is always going to iterate over the range of the target_dates, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount). An index should start with date, and include all fields of the where clause and the amount. As this ends up with a largish index recommend creating it separately to the existing target_date index.



To make this query is faster when the user say selects one country, assuming this is a common case, a index country, target_date, {other fields} will aid this form of query. Likewise if a single metric, something_id or other field is common as a single value selection (i.e x=5, but not x=5 or x=9).






share|improve this answer













As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.



As the query is always going to iterate over the range of the target_dates, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount). An index should start with date, and include all fields of the where clause and the amount. As this ends up with a largish index recommend creating it separately to the existing target_date index.



To make this query is faster when the user say selects one country, assuming this is a common case, a index country, target_date, {other fields} will aid this form of query. Likewise if a single metric, something_id or other field is common as a single value selection (i.e x=5, but not x=5 or x=9).







share|improve this answer












share|improve this answer



share|improve this answer










answered Sep 27 '18 at 8:26









danblackdanblack

2,1161214




2,1161214













  • so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

    – PaaPs
    Sep 27 '18 at 8:43













  • looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

    – danblack
    Sep 27 '18 at 8:47











  • created the indexes, load time decreased but 1-2 seconds. Thank you

    – PaaPs
    Sep 27 '18 at 12:26











  • 1-2 seconds for 30 days worth?

    – danblack
    Sep 30 '18 at 2:00











  • yes. The time was reduced from 8 secs to around 6.

    – PaaPs
    Sep 30 '18 at 9:51



















  • so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

    – PaaPs
    Sep 27 '18 at 8:43













  • looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

    – danblack
    Sep 27 '18 at 8:47











  • created the indexes, load time decreased but 1-2 seconds. Thank you

    – PaaPs
    Sep 27 '18 at 12:26











  • 1-2 seconds for 30 days worth?

    – danblack
    Sep 30 '18 at 2:00











  • yes. The time was reduced from 8 secs to around 6.

    – PaaPs
    Sep 30 '18 at 9:51

















so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

– PaaPs
Sep 27 '18 at 8:43







so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

– PaaPs
Sep 27 '18 at 8:43















looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

– danblack
Sep 27 '18 at 8:47





looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

– danblack
Sep 27 '18 at 8:47













created the indexes, load time decreased but 1-2 seconds. Thank you

– PaaPs
Sep 27 '18 at 12:26





created the indexes, load time decreased but 1-2 seconds. Thank you

– PaaPs
Sep 27 '18 at 12:26













1-2 seconds for 30 days worth?

– danblack
Sep 30 '18 at 2:00





1-2 seconds for 30 days worth?

– danblack
Sep 30 '18 at 2:00













yes. The time was reduced from 8 secs to around 6.

– PaaPs
Sep 30 '18 at 9:51





yes. The time was reduced from 8 secs to around 6.

– PaaPs
Sep 30 '18 at 9:51













0














Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.



CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


Index with fields there's likely to be few single values of.



Populated will data group by all the parameters and amount totalled:



INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country





share|improve this answer
























  • Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

    – Rick James
    Oct 1 '18 at 3:06











  • I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

    – PaaPs
    Oct 1 '18 at 5:42











  • @PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

    – Rick James
    Oct 1 '18 at 15:55
















0














Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.



CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


Index with fields there's likely to be few single values of.



Populated will data group by all the parameters and amount totalled:



INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country





share|improve this answer
























  • Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

    – Rick James
    Oct 1 '18 at 3:06











  • I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

    – PaaPs
    Oct 1 '18 at 5:42











  • @PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

    – Rick James
    Oct 1 '18 at 15:55














0












0








0







Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.



CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


Index with fields there's likely to be few single values of.



Populated will data group by all the parameters and amount totalled:



INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country





share|improve this answer













Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.



CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


Index with fields there's likely to be few single values of.



Populated will data group by all the parameters and amount totalled:



INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country






share|improve this answer












share|improve this answer



share|improve this answer










answered Sep 30 '18 at 2:14









danblackdanblack

2,1161214




2,1161214













  • Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

    – Rick James
    Oct 1 '18 at 3:06











  • I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

    – PaaPs
    Oct 1 '18 at 5:42











  • @PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

    – Rick James
    Oct 1 '18 at 15:55



















  • Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

    – Rick James
    Oct 1 '18 at 3:06











  • I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

    – PaaPs
    Oct 1 '18 at 5:42











  • @PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

    – Rick James
    Oct 1 '18 at 15:55

















Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

– Rick James
Oct 1 '18 at 3:06





Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

– Rick James
Oct 1 '18 at 3:06













I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

– PaaPs
Oct 1 '18 at 5:42





I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

– PaaPs
Oct 1 '18 at 5:42













@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

– Rick James
Oct 1 '18 at 15:55





@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

– Rick James
Oct 1 '18 at 15:55











0














You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.



Start with the = columns (in any order):



INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE


If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.



Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the = columns. Don't bother with indexes longer than, say, 4 column.



Other issues:





  • int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.


  • DEFAULT NULL -- use NOT NULL when appropriate.

  • Change the ORs to INs. This won't improve performance, but it will be cleaner.

  • After moving to IN, you can say country NOT IN (short-list).

  • A Comment mentions JOIN vs LEFT JOIN, but I don't see either??


If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.



(I added some Comments about summary tables onto danblack's Answers.)



Variants:





  • foo LIKE '%something' -- can't use foo part of index due to leading wildcard


  • foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.


  • country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

  • Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.


Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes -- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.






share|improve this answer


























  • JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

    – danblack
    Oct 1 '18 at 3:28











  • @danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

    – Rick James
    Oct 1 '18 at 4:53













  • I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

    – PaaPs
    Oct 1 '18 at 5:51











  • @PaaPs - I responded to your Comment.

    – Rick James
    Oct 1 '18 at 15:52
















0














You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.



Start with the = columns (in any order):



INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE


If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.



Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the = columns. Don't bother with indexes longer than, say, 4 column.



Other issues:





  • int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.


  • DEFAULT NULL -- use NOT NULL when appropriate.

  • Change the ORs to INs. This won't improve performance, but it will be cleaner.

  • After moving to IN, you can say country NOT IN (short-list).

  • A Comment mentions JOIN vs LEFT JOIN, but I don't see either??


If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.



(I added some Comments about summary tables onto danblack's Answers.)



Variants:





  • foo LIKE '%something' -- can't use foo part of index due to leading wildcard


  • foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.


  • country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

  • Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.


Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes -- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.






share|improve this answer


























  • JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

    – danblack
    Oct 1 '18 at 3:28











  • @danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

    – Rick James
    Oct 1 '18 at 4:53













  • I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

    – PaaPs
    Oct 1 '18 at 5:51











  • @PaaPs - I responded to your Comment.

    – Rick James
    Oct 1 '18 at 15:52














0












0








0







You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.



Start with the = columns (in any order):



INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE


If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.



Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the = columns. Don't bother with indexes longer than, say, 4 column.



Other issues:





  • int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.


  • DEFAULT NULL -- use NOT NULL when appropriate.

  • Change the ORs to INs. This won't improve performance, but it will be cleaner.

  • After moving to IN, you can say country NOT IN (short-list).

  • A Comment mentions JOIN vs LEFT JOIN, but I don't see either??


If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.



(I added some Comments about summary tables onto danblack's Answers.)



Variants:





  • foo LIKE '%something' -- can't use foo part of index due to leading wildcard


  • foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.


  • country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

  • Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.


Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes -- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.






share|improve this answer















You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.



Start with the = columns (in any order):



INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE


If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.



Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the = columns. Don't bother with indexes longer than, say, 4 column.



Other issues:





  • int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.


  • DEFAULT NULL -- use NOT NULL when appropriate.

  • Change the ORs to INs. This won't improve performance, but it will be cleaner.

  • After moving to IN, you can say country NOT IN (short-list).

  • A Comment mentions JOIN vs LEFT JOIN, but I don't see either??


If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.



(I added some Comments about summary tables onto danblack's Answers.)



Variants:





  • foo LIKE '%something' -- can't use foo part of index due to leading wildcard


  • foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.


  • country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

  • Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.


Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes -- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.







share|improve this answer














share|improve this answer



share|improve this answer








edited Oct 1 '18 at 15:51

























answered Oct 1 '18 at 2:52









Rick JamesRick James

43.7k22259




43.7k22259













  • JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

    – danblack
    Oct 1 '18 at 3:28











  • @danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

    – Rick James
    Oct 1 '18 at 4:53













  • I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

    – PaaPs
    Oct 1 '18 at 5:51











  • @PaaPs - I responded to your Comment.

    – Rick James
    Oct 1 '18 at 15:52



















  • JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

    – danblack
    Oct 1 '18 at 3:28











  • @danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

    – Rick James
    Oct 1 '18 at 4:53













  • I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

    – PaaPs
    Oct 1 '18 at 5:51











  • @PaaPs - I responded to your Comment.

    – Rick James
    Oct 1 '18 at 15:52

















JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

– danblack
Oct 1 '18 at 3:28





JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

– danblack
Oct 1 '18 at 3:28













@danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

– Rick James
Oct 1 '18 at 4:53







@danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

– Rick James
Oct 1 '18 at 4:53















I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

– PaaPs
Oct 1 '18 at 5:51





I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

– PaaPs
Oct 1 '18 at 5:51













@PaaPs - I responded to your Comment.

– Rick James
Oct 1 '18 at 15:52





@PaaPs - I responded to your Comment.

– Rick James
Oct 1 '18 at 15:52


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f218265%2fdatabase-performance-improvements-for-current-setup-mysql-marriadb%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Anexo:Material bélico de la Fuerza Aérea de Chile Índice Aeronaves Defensa...

Always On Availability groups resolving state after failover - Remote harden of transaction...

update json value to null Announcing the arrival of Valued Associate #679: Cesar Manara ...