使用MySQL计算中值的最简单(并且希望不是太慢)的方法是什么?我已经习惯AVG(x)
了找到平均值,但我很难找到一种计算中位数的简单方法.现在,我将所有行返回给PHP,进行排序,然后选择中间行,但肯定必须有一些简单的方法在单个MySQL查询中执行此操作.
示例数据:
id | val -------- 1 4 2 7 3 2 4 2 5 9 6 8 7 3
排序上val
给出2 2 3 4 7 8 9
的,所以中间应该是4
,与SELECT AVG(val)
这== 5
.
在MariaDB/MySQL中:
SELECT AVG(dd.val) as median_val FROM ( SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum FROM data d, (SELECT @rownum:=0) r WHERE d.val is NOT NULL -- put some where clause here ORDER BY d.val ) as dd WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );
Steve Cohen指出,在第一次传递后,@ runum将包含总行数.这可用于确定中位数,因此不需要第二次通过或连接.
此外AVG(dd.val)
,dd.row_number IN(...)
当有偶数个记录时,用于正确生成中位数.推理:
SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2 SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3
最后,MariaDB 10.3.3+包含一个MEDIAN函数
我刚刚在评论中在网上找到了另一个答案:
对于几乎所有SQL中的中位数:
SELECT x.val from data x, data y GROUP BY x.val HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
确保列的索引编制良好,索引用于过滤和排序.验证解释计划.
select count(*) from table --find the number of rows
计算"中位数"行数.也许用:median_row = floor(count / 2)
.
然后从列表中选择它:
select val from table order by val asc limit median_row,1
这应该只返回您想要的值的一行.
雅各
我发现接受的解决方案不适用于我的MySQL安装,返回一个空集,但这个查询在我测试的所有情况下都适用于我:
SELECT x.val from data x, data y GROUP BY x.val HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5 LIMIT 1
不幸的是,TheJacobTaylor和velcro的答案都没有为当前版本的MySQL返回准确的结果.
Velcro从上面得到的答案很接近,但是对于具有偶数行的结果集,它没有正确计算.中位数被定义为1)奇数集上的中间数,或2)偶数集上的两个中间数的平均值.
所以,这里的velcro解决方案修补了处理奇数和偶数集:
SELECT AVG(middle_values) AS 'median' FROM ( SELECT t1.median_column AS 'middle_values' FROM ( SELECT @row:=@row+1 as `row`, x.median_column FROM median_table AS x, (SELECT @row:=0) AS r WHERE 1 -- put some where clause here ORDER BY x.median_column ) AS t1, ( SELECT COUNT(*) as 'count' FROM median_table x WHERE 1 -- put same where clause here ) AS t2 -- the following condition will return 1 record for odd number sets, or 2 records for even number sets. WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;
要使用它,请按照以下3个简单步骤操作:
将上述代码中的"median_table"(2次出现)替换为您的表的名称
将"median_column"(3次出现)替换为您要查找中位数的列名称
如果您有WHERE条件,请将"WHERE 1"(2次出现)替换为where条件
我提出了一个更快的方法.
获取行数:
SELECT CEIL(COUNT(*)/2) FROM data;
然后在排序的子查询中取中间值:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;
我使用随机数的5x10e6数据集对此进行了测试,它将在10秒内找到中位数.
MySQL文档中对此页面的评论有以下建议:
-- (mostly) High Performance scaling MEDIAN function per group -- Median defined in http://en.wikipedia.org/wiki/Median -- -- by Peter Hlavac -- 06.11.2008 -- -- Example Table: DROP table if exists table_median; CREATE TABLE table_median (id INTEGER(11),val INTEGER(11)); COMMIT; INSERT INTO table_median (id, val) VALUES (1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6), (2, 4), (3, 5), (3, 2), (4, 5), (4, 12), (4, 1), (4, 7); -- Calculating the MEDIAN SELECT @a := 0; SELECT id, AVG(val) AS MEDIAN FROM ( SELECT id, val FROM ( SELECT -- Create an index n for every id @a := (@a + 1) mod o.c AS shifted_n, IF(@a mod o.c=0, o.c, @a) AS n, o.id, o.val, -- the number of elements for every id o.c FROM ( SELECT t_o.id, val, c FROM table_median t_o INNER JOIN (SELECT id, COUNT(1) AS c FROM table_median GROUP BY id ) t2 ON (t2.id = t_o.id) ORDER BY t_o.id,val ) o ) a WHERE IF( -- if there is an even number of elements -- take the lower and the upper median -- and use AVG(lower,upper) c MOD 2 = 0, n = c DIV 2 OR n = (c DIV 2)+1, -- if its an odd number of elements -- take the first if its only one element -- or take the one in the middle IF( c = 1, n = 1, n = c DIV 2 + 1 ) ) ) a GROUP BY id; -- Explanation: -- The Statement creates a helper table like -- -- n id val count -- ---------------- -- 1, 1, 1, 7 -- 2, 1, 3, 7 -- 3, 1, 4, 7 -- 4, 1, 5, 7 -- 5, 1, 6, 7 -- 6, 1, 7, 7 -- 7, 1, 8, 7 -- -- 1, 2, 4, 1 -- 1, 3, 2, 2 -- 2, 3, 5, 2 -- -- 1, 4, 1, 4 -- 2, 4, 5, 4 -- 3, 4, 7, 4 -- 4, 4, 12, 4 -- from there we can select the n-th element on the position: count div 2 + 1
上面的大多数解决方案仅适用于表的一个字段,您可能需要获取查询中许多字段的中位数(第50个百分位数)。
我用这个:
SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median` FROM table_name;
您可以将上面示例中的“ 50”替换为任何百分位数,非常有效。
只要确保您有足够的内存来存储GROUP_CONCAT,就可以使用以下方法进行更改:
SET group_concat_max_len = 10485760; #10MB max length
更多详细信息:http : //web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/