有没有一种优雅的方法在MySQL数据库中进行高性能的自然排序?
例如,如果我有这个数据集:
最终幻想
最终幻想4
最终幻想10
最终幻想12
最终幻想12:Promathia的链条
最终幻想冒险
最终幻想起源
最终幻想战术
任何其他优雅的解决方案,而不是将游戏的名称分解为他们的组件
标题:"最终幻想"
数量:"12"
副标题:"Promathia链"
确保它们以正确的顺序出现?(4后4,而不是2).
这样做是一个痛苦的**,因为时不时有另一个游戏打破了解析游戏标题的机制(例如"Warhammer 40,000","James Bond 007")
这是一个快速的解决方案:
SELECT alphanumeric, integer FROM sorting_test ORDER BY LENGTH(alphanumeric), alphanumeric
刚发现这个:
SELECT names FROM your_table ORDER BY games + 0 ASC
当数字位于前面时,自然排序也适用于中间.
与@plalx发布的功能相同,但重写为MySQL:
DROP FUNCTION IF EXISTS `udf_FirstNumberPos`; DELIMITER ;; CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) RETURNS int LANGUAGE SQL DETERMINISTIC NO SQL SQL SECURITY INVOKER BEGIN DECLARE position int; DECLARE tmp_position int; SET position = 5000; SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; IF (position = 5000) THEN RETURN 0; END IF; RETURN position; END ;; DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`; DELIMITER ;; CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) RETURNS varchar(4000) LANGUAGE SQL DETERMINISTIC NO SQL SQL SECURITY INVOKER BEGIN DECLARE sortString varchar(4000); DECLARE numStartIndex int; DECLARE numEndIndex int; DECLARE padLength int; DECLARE totalPadLength int; DECLARE i int; DECLARE sameOrderCharsLen int; SET totalPadLength = 0; SET instring = TRIM(instring); SET sortString = instring; SET numStartIndex = udf_FirstNumberPos(instring); SET numEndIndex = 0; SET i = 1; SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars); WHILE (i <= sameOrderCharsLen) DO SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' '); SET i = i + 1; END WHILE; WHILE (numStartIndex <> 0) DO SET numStartIndex = numStartIndex + numEndIndex; SET numEndIndex = numStartIndex; WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO SET numEndIndex = numEndIndex + 1; END WHILE; SET numEndIndex = numEndIndex - 1; SET padLength = numberLength - (numEndIndex + 1 - numStartIndex); IF padLength < 0 THEN SET padLength = 0; END IF; SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength)); SET totalPadLength = totalPadLength + padLength; SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex)); END WHILE; RETURN sortString; END ;;
用法:
SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")
我认为这就是为什么很多东西按发布日期排序的原因.
解决方案可能是在表中为"SortKey"创建另一列.这可能是标题的清理版本,它符合您为便于排序或计数器而创建的模式.
我刚刚为MSSQL 2000编写了这个函数:
/** * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings. * * @author Alexandre Potvin Latreille (plalx) * @param {nvarchar(4000)} string The formatted string. * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10. * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string. * * @return {nvarchar(4000)} A string for natural sorting. * Example of use: * * SELECT Name FROM TableA ORDER BY Name * TableA (unordered) TableA (ordered) * ------------ ------------ * ID Name ID Name * 1. A1. 1. A1-1. * 2. A1-1. 2. A1. * 3. R1 --> 3. R1 * 4. R11 4. R11 * 5. R2 5. R2 * * * As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it. * We can use this function to fix this. * * SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-') * TableA (unordered) TableA (ordered) * ------------ ------------ * ID Name ID Name * 1. A1. 1. A1. * 2. A1-1. 2. A1-1. * 3. R1 --> 3. R1 * 4. R11 4. R2 * 5. R2 5. R11 */ CREATE FUNCTION dbo.udf_NaturalSortFormat( @string nvarchar(4000), @numberLength int = 10, @sameOrderChars char(50) = '' ) RETURNS varchar(4000) AS BEGIN DECLARE @sortString varchar(4000), @numStartIndex int, @numEndIndex int, @padLength int, @totalPadLength int, @i int, @sameOrderCharsLen int; SELECT @totalPadLength = 0, @string = RTRIM(LTRIM(@string)), @sortString = @string, @numStartIndex = PATINDEX('%[0-9]%', @string), @numEndIndex = 0, @i = 1, @sameOrderCharsLen = LEN(@sameOrderChars); -- Replace all char that has to have the same order by a space. WHILE (@i <= @sameOrderCharsLen) BEGIN SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' '); SET @i = @i + 1; END -- Pad numbers with zeros. WHILE (@numStartIndex <> 0) BEGIN SET @numStartIndex = @numStartIndex + @numEndIndex; SET @numEndIndex = @numStartIndex; WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1) BEGIN SET @numEndIndex = @numEndIndex + 1; END SET @numEndIndex = @numEndIndex - 1; SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex); IF @padLength < 0 BEGIN SET @padLength = 0; END SET @sortString = STUFF( @sortString, @numStartIndex + @totalPadLength, 0, REPLICATE('0', @padLength) ); SET @totalPadLength = @totalPadLength + @padLength; SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex)); END RETURN @sortString; END GO
MySQL不允许这种"自然排序",因此看起来最好的方法就是如上所述拆分数据集(单独的id字段等),或者失败,根据非标题元素,数据库中的索引元素(日期,数据库中的插入ID等)执行排序.
让db为你做排序几乎总是比将大数据集读入你选择的编程语言并在那里排序更快,所以如果你在这里对db模式有任何控制,那么看看添加如上所述,易于排序的字段,从长远来看,它将为您节省大量的麻烦和维护.
在MySQL错误和讨论论坛上不时出现添加"自然排序"的请求,许多解决方案围绕剥离数据的特定部分并将其投射到ORDER BY
查询的一部分,例如
SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned)
这种解决方案几乎可以用于上面的最终幻想示例,但不是特别灵活,不太可能干净地扩展到数据集,包括"Warhammer 40,000"和"James Bond 007"我害怕.
所以,虽然我知道你已经找到了一个令人满意的答案,但我一直在努力解决这个问题,我们之前已经确定它在SQL中不能很好地完成,我们将不得不在JSON上使用javascript阵列.
这是我使用SQL解决它的方法.希望这对其他人有帮助:
我有以下数据:
Scene 1 Scene 1A Scene 1B Scene 2A Scene 3 ... Scene 101 Scene XXA1 Scene XXA2
我实际上并没有"投"一些东西,虽然我认为这也可能有效.
我首先替换了数据中不变的部分,在本例中是"场景",然后做了一个LPAD来排序.这似乎允许alpha字符串正确排序以及编号的字符串.
我的ORDER BY
条款如下:
ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')
显然这对原始问题没有帮助,但原始问题并不是那么统一 - 但我想这可能适用于许多其他相关问题,所以把它放在那里.
在表格中添加排序键(排名). ORDER BY rank
使用"发布日期"列. ORDER BY release_date
从SQL中提取数据时,让对象进行排序,例如,如果提取到Set中,使其成为TreeSet,并使您的数据模型实现Comparable并在此处制定自然排序算法(如果您使用插入排序就足够了)一个没有集合的语言)因为在创建模型并将其插入集合时,您将逐个从SQL读取行
关于Richard Toth的最佳回复/sf/ask/17360801/
注意包含2字节(或更多)字符和数字的UTF8编码字符串,例如
12 ???
LENGTH()
在udf_NaturalSortFormat
函数中使用MySQL 将返回字符串的字节长度并且不正确,而使用CHAR_LENGTH()
将返回正确的字符长度.
在我的情况下,使用LENGTH()
导致的查询永远不会完成并导致MySQL的100%CPU使用率
DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`; DELIMITER ;; CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) RETURNS varchar(4000) LANGUAGE SQL DETERMINISTIC NO SQL SQL SECURITY INVOKER BEGIN DECLARE sortString varchar(4000); DECLARE numStartIndex int; DECLARE numEndIndex int; DECLARE padLength int; DECLARE totalPadLength int; DECLARE i int; DECLARE sameOrderCharsLen int; SET totalPadLength = 0; SET instring = TRIM(instring); SET sortString = instring; SET numStartIndex = udf_FirstNumberPos(instring); SET numEndIndex = 0; SET i = 1; SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars); WHILE (i <= sameOrderCharsLen) DO SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' '); SET i = i + 1; END WHILE; WHILE (numStartIndex <> 0) DO SET numStartIndex = numStartIndex + numEndIndex; SET numEndIndex = numStartIndex; WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO SET numEndIndex = numEndIndex + 1; END WHILE; SET numEndIndex = numEndIndex - 1; SET padLength = numberLength - (numEndIndex + 1 - numStartIndex); IF padLength < 0 THEN SET padLength = 0; END IF; SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength)); SET totalPadLength = totalPadLength + padLength; SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex)); END WHILE; RETURN sortString; END ;;
ps我会把这个作为对原版的评论添加但我没有足够的声誉(还)