当前位置:  开发笔记 > 编程语言 > 正文

在MySQL中自然排序

如何解决《在MySQL中自然排序》经验,为你挑选了9个好方法。

有没有一种优雅的方法在MySQL数据库中进行高性能的自然排序?

例如,如果我有这个数据集:

最终幻想

最终幻想4

最终幻想10

最终幻想12

最终幻想12:Promathia的链条

最终幻想冒险

最终幻想起源

最终幻想战术

任何其他优雅的解决方案,而不是将游戏的名称分解为他们的组件

标题:"最终幻想"

数量:"12"

副标题:"Promathia链"

确保它们以正确的顺序出现?(4后4,而不是2).

这样做是一个痛苦的**,因为时不时有另一个游戏打破了解析游戏标题的机制(例如"Warhammer 40,000","James Bond 007")



1> slotishtype..:

这是一个快速的解决方案:

SELECT alphanumeric, 
       integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric


如果一切都是"最终幻想",那就太好了,但它把"高飞"放在FF套件之前.
在kludge上堆积kludge:`SELECT alphanumeric,integer FROM sorting_test ORDER BY SOUNDEX(alphanumeric),LENGTH(alphanumeric),alphanumeric`.如果这完全有效,那是因为SOUNDEX方便地丢弃数字,从而确保例如`apple1`在`z1`之前.
此解决方案始终不起作用.它有时会破裂.你应该使用这个:http://stackoverflow.com/a/12257917/384864

2> 小智..:

刚发现这个:

SELECT names FROM your_table ORDER BY games + 0 ASC

当数字位于前面时,自然排序也适用于中间.


在我的情况下(数字字段)工作得很好,谢谢.

3> Richard Toth..:

与@plalx发布的功能相同,但重写为MySQL:

DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE position int;
    DECLARE tmp_position int;
    SET position = 5000;
    SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; 
    SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;

    IF (position = 5000) THEN RETURN 0; END IF;
    RETURN position;
END
;;

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

用法:

SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")


这是唯一真正有效的解决方案.我也测试过drupals代码,但有时会失败.谢啦!

4> Michael Hare..:

我认为这就是为什么很多东西按发布日期排序的原因.

解决方案可能是在表中为"SortKey"创建另一列.这可能是标题的清理版本,它符合您为便于排序或计数器而创建的模式.



5> plalx..:

刚刚MSSQL 2000编写了这个函数:

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO



6> ConroyP..:

MySQL不允许这种"自然排序",因此看起来最好的方法就是如上所述拆分数据集(单独的id字段等),或者失败,根据非标题元素,数据库中的索引元素(日期,数据库中的插入ID等)执行排序.

让db为你做排序几乎总是比将大数据集读入你选择的编程语言并在那里排序更快,所以如果你在这里对db模式有任何控制,那么看看添加如上所述,易于排序的字段,从长远来看,它将为您节省大量的麻烦和维护.

在MySQL错误和讨论论坛上不时出现添加"自然排序"的请求,许多解决方案围绕剥离数据的特定部分并将其投射到ORDER BY查询的一部分,例如

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned) 

这种解决方案几乎可以用于上面的最终幻想示例,但不是特别灵活,不太可能干净地扩展到数据集,包括"Warhammer 40,000"和"James Bond 007"我害怕.



7> FilmJ..:

所以,虽然我知道你已经找到了一个令人满意的答案,但我一直在努力解决这个问题,我们之前已经确定它在SQL中不能很好地完成,我们将不得不在JSON上使用javascript阵列.

这是我使用SQL解决它的方法.希望这对其他人有帮助:

我有以下数据:

Scene 1
Scene 1A
Scene 1B
Scene 2A
Scene 3
...
Scene 101
Scene XXA1
Scene XXA2

我实际上并没有"投"一些东西,虽然我认为这也可能有效.

我首先替换了数据中不变的部分,在本例中是"场景",然后做了一个LPAD来排序.这似乎允许alpha字符串正确排序以及编号的字符串.

我的ORDER BY条款如下:

ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')

显然这对原始问题没有帮助,但原始问题并不是那么统一 - 但我想这可能适用于许多其他相关问题,所以把它放在那里.



8> JeeBee..:

    在表格中添加排序键(排名). ORDER BY rank

    使用"发布日期"列. ORDER BY release_date

    从SQL中提取数据时,让对象进行排序,例如,如果提取到Set中,使其成为TreeSet,并使您的数据模型实现Comparable并在此处制定自然排序算法(如果您使用插入排序就足够了)一个没有集合的语言)因为在创建模型并将其插入集合时,您将逐个从SQL读取行



9> 小智..:

关于Richard Toth的最佳回复/sf/ask/17360801/

注意包含2字节(或更多)字符和数字的UTF8编码字符串,例如

12 ???

LENGTH()udf_NaturalSortFormat函数中使用MySQL 将返回字符串的字节长度并且不正确,而使用CHAR_LENGTH()将返回正确的字符长度.

在我的情况下,使用LENGTH()导致的查询永远不会完成并导致MySQL的100%CPU使用率

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

ps我会把这个作为对原版的评论添加但我没有足够的声誉(还)

推荐阅读
李桂平2402851397
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有