当前位置:  开发笔记 > 前端 > 正文

从sql server中的字符串中删除html标签的最佳方法?

如何解决《从sqlserver中的字符串中删除html标签的最佳方法?》经验,为你挑选了4个好方法。

我在SQL Server 2005中有包含html标签的数据,我想将所有内容删除,只留下标签之间的文本.理想的情况下也更换喜欢的东西<<

有没有一种简单的方法可以做到这一点,或者有人已经有一些示例t-sql代码?

我没有能力添加扩展存储过程等,所以更喜欢纯t-sql方法(最好是一个向后兼容sql 2000).

我只想用剥离的html检索数据,而不是更新它,所以理想情况下它会被写成用户定义的函数,以便于重用.

所以例如转换这个:

Some useful text 

   
more text

对此:

Some useful text > more text

RedFilter.. 152

有一个UDF将执行此处描述的操作:

剥离HTML的用户定义函数

CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
    DECLARE @Start INT
    DECLARE @End INT
    DECLARE @Length INT
    SET @Start = CHARINDEX('<',@HTMLText)
    SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
    SET @Length = (@End - @Start) + 1
    WHILE @Start > 0 AND @End > 0 AND @Length > 0
    BEGIN
        SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
        SET @Start = CHARINDEX('<',@HTMLText)
        SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
        SET @Length = (@End - @Start) + 1
    END
    RETURN LTRIM(RTRIM(@HTMLText))
END
GO

编辑:请注意这是针对SQL Server 2005的,但是如果将关键字MAX更改为类似4000,它也可以在SQL Server 2000中使用.



1> RedFilter..:

有一个UDF将执行此处描述的操作:

剥离HTML的用户定义函数

CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
    DECLARE @Start INT
    DECLARE @End INT
    DECLARE @Length INT
    SET @Start = CHARINDEX('<',@HTMLText)
    SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
    SET @Length = (@End - @Start) + 1
    WHILE @Start > 0 AND @End > 0 AND @Length > 0
    BEGIN
        SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
        SET @Start = CHARINDEX('<',@HTMLText)
        SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
        SET @Length = (@End - @Start) + 1
    END
    RETURN LTRIM(RTRIM(@HTMLText))
END
GO

编辑:请注意这是针对SQL Server 2005的,但是如果将关键字MAX更改为类似4000,它也可以在SQL Server 2000中使用.


十分感谢.评论链接到改进版本:http://lazycoders.blogspot.com/2007/06/stripping-html-from-text-in-sql-server.html处理更多的html实体.
注意lazycoders帖子有两个拼写错误.从具有这些的两个部分中的`CHAR(13)+ CHAR(10)`周围删除单引号.微妙的我没有抓住它,直到它超过短场的长度(有趣的是,我需要,所有替换都比原始字符串短).
请注意,作为SQL Server 2005或更高版本中的字符串密集型UDF,**这是实现CLR UDF功能**的理想选择,可以大幅提升性能.关于这样做的更多信息:http://stackoverflow.com/questions/34509/natural-human-alpha-numeric-sort-in-microsoft-sql-2005/2060952#2060952
我使用了lazycoders,加上上面@goodeye的拼写错误修复 - 效果很好.为了节省时间,lazycoders博客版本在这里:http://lazycoders.blogspot.com/2007/06/stripping-html-from-text-in-sql-server.html

2> Patrick Hono..:

源自@Goner Doug的答案,更新了一些内容:
- 尽可能使用REPLACE
- 预定义实体的转换é(我选择了我需要的那些:-)
- 列表标签的一些转换

    and
  • ALTER FUNCTION [dbo].[udf_StripHTML]
    --by Patrick Honorez --- www.idevlop.com
    --inspired by http://stackoverflow.com/questions/457701/best-way-to-strip-html-tags-from-a-string-in-sql-server/39253602#39253602
    (
    @HTMLText varchar(MAX)
    )
    RETURNS varchar(MAX)
    AS
    BEGIN
    DECLARE @Start  int
    DECLARE @End    int
    DECLARE @Length int
    
    set @HTMLText = replace(@htmlText, '
    ',CHAR(13) + CHAR(10)) set @HTMLText = replace(@htmlText, '
    ',CHAR(13) + CHAR(10)) set @HTMLText = replace(@htmlText, '
    ',CHAR(13) + CHAR(10)) set @HTMLText = replace(@htmlText, '
  • ','- ') set @HTMLText = replace(@htmlText, '
  • ',CHAR(13) + CHAR(10)) set @HTMLText = replace(@htmlText, '’' collate Latin1_General_CS_AS, '''' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '"' collate Latin1_General_CS_AS, '"' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '&' collate Latin1_General_CS_AS, '&' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '€' collate Latin1_General_CS_AS, '€' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '<' collate Latin1_General_CS_AS, '<' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '>' collate Latin1_General_CS_AS, '>' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'œ' collate Latin1_General_CS_AS, 'oe' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, ' ' collate Latin1_General_CS_AS, ' ' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '©' collate Latin1_General_CS_AS, '©' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '«' collate Latin1_General_CS_AS, '«' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '®' collate Latin1_General_CS_AS, '®' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '±' collate Latin1_General_CS_AS, '±' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '²' collate Latin1_General_CS_AS, '²' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '³' collate Latin1_General_CS_AS, '³' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'µ' collate Latin1_General_CS_AS, 'µ' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '·' collate Latin1_General_CS_AS, '·' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'º' collate Latin1_General_CS_AS, 'º' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '»' collate Latin1_General_CS_AS, '»' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '¼' collate Latin1_General_CS_AS, '¼' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '½' collate Latin1_General_CS_AS, '½' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '¾' collate Latin1_General_CS_AS, '¾' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '&Aelig' collate Latin1_General_CS_AS, 'Æ' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'Ç' collate Latin1_General_CS_AS, 'Ç' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'È' collate Latin1_General_CS_AS, 'È' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'É' collate Latin1_General_CS_AS, 'É' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'Ê' collate Latin1_General_CS_AS, 'Ê' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'Ö' collate Latin1_General_CS_AS, 'Ö' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'à' collate Latin1_General_CS_AS, 'à' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'â' collate Latin1_General_CS_AS, 'â' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ä' collate Latin1_General_CS_AS, 'ä' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'æ' collate Latin1_General_CS_AS, 'æ' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ç' collate Latin1_General_CS_AS, 'ç' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'è' collate Latin1_General_CS_AS, 'è' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'é' collate Latin1_General_CS_AS, 'é' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ê' collate Latin1_General_CS_AS, 'ê' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ë' collate Latin1_General_CS_AS, 'ë' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'î' collate Latin1_General_CS_AS, 'î' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ô' collate Latin1_General_CS_AS, 'ô' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ö' collate Latin1_General_CS_AS, 'ö' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '÷' collate Latin1_General_CS_AS, '÷' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ø' collate Latin1_General_CS_AS, 'ø' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ù' collate Latin1_General_CS_AS, 'ù' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ú' collate Latin1_General_CS_AS, 'ú' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'û' collate Latin1_General_CS_AS, 'û' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, 'ü' collate Latin1_General_CS_AS, 'ü' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '"' collate Latin1_General_CS_AS, '"' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '&' collate Latin1_General_CS_AS, '&' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '‹' collate Latin1_General_CS_AS, '<' collate Latin1_General_CS_AS) set @HTMLText = replace(@htmlText, '›' collate Latin1_General_CS_AS, '>' collate Latin1_General_CS_AS) -- Remove anything between ', @HTMLText, CHARINDEX('<', @HTMLText)) + 7 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('', @HTMLText, CHARINDEX('', @HTMLText)) + 7 SET @Length = (@End - @Start) + 1 END -- Remove anything between tags SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END


    我已经使用了这个并且喜欢它,但我确实在顶级组中添加了一个替换:我也改为char 13 + char 10,因为段落标记的结尾通常表示新行.它在我的特定场景中完美运行

    3> dudeNumber4..:

    如果您的HTML格式正确,我认为这是一个更好的解决方案:

    create function dbo.StripHTML( @text varchar(max) ) returns varchar(max) as
    begin
        declare @textXML xml
        declare @result varchar(max)
        set @textXML = REPLACE( @text, '&', '' );
        with doc(contents) as
        (
            select chunks.chunk.query('.') from @textXML.nodes('/') as chunks(chunk)
        )
        select @result = contents.value('.', 'varchar(max)') from doc
        return @result
    end
    go
    
    select dbo.StripHTML('This is an html test')
    


    放入黑客攻击不对HTML代码进行轰炸.显然只是内部使用或其他任何事情的快速破解(就像接受的UDF一样).

    4> Goner Doug..:

    这是该函数的更新版本,它结合了RedFilter答案(Pinal的原始版本)和LazyCoders添加以及goodeye拼写错误修正和我自己添加来处理', @HTMLText, CHARINDEX('<', @HTMLText)) + 7 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('', @HTMLText, CHARINDEX('', @HTMLText)) + 7 SET @Length = (@End - @Start) + 1 END -- Remove anything between tags SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END

推荐阅读
小妖694_807
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有