我在SQL Server 2005中有包含html标签的数据,我想将所有内容删除,只留下标签之间的文本.理想的情况下也更换喜欢的东西<
带<
等
有没有一种简单的方法可以做到这一点,或者有人已经有一些示例t-sql代码?
我没有能力添加扩展存储过程等,所以更喜欢纯t-sql方法(最好是一个向后兼容sql 2000).
我只想用剥离的html检索数据,而不是更新它,所以理想情况下它会被写成用户定义的函数,以便于重用.
所以例如转换这个:
Some useful text > more text
对此:
Some useful text > more text
RedFilter.. 152
有一个UDF将执行此处描述的操作:
剥离HTML的用户定义函数
CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX)) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @Start INT DECLARE @End INT DECLARE @Length INT SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 WHILE @Start > 0 AND @End > 0 AND @Length > 0 BEGIN SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'') SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END GO
编辑:请注意这是针对SQL Server 2005的,但是如果将关键字MAX更改为类似4000,它也可以在SQL Server 2000中使用.
有一个UDF将执行此处描述的操作:
剥离HTML的用户定义函数
CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX)) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @Start INT DECLARE @End INT DECLARE @Length INT SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 WHILE @Start > 0 AND @End > 0 AND @Length > 0 BEGIN SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'') SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END GO
编辑:请注意这是针对SQL Server 2005的,但是如果将关键字MAX更改为类似4000,它也可以在SQL Server 2000中使用.
源自@Goner Doug的答案,更新了一些内容:
- 尽可能使用REPLACE
- 预定义实体的转换é
(我选择了我需要的那些:-)
- 列表标签的一些转换 and
ALTER FUNCTION [dbo].[udf_StripHTML] --by Patrick Honorez --- www.idevlop.com --inspired by http://stackoverflow.com/questions/457701/best-way-to-strip-html-tags-from-a-string-in-sql-server/39253602#39253602 ( @HTMLText varchar(MAX) ) RETURNS varchar(MAX) AS BEGIN DECLARE @Start int DECLARE @End int DECLARE @Length int set @HTMLText = replace(@htmlText, '
',CHAR(13) + CHAR(10)) set @HTMLText = replace(@htmlText, '
',CHAR(13) + CHAR(10)) set @HTMLText = replace(@htmlText, '
',CHAR(13) + CHAR(10)) set @HTMLText = replace(@htmlText, '
如果您的HTML格式正确,我认为这是一个更好的解决方案:
create function dbo.StripHTML( @text varchar(max) ) returns varchar(max) as begin declare @textXML xml declare @result varchar(max) set @textXML = REPLACE( @text, '&', '' ); with doc(contents) as ( select chunks.chunk.query('.') from @textXML.nodes('/') as chunks(chunk) ) select @result = contents.value('.', 'varchar(max)') from doc return @result end go select dbo.StripHTML('This is an html test')
这是该函数的更新版本,它结合了RedFilter答案(Pinal的原始版本)和LazyCoders添加以及goodeye拼写错误修正和我自己添加来处理HTML中的内嵌标记.
ALTER FUNCTION [dbo].[udf_StripHTML] ( @HTMLText varchar(MAX) ) RETURNS varchar(MAX) AS BEGIN DECLARE @Start int DECLARE @End int DECLARE @Length int -- Replace the HTML entity & with the '&' character (this needs to be done first, as -- '&' might be double encoded as '&') SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&') SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity < with the '<' character SET @Start = CHARINDEX('<', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '<') SET @Start = CHARINDEX('<', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity > with the '>' character SET @Start = CHARINDEX('>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '>') SET @Start = CHARINDEX('>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity & with the '&' character SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&') SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity with the ' ' character SET @Start = CHARINDEX(' ', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, ' ') SET @Start = CHARINDEX(' ', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 END -- Replace any
tags with a newline SET @Start = CHARINDEX('
', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10)) SET @Start = CHARINDEX('
', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Replace any
tags with a newline SET @Start = CHARINDEX('
', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10)) SET @Start = CHARINDEX('
', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace any
tags with a newline SET @Start = CHARINDEX('
', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10)) SET @Start = CHARINDEX('
', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 END -- Remove anything between ', @HTMLText, CHARINDEX('<', @HTMLText)) + 7 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('', @HTMLText)) + 7 SET @Length = (@End - @Start) + 1 END -- Remove anything betweentags SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END