16赞

如何使用Python/Django执行HTML解码/编码？

作者：php | 2023-09-03 09:49

如何解决《如何使用Python/Django执行HTML解码/编码？》经验，为你挑选了8个好方法。

我有一个html编码的字符串:

'''<img class="size-medium wp-image-113"\
 style="margin-left: 15px;" title="su1"\
 src="https://img.devbox.cn/3cccf/16086/243/0ab2d201ea5dc980.png"\
 alt="" ''

我想将其改为:



我希望将其注册为HTML,以便浏览器将其呈现为图像,而不是显示为文本. 

我已经在C#中找到了如何做到这一点,但在Python中却没有.有人可以帮我吗？

谢谢.

编辑:有人问为什么我的字符串存储就像那样.这是因为我正在使用网络抓取工具"扫描"网页并从中获取某些内容.该工具(BeautifulSoup)以该格式返回字符串.

有关


在Python中将XML/HTML实体转换为Unicode字符串

    
  




  
  
  

    

      

        Daniel Naab..
         112
      
      
鉴于Django用例,有两个答案.这是它的django.utils.html.escape功能,供参考:

def escape(html):
    """Returns the given HTML with ampersands, quotes and carets encoded."""
    return mark_safe(force_unicode(html).replace('&', '&').replace('<', '&l
t;').replace('>', '>').replace('"', '"').replace("'", '''))


为了扭转这一点,Jake的答案中描述的猎豹功能应该有效,但缺少单引号.此版本包含更新的元组,更换顺序颠倒以避免对称问题:

def html_decode(s):
    """
    Returns the ASCII decoded version of the given HTML string. This does
    NOT remove normal HTML tags like .
    """
    htmlCodes = (
            ("'", '''),
            ('"', '"'),
            ('>', '>'),
            ('<', '<'),
            ('&', '&')
        )
    for code in htmlCodes:
        s = s.replace(code[1], code[0])
    return s

unescaped = html_decode(my_string)


然而,这不是一般解决方案; 它仅适用于编码的字符串django.utils.html.escape.更一般地说,坚持使用标准库是个好主意:

# Python 2.x:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(my_string)

# Python 3.x:
import html.parser
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(my_string)

# >= Python 3.5:
from html import unescape
unescaped = unescape(my_string)


作为建议:将未转义的HTML存储在数据库中可能更有意义.如果可能的话,值得研究从BeautifulSoup获取未转义的结果,并完全避免这个过程.

使用Django,只能在模板渲染过程中进行转义; 所以为了防止逃避你只是告诉模板引擎不要逃避你的字符串.为此,请在模板中使用以下选项之一:

{{ context_var|safe }}
{% autoescape off %}
    {{ context_var }}
{% endautoescape %}

        
          
          

              我认为转义只发生在模板渲染过程中的Django中.因此,不需要unescape  - 你只是告诉模板引擎不要逃脱.{{context_var | safe}}或{%autoescape off%} {{context_var}} {%endautoescape%} (12认同)
           
          
          
              与django.utils.html.escape没有对立面吗？ (4认同)
           
          
          
              @Daniel:请将您的评论更改为答案,以便我可以投票!|安全正是我(我相信其他人)在回答这个问题时所寻求的. (3认同)
           
          
        
      

    
  

  
  

    

      

        Jiangge Zhan..
         110
      
      
使用标准库:


HTML Escape

try:
    from html import escape  # python 3.x
except ImportError:
    from cgi import escape  # python 2.x

print(escape("<"))

HTML Unescape

try:
    from html import unescape  # python 3.4+
except ImportError:
    try:
        from html.parser import HTMLParser  # python 3.x (<3.4)
    except ImportError:
        from HTMLParser import HTMLParser  # python 2.x
    unescape = HTMLParser().unescape

print(unescape(">"))


        
          
          

              我认为这是最直接的,"包括电池"和正确的答案.我不知道为什么人们投票给那些Django/Cheetah的事情. (12认同)
           
          
          
              对于2015年的说明,HTMLParser.unescape在py 3.4中已弃用,在3.5中已删除.使用`from html import unescape`代替 (3认同)
           
          
          
              请注意,这不会处理像德语元音("Ü")这样的特殊字符 (2认同)
           
          
        
      

    
  

  
  

    

      

        user26294..
         80
      
      
对于html编码,标准库中有cgi.escape:

>> help(cgi.escape)
cgi.escape = escape(s, quote=None)
    Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
    is also translated.


对于html解码,我使用以下内容:

import re
from htmlentitydefs import name2codepoint
# for some reason, python 2.5.2 doesn't have this one (apostrophe)
name2codepoint['#39'] = 39

def unescape(s):
    "unescape HTML code refs; c.f. http://wiki.python.org/moin/EscapingHtml"
    return re.sub('&(%s);' % '|'.join(name2codepoint),
              lambda m: unichr(name2codepoint[m.group(1)]), s)


对于任何更复杂的东西,我使用BeautifulSoup.


1> Daniel Naab..：
鉴于Django用例,有两个答案.这是它的django.utils.html.escape功能,供参考:

def escape(html):
    """Returns the given HTML with ampersands, quotes and carets encoded."""
    return mark_safe(force_unicode(html).replace('&', '&').replace('<', '&l
t;').replace('>', '>').replace('"', '"').replace("'", '''))


为了扭转这一点,Jake的答案中描述的猎豹功能应该有效,但缺少单引号.此版本包含更新的元组,更换顺序颠倒以避免对称问题:

def html_decode(s):
    """
    Returns the ASCII decoded version of the given HTML string. This does
    NOT remove normal HTML tags like .
    """
    htmlCodes = (
            ("'", '''),
            ('"', '"'),
            ('>', '>'),
            ('<', '<'),
            ('&', '&')
        )
    for code in htmlCodes:
        s = s.replace(code[1], code[0])
    return s

unescaped = html_decode(my_string)


然而,这不是一般解决方案; 它仅适用于编码的字符串django.utils.html.escape.更一般地说,坚持使用标准库是个好主意:

# Python 2.x:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(my_string)

# Python 3.x:
import html.parser
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(my_string)

# >= Python 3.5:
from html import unescape
unescaped = unescape(my_string)


作为建议:将未转义的HTML存储在数据库中可能更有意义.如果可能的话,值得研究从BeautifulSoup获取未转义的结果,并完全避免这个过程.

使用Django,只能在模板渲染过程中进行转义; 所以为了防止逃避你只是告诉模板引擎不要逃避你的字符串.为此,请在模板中使用以下选项之一:

{{ context_var|safe }}
{% autoescape off %}
    {{ context_var }}
{% endautoescape %}

        
我认为转义只发生在模板渲染过程中的Django中.因此,不需要unescape  - 你只是告诉模板引擎不要逃脱.{{context_var | safe}}或{%autoescape off%} {{context_var}} {%endautoescape%} 
与django.utils.html.escape没有对立面吗？ 
@Daniel:请将您的评论更改为答案,以便我可以投票!|安全正是我(我相信其他人)在回答这个问题时所寻求的. 

2> Jiangge Zhan..：
使用标准库:


HTML Escape

try:
    from html import escape  # python 3.x
except ImportError:
    from cgi import escape  # python 2.x

print(escape("<"))

HTML Unescape

try:
    from html import unescape  # python 3.4+
except ImportError:
    try:
        from html.parser import HTMLParser  # python 3.x (<3.4)
    except ImportError:
        from HTMLParser import HTMLParser  # python 2.x
    unescape = HTMLParser().unescape

print(unescape(">"))


        
我认为这是最直接的,"包括电池"和正确的答案.我不知道为什么人们投票给那些Django/Cheetah的事情. 
对于2015年的说明,HTMLParser.unescape在py 3.4中已弃用,在3.5中已删除.使用`from html import unescape`代替 
请注意,这不会处理像德语元音("Ü")这样的特殊字符 

3> user26294..：
对于html编码,标准库中有cgi.escape:

>> help(cgi.escape)
cgi.escape = escape(s, quote=None)
    Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
    is also translated.


对于html解码,我使用以下内容:

import re
from htmlentitydefs import name2codepoint
# for some reason, python 2.5.2 doesn't have this one (apostrophe)
name2codepoint['#39'] = 39

def unescape(s):
    "unescape HTML code refs; c.f. http://wiki.python.org/moin/EscapingHtml"
    return re.sub('&(%s);' % '|'.join(name2codepoint),
              lambda m: unichr(name2codepoint[m.group(1)]), s)


对于任何更复杂的东西,我使用BeautifulSoup.

        

4> vincent..：
如果编码字符集相对受限,请使用daniel的解决方案.否则,请使用众多HTML解析库中的一个.

我喜欢BeautifulSoup,因为它可以处理格式错误的XML/HTML:

http://www.crummy.com/software/BeautifulSoup/

对于你的问题,他们的文档中有一个例子 

from BeautifulSoup import BeautifulStoneSoup
BeautifulStoneSoup("Sacré bleu!", 
                   convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0]
# u'Sacr\xe9 bleu!'

        

5> Collin Ander..：
在Python 3.4+中:

import html

html.unescape(your_string)

        

6> zgoda..：
请参阅本页底部的Python wiki,"unescape"html至少有2个选项.

        

7> dfrankow..：
丹尼尔的评论作为答案:

"转义只发生在Django模板渲染过程中.因此,不需要unescape  - 你只需告诉模板引擎不要逃脱.{{context_var | safe}}或{%autoescape off%} {{context_var}} { %endautoescape%}"

        

8> slowkvant..：
我发现了一个很好的功能:http://snippets.dzone.com/posts/show/4569

def decodeHtmlentities(string):
    import re
    entity_re = re.compile("&(#?)(\d{1,5}|\w{1,8});")

    def substitute_entity(match):
        from htmlentitydefs import name2codepoint as n2cp
        ent = match.group(2)
        if match.group(1) == "#":
            return unichr(int(ent))
        else:
            cp = n2cp.get(ent)

            if cp:
                return unichr(cp)
            else:
                return match.group()

    return entity_re.subn(substitute_entity, string)[0]



    

    

    
        推荐阅读
        
            
                                
                    
                        程序员
                        python中有序字典的有序字典
                    

                    
                                                
                        如何解决《python中有序字典的有序字典》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Android Google Plus API  -  PeopleApi.loadConnected返回0个人
                    

                    
                                                
                        如何解决《AndroidGooglePlusAPI-PeopleApi.loadConnected返回0个人》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Mongoose模式方法是"不是函数"
                    

                    
                                                
                        如何解决《Mongoose模式方法是"不是函数"》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        检查Javascript中是否存在活动超时
                    

                    
                                                
                        如何解决《检查Javascript中是否存在活动超时》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        ol.overlay未正确设置位置
                    

                    
                                                
                        如何解决《ol.overlay未正确设置位置》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Swift 2和Linux/OS X的区别
                    

                    
                                                
                        如何解决《Swift2和Linux/OSX的区别》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        无法导入flup.server.fcgi
                    

                    
                                                
                        如何解决《无法导入flup.server.fcgi》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        CSS转换比例允许的最大小数位数？
                    

                    
                                                
                        如何解决《CSS转换比例允许的最大小数位数？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Xamarin  -  Android  -  Visual Studio  - 无法启动应用程序
                    

                    
                                                
                            
                        
                                                
                        如何解决《Xamarin-Android-VisualStudio-无法启动应用程序》经验，为你挑选了4个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Java中的大数字计算？
                    

                    
                                                
                        如何解决《Java中的大数字计算？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        从iOS Share Extension获取PHAsset
                    

                    
                                                
                        如何解决《从iOSShareExtension获取PHAsset》经验，为你挑选了2个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        是否以形式公开显示x-amz凭证或任何亚马逊东西？
                    

                    
                                                
                        如何解决《是否以形式公开显示x-amz凭证或任何亚马逊东西？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        R ::如何从chisq.test输出中获取p值
                    

                    
                                                
                        如何解决《R::如何从chisq.test输出中获取p值》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        如何通过API网关使用事件调用类型调用Lambda函数？
                    

                    
                                                
                        如何解决《如何通过API网关使用事件调用类型调用Lambda函数？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        如何禁用glimpse更新检查请求？
                    

                    
                                                
                        如何解决《如何禁用glimpse更新检查请求？》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        NumPy:旧数据描述符和新数据描述符的大小不匹配
                    

                    
                                                
                        如何解决《NumPy:旧数据描述符和新数据描述符的大小不匹配》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        使用浓缩咖啡从测试中调用测试活动的方法并查看其结果
                    

                    
                                                
                        如何解决《使用浓缩咖啡从测试中调用测试活动的方法并查看其结果》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Perl拆分并丢弃一行中的第一个元素
                    

                    
                                                
                        如何解决《Perl拆分并丢弃一行中的第一个元素》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Android:检查系统是否从服务中销毁活动
                    

                    
                                                
                        如何解决《Android:检查系统是否从服务中销毁活动》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        如何在同一分区中保留2个Cassandra表
                    

                    
                                                
                        如何解决《如何在同一分区中保留2个Cassandra表》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                

            
        
    

    
        吐了个 "CAO" !
        
            
                吐个槽吧,看都看了
            
            
                
                                        会员登录 | 用户注册
























    

    
        
            
            
                
                    
                
            

            
                php            

            
                这个屌丝很懒，什么也没留下！            
            
            

                                
                    
                    关注作者
                            

        
    


    
        Tags | 热门标签
        
            
                                
                    actionscrip
                
                                
                    bash
                
                                
                    c#
                
                                
                    c++
                
                                
                    c语言
                
                                
                    erlang
                
                                
                    flutter
                
                                
                    go
                
                                
                    golang
                
                                
                    java
                
                                
                    javascript
                
                                
                    lua
                
                                
                    node.js
                
                                
                    perl
                
                                
                    php
                
                                
                    python
                
                                
                    scala
                
                                
                    typescript
                
                                
            
        
    


    
        RankList | 热门文章
        
            
                                
                    1从旧字典键生成新的字典键
                
                                
                    2nltk StanfordNERTagger:NoClassDefFoundError:org/slf4j/LoggerFactory(在Windows中)
                
                                
                    3使用Python的Windows桌面GUI自动化 - 睡眠与紧密循环
                
                                
                    4使用Google Maps API旋转SVG符号以匹配飞机航向
                
                                
                    5使用值而不是索引从Python列表中进行选择
                
                                
                    6MATLAB中阈值内的最小二乘最小化
                
                                
                    7System.out.println在Play Framework控制台中按顺序打印
                
                                
                    8根据foldr验证foldl实现
                
                                
                    9Outlook关闭时,为什么Excel VBA运行速度明显加快？
                
                                
                    10如何在Javascript中将UTC/GMT日期时间转换为CST？(不是本地的,CST总是)
                
                                
                    11为什么LocationSettingsResult startResolutionForResult没有调用onActivityResult？
                
                                
                    12保存录制的音频(Swift)
                
                                
                    13AWS Lambda通过cloudformation安排事件源
                
                                
                    14[iOS]:检测视图控制器何时从另一个外部应用程序返回后出现
                
                                
                    15如何使用jquery或javascript从html表中删除文本框而不会丢失文本框值
                
                                
                    16for stylus for CSS选择器名称的循环
                
                                
                    17如何使用控制台在React Native Android中调试Java代码
                
                                
                    18打印单击按钮的文本tkinter
                
                                
                    19如何正确编写递归jquery承诺的代码
                
                                
                    20样式本机JavaScript通知