当前位置:  开发笔记 > 编程语言 > 正文

如何将HTML转换为XHTML?

如何解决《如何将HTML转换为XHTML?》经验,为你挑选了2个好方法。

我需要将HTML文档转换为有效的XML,最好是XHTML.最好的方法是什么?有没有人知道工具包/库/样本/ ......什么能帮助我完成任务?

为了更清楚一点,我的应用程序必须在运行时自动进行转换.我不寻找可以帮助我手动将某些页面移动到XHTML的工具.



1> prakash..:

使用HTML Tidy从HTML转换为XML

可下载的二进制文件

JRoppert,根据您的需要,我想你可能想看一下Sources

c:\temp>tidy -help
tidy [option...] [file...] [option...] [file...]
Utility to clean up and pretty print HTML/XHTML/XML
see http://tidy.sourceforge.net/

Options for HTML Tidy for Windows released on 14 February 2006:

File manipulation
-----------------
 -output , -o  write output to the specified 
 
 -config       set configuration options from the specified 
 -file , -f    write errors to the specified 
 
 -modify, -m         modify the original input files

Processing directives
---------------------
 -indent, -i         indent element content
 -wrap , -w  wrap text at the specified . 0 is assumed if
              is missing. When this option is omitted, the
                     default of the configuration option "wrap" applies.
 -upper, -u          force tags to upper case
 -clean, -c          replace FONT, NOBR and CENTER tags by CSS
 -bare, -b           strip out smart quotes and em dashes, etc.
 -numeric, -n        output numeric rather than named entities
 -errors, -e         only show errors
 -quiet, -q          suppress nonessential output
 -omit               omit optional end tags
 -xml                specify the input is well formed XML
 -asxml, -asxhtml    convert HTML to well formed XHTML
 -ashtml             force XHTML to well formed HTML
 -access      do additional accessibility checks ( = 0, 1, 2, 3).
                     0 is assumed if  is missing.

Character encodings
-------------------
 -raw                output values above 127 without conversion to entities
 -ascii              use ISO-8859-1 for input, US-ASCII for output
 -latin0             use ISO-8859-15 for input, US-ASCII for output
 -latin1             use ISO-8859-1 for both input and output
 -iso2022            use ISO-2022 for both input and output
 -utf8               use UTF-8 for both input and output
 -mac                use MacRoman for input, US-ASCII for output
 -win1252            use Windows-1252 for input, US-ASCII for output
 -ibm858             use IBM-858 (CP850+Euro) for input, US-ASCII for output
 -utf16le            use UTF-16LE for both input and output
 -utf16be            use UTF-16BE for both input and output
 -utf16              use UTF-16 for both input and output
 -big5               use Big5 for both input and output
 -shiftjis           use Shift_JIS for both input and output
 -language     set the two-letter language code  (for future use)

Miscellaneous
-------------
 -version, -v        show the version of Tidy
 -help, -h, -?       list the command line options
 -xml-help           list the command line options in XML format
 -help-config        list all configuration options
 -xml-config         list all configuration options in XML format
 -show-config        list the current configuration settings

Use --blah blarg for any configuration option "blah" with argument "blarg"

Input/Output default to stdin/stdout respectively
Single letter options apart from -f may be combined
as in:  tidy -f errs.txt -imu foo.html
For further info on HTML see http://www.w3.org/MarkUp



2> TcKs..:

您可以使用HTML Agility Pack.它是CodePlex的开源项目.

推荐阅读
手机用户2402851335
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有