我正在用C#编写一个GIS客户端工具,以从服务器中检索基于GML的XML模式(下面的示例)中的"功能".提取物限制为100,000个功能.
我猜测最大的extract.xml可能会增加大约150兆字节,所以很明显DOM解析器已经出来我一直试图在XmlSerializer和XSD.EXE生成的绑定之间做出决定--OR-- XmlReader和手工制作的对象图.
或者也许有一种更好的方式我尚未考虑过?像XLINQ,或????
请任何人指导我吗?特别是关于任何给定方法的存储效率.如果不是,我将不得不"原型化"两种解决方案,并将它们并排分析.
我是.NET中的一个原始对象.任何指导将不胜感激.
感谢您.基思.
示例XML - 最多100,000个,每个功能最多234,600个coords.
153.505004,-27.42196 153.505044,-27.422015 153.503992 .... 172 coordinates omitted to save space ... 153.505004,-27.42196
Mitch Wheat.. 63
使用XmlReader
解析大型XML文档.XmlReader
提供对XML数据的快速,仅向前,非缓存的访问.(Forward-only表示您可以从头到尾读取XML文件,但不能在文件中向后移动.)XmlReader
使用少量内存,相当于使用简单的SAX阅读器.
using (XmlReader myReader = XmlReader.Create(@"c:\data\coords.xml")) { while (myReader.Read()) { // Process each node (myReader.Value) here // ... } }
您可以使用XmlReader处理大小高达2千兆字节(GB)的文件.
参考:如何使用Visual C#从文件中读取XML
使用XmlReader
解析大型XML文档.XmlReader
提供对XML数据的快速,仅向前,非缓存的访问.(Forward-only表示您可以从头到尾读取XML文件,但不能在文件中向后移动.)XmlReader
使用少量内存,相当于使用简单的SAX阅读器.
using (XmlReader myReader = XmlReader.Create(@"c:\data\coords.xml")) { while (myReader.Read()) { // Process each node (myReader.Value) here // ... } }
您可以使用XmlReader处理大小高达2千兆字节(GB)的文件.
参考:如何使用Visual C#从文件中读取XML
Asat 2009年5月14日:我转而使用混合方法......请参阅下面的代码.
这个版本具有以下两者的大部分优点:
*XmlReader/XmlTextReader(内存效率 - >速度); 和
*XmlSerializer(代码 - >开发的灵活性和灵活性).
它使用XmlTextReader遍历文档,并创建"doclet",使用XmlSerializer和使用XSD.EXE生成的"XML绑定"类对其进行反序列化.
我猜这个配方是普遍适用的,它很快......我正在解析一个包含56,000个GML功能的201 MB XML文档,大约7秒......这个应用程序的旧VB6实现花了几分钟(甚至几个小时)来解析大提取物......所以我看起来很高兴.
再次,BIG感谢大家对forumites捐出您的宝贵时间.对此,我真的非常感激.
干杯全都.基思.
using System; using System.Reflection; using System.Xml; using System.Xml.Serialization; using System.IO; using System.Collections.Generic; using nrw_rime_extract.utils; using nrw_rime_extract.xml.generated_bindings; namespace nrw_rime_extract.xml { internal interface ExtractXmlReader { rimeType read(string xmlFilename); } ////// RimeExtractXml provides bindings to the RIME Extract XML as defined by /// $/Release 2.7/Documentation/Technical/SCHEMA and DTDs/nrw-rime-extract.xsd /// internal class ExtractXmlReader_XmlSerializerImpl : ExtractXmlReader { private Log log = Log.getInstance(); public rimeType read(string xmlFilename) { log.write( string.Format( "DEBUG: ExtractXmlReader_XmlSerializerImpl.read({0})", xmlFilename)); using (Stream stream = new FileStream(xmlFilename, FileMode.Open)) { return read(stream); } } internal rimeType read(Stream xmlInputStream) { // create an instance of the XmlSerializer class, // specifying the type of object to be deserialized. XmlSerializer serializer = new XmlSerializer(typeof(rimeType)); serializer.UnknownNode += new XmlNodeEventHandler(handleUnknownNode); serializer.UnknownAttribute += new XmlAttributeEventHandler(handleUnknownAttribute); // use the Deserialize method to restore the object's state // with data from the XML document. return (rimeType)serializer.Deserialize(xmlInputStream); } protected void handleUnknownNode(object sender, XmlNodeEventArgs e) { log.write( string.Format( "XML_ERROR: Unknown Node at line {0} position {1} : {2}\t{3}", e.LineNumber, e.LinePosition, e.Name, e.Text)); } protected void handleUnknownAttribute(object sender, XmlAttributeEventArgs e) { log.write( string.Format( "XML_ERROR: Unknown Attribute at line {0} position {1} : {2}='{3}'", e.LineNumber, e.LinePosition, e.Attr.Name, e.Attr.Value)); } } ////// xtractXmlReader provides bindings to the extract.xml /// returned by the RIME server; as defined by: /// $/Release X/Documentation/Technical/SCHEMA and /// DTDs/nrw-rime-extract.xsd /// internal class ExtractXmlReader_XmlTextReaderXmlSerializerHybridImpl : ExtractXmlReader { private Log log = Log.getInstance(); public rimeType read(string xmlFilename) { log.write( string.Format( "DEBUG: ExtractXmlReader_XmlTextReaderXmlSerializerHybridImpl." + "read({0})", xmlFilename)); using (XmlReader reader = XmlReader.Create(xmlFilename)) { return read(reader); } } public rimeType read(XmlReader reader) { rimeType result = new rimeType(); // a deserializer for featureClass, feature, etc, "doclets" Dictionaryserializers = new Dictionary (); serializers.Add(typeof(featureClassType), newSerializer(typeof(featureClassType))); serializers.Add(typeof(featureType), newSerializer(typeof(featureType))); List featureClasses = new List (); List features = new List (); while (!reader.EOF) { if (reader.MoveToContent() != XmlNodeType.Element) { reader.Read(); // skip non-element-nodes and unknown-elements. continue; } // skip junk nodes. if (reader.Name.Equals("featureClass")) { using ( StringReader elementReader = new StringReader(reader.ReadOuterXml())) { XmlSerializer deserializer = serializers[typeof (featureClassType)]; featureClasses.Add( (featureClassType) deserializer.Deserialize(elementReader)); } continue; // ReadOuterXml advances the reader, so don't read again. } if (reader.Name.Equals("feature")) { using ( StringReader elementReader = new StringReader(reader.ReadOuterXml())) { XmlSerializer deserializer = serializers[typeof (featureType)]; features.Add( (featureType) deserializer.Deserialize(elementReader)); } continue; // ReadOuterXml advances the reader, so don't read again. } log.write( "WARNING: unknown element '" + reader.Name + "' was skipped during parsing."); reader.Read(); // skip non-element-nodes and unknown-elements. } result.featureClasses = featureClasses.ToArray(); result.features = features.ToArray(); return result; } private XmlSerializer newSerializer(Type elementType) { XmlSerializer serializer = new XmlSerializer(elementType); serializer.UnknownNode += new XmlNodeEventHandler(handleUnknownNode); serializer.UnknownAttribute += new XmlAttributeEventHandler(handleUnknownAttribute); return serializer; } protected void handleUnknownNode(object sender, XmlNodeEventArgs e) { log.write( string.Format( "XML_ERROR: Unknown Node at line {0} position {1} : {2}\t{3}", e.LineNumber, e.LinePosition, e.Name, e.Text)); } protected void handleUnknownAttribute(object sender, XmlAttributeEventArgs e) { log.write( string.Format( "XML_ERROR: Unknown Attribute at line {0} position {1} : {2}='{3}'", e.LineNumber, e.LinePosition, e.Attr.Name, e.Attr.Value)); } } }
只是总结一下,并且对于在谷歌中找到这个帖子的人来说,答案会更加明显.
在.NET 2之前,XmlTextReader是标准API中提供的内存效率最高的XML解析器(thanx Mitch ;-)
.NET 2引入了XmlReader类,它再次更好.它是一个仅向前的元素迭代器(有点像StAX解析器).(thanx Cerebrus ;-)
并且记住小子,任何XML实例都有可能超过大约500k,不要使用DOM!
干杯全都.基思.
一个SAX解析器可能是你在找什么.SAX不要求您将整个文档读入内存 - 它会逐步解析它并允许您随时处理元素.我不知道.NET中是否提供了SAX解析器,但您可以查看一些开源选项:
http://saxdotnet.sourceforge.net/
http://www.codeguru.com/csharp/csharp/cs_data/xml/article.php/c4221
这是一篇相关的帖子:
SAX vs XmlTextReader - C#中的SAX