关于Big XML文件的后续问题:
首先,非常感谢你的回答.之后......我做错了什么?这是我的使用SAX的类:
public class SAXParserXML extends DefaultHandler { public static void ParcourXML() { DefaultHandler handler = new SAXParserXML(); SAXParserFactory factory = SAXParserFactory.newInstance(); try { String URI = "dblp.xml"; SAXParser saxParser = factory.newSAXParser(); saxParser.parse(URI,handler); } catch (Throwable t) { t.printStackTrace (); } } public void startElement (String namespaceURI,String simpleName,String qualifiedName,Attributes attrs) throws SAXException { } public void endElement (String namespaceURI,String simpleName,String qualifiedName) throws SAXException { } }
你可以看到我对我的XML文件什么都不做但它给出了这个错误:
java.lang.OutOfMemoryError: Java heap space at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.refresh(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.invokeListeners(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(Unknown Source) at SAXParserXML.ParcourXML(SAXParserXML.java:30) at Main.main(Main.java:28)
我也试过Stax ......同样的错误......我该怎么办?此外,我将Java堆大小增加到1260M
java -Xmx1260M SAXParserXML
XML文件具有以下形式:
... ....... #other tags-i'm interested only by# ... # the same thing# ....
你可以找到原始文件:http://dblp.uni-trier.de/xml/
谢谢
Java 1.6 有一个错误,它显示了完全相同的堆栈跟踪,并且它现在已经不固定了.较新的Xerces版本似乎没问题.
对于那些仍然包含大量结构的大型文档,您可以考虑使用pull-parsing,即解析部分结构,例如使用StAX.