我试图使用Zeppelin与以下代码:
val dataText = sc.parallelize(IOUtils.toString(new URL("http://XXX.XX.XXX.121:8090/my_data.txt"),Charset.forName("utf8")).split("\n"))
case class Data(id: string, time: long, value1: Double, value2: int, mode: int)
val dat = dataText .map(s => s.split("\t")).filter(s => s(0) != "Header:").map(
s => Data(s(0),
s(1).toLong,
s(2).toDouble,
s(3).toInt,
s(4).toInt
)
).toDF()
dat.registerTempTable("mydatatable")
这一直让我误以为:
java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535) at java.lang.StringBuilder.append(StringBuilder.java:204) at org.apache.commons.io.output.StringBuilderWriter.write(StringBuilderWriter.java:138) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2002) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1980) at org.apache.commons.io.IOUtils.copy(IOUtils.java:1957) at org.apache.commons.io.IOUtils.copy(IOUtils.java:1907) at org.apache.commons.io.IOUtils.toString(IOUtils.java:778) at org.apache.commons.io.IOUtils.toString(IOUtils.java:896) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.( :38) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC. ( :43) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC. ( :45) at $iwC$$iwC$$iwC$$iwC$$iwC. ( :47) at $iwC$$iwC$$iwC$$iwC. ( :49) at $iwC$$iwC$$iwC. ( :51) at $iwC$$iwC. ( :53) at $iwC. ( :55) at ( :57) at . ( :61) at . ( ) at . ( :7) at . ( ) at $print( ) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
我已经设置了以下内容 zeppelin-env.sh
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557 -Dspark.executor.memory=4g"
任何想法,我可能会失踪.我正在解析的文件my_data.txt
大约是200MB
顺便说一下,如果重要的话,我正在使用Hortonworks Sandbox
编辑1
这是我的zeppelin-env.sh
export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_PORT=9995 export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557 -Dspark.executor.memory=4g" export SPARK_SUBMIT_OPTIONS="--driver-java-options -Xmx4g" export ZEPPELIN_INT_MEM="-Xmx4g" export SPARK_HOME=/usr/hdp/2.3.0.0-2557/spark
关心Kiran