我有一个有2个节点的火花簇,master(172.17.0.229)
和slave(172.17.0.228)
.我已经编辑spark-env.sh
,添加SPARK_MASTER_IP=127.17.0.229
和奴隶,补充说172.17.0.228
.
我正在使用start-master.sh
和从节点启动我的主节点start-slaves.sh
.
我可以看到webUI的主节点没有worker,但是worker节点的日志如下:
Spark Command: /usr/lib/jvm/java-7-oracle/jre/bin/java -cp /usr/local/src/spark-1.5.2-bin-hadoop2.6/sbin/../conf/:/usr/local/src/spark-1.5.2-bin-hadoop$ ======================================== Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/12/18 14:17:25 INFO Worker: Registered signal handlers for [TERM, HUP, INT] 15/12/18 14:17:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/12/18 14:17:26 INFO SecurityManager: Changing view acls to: ujjwal 15/12/18 14:17:26 INFO SecurityManager: Changing modify acls to: ujjwal 15/12/18 14:17:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ujjwal); users wit$ 15/12/18 14:17:27 INFO Slf4jLogger: Slf4jLogger started 15/12/18 14:17:27 INFO Remoting: Starting remoting 15/12/18 14:17:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@172.17.0.228:47599] 15/12/18 14:17:27 INFO Utils: Successfully started service 'sparkWorker' on port 47599. 15/12/18 14:17:27 INFO Worker: Starting Spark worker 172.17.0.228:47599 with 2 cores, 2.7 GB RAM 15/12/18 14:17:27 INFO Worker: Running Spark version 1.5.2 15/12/18 14:17:27 INFO Worker: Spark home: /usr/local/src/spark-1.5.2-bin-hadoop2.6 15/12/18 14:17:27 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 15/12/18 14:17:27 INFO WorkerWebUI: Started WorkerWebUI at http://172.17.0.228:8081 15/12/18 14:17:27 INFO Worker: Connecting to master 127.17.0.229:7077... 15/12/18 14:17:27 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@127.17.0.229:7077] has failed, address is now$ 15/12/18 14:17:27 WARN Worker: Failed to connect to master 127.17.0.229:7077 akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster@127.17.0.229:7077/), Path(/user/Master)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266) at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533) at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569) at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559) at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87) at akka.remote.EndpointWriter.postStop(Endpoint.scala:557) at akka.actor.Actor$class.aroundPostStop(Actor.scala:477) at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) at akka.actor.ActorCell.terminate(ActorCell.scala:369) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
谢谢你的建议.
通常,spark://...:7077
在172.17.0.229端口8080处检查您的工作者尝试连接的IP 与Web UI上的报告地址将有助于确定该地址是否正确.
在这种特殊情况下,看起来你有一个错字; 更改
SPARK_MASTER_IP=127.17.0.229
阅读:
SPARK_MASTER_IP=172.17.0.229
(你似乎倒了127/172).