Apache Zeppelin on Spark cluster mode (2024)

Overview

Apache Spark has supported three cluster manager types(Standalone, Apache Mesos and Hadoop YARN) so far.This document will guide you how you can build and configure the environment on 3 types of Spark cluster manager with Apache Zeppelin using Docker scripts.So install docker on the machine first.

Spark standalone mode

Spark standalone is a simple cluster manager included with Spark that makes it easy to set up a cluster.You can simply set up Spark standalone environment with below steps.

Note : Since Apache Zeppelin and Spark use same 8080 port for their web UI, you might need to change zeppelin.server.port in conf/zeppelin-site.xml.

1. Build Docker file

You can find docker script files under scripts/docker/spark-cluster-managers.

cd $ZEPPELIN_HOME/scripts/docker/spark-cluster-managers/spark_standalonedocker build -t "spark_standalone" .

2. Run docker

docker run -it \-p 8080:8080 \-p 7077:7077 \-p 8888:8888 \-p 8081:8081 \-h sparkmaster \--name spark_standalone \spark_standalone bash;

Note that sparkmaster hostname used here to run docker container should be defined in your /etc/hosts.

3. Configure Spark interpreter in Zeppelin

Set Spark master as spark://<hostname>:7077 in Zeppelin Interpreters setting page.

Apache Zeppelin on Spark cluster mode (1)

4. Run Zeppelin with Spark interpreter

After running single paragraph with Spark interpreter in Zeppelin, browse https://<hostname>:8080 and check whether Spark cluster is running well or not.

Apache Zeppelin on Spark cluster mode (2)

You can also simply verify that Spark is running well in Docker with below command.

ps -ef | grep spark

Spark on YARN mode

You can simply set up Spark on YARN docker environment with below steps.

Note : Since Apache Zeppelin and Spark use same 8080 port for their web UI, you might need to change zeppelin.server.port in conf/zeppelin-site.xml.

1. Build Docker file

You can find docker script files under scripts/docker/spark-cluster-managers.

cd $ZEPPELIN_HOME/scripts/docker/spark-cluster-managers/spark_yarn_clusterdocker build -t "spark_yarn" .

2. Run docker

docker run -it \ -p 5000:5000 \ -p 9000:9000 \ -p 9001:9001 \ -p 8088:8088 \ -p 8042:8042 \ -p 8030:8030 \ -p 8031:8031 \ -p 8032:8032 \ -p 8033:8033 \ -p 8080:8080 \ -p 7077:7077 \ -p 8888:8888 \ -p 8081:8081 \ -p 50010:50010 \ -p 50075:50075 \ -p 50020:50020 \ -p 50070:50070 \ --name spark_yarn \ -h sparkmaster \ spark_yarn bash;

Note that sparkmaster hostname used here to run docker container should be defined in your /etc/hosts.

3. Verify running Spark on YARN.

You can simply verify the processes of Spark and YARN are running well in Docker with below command.

ps -ef

You can also check each application web UI for HDFS on http://<hostname>:50070/, YARN on http://<hostname>:8088/cluster and Spark on http://<hostname>:8080/.

4. Configure Spark interpreter in Zeppelin

Set following configurations to conf/zeppelin-env.sh.

export MASTER=yarn-clientexport HADOOP_CONF_DIR=[your_hadoop_conf_path]export SPARK_HOME=[your_spark_home_path]

HADOOP_CONF_DIR(Hadoop configuration path) is defined in /scripts/docker/spark-cluster-managers/spark_yarn_cluster/hdfs_conf.

Don't forget to set Spark master as yarn-client in Zeppelin Interpreters setting page like below.

Apache Zeppelin on Spark cluster mode (3)

5. Run Zeppelin with Spark interpreter

After running a single paragraph with Spark interpreter in Zeppelin, browse http://<hostname>:8088/cluster/apps and check Zeppelin application is running well or not.

Apache Zeppelin on Spark cluster mode (4)

Spark on Mesos mode

You can simply set up Spark on Mesos docker environment with below steps.

1. Build Docker file

cd $ZEPPELIN_HOME/scripts/docker/spark-cluster-managers/spark_mesosdocker build -t "spark_mesos" .

2. Run docker

docker run --net=host -it \-p 8080:8080 \-p 7077:7077 \-p 8888:8888 \-p 8081:8081 \-p 8082:8082 \-p 5050:5050 \-p 5051:5051 \-p 4040:4040 \-h sparkmaster \--name spark_mesos \spark_mesos bash;

Note that sparkmaster hostname used here to run docker container should be defined in your /etc/hosts.

3. Verify running Spark on Mesos.

You can simply verify the processes of Spark and Mesos are running well in Docker with below command.

ps -ef

You can also check each application web UI for Mesos on http://<hostname>:5050/cluster and Spark on http://<hostname>:8080/.

4. Configure Spark interpreter in Zeppelin

export MASTER=mesos://127.0.1.1:5050export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so]export SPARK_HOME=[PATH OF SPARK HOME]

Don't forget to set Spark master as mesos://127.0.1.1:5050 in Zeppelin Interpreters setting page like below.

Apache Zeppelin on Spark cluster mode (5)

5. Run Zeppelin with Spark interpreter

After running a single paragraph with Spark interpreter in Zeppelin, browse http://<hostname>:5050/#/frameworks and check Zeppelin application is running well or not.

Apache Zeppelin on Spark cluster mode (6)

Troubleshooting for Spark on Mesos

  • If you have problem with hostname, use --add-host option when executing dockerrun
## use `--add-host=moby:127.0.0.1` option to resolve## since docker container couldn't resolve `moby`: java.net.UnknownHostException: moby: moby: Name or service not known at java.net.InetAddress.getLocalHost(InetAddress.java:1496) at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:789) at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:782) at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:782)
  • If you have problem with mesos master, try mesos://127.0.0.1 instead of mesos://127.0.1.1
I0103 20:17:22.329269 340 sched.cpp:330] New master detected at [emailprotected]:5050I0103 20:17:22.330749 340 sched.cpp:341] No credentials provided. Attempting to register without authenticationW0103 20:17:22.333531 340 sched.cpp:736] Ignoring framework registered message because it was sentfrom '[emailprotected]:5050' instead of the leading master '[emailprotected]:5050'W0103 20:17:24.040252 339 sched.cpp:736] Ignoring framework registered message because it was sentfrom '[emailprotected]:5050' instead of the leading master '[emailprotected]:5050'W0103 20:17:26.150250 339 sched.cpp:736] Ignoring framework registered message because it was sentfrom '[emailprotected]:5050' instead of the leading master '[emailprotected]:5050'W0103 20:17:26.737604 339 sched.cpp:736] Ignoring framework registered message because it was sentfrom '[emailprotected]:5050' instead of the leading master '[emailprotected]:5050'W0103 20:17:35.241714 336 sched.cpp:736] Ignoring framework registered message because it was sentfrom '[emailprotected]:5050' instead of the leading master '[emailprotected]:5050'
Apache Zeppelin on Spark cluster mode (2024)
Top Articles
Latest Posts
Article information

Author: Moshe Kshlerin

Last Updated:

Views: 5534

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Moshe Kshlerin

Birthday: 1994-01-25

Address: Suite 609 315 Lupita Unions, Ronnieburgh, MI 62697

Phone: +2424755286529

Job: District Education Designer

Hobby: Yoga, Gunsmithing, Singing, 3D printing, Nordic skating, Soapmaking, Juggling

Introduction: My name is Moshe Kshlerin, I am a gleaming, attractive, outstanding, pleasant, delightful, outstanding, famous person who loves writing and wants to share my knowledge and understanding with you.