Prerequisites

This article assumes that the Docker-CE is installed and the daemon is running.



Process


Apache Spark doesn't have an official docker image therefore, we will be using a popular public image for running it as a docker container. Run this command to get the spark docker container image:

docker pull cloudsuite/spark:2.4.5 bash
Bash
 

Docker maintains a site called Dockerhub, a public repository of Docker files (including both official and user-submitted images). The image we downloaded is a public image of Apache Spark that supports ARM64 architecture.

Start the Docker container with this command:

docker run -it --rm cloudsuite/spark:2.4.5 bash
Bash
 
  • run                         command to create a new container
  • -it                            run the container in and interactive tty environment
  • --rm                        remove the container after exit
  • cloudsuite/spark    name of the image


2.4.5 is the tag of this image on dockerhub, if we had not already downloaded the image, Docker would do this automatically.


On the docker container interactive bash shell, invoke the spark-shell with this command:

root@8200d3e9ff88:/# /opt/spark-2.4.5/bin/spark-shell 
Bash
 a

A successful execution of the above would provide you the Spark Scala prompt , something similar to

Output Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://8200d3e9ff88:4040 Spark context available as 'sc' (master = local[*], app id = local-1615477396359). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.5 /_/ Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.7) Type in expressions to have them evaluated. Type :help for more information. scala>



Verifying Apache Spark container

We can verify the container running by giving it a task, e.g. calculating the value of π by throwing darts at a circular board, here is the scala code for that

 scala> val count = sc.parallelize(1 to 99999).filter { _ =>
     |   val x = math.random
     |   val y = math.random
     |   x*x + y*y < 1
     | }.count()
Output
count: Long = 78650

scala> println(s"Pi is roughly ${4.0 * count / 99999}")
Bash
 

You should see something similar to the output shown below:

Output Pi is roughly 3.146031460314603