Prerequisites

This article assumes that the installed OS is Ubuntu 20.04 and the installation is performed either by a root or a sudo account.



Process


First, let's install update the packages installed on the current system using  

apt-get update -y 
Bash
 

Not we will install other packages those are prerequisites for Apache Spark.

apt-get install -y default-jdk scala
Bash
 

Once the above step is complete then download the Apache Spark installation with the following commands


cd /opt
wget https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz
Bash
 Extract the downloaded tarball and move it into its own directory
tar -xvzf spark-2.4.6-bin-hadoop2.7.tgz
mv spark-2.4.6-bin-hadoop2.7 spark 
Bash
 Once this is done, change it into your home directory append the ~/.bashrc file with your favorite text editor with the following 
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin 

Update your shell environment with sourcing the above file 
source ~/.bashrc

Now you are ready to start Apache Spark , let's bring up Spark's scala cli with 

spark-shell 
Bash
 

Tsuccessful execution of the above would provide you the Spark Scala prompt , something similar to

Output

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://AppChassis5B1S4:4040 Spark context available as 'sc' (master = local[*], app id = local-1615563643141). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.6 /_/ Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.10) Type in expressions to have them evaluated. Type :help for more information. scala>  



Verifying Apache Spark installation

We can verify the container running by giving it a task, e.g. calculating the value of π by throwing darts at a circular board, here is the scala code for that

 scala> val count = sc.parallelize(1 to 99999).filter { _ =>
val x = math.random
val y = math.random
x*x + y*y < 1
      }.count()

Output
count: Long = 78515

scala> println(s"Pi is roughly ${4.0 * count / 99999}")
Bash
 

You should see something similar to the output shown below:

Output

Pi is roughly 3.1406314063140632