May 3, 2016

Building Apache Spark on Windows 10

I built Apache Spark on Windows 10 and checked the operation of the Spark :)


I. Building

First, I downloaded the Spark source code (v1.6.0.zip) and then extracted all files.

・Apache Spark source code
https://github.com/apache/spark/releases
v1.6.0.zip

・after extracting
C:\enjyoyspace
    ├─spark-1.6.0
    ├─assembly
    ├─bagel
    ├─bin
    ├─build

Next, I compiled with Scala 2.10.6 to produce a Spark package (Hadoop 2.6.0) by using Apache Maven. Apache Maven is a so loveable :) The changes of pom.xml and the command are following.

・changes of pom.xml
C:\enjyoyspace\spark-1.6.0\pom.xml

before:    <scala.version>2.10.5</scala.version>
after:      <scala.version>2.10.6</scala.version>

・command
cd C:\enjyoyspace\spark-1.6.0
mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package


(install Apache Maven: https://maven.apache.org/install.html)

The Spark package (Hadoop 2.6.0) was produced.

・Spark package (Hadoop 2.6.0)
C:\enjyoyspace\spark-1.6.0\assembly\target\scala-2.10\spark-assembly-1.6.0-hadoop2.6.0.jar

II. Operation Check

I checked the operation capabilities of Spark (testing interactive programs). It noramlly ran :)

・command
C:\enjyoyspace\spark-1.6.0>bin\spark-shell

・using Spark
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.

Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> val lines = sc.parallelize(List("Sqoop", "from external datastores into HDFS", "Julia", "Julia is a high-performance dynamic programming language", "JuliaCon 2016"))
lines: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at< console>:27


scala> lines.count()
res0: Long = 5

scala> val words = lines.flatMap(line => line.split(" "))
words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at flatMap at< console>:29

scala> words.count()
res1: Long = 16
scala>


・Spark Web UI (http://localhost:4040/jobs/)