InuT Blog: May 2016

I built Apache Spark on Windows 10 and checked the operation of the Spark :)

I. Building

First, I downloaded the Spark source code (v1.6.0.zip) and then extracted all files.

・Apache Spark source code

https://github.com/apache/spark/releases
v1.6.0.zip

・after extracting

C:\enjyoyspace
    ├─spark-1.6.0
    ├─assembly
    ├─bagel
    ├─bin
    ├─build

Next, I compiled with Scala 2.10.6 to produce a Spark package (Hadoop 2.6.0) by using Apache Maven. Apache Maven is a so loveable :) The changes of pom.xml and the command are following.

・changes of pom.xml

C:\enjyoyspace\spark-1.6.0\pom.xml

before: <scala.version>2.10.5</scala.version>
after: <scala.version>2.10.6</scala.version>

・command

cd C:\enjyoyspace\spark-1.6.0
mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package

(install Apache Maven: https://maven.apache.org/install.html)

The Spark package (Hadoop 2.6.0) was produced.

・Spark package (Hadoop 2.6.0)

C:\enjyoyspace\spark-1.6.0\assembly\target\scala-2.10\spark-assembly-1.6.0-hadoop2.6.0.jar

II. Operation Check

I checked the operation capabilities of Spark (testing interactive programs). It noramlly ran :)

・command

C:\enjyoyspace\spark-1.6.0>bin\spark-shell

・using Spark

Welcome to
      ____              __
     / __/__ ___ _____/ /__
    _\ \/ _ \/ _ `/ __/ '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.

Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> val lines = sc.parallelize(List("Sqoop", "from external datastores into HDFS", "Julia", "Julia is a high-performance dynamic programming language", "JuliaCon 2016"))
lines: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at< console>:27

scala> lines.count()

res0: Long = 5

scala> val words = lines.flatMap(line => line.split(" "))

words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at flatMap at< console>:29

scala> words.count()
res1: Long = 16

scala>

・Spark Web UI (http://localhost:4040/jobs/)

InuT Blog

May 3, 2016

Building Apache Spark on Windows 10

I. Building

II. Operation Check

Blog Archive