Jun 12, 2017

Stabilization of Successor and Predecessor in Chord




1. Introduction

Chord is one of the most famous algorithms for a distributed hash table. In this article, I note the stabilization of Chord about successors and predecessors. Understanding stabilizations like Chord's one is important for understanding cloud computing :)
I don't note the stabilization of fingertable in this time and will note it in the other article :)


2. Stabilization of Successor and Predecessor

2.1. Logic of Stailization

The logic of the stabilization of successors and predecessors is as follows. In this figure, ID is the hash value of the node's IP address and port. The distance of nodes is the difference in IDs. So, the distance between 'Node A' and 'Node B' is as follows.



・Distance between 'Node A' and 'Node B'
ID of 'Node B' - ID of 'Node A'

(ID: the hash value of the node's IP address and port) 


2.2. Specific Example of Stabilization

Let's assume that the following ring in which blue nodes participate exists and has a pink node participate in the ring.
The successor of 'Node A' is incorrect, the predecessor of 'Node AA' is not assigned and the predecessor of 'Node B' is incorrect.



'Node AA' asks wheter the predecessor of his current successor ('Node B') is himself ('Node AA').
Because the predecessor of 'Node B' is not 'Node AA', 'Node AA' askes 'Node C' wheter he is closer to 'Node B' than 'Node A' is. Node AA' is closer to 'Node B' than 'Node A', so the predecessor of 'Node B' is revised to be 'Node AA'.

Node A' asks whether the predecessor of his current successor ('Node B') is himself ('Node A').
Because the node that is closer to 'Node B' than himself is 'Node AA', 'Node A' change his successor to 'Node AA'. And then, 'Node A' teach 'Node AA' that the predecessor of 'Node AA' is himself. 'Node AA' set his predecessor to 'Node A'.

The ring is as follows at this time. All successors and predecessors of nodes are properly set :)



3. Conclusion

The stabilization of successors and predecessors of nodes is based on a simple logic noted in 2.1. Even if nodes are added or removed, the stabilization revises these successors and predecessors and ring is properly maintained.

Apr 23, 2017

Two Styles of Continue and Break in Scala




1. Introduction

In this article, I note how to write 'continue' and 'break' by using Scala in two styles. One is a classic Java style and the other is a elegant style like Japanese calligraphy drawn unicursally. I loved the latter style :)


2. Continue and Break in Scala

2.1. Continue

First, I note the 'continue' in Scala. Test data is the simple Map object.

・ test data
val data = Map[String, Double]( "libor1M" -> 0.1, "libor3M" -> 0.2, "tibor1M" -> 0.3, "libor6M" -> 0.4)


If you write a progam by using Scala in classic Java stile, it may be the following. It's so redundant and the readability is low and not beautiful.

・ 'continue' in Scala (classic Java style)
val exitBlock = new Breaks()
import exitBlock.{breakable, break}

for (one <- data) {
  breakable {
    if (!one._1.startsWith("libor")) {
      break
    }
    println(one._1 + ":" + one._2)
  }
}


But, you can write so simple and elegant code as follows, using Scala. Compared to the above classic Java style, the amount of code is decreased. It's like the beautiful Japanese calligraphy drawn unicursally and so readable :)

・ 'continue' in Scala (elegant style :))
data.withFilter(p => p._1.startsWith("libor")).foreach(f => println(f._1 + ":" + f._2))


The full source code is as follows.

・ full source code
import scala.util.control.Breaks

object test {
  def main(args: Array[String])={
    val data = Map[String, Double]( "libor1M" -> 0.1, "libor3M" -> 0.2,
                                                             "tibor1M" -> 0.3, "libor6M" -> 0.4)
                                       
   
    // I don't like the following code. It's not elegant ...
    val exitBlock = new Breaks()
    import exitBlock.{breakable, break}

    for (one <- data) {
      breakable {
        if (!one._1.startsWith("libor")) {
          break
        }
        println(one._1 + ":" + one._2)
      }
    }
   
   
    // I like the following code :)
    data.withFilter(p => p._1.startsWith("libor")).foreach(f => println(f._1 + ":" + f._2))
  }
}


2.2. Break

Next, I note the 'break' in Scala. Test data is the simple List object.

・ test data
val data = List[String]("libor1M", "libor3M", "tibor1M", "libor6M")


If you write a progam by using Scala in classic Java style, it may the following redundant one in common with 'continue'.

・ 'break' in Scala (classic Java style)
val exitBlock = new Breaks()
import exitBlock.{breakable, break}

breakable {
  for (one <- data) {
    if (one == "tibor1M") {
      println(one)
      break
    }
  }
}

But, as with the case of 'continue', you can write simple and elegant code. It's so readable :)

・ 'break' in Scala (elegant style :))
val result = data.find(p => p == "tibor1M").toString()
println(result)


The full source code is as follows.

・full source code
import scala.util.control.Breaks

object test {
  def main(args: Array[String])={  
    val data = List[String]("libor1M", "libor3M", "tibor1M", "libor6M")
   
   
    // I don't like the following code. It's not elegant ...
    val exitBlock = new Breaks()
    import exitBlock.{breakable, break}


    breakable {
      for (one <- data) {
        if (one == "tibor1M") {
          println(one)
          break
        }
      }
    }
   
   
    // I like the following code :)
    val result = data.find(p => p == "tibor1M").toString()
    println(result)

  }
}



3. Conclusion

In Scala, you can write 'continue' and 'break' in the classic Java Style and the beautiful style like Japanese calligraphy drawn unicursally. Writing in the latter style will reduce the amount of code and make the readability so high.  


Mar 13, 2017

Fun Quiz on SparkR: How to Create Data Frame without Spark-CSV Package :)





1. Introduction

We are able to write Spark applications that create directly a data frame from a csv file by using the spark-csv package. And so, the package is often used.

Mahler's Symphony No. 5 will be performed by Eliahu Inbal and Konzerthausorchester Berlin in Tokyo tonight. I arrived at the hall an hour before that the classical concert started. So, I considered a few interesting quiz about creating data frames from csv files without above package :)

In this article, Spark's version is 1.6.3.

2. Fun Quiz

In this chapter I note the source code using the spark-csv package and the few funny quiz :)
Before starting the quiz game, we need to prepare the following csv file.

・ test_data.csv
20170228,scheme1,BermudanCallableSwap,JPY,21261339422
20170228,scheme2,BermudanCallableSwap,JPY,22759109989
20170228,scheme3,BermudanCallableSwap,JPY,21405741891
...


2.1. Use Spark-CSV Package

First, I note the program using the spark-csv package. There are two important points. One is setting 'SPARKR_SUBMIT_ARGS' to use spark-csv package, and the other is to load a csv file as data frame directly. 

How to set 'SPARKR_SUBMIT_ARGS' and load a csv file as data frame are as follows.

・ set 'SPARKR_SUBMIT_ARGS' (com.databricks:spark-csv)
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.11:1.5.0" "sparkr-shell"')

・ load a csv file as data frame
df <- read.df(sqlContext, inputFilePath, source = "com.databricks.spark.csv", schema = schema)

The full source code is as follows.

・ source code
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.11:1.5.0" "sparkr-shell"')

# init SparkR
sc <- sparkR.init(appName = "visualization", master = "local[*]")
sqlContext <- sparkRSQL.init(sc)

# parameter
inputFilePath <- "C:/dev/R/test_data.csv"

# schema of data frame
schema <- structType(structField(
                     structField(x = "date", type = "integer", nullable = FALSE),
                     structField(x = "scheme_number", type = "string", nullable = FALSE),
                     structField(x = "product", type = "string", nullable = FALSE),
                     structField(x = "currency", type = "string", nullable = FALSE),
                     structField(x = "sensitivity", type = "double", nullable = FALSE))

# load a csv file as data frame
df <- read.df(sqlContext, inputFilePath, source = "com.databricks.spark.csv", schema = schema)


showDF(df)

2.2. Quiz

Next, I note a simple quiz. Enjoy!! :)

・ quiz
If you don't use the spark-csv package, how do you create the data frame with the folllowing schema from test_data.csv?

・ schema
  date,scheme_number,product,currency,sensitivity

・ test_data.csv
  20170228,scheme1,BermudanCallableSwap,JPY,21261339422
  20170228,scheme2,BermudanCallableSwap,JPY,22759109989
  20170228,scheme3,BermudanCallableSwap,JPY,21405741891
  ...

The time limit is 5 minutes :)


3. Answer

The answer is as follows. It's diffuse, but there is a bit of fun :)
You must prepare a record splitter.

・ record splitter
splitRecord <- function(record) {
  Sys.setlocale("LC_ALL", "C")
  part <- strsplit(record, ",")[[1]]
  list(column1 = as.integer(part[1]),
       column2 = part[2],
       column3 = part[3],
       column4 = part[4],
       column5 = as.double(part[5]))
}

And then, you must make a rdd, split it and convert to a data frame.

・ load a csv file and create data frame
# load a csv file
orgData <- SparkR:::textFile(sc, inputFilePath)

# split data
splitData <- SparkR:::lapply(orgData, splitRecord)

# create data frame
df <- SparkR:::createDataFrame(sqlContext, splitData, schema)

The full source code is as follows :)

・ source code
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))

# init SparkR
sc <- sparkR.init(appName = "visualization", master = "local[*]")
sqlContext <- sparkRSQL.init(sc)

# parameter
inputFilePath <- "C:/dev/R/test_data.csv"

# schema of data frame
schema <- structType(structField(
                     structField(x = "date", type = "integer", nullable = FALSE),
                     structField(x = "scheme_number", type = "string", nullable = FALSE),
                     structField(x = "product", type = "string", nullable = FALSE),
                     structField(x = "currency", type = "string", nullable = FALSE),
                     structField(x = "sensitivity", type = "double", nullable = FALSE))

# record splitter
splitRecord <- function(record) {
  Sys.setlocale("LC_ALL", "C")
  part <- strsplit(record, ",")[[1]]
  list(column1 = as.integer(part[1]),
       column2 = part[2],
       column3 = part[3],
       column4 = part[4],
       column5 = as.double(part[5]))
}


# load a csv file
orgData <- SparkR:::textFile(sc, inputFilePath)


# split data
splitData <- SparkR:::lapply(orgData, splitRecord)


# create data frame
df <- SparkR:::createDataFrame(sqlContext, splitData, schema)


showDF(df)


4. Conclusion

If you enjoy this quiz, I  will be pleased :) The concert will start soon!


・ Gustav Mahler    Symphony No. 5
・ Richard Wagner  Tristan and Isolde: Prelude and Love Death
・ Konzerthausorchester Berlin, cond. Eliahu Inbal
・ Tokyo



Feb 18, 2017

SparkR with Visual Studio and RStudio





1. Introduction

I usually see many lovers of R in financial investment banks. If they have the capability of processing  'Big Data' by R that they loves, they may feel so happy :) SparkR makes it possible. In this article, I note how to enjoy SparkR.

・SparkR
 https://github.com/apache/spark/tree/master/R

SparkR is included in Apache Spark. I use RStudio and Visual Studio  as  IDE. The versions of Apache Spark, R, Visual Studio and RStudio are as follows.

Apache Spark 1.6.3
R 3.3.2
RStudio 1.0.136
Visual Studio Community 2015 Update 3 (Update 3 or later are necessary)

 

2. SparkR with IDE

I deployed  Spark 1.6.3 as follows and set SPARK_HOME to C:\enjyoyspace\spark-1.6.3. RStudio is often used in Executing SparkR as IDE. We can also use Visual Studio and there are many lovers of Visual Studio, So, I note how to execute SparkR with RStudio and Visual Studio.

C:\enjyoyspace\spark-1.6.3
    ├─bin
    ├─conf
    ├─data
    ├─ec2
    ├─examples
    ├─lib
    ├─licenses
    ├─python
    ├─R


2.1. Use RStudio

SparkR is loaded as follows. SparkR's APIs are so easy and kind for the lovers of R language. So, learning costs may be very low for them :)

library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))


The full source code is as follows :)

# load SparkR
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))

sc <- sparkR.init(appName = "CalcProhitAndLoss")
sqlContext <- sparkRSQL.init(sc)

# create data frame (R)
shockedPV <- data.frame(date = c("20161001", "20161101", "20161224"), PV = c(10000000000, 10000001000, 10000002000))
nonShockedPV <- data.frame(date_nonShocked = c("20161001", "20161101", "20161224"), PV_nonShocked = c(9000000000, 9000001000, 9000002000))

# create data frame (SparkR) from data frame (R)
shockedPVforSparkR <- createDataFrame(sqlContext,shockedPV)
nonShockedPVforSparkR <- createDataFrame(sqlContext,nonShockedPV)

# join of RDDs
masterPV <- join(shockedPVforSparkR, nonShockedPVforSparkR, shockedPVforSparkR$date == nonShockedPVforSparkR$date_nonShocked)

# register table
registerTempTable(masterPV, "masterPV")

# SparkSQL
prohitAndLossForSparkR <- sql(sqlContext, "SELECT date, PV-PV_nonShocked AS prohit_and_loss FROM masterPV")

# collect query results
prohitAndLoss <- collect(prohitAndLossForSparkR)

# display collected results
print(prohitAndLoss)


2.2. Use Visual Studio Community

In the execution of SparkR with Visual Studio, it's necessary to install the following plugin.  The version of Visual Studio must be '2015 Update 3 or later'.

・R Tools for Visual Studio
 https://microsoft.github.io/RTVS-docs/

We are able to use the script described in 2.1. There is nothing to change :)


3. Execution Result

The execution result is as follows. The calculation results are noramlly computed :)

      date prohit_and_loss
1 20161224           1e+09
2 20161001           1e+09
3 20161101           1e+09


4. Conclusion

We can use an attractive tool 'SparkR' with Visual Studio or RStudio. If you use Visual Studio, the version of Visual Studio is '2015 Update 3 or later'.



Jan 29, 2017

Executing R Script Files from C# with R.NET





1. Introduction

The photo is Suimono :) It includes Ebi Shinjyo, which is a steamed shrimp dumpling.

In this article, I note how to execute R script files from C# and the execution results. I integrated C# and R with R.NET. R.NET has so simple APIs and is easy to use.

・R.NET
 https://github.com/jmp75/rdotnet

The versions of .NET Framework, R.NET and R are as follows.

.NET Framework 4.5
R.NET 1.6.5
R 3.3.2

2. Integrating C# and R

I want to use R to execute machine learning models, display useful charts and so on, and C# to process various parameters and pass them to R. It's because R has the good capabilities of graphical processing and machine learning but isn't good at handling objects.
And I also want to incorporate R script files implemented by other persons such as quantitative analysts and data scientists, into C# applications.

2.1. C# Program

First, I note the C# program. It's required to pass arguments from it to a R script file and run the R script file. These are realized by the following methods. It's so simple :)

・ passing arguments from C# programs to R script files
REngine.SetCommandLineArguments (string[] parameters)
・ running R script files
REngine.Evaluate ("source('script file path')")


The full source code is as follows :)

・ Program.cs
using RDotNet;

namespace FinancialEngineering
{
    class Program
    {
        static void Main(string[] args)
        {
            var scriptFilePath = "C:/dev/R/test_script.R";
            var riskCsvPath = "C:/dev/R/test_data.csv";
            var valueAtRisk = "25750000000";

            ExecuteScriptFile(scriptFilePath, riskCsvPath, valueAtRisk);
        }


        public static void ExecuteScriptFile(string scriptFilePath, string paramForScript1,
                                                                                                     string paramForScript2)
        {
            using (var en = REngine.GetInstance())
            {
                var args_r = new string[2] { paramForScript1, paramForScript2 };
                var execution = "source('" + scriptFilePath + "')";

                en.SetCommandLineArguments(args_r);
                en.Evaluate(execution);

            }
        }
    }
}



'C:/dev/R/test_data.csv' in this program is like the following csv file.

・ test_data.csv
date,risk_value
20160128,21261339422
20160129,22759109989
20160130,21405741891
...


2.2. R Script File

Next, I wrote the following R script file. It's important that the arguments passed from the C# program become available  by 'args <- commandArgs()'. These are stored after 'args[2]'.

・ test_script.R
args <- commandArgs()
riskValueFilepath <- args[2]
VaR <- args[3]


data <- read.csv(riskValueFilepath, header = TRUE)
attach(data)
x <- strptime(data$date, '%Y%m%d', tz = '')

par(xaxt = 'n')

plot(x, data$risk_value, type = 'n', xlab = '',ylab ='')
lines(x, data$risk_value, type = 'l',col = 'thistle4',add = T)
points(x, data$risk_value, type = 'p',pch = 20,col = ifelse(data$risk_value > VaR, 'tomato', 'thistle4'), add = T)

par(xaxt = 's')

y <- as.POSIXct(round(range(x), "days"))
axis.POSIXct(1,at = seq(y[1],y[2],by = "1 day"),format = '%Y%m%d')
title('Test of Integrating C# and R\n(Points above Value at Risk)', xlab = 'Date',ylab = 'Risk Value')


3. Execution Result

The execution result is as follows. It noramlly ran :)



4. Conclusion

R.NET enables you to intagrate C# and R. Because passing arguments from C# to R and running R script files are possible, it's easy to incorporate  R script files into C# applications.