Posts

Showing posts from January, 2015

Using the Netezza Analytics Matrix Engine

Felt like writing something about the topic due to the lack of available examples out there in the internet. -- initialize the engine CALL NZM..INITIALIZE(); --create some random matrix CALL NZM..CREATE_RANDOM_MATRIX('A', 10, 10); CALL NZM..CREATE_RANDOM_MATRIX('B', 10, 10); --let's try adding the matrix. Put the result in matrix C CALL NZM..ADD('A', 'B', 'C'); --now let's see the result -> create a table from the matrix CALL NZM..CREATE_TABLE_FROM_MATRIX('C', 'TABLE_C'); --look at the content. select * from table_c order by row, col; --to check the results, let's create additional tables from matrix A and B and calculate manually. CALL NZM..CREATE_TABLE_FROM_MATRIX('B', 'TABLE_B'); select * from table_b order by row, col; CALL NZM..CREATE_MATRIX_FROM_TABLE('TABLE_A', 'A', 10, 10); select * from table_b order by row, col; --let's try some real-life application. /*  Begin Ego Ne

Apache Spark saveAsTextFile error

Below article was copied from Solai Murugan's blog . All credits goes to him for the fine work. Copied it over so that I'd know where to find it in the future should I ever forget about it. Error :19: error: value saveAsTextFile is not a member of Array[(String, Int)] arr.saveAsTextFile("hdfs://localhost:9000/sparkhadoop/sp1")  Step to reproduce    val file = sc.textFile("hdfs://master:9000/sparkdata/file2.txt") val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) val arr = counts.collect() arr.saveAsTextFile("hdfs://master:9000/sparkhadoop/sp1")  Solution   Error caused on the bolted line above. Its due to storing the array value to the HDFS. In scala for Spark everything should be in RDD (Resilient Distributed datasets). so that scala variable can use Spark realated objects / methos. in this case just convert array into RDD ( replace bolded line by ) sc.makeRDD(arr).sav

Multi Agent Systems

Image
For those of you interested in learning about multi agent systems or game theory, below is a free ebook from Cambridge Press; written by Yoav Shoham of Stanford and Kevin of University of British Columbia. Got the link from an Advanced Game Theory course in Coursera. Happy reading. http://www.masfoundations.org/

Learning GraphX

A repository of some of the places I've been to in learning Apache GraphX. Apache Spark ( http://spark.apache.org/docs/latest/ ) How can you not learn about Spark to learn GraphX? A nice introduction to Spark, and from there you can dive into other components. Download Spark, and just start playing with Scala. First Step to Scala ( http://www.artima.com/scalazine/articles/steps.html ) More tutorials on Scala. A step-by-step approach on some basic Scala skills Functional Programming Principles in Scala ( https://class.coursera.org/progfun-004/lecture ) A course on Scala by none other than Martin Odersky himself. A great way to learn Scala and work in Spark better. AMP Camp by Stanford ( http://ampcamp.berkeley.edu/stanford-workshop/index.html ) Exploring the BDAS stack by Stanford University. If you're in the States - this is probably the best way to learn about Spark, Scala, GraphX, MLib. Why do I put that restriction? It's because to be able to make full use o

SNA in Netezza

Can Netezza do network analysis? That is the current predicament that I was put under for the past couple of weeks. Based on the set of hardware that I've (read: my company) got with me - I'm supposed to be using it to calculate the usual set of social network measurements (i.e degree, betweenness, closeness, eigenvector etc).  2 weeks have passed and while it was relatively easy to calculate the degree centrality - betweenness have proven to be quite a challenge. So far I've been able to translate Djikstra's work in determining shortest path between nodes, and using those to determine the betweenness (refer back to the formula of betweenness centrality if you're lost here). The results have been tested on a small scale network with 10 vertices - and the values match with ones given in Gephi - so initially I was quite confident to be able to simply pump in the actual data from my telecom network. That however didn't go as smoothly. The amount of memo