Apache Spark saveAsTextFile error
Below article was copied from Solai Murugan's blog. All credits goes to him for the fine work.
Copied it over so that I'd know where to find it in the future should I ever forget about it.
Error
:19: error: value saveAsTextFile is not a member of Array[(String, Int)] arr.saveAsTextFile("hdfs://localhost:9000/sparkhadoop/sp1")
Step to reproduce
val file = sc.textFile("hdfs://master:9000/sparkdata/file2.txt")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
val arr = counts.collect()
arr.saveAsTextFile("hdfs://master:9000/sparkhadoop/sp1")
Solution
Error caused on the bolted line above. Its due to storing the array value to the HDFS. In scala for Spark everything should be in RDD (Resilient Distributed datasets). so that scala variable can use Spark realated objects / methos. in this case just convert array into RDD ( replace bolded line by )
sc.makeRDD(arr).saveAsTextFile("hdfs://master:9000/sparkhadoop/sp1")
Comments