博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Spark Standalone Mode
阅读量:5243 次
发布时间:2019-06-14

本文共 5449 字,大约阅读时间需要 18 分钟。

因为Spark与Hadoop是关联的,所以在安装Spark前应该根据已安装的Hadoop版本来选择待安装的Sqark版本,要不然就会报“Server IPC version X cannot communicate with client version Y”的错误。

我安装的Hadoop版本为Hadoop2.4.0(),选择的Spark版本为spark-1.2.0-bin-hadoop2.4.tgz()。要注意的是Spark和Scala存在一定的版本兼容问题,参考我的另一篇中记录的问题。

官方文档:

 

Spark依赖与Scala,所以还需要预装Scala,我的版本为scala-2.11.5.tgz,配置Scala的环境变量:

export SCALA_HOME=/opt/scala/scala-2.11.5export PATH=$PATH:$SCALA_HOME/bin

修改后使环境变量生效,查看Scala版本:

 

然后配置Spark的环境变量:

export SPARK_HOME=/opt/sparkexport PATH=$PATH:$SPARK_HOME/bin

配置后使环境变量修改生效。

在 ${SPARK_HOME}/conf 目录下做如下操作:

cp spark-env.sh.template spark-env.sh

修改 spark-env.sh ,在文件最后添加(视具体配置路径而定):

export SCALA_HOME=/opt/scala/scala-2.11.5export SPARK_MASTER_IP=127.0.0.1export SPARK_WORKER_MEMORY=2Gexport JAVA_HOME=/usr/lib/jvm/jdk1.7.0_75export HADOOP_HOME=/usr/local/hadoopexport HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

 

配置完毕。

 

启动Spark前,先启动Hadoop:

hadoop@tinylcy:/usr/local/hadoop$ sbin/start-all.sh

 

然后启动Spark:

hadoop@tinylcy:/opt/spark$ sbin/start-all.sh

 

切换到Spark的bin目录,进入交互模式:

hadoop@tinylcy:/opt/spark$ bin/spark-shell

 

测试:

scala> val textFile=sc.textFile("hdfs://localhost:9000/user/hadoop/input/words.txt")

 

scala> val count=textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)

 

scala> count.collect()

 

再举一个例子:

scala> val data=Array(1,2,3,4,5)  //产生datadata: Array[Int] = Array(1, 2, 3, 4, 5)scala> val distData=sc.parallelize(data)  //将data处理成RDDdistData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at 
:14scala> distData.reduce(_+_) //在RDD上进行运算,对data里面的元素进行加和15/07/19 14:37:56 INFO spark.SparkContext: Starting job: reduce at
:1715/07/19 14:37:56 INFO scheduler.DAGScheduler: Got job 0 (reduce at
:17) with 4 output partitions (allowLocal=false)15/07/19 14:37:56 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at
:17)15/07/19 14:37:56 INFO scheduler.DAGScheduler: Parents of final stage: List()15/07/19 14:37:56 INFO scheduler.DAGScheduler: Missing parents: List()15/07/19 14:37:56 INFO scheduler.DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize at
:14), which has no missing parents15/07/19 14:37:56 WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes15/07/19 14:37:56 INFO storage.MemoryStore: ensureFreeSpace(1184) called with curMem=0, maxMem=27801944015/07/19 14:37:56 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1184.0 B, free 265.1 MB)15/07/19 14:37:56 INFO storage.MemoryStore: ensureFreeSpace(912) called with curMem=1184, maxMem=27801944015/07/19 14:37:56 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 912.0 B, free 265.1 MB)15/07/19 14:37:56 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:47649 (size: 912.0 B, free: 265.1 MB)15/07/19 14:37:56 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece015/07/19 14:37:56 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:83815/07/19 14:37:56 INFO scheduler.DAGScheduler: Submitting 4 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at
:14)15/07/19 14:37:56 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 4 tasks15/07/19 14:37:56 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1204 bytes)15/07/19 14:37:56 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1204 bytes)15/07/19 14:37:56 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 1204 bytes)15/07/19 14:37:56 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 1208 bytes)15/07/19 14:37:56 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3)15/07/19 14:37:56 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)15/07/19 14:37:56 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)15/07/19 14:37:56 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)15/07/19 14:37:56 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 727 bytes result sent to driver15/07/19 14:37:56 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 727 bytes result sent to driver15/07/19 14:37:56 INFO executor.Executor: Finished task 3.0 in stage 0.0 (TID 3). 727 bytes result sent to driver15/07/19 14:37:56 INFO executor.Executor: Finished task 2.0 in stage 0.0 (TID 2). 727 bytes result sent to driver15/07/19 14:37:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 60 ms on localhost (1/4)15/07/19 14:37:56 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 56 ms on localhost (2/4)15/07/19 14:37:56 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 57 ms on localhost (3/4)15/07/19 14:37:56 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 58 ms on localhost (4/4)15/07/19 14:37:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/07/19 14:37:56 INFO scheduler.DAGScheduler: Stage 0 (reduce at
:17) finished in 0.077 s15/07/19 14:37:56 INFO scheduler.DAGScheduler: Job 0 finished: reduce at
:17, took 0.342516 sres0: Int = 15 //得到运算结果scala>

 

转载于:https://www.cnblogs.com/Murcielago/p/4657501.html

你可能感兴趣的文章
关于将qt作为max插件ui库所遇到的困难
查看>>
如何设置映射网络驱动器的具体步骤和方法
查看>>
ASP.NET WebApi 基于OAuth2.0实现Token签名认证
查看>>
SendMail与Postfix的架构备忘
查看>>
paip.mysql 性能测试 报告 home right
查看>>
Atitit.跨平台预定义函数 魔术方法 魔术函数 钩子函数 api兼容性草案 v2 q216 java c# php js.docx...
查看>>
283. Move Zeroes把零放在最后面
查看>>
我的函数说明风格
查看>>
ssh 简介
查看>>
26.无向网邻接表类
查看>>
Visual Studio Code 打开.py代码报Linter pylint is not installed解决办法
查看>>
洛谷 p1352 没有上司的舞会 题解
查看>>
Python 数据类型
查看>>
Task 与 Activity
查看>>
Google Guava学习笔记——简介
查看>>
历时八年,HTML5 标准终于完工了
查看>>
17.树的子结构
查看>>
D - Mike and strings
查看>>
C++:多维数组的动态分配(new)和释放(delete)
查看>>
c#基础学习(0806)之抽象类实现多态
查看>>