快速入门
上传作业jar包及python等文件
分析集群选择使用HttpFS服务来供用户上传管理作业的jar包、python文件等到服务端
1、从分析集群控制台获取HttpFS服务地址,比如:
HttpFS: http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000
2、使用建议:
- 目前可以使用Restful API或者命令行来管理这些资源。具体使用参考文档,下面以Restful API来举例:
- 为了使用合理,建议HttpFS的用户名为:resource;resource上传的根目录为/resourcesdir/,该目录后面可以创建子目录
3、上传本地jar包或者python文件到Spark服务端
创建目录/resourcesdir:
curl -i -X PUT "http://ap-xxx.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir?op=MKDIRS&user.name=resource"
上传jar:
上传本地./examples/jars/examples_2.11-2.3.2.jar到HttpFs的/resourcesdir/examples_2.11-2.3.2.jar
curl -i -X PUT -T ./examples/jars/examples_2.11-2.3.2.jar "http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/examples_2.11-2.3.2.jar?op=CREATE&data=true&user.name=resource" -H "Content-Type:application/octet-stream"
- 上传python:
上传本地./examples/src/main/python/pi.py到HttpFs的/resourcesdir/pi.py
curl -i -X PUT -T ./examples/src/main/python/pi.py "http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/pi.py?op=CREATE&data=true&user.name=resource" -H "Content-Type:application/octet-stream"
- 查看文件:
查看HttpFs的/resourcesdir/目录文件
curl -i "http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/?op=LISTSTATUS&user.name=resource"
通过作业管理服务(LivyServer)提交作业
Spark服务选择Apache LivyServer来构建作业管理服务,支持提交jar(包括streaming)、python等
1、分析集群控制台获取LivyServer服务地址,比如:
LivyServer:http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998
2、提交作业
- 编写LivyServer上传作业的json文件livy_pi.json
{
"file": "/resourcesdir/spark-examples_2.11-2.3.2.jar",
"className": "org.apache.spark.examples.SparkPi",
"driverMemory": "1g",
"executorMemory": "1g",
"conf": {
"spark.executor.instances": "1",
"spark.executor.cores": "1"
}
}
- 提交jar作业
命令:
curl -H "Content-Type: application/json" -X POST -d @livy_pi.json http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches |python -m json.tool
样例:
[root@master]# curl -H "Content-Type: application/json" -X POST -d @livy_pi.json http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches |python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 368 100 145 100 223 4815 7405 --:--:-- --:--:-- --:--:-- 7689
{
"appId": null,
"appInfo": {
"driverLogUrl": null,
"sparkUiUrl": null
},
"id": 1,
"log": [
"stdout: ",
"\nstderr: ",
"\nYARN Diagnostics: "
],
"state": "starting"
}
- 提交python作业
命令:
curl -X POST --data '{"file": "/resourcesdir/pi.py"}' -H "Content-Type: application/json" http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches
3、查询作业状态
通过LivyServer的API以及Spark UI查看
命令:
curl http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches/1/state | python -m json.tool
样例:
[root@master t-apsara-spark-2.2.2]# curl http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches/1/state | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 26 100 26 0 0 1904 0 --:--:-- --:--:-- --:--:-- 2000
{
"id": 1,
"state": "success"
}
4、 参考资料
Livy社区文档:https://livy.incubator.apache.org/
Spark社区文档:http://spark.apache.org/docs/2.3.2/
Aliyun官方Demo:https://github.com/aliyun/aliyun-apsaradb-hbase-demo/tree/master/spark
版权声明
本文仅代表作者观点,不代表本站立场。
本文系作者授权发表,未经许可,不得转载。
评论