
大数据之Flume:Flume企业开发案例之复制和多路复用
3)实现步骤: 0.准备工作 在/opt/module/flume/job目录下创建group1文件夹
发布日期:2021-05-07 07:32:07
浏览次数:23
分类:精选文章
本文共 3816 字,大约阅读时间需要 12 分钟。
复制和多路复用
1)案例需求 使用Flume-1监控文件变动,Flume-1将变动内容传递给Flume-2,Flume-2负责存储到HDFS。同时Flume-1将变动内容传递给Flume-3,Flume-3负责输出到Local FileSystem。 2)需求分析:
[hadoop@hadoop101 job]$ cd group1/
在/opt/module/data/目录下创建flume3文件夹
[hadoop@hadoop101 datas]$ mkdir flume3
1.创建flume-file-flume.conf
配置1个接收日志文件的source和两个channel、两个sink,分别输送给flume-flume-hdfs和flume-flume-dir。 编辑配置文件[hadoop@hadoop101 group1]$ vim flume-file-flume.conf
添加如下内容
# Name the components on this agenta1.sources = r1a1.sinks = k1 k2a1.channels = c1 c2# 将数据流复制给所有channela1.sources.r1.selector.type = replicating# Describe/configure the sourcea1.sources.r1.type = execa1.sources.r1.command = tail -F /opt/module/hive/logs/hive.loga1.sources.r1.shell = /bin/bash -c# Describe the sink# sink端的avro是一个数据发送者a1.sinks.k1.type = avroa1.sinks.k1.hostname = hadoop101 a1.sinks.k1.port = 4141a1.sinks.k2.type = avroa1.sinks.k2.hostname = hadoop101a1.sinks.k2.port = 4142# Describe the channela1.channels.c1.type = memorya1.channels.c1.capacity = 10000a1.channels.c1.transactionCapacity = 1000a1.channels.c2.type = memorya1.channels.c2.capacity = 10000a1.channels.c2.transactionCapacity = 1000# Bind the source and sink to the channela1.sources.r1.channels = c1 c2a1.sinks.k1.channel = c1a1.sinks.k2.channel = c2
3.创建flume-flume-dir.conf
配置上级Flume输出的Source,输出是到本地目录的Sink。 编辑配置文件[hadoop@hadoop101 group1]$ vim flume-flume-dir.conf
添加如下内容
# Name the components on this agenta2.sources = r1a2.sinks = k1a2.channels = c1# Describe/configure the source# source端的avro是一个数据接收服务a2.sources.r1.type = avroa2.sources.r1.bind = hadoop101a2.sources.r1.port = 4141# Describe the sinka2.sinks.k1.type = hdfsa2.sinks.k1.hdfs.path = hdfs://hadoop101:9000/flume2/%Y%m%d/%H#上传文件的前缀a2.sinks.k1.hdfs.filePrefix = flume2-#是否按照时间滚动文件夹a2.sinks.k1.hdfs.round = true#多少时间单位创建一个新的文件夹a2.sinks.k1.hdfs.roundValue = 1#重新定义时间单位a2.sinks.k1.hdfs.roundUnit = hour#是否使用本地时间戳a2.sinks.k1.hdfs.useLocalTimeStamp = true#积攒多少个Event才flush到HDFS一次a2.sinks.k1.hdfs.batchSize = 100#设置文件类型,可支持压缩a2.sinks.k1.hdfs.fileType = DataStream#多久生成一个新的文件a2.sinks.k1.hdfs.rollInterval = 600#设置每个文件的滚动大小大概是128Ma2.sinks.k1.hdfs.rollSize = 134217700#文件的滚动与Event数量无关a2.sinks.k1.hdfs.rollCount = 0# Describe the channela2.channels.c1.type = memorya2.channels.c1.capacity = 10000a2.channels.c1.transactionCapacity = 1000# Bind the source and sink to the channela2.sources.r1.channels = c1a2.sinks.k1.channel = c13.创建flume-flume-dir.conf配置上级Flume输出的Source,输出是到本地目录的Sink。编辑配置文件[hadoop@hadoop101 group1]$ vim flume-flume-dir.conf添加如下内容# Name the components on this agenta3.sources = r1a3.sinks = k1a3.channels = c2# Describe/configure the sourcea3.sources.r1.type = avroa3.sources.r1.bind = hadoop101a3.sources.r1.port = 4142# Describe the sinka3.sinks.k1.type = file_rolla3.sinks.k1.sink.directory = /opt/module/data/flume3# Describe the channela3.channels.c2.type = memorya3.channels.c2.capacity = 10000a3.channels.c2.transactionCapacity = 1000# Bind the source and sink to the channela3.sources.r1.channels = c2a3.sinks.k1.channel = c2
提示:输出的本地目录必须是已经存在的目录,如果该目录不存在,并不会创建新的目录。
4.执行配置文件 分别启动对应的flume进程:flume-flume-dir,flume-flume-hdfs,flume-file-flume。[hadoop@hadoop101 flume]$ bin/flume-ng agent -c conf/ -n a3 -f job/group1/flume-flume-dir.conf[hadoop@hadoop101 flume]$ bin/flume-ng agent -c conf/ -n a2 -f job/group1/flume-flume-hdfs.conf[hadoop@hadoop101 flume]$ bin/flume-ng agent -c conf/ -n a1 -f job/group1/flume-file-flume.conf
5.启动Hadoop和Hive
[hadoop@hadoop101 hadoop-2.7.2]$ sbin/start-dfs.sh[hadoop@hadoop102 hadoop-2.7.2]$ sbin/start-yarn.sh[hadoop@hadoop101 hive]$ bin/hivehive (default)>
6.检查HDFS上数据

7.检查/opt/module/data/flume3目录中数据
[[hadoop@hadoop101 flume3]$ ll
总用量 64
-rw-rw-r–. 1 hadoop hadoop 0 6月 22 18:53 1592823192120-1 -rw-rw-r–. 1 hadoop hadoop 0 6月 22 18:57 1592823192120发表评论
最新留言
第一次来,支持一个
[***.219.124.196]2025年03月25日 10时31分54秒
关于作者

喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
-- 愿君每日到此一游!
推荐文章
《我是猫》总结
2021-05-09
《抗糖化书》总结
2021-05-09
apache虚拟主机配置
2021-05-09
光盘作为yum源
2021-05-09
PHP 正则表达式资料
2021-05-09
PHP官方网站及PHP手册
2021-05-09
mcrypt加密以及解密过程
2021-05-09
mysql连续聚合
2021-05-09
go等待N个线程完成操作总结
2021-05-09
消息队列 RocketMQ 并发量十万级
2021-05-09
ReactJs入门教程-精华版
2021-05-09
乐观锁悲观锁应用
2021-05-09
.net Core 使用IHttpClientFactory请求
2021-05-09
多线程之旅(准备阶段)
2021-05-09
Python 之网络式编程
2021-05-09
MySql5.5安装步骤及MySql_Front视图配置
2021-05-09
mybatis #{}和${}区别
2021-05-09
Java Objects工具类重点方法使用
2021-05-09