博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Flume 如何自定义 Mysql Source?
阅读量:4073 次
发布时间:2019-05-25

本文共 14053 字,大约阅读时间需要 46 分钟。

前言

本文隶属于专栏《1000个问题搞定大数据技术体系》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!

本专栏目录结构和参考文献请见

正文

场景描述

官方提供的source类型已经很多,但是有时候并不能满足实际开发当中的需求,此时我们就需要根据实际需求自定义某些source。如:实时监控MySQL,从MySQL中获取数据传输到HDFS或者其他存储框架,所以此时需要我们自己实现MySQLSource。

自定义MysqlSource步骤

  • 1、根据官方说明自定义 mysqlsource 需要继承 AbstractSource 类并实现 Configurable 和 PollableSource 接口。
  • 2、实现对应的方法
    • configure(Context context)
      • 初始化context
    • process()
      • 从mysql表中获取数据,然后把数据封装成event对象写入到channel,该方法被一直调用
    • stop()
      • 关闭相关资源

实践

  1. 创建 mysql 数据库以及 mysql 数据库表
--创建一个数据库CREATE DATABASE IF NOT EXISTS mysqlsource DEFAULT CHARACTER SET utf8 ;--创建一个表,用户保存拉取目标表位置的信息CREATE TABLE mysqlsource.flume_meta (                                        source_tab varchar(255) NOT NULL,                                        currentIndex varchar(255) NOT NULL,                                        PRIMARY KEY (source_tab)) ENGINE=InnoDB DEFAULT CHARSET=utf8;--插入数据insert  into mysqlsource.flume_meta(source_tab,currentIndex) values ('student','4');--创建要拉取数据的表CREATE TABLE mysqlsource.student(                                    id int(11) NOT NULL AUTO_INCREMENT,                                    name varchar(255) NOT NULL,                                    PRIMARY KEY (id)) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;--向student表中添加测试数据insert  into mysqlsource.student(id,name) values (1,'zhangsan'),(2,'lisi'),(3,'wangwu'),(4,'zhaoliu');
  1. 构建maven工程,添加依赖
1.9.0
8.0.24
org.apache.flume
flume-ng-core
${flume.version}
mysql
mysql-connector-java
${mysql.version}
org.apache.commons
commons-lang3
3.12.0
  1. 在 resources 资源文件夹下添加 flume/jdbc.properties
dbDriver=com.mysql.jdbc.DriverdbUrl=jdbc:mysql://node1:3306/mysqlsource?useUnicode=true&characterEncoding=utf-8dbUser=rootdbPassword=123456
  1. 定义 QueryMysql 工具类
package com.shockang.study.bigdata.flume;import org.apache.flume.Context;import org.apache.flume.conf.ConfigurationException;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import java.sql.*;import java.text.ParseException;import java.util.ArrayList;import java.util.List;import java.util.Properties;public class QueryMysql {
private static final Logger LOG = LoggerFactory.getLogger(QueryMysql.class); private int runQueryDelay, //两次查询的时间间隔 startFrom, //开始id currentIndex, //当前id recordSixe = 0, //每次查询返回结果的条数 maxRow; //每次查询的最大条数 private String table, //要操作的表 columnsToSelect, //用户传入的查询的列 customQuery, //用户传入的查询语句 query, //构建的查询语句 defaultCharsetResultSet;//编码集 //上下文,用来获取配置文件 private Context context; //为定义的变量赋值(默认值),可在flume任务的配置文件中修改 private static final int DEFAULT_QUERY_DELAY = 10000; private static final int DEFAULT_START_VALUE = 0; private static final int DEFAULT_MAX_ROWS = 2000; private static final String DEFAULT_COLUMNS_SELECT = "*"; private static final String DEFAULT_CHARSET_RESULTSET = "UTF-8"; private static Connection conn = null; private static PreparedStatement ps = null; private static String connectionURL, connectionUserName, connectionPassword; //加载静态资源 static {
Properties p = new Properties(); try {
p.load(QueryMysql.class.getClassLoader().getResourceAsStream("flume/jdbc.properties")); connectionURL = p.getProperty("dbUrl"); connectionUserName = p.getProperty("dbUser"); connectionPassword = p.getProperty("dbPassword"); Class.forName(p.getProperty("dbDriver")); } catch (Exception e) {
LOG.error(e.toString()); } } //获取JDBC连接 private static Connection InitConnection(String url, String user, String pw) {
try {
Connection conn = DriverManager.getConnection(url, user, pw); if (conn == null) throw new SQLException(); return conn; } catch (SQLException e) {
e.printStackTrace(); } return null; } //构造方法 QueryMysql(Context context) throws ParseException {
//初始化上下文 this.context = context; //有默认值参数:获取flume任务配置文件中的参数,读不到的采用默认值 this.columnsToSelect = context.getString("columns.to.select", DEFAULT_COLUMNS_SELECT); this.runQueryDelay = context.getInteger("run.query.delay", DEFAULT_QUERY_DELAY); this.startFrom = context.getInteger("start.from", DEFAULT_START_VALUE); this.defaultCharsetResultSet = context.getString("default.charset.resultset", DEFAULT_CHARSET_RESULTSET); //无默认值参数:获取flume任务配置文件中的参数 this.table = context.getString("table"); this.customQuery = context.getString("custom.query"); connectionURL = context.getString("connection.url"); connectionUserName = context.getString("connection.user"); connectionPassword = context.getString("connection.password"); conn = InitConnection(connectionURL, connectionUserName, connectionPassword); //校验相应的配置信息,如果没有默认值的参数也没赋值,抛出异常 checkMandatoryProperties(); //获取当前的id currentIndex = getStatusDBIndex(startFrom); //构建查询语句 query = buildQuery(); } //校验相应的配置信息(表,查询语句以及数据库连接的参数) private void checkMandatoryProperties() {
if (table == null) {
throw new ConfigurationException("property table not set"); } if (connectionURL == null) {
throw new ConfigurationException("connection.url property not set"); } if (connectionUserName == null) {
throw new ConfigurationException("connection.user property not set"); } if (connectionPassword == null) {
throw new ConfigurationException("connection.password property not set"); } } //构建sql语句 private String buildQuery() {
String sql = ""; //获取当前id currentIndex = getStatusDBIndex(startFrom); LOG.info(currentIndex + ""); if (customQuery == null) {
sql = "SELECT " + columnsToSelect + " FROM " + table; } else {
sql = customQuery; } StringBuilder execSql = new StringBuilder(sql); //以id作为offset if (!sql.contains("where")) {
execSql.append(" where "); execSql.append("id").append(">").append(currentIndex); return execSql.toString(); } else {
int length = execSql.toString().length(); return execSql.toString().substring(0, length - String.valueOf(currentIndex).length()) + currentIndex; } } //执行查询 List
> executeQuery() {
try {
//每次执行查询时都要重新生成sql,因为id不同 customQuery = buildQuery(); //存放结果的集合 List
> results = new ArrayList<>(); if (ps == null) {
//初始化PrepareStatement对象 ps = conn.prepareStatement(customQuery); } ResultSet result = ps.executeQuery(customQuery); while (result.next()) {
//存放一条数据的集合(多个列) List row = new ArrayList<>(); //将返回结果放入集合 for (int i = 1; i <= result.getMetaData().getColumnCount(); i++) {
row.add(result.getObject(i)); } results.add(row); } LOG.info("execSql:" + customQuery + "\nresultSize:" + results.size()); return results; } catch (SQLException e) {
LOG.error(e.toString()); // 重新连接 conn = InitConnection(connectionURL, connectionUserName, connectionPassword); } return null; } //将结果集转化为字符串,每一条数据是一个list集合,将每一个小的list集合转化为字符串 List
getAllRows(List
> queryResult) { List
allRows = new ArrayList<>(); if (queryResult == null || queryResult.isEmpty()) return allRows; StringBuilder row = new StringBuilder(); for (List
rawRow : queryResult) { Object value = null; for (Object aRawRow : rawRow) { value = aRawRow; if (value == null) { row.append(","); } else { row.append(aRawRow.toString()).append(","); } } allRows.add(row.toString()); row = new StringBuilder(); } return allRows; } //更新offset元数据状态,每次返回结果集后调用。必须记录每次查询的offset值,为程序中断续跑数据时使用,以id为offset void updateOffset2DB(int size) { //以source_tab做为KEY,如果不存在则插入,存在则更新(每个源表对应一条记录) String sql = "insert into flume_meta(source_tab,currentIndex) VALUES('" + this.table + "','" + (recordSixe += size) + "') on DUPLICATE key update source_tab=values(source_tab),currentIndex=values(currentIndex)"; LOG.info("updateStatus Sql:" + sql); execSql(sql); } //执行sql语句 private void execSql(String sql) { try { ps = conn.prepareStatement(sql); LOG.info("exec::" + sql); ps.execute(); } catch (SQLException e) { e.printStackTrace(); } } //获取当前id的offset private Integer getStatusDBIndex(int startFrom) { //从flume_meta表中查询出当前的id是多少 String dbIndex = queryOne("select currentIndex from flume_meta where source_tab='" + table + "'"); if (dbIndex != null) { return Integer.parseInt(dbIndex); } //如果没有数据,则说明是第一次查询或者数据表中还没有存入数据,返回最初传入的值 return startFrom; } //查询一条数据的执行语句(当前id) private String queryOne(String sql) { ResultSet result = null; try { ps = conn.prepareStatement(sql); result = ps.executeQuery(); while (result.next()) { return result.getString(1); } } catch (SQLException e) { e.printStackTrace(); } return null; } //关闭相关资源 void close() { try { ps.close(); conn.close(); } catch (SQLException e) { e.printStackTrace(); } } int getCurrentIndex() { return currentIndex; } void setCurrentIndex(int newValue) { currentIndex = newValue; } int getRunQueryDelay() { return runQueryDelay; } String getQuery() { return query; } String getConnectionURL() { return connectionURL; } private boolean isCustomQuerySet() { return (customQuery != null); } Context getContext() { return context; } public String getConnectionUserName() { return connectionUserName; } public String getConnectionPassword() { return connectionPassword; } String getDefaultCharsetResultSet() { return defaultCharsetResultSet; }}
  1. 定义 MysqlSource
package com.shockang.study.bigdata.flume;import org.apache.flume.Context;import org.apache.flume.Event;import org.apache.flume.EventDeliveryException;import org.apache.flume.PollableSource;import org.apache.flume.conf.Configurable;import org.apache.flume.event.SimpleEvent;import org.apache.flume.source.AbstractSource;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import java.text.ParseException;import java.util.ArrayList;import java.util.HashMap;import java.util.List;public class MysqlSource extends AbstractSource implements Configurable, PollableSource {
//打印日志 private static final Logger LOG = LoggerFactory.getLogger(MysqlSource.class); //定义sqlHelper private QueryMysql sqlSourceHelper; @Override public long getBackOffSleepIncrement() {
return 0; } @Override public long getMaxBackOffSleepInterval() {
return 0; } @Override public void configure(Context context) {
//初始化 try {
sqlSourceHelper = new QueryMysql(context); } catch (ParseException e) {
e.printStackTrace(); } } /** * 接受mysql表中的数据 * * @return * @throws EventDeliveryException */ @Override public PollableSource.Status process() throws EventDeliveryException {
try {
//查询数据表 List
> result = sqlSourceHelper.executeQuery(); //存放event的集合 List
events = new ArrayList<>(); //存放event头集合 HashMap
header = new HashMap<>(); //如果有返回数据,则将数据封装为event if (!result.isEmpty()) {
List
allRows = sqlSourceHelper.getAllRows(result); Event event = null; for (String row : allRows) {
event = new SimpleEvent(); event.setBody(row.getBytes()); event.setHeaders(header); events.add(event); } //将event写入channel this.getChannelProcessor().processEventBatch(events); //更新数据表中的offset信息 sqlSourceHelper.updateOffset2DB(result.size()); } //等待时长 Thread.sleep(sqlSourceHelper.getRunQueryDelay()); return Status.READY; } catch (InterruptedException e) {
LOG.error("Error procesing row", e); return Status.BACKOFF; } } @Override public synchronized void stop() {
LOG.info("Stopping sql source {} ...", getName()); try {
//关闭资源 sqlSourceHelper.close(); } finally {
super.stop(); } }}
  1. 测试

① 程序打成jar包,上传jar包到flume的lib目录下

② 配置文件准备

vim mysqlsource.conf
# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = com.shockang.study.bigdata.flume.MysqlSourcea1.sources.r1.connection.url = jdbc:mysql://node1:3306/mysqlsourcea1.sources.r1.connection.user = roota1.sources.r1.connection.password = 123456a1.sources.r1.table = studenta1.sources.r1.columns.to.select = *a1.sources.r1.start.from=0a1.sources.r1.run.query.delay=3000# Describe the channela1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Describe the sinka1.sinks.k1.type = logger# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1

③ 启动flume配置

flume-ng agent -n a1 -c /opt/bigdata/flume/myconf -f /opt/bigdata/flume/myconf/mysqlsource.conf -Dflume.root.logger=info,console

④ 最后向表添加数据,观察控制台信息

转载地址:http://ikgji.baihongyu.com/

你可能感兴趣的文章
React Native(四):布局(使用Flexbox)
查看>>
React Native(七):Android双击Back键退出应用
查看>>
Android自定义apk名称、版本号自增
查看>>
adb command not found
查看>>
Xcode 启动页面禁用和显示
查看>>
【剑指offer】q50:树中结点的最近祖先
查看>>
二叉树的非递归遍历
查看>>
【leetcode】Reorder List (python)
查看>>
【leetcode】Linked List Cycle (python)
查看>>
【leetcode】Candy(python)
查看>>
【leetcode】Sum Root to leaf Numbers
查看>>
【leetcode】Pascal's Triangle II (python)
查看>>
java自定义容器排序的两种方法
查看>>
如何成为编程高手
查看>>
本科生的编程水平到底有多高
查看>>
备忘:java中的递归
查看>>
Solr及Spring-Data-Solr入门学习
查看>>
python_time模块
查看>>
python_configparser(解析ini)
查看>>
selenium学习资料
查看>>