diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..982eced --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +.idea/* +fire-parent.iml +*.iml +target/ +*.log \ No newline at end of file diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..261eeb9 --- /dev/null +++ b/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/README.md b/README.md new file mode 100644 index 0000000..4dba045 --- /dev/null +++ b/README.md @@ -0,0 +1,209 @@ +# Fire框架 +Fire框架是由**中通**开源的,专门用于大数据**实时计算**的开发框架。Fire框架具有易学易用,稳定可靠等诸多优点,基于Fire框架可以很简单的进行**Spark&Flink**需求开发。Fire框架在赋能开发者的同时,也对实时平台进行了赋能,正因为有了Fire,才真正的连接了**平台**与**任务**,消除了任务孤岛。 + +![](docs/img/Fire.png) + +## 一、现状 +基于Fire框架的任务在中通每天处理的数据量高达**几千亿以上**,覆盖了**Spark计算**(离线&实时)、**Flink计算**等众多计算场景。 +## 二、赋能开发者 +Fire框架自研发之日起就以简单高效、稳定可靠为目标。通过屏蔽技术细节、提供简洁通用的API的方式,将开发者从技术的大海中拯救出来,让开发者更专注于业务代码开发。Fire框架支持Spark与Flink两大引擎,并且覆盖离线计算与实时计算两大场景,内部提供了丰富的API,许多复杂操作仅需一行代码,大大提升了生产力。 + +接下来以HBase、JDBC、Kafka为例进行简单介绍(connector连接信息均在任务同名的配置文件中): + +### 2.1 HBase 操作 + +```scala +val hTableName = "t_student" +/** HBase get API **/ +// 构建rowKeyRDD +val rowKeyRDD: RDD[String] = this.fire.createRDD(getList, 3) +// 方式一:通过fire变量,批量get到的结果以dataframe形式返回,也支持返回RDD[JavaBean]或Dataset类型 +val studentDF: DataFrame = this.fire.hbaseGetDF(hTableName, classOf[Student], getRDD) +// 方式二:通过RDD[String]对象直接get +val studentDS: Dataset[Student] = rowKeyRDD.hbaseGetDS(hTableName, classOf[Student]) +// 方式三:通过bulk api进行HBase读操作 +val studentDS: Dataset[Student] = rowKeyRdd.hbaseBulkGetDS(hTableName, classOf[Student]) + +/** HBase insert API **/ +val studentDF: DataFrame = this.fire.createDataFrame(studentList, classOf[Student]) +// 方式一:通过fire变量,将指定DataFrame数据插入到HBase中 +this.fire.hbasePutDF(hTableName, studentDF, classOf[Student]) +// 方式二:直接在DataFrame上调用hbasePutDF方法进行HBase写操作 +studentDF.hbasePutDF(hTableName, classOf[Student]) +// 方式三:通过bulk api进行HBase写操作 +studentDF.hbaseBulkPutDF(hTableName, classOf[Student]) + +/** 多HBase集群操作 **/ +studentDF.hbaseBulkPutDF(hTableName, classOf[Student], keyNum = 2) +``` + +### 2.2 JDBC 操作 + +```scala +/** 关系型数据库更新API **/ +// 关系型数据库SQL语句 +val insertSql = s"INSERT INTO table_name(name, age, createTime, length, sex) VALUES (?, ?, ?, ?, ?)" +// 将DataFrame中指定几列插入到关系型数据库中,每100条一插入 +df.jdbcBatchUpdate(insertSql, Seq("name", "age", "createTime", "length", "sex"), batch = 100) + +/** 关系型数据库查询API **/ +val querySql = s"select * from $tableName where id in (?, ?, ?)" +// 将查询结果通过反射映射到DataFrame中 +val df: DataFrame = this.fire.jdbcQueryDF(querySql, Seq(1, 2, 3), classOf[Student]) + +/** 多关系型数据库操作 **/ +val df = this.fire.jdbcQueryDF(querySql, Seq(1, 2, 3), classOf[Student], keyNum=2) +``` +###2.3 Kafka 操作 + +````scala +// 从指定kafka集群消费,该写法支持spark与flink,kafka相关信息在类同名的配置文件中 +val dstream = this.fire.createKafkaDirectStream() +// 通过keyNum指定多kafka集群消费 +val dstream = this.fire.createKafkaDirectStream(keyNum = 2) +```` + +可以看到,Fire框架中的API是以DataFrame、RDD为基础进行了高度抽象,通过引入fire隐式转换,让RDD、DataFrame等对象直接具有了某些能力,进而实现直接调用。目前Fire框架已经覆盖了主流大数据组件的API,基本上都是一行代码搞定。同时,Fire框架让任务具有**多集群的操作能力**,仅需在各API中指定参数**keyNum**,即可同时访问不同集群的不同表。 + +## 三、赋能平台 +Fire框架可以将**实时任务**与**实时管理平台**进行绑定,实现很多酷炫又实用的功能。比如配置管理、SQL在线调试、任务热重启、配置热更新等,甚至可以直接获取到任务的运行时数据,实现更细粒度的监控管理。 + +### 3.1 配置管理 + +类似于携程开源的apollo,实时任务管理平台可提供任务配置的管理功能,基于Fire的实时任务在启动时会主动拉取配置信息,并覆盖任务jar包中的配置文件,避免重复打包发布,节约时间。 + +### 3.2 SQL在线调试 + +基于该技术,可以在实时任务管理平台中提交SQL语句,交由指定的Spark Streaming任务执行,并将结果返回,该功能的好处是支持Spark内存临时表,便于在web端进行Spark SQL的调试,大幅节省SQL开发时间。 + +### 3.3 定时任务 + +有些实时任务会有定时刷新维表的需求,Fire框架支持这样的功能,类似于Spring的@Scheduled,但Fire框架的定时任务功能更强大,甚至支持指定在driver端运行还是在executor端运行。 + +```scala +/** + * 声明了@Scheduled注解的方法将作为定时任务方法,会被Fire框架周期性调用 + * + * @param cron cron表达式 + * @param scope 默认同时在driver端和executor端执行,如果指定了driver,则只在driver端定时执行 + * @param concurrent 上一个周期定时任务未执行完成时是否允许下一个周期任务开始执行 + * @param startAt 用于指定第一次开始执行的时间 + * @param initialDelay 延迟多长时间开始执行第一次定时任务 + */ +@Scheduled(cron = "0/5 * * * * ?", scope = "driver", concurrent = false, startAt = "2021-01-21 11:30:00", initialDelay = 60000) +def loadTable: Unit = { + this.logger.info("周期性执行") +} +``` + +### 3.4 任务热重启 + +该功能是主要用于Spark Streaming任务,通过热重启技术,可以在不重启Spark Streaming的前提下,实现批次时间的热修改。比如在web端将某个任务的批次时间调整为10s,会立即生效。 + +### 3.5 配置热更新 + +用户仅需在web页面中更新指定的配置信息,就可以让实时任务接收到最新的配置并且立即生效。最典型的应用场景是进行Spark任务的某个算子partition数调整,比如当任务处理的数据量较大时,可以通过该功能将repartition的具体分区数调大,会立即生效。 + +### 3.6 实时血缘 + +基于Fire框架,可获取到任务所使用到的组件信息,包括任务使用到的hive表信息、hbase集群信息、jdbc信息等,可用于实时平台进行实时血缘的构建。 + +### 3.7 RocketMQ支持 + +Fire框架内部集成了rocketmq,甚至率先支持了flink sql任务的sql connector。 + +## 四、程序结构 +​ Fire支持Spark与Flink两大热门计算引擎,对常用的初始化操作进行了大幅度的简化,让业务代码更紧凑更突出更具维护性。 + +###4.1 Spark开发 + +```scala +import com.zto.fire._ +import com.zto.fire.spark.BaseSparkStreaming + +/** + * 基于Fire进行Spark Streaming开发 + */ +object Test extends BaseSparkStreaming { + + /** + * process会被fire框架主动调用 + * 在该方法中编写主要的业务代码,避免main方法过于臃肿 + */ + override def process: Unit = { + // 从配置文件中获取kafka集群信息,并创建KafkaDataStram + val dstream = this.fire.createKafkaDirectStream() + dstream.print + // 提交streaming任务执行 + this.fire.start + } + + def main(args: Array[String]): Unit = { + // 从配置文件中获取必要的配置信息,并初始化SparkSession、StreamingContext等对象 + this.init(10, false) + } +} +``` + +###4.2 Flink开发 +```scala +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkStreaming + +/** + * Flink流式计算任务模板 + */ +object Test extends BaseFlinkStreaming { + + override def process: Unit = { + val dstream = this.fire.createKafkaDirectStream() + dstream.print + // 提交flink streaming任务,job名称不指定默认当前类名 + this.fire.start + } + + def main(args: Array[String]): Unit = { + // 根据配置信息自动创建fire变量、StreamExecutionEnvironment、StreamTableEnvironment等 + this.init() + } +} +``` + +## 五、操作手册 + +### [5.1 依赖管理](docs/dependency.md) + +### [5.2 第三方包install](docs/dependency-install.md) + +### [5.3 Fire集成](docs/outline.md) + +### [5.4 配置文件](docs/config.md) + +### [5.5 消费Kafka](/docs/kafka.md) + +### [5.6 消费RocketMQ](docs/rocketmq.md) + +### [5.7 集成Hive](docs/hive.md) + +### [5.8 HBase API手册](docs/hbase.md) + +### [5.9 JDBC API手册](docs/jdbc.md) + +### [5.10 累加器](docs/accumulator.md) + +### [5.11 定时任务](docs/schedule.md) + +### [5.12 线程池与并发计算](docs/threadpool.md) + +### [5.13 Spark DataSource增强](docs/datasource.md) + +## 六、平台建设 + +### [6.1 实时平台集成方案](docs/platform.md) + +### [6.2 内置接口](docs/restful.md) + +## 七、配置与调优 + +### [7.1 Fire配置手册](docs/properties.md) + diff --git a/docs/accumulator.md b/docs/accumulator.md new file mode 100644 index 0000000..c7dfbd3 --- /dev/null +++ b/docs/accumulator.md @@ -0,0 +1,98 @@ + + +# 累加器 + +Fire框架针对spark和flink的累加器进行了深度的定制,该api具有不需要事先声明累加器变量,可到处使用等优点。[示例代码](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/acc/FireAccTest.scala) + +### 一、累加器的基本使用 + +```scala +// 消息接入 +val dstream = this.fire.createKafkaDirectStream() +dstream.foreachRDD(rdd => { + rdd.coalesce(this.conf.getInt(key, 10)).foreachPartition(t => { + // 单值累加器 + this.acc.addCounter(1) + // 多值累加器,根据key的不同分别进行数据的累加,以下两行代码表示分别对multiCounter + // 和partitions这两个累加器进行累加 + this.acc.addMultiCounter("multiCounter", 1) + this.acc.addMultiCounter("partitions", 1) + // 多时间维度累加器,比多值累加器多了一个时间维度, + // 如:hbaseWriter 2019-09-10 11:00:00 10 + // 如:hbaseWriter 2019-09-10 11:01:00 21 + this.acc.addMultiTimer("multiTimer", 1) + }) +}) +``` + +### 二、累加器类型 + +1. 单值累加器 + + 单值累加器的特点是:只会将数据累加到同一个累加器中,全局唯一。 + +2. 多值累加器 + + 多值累加器的特点是:不同累加器实例使用不同的字符串key作为区分,相同的key的进行统一的累加,比单值累加器更强大。 + +3. 时间维度累加器 + + 时间维度累加器是在多值累加器的基础上进行了进一步的增强,引入了时间维度的概念。它以时间和累加器标识作为联合的累加器key。比如key为hbase_sink,那么统计的数据默认是按分钟进行,下一分钟是一个全新的累加窗口。时间维度累加器可以通过参数修改时间戳的格式,比如按分钟、小时、天、月、年等。 + + ```scala + // 多时间维度累加器,比多值累加器多了一个时间维度, + // 如:hbaseWriter 2019-09-10 11:00:00 10 + // 如:hbaseWriter 2019-09-10 11:01:00 21 + this.acc.addMultiTimer("multiTimer", 1) + // 指定时间戳,以小时作为统计窗口进行累加 + this.acc.addMultiTimer("multiTimer", 1, schema = "YYYY-MM-dd HH") + ``` + +### 三、累加器值的获取 + + 1. 程序中获取 + + ```scala + /** + * 获取累加器中的值 + */ + @Scheduled(fixedInterval = 60 * 1000) + def printAcc: Unit = { + this.acc.getMultiTimer.cellSet().foreach(t => println(s"key:" + t.getRowKey + " 时间:" + t.getColumnKey + " " + t.getValue + "条")) + println("单值:" + this.acc.getCounter) + this.acc.getMultiCounter.foreach(t => { + println("多值:key=" + t._1 + " value=" + t._2) + }) + val size = this.acc.getMultiTimer.cellSet().size() + println(s"===multiTimer.size=${size}==log.size=${this.acc.getLog.size()}===") + } + ``` + + + + 2. 平台接口获取 + + fire框架针对累加器的获取提供了单独的接口,平台可以通过接口调用方式实时获取累加器的最新统计结果。 + + | 接口地址 | 接口用途 | + | -------------------- | -------------------------------- | + | /system/counter | 用于获取累加器的值。 | + | /system/multiCounter | 用于获取多值累加器的值。 | + | /system/multiTimer | 用于获取时间维度多值累加器的值。 | \ No newline at end of file diff --git a/docs/config.md b/docs/config.md new file mode 100644 index 0000000..022ec93 --- /dev/null +++ b/docs/config.md @@ -0,0 +1,97 @@ + + +# 配置管理 + +支持灵活的配置是fire框架一大亮点,fire框架针对不同的引擎,提供了不同的配置文件,并且支持通过调用外部接口实现配置的动态覆盖。 + +### 1. 系统配置 + +fire框架内置了多个配置文件,用于应对多种引擎场景,分别是: + +1)**fire.properties**:该配置文件中fire框架的总配置文件,位于fire-core包中,其中的配置主要是针对fire框架的,不含有spark或flink引擎的配置。 + +2)**cluster.properties:**该配置文件用于存放各公司集群地址相关的映射信息,由于集群地址信息比较敏感,因此单独拿出来作为一个配置文件。 + +3)**spark.properties**:该配置文件是spark引擎的总配置文件,位于fire-spark包中,作为spark引擎任务的总配置文件。 + +4)**spark-core.properties**:该配置文件位于fire-spark包中,该配置文件用于配置spark core任务。 + +5)**spark-streaming.properties**:该配置文件位于fire-spark包中,主要用于spark streaming任务。 + +6)**structured-streaming.properties**:该配置文件位于fire-spark包中,用于进行structured streaming任务的配置。 + +7)**flink.properties**:该配置文件位于fire-flink包中,作为flink引擎的总配置文件。 + +8)**flink-streaming.properties**:该配置文件位于fire-flink包中,用于配置flink streaming任务。 + +9)**flink-batch.properties**:该配置文件位于fire-flink包中,用于配置flink批处理任务。 + +以上配置文件是fire框架内置的,用户无需关心。 + +### 2. 用户配置 + +#### 1)公共配置: + +fire框架支持用户的公共配置,默认名称是**common.properties**。在公共配置文件中配置的内容将对所有任务生效。因此,可以考虑将一些公共配置项放到该文件中。 + +#### 2)任务配置: + +用户配置需存放到代码的**src/main/resources**目录下,支持子目录存放。配置文件与任务通过名称进行自动绑定,一个任务一个配置文件。如果任务类名为:**Test**.scala,则对应的配置文件名称是:**Test**.properties。用户配置暂不支持一个任务多个配置文件。 + +### 3. 配置中心 + +为了提供灵活性,避免因配置修改而重新打包,fire框架提供了从接口获取配置信息,并覆盖用户同名配置的功能。该功能通常需要实时平台提供配置接口,用于重启覆盖参数。 + +### 4. 动态配置 + +动态配置是指可以在运行时动态获取及生效的配置,fire框架提供了相应的接口,平台通过调用该接口即可实现配置的热替换已经分布式分发。该功能目前仅支持spark引擎。 + +### 5. 优先级 + +fire.properties **<** cluster.properties **<** spark.properties|flink.properties **<** spark-core.properties|spark-streaming.properties|structured-streaming.properties|flink-streaming.properties|flink-batch.properties **<** common.properties **<** 用户配置文件 **<** 配置中心 + +### 6. 任务调优 + +fire支持所有的spark与flink的参数,只需将相应的spark或flink引擎的参数配置到任务同名的配置文件或配置中心中即可通过重启生效。甚至可以**实现任务级别的覆盖flink-conf.yaml**中的配置。 + +### 7. 分布式传递与获取 + +配置文件中的所有配置,会被fire框架分布式的分发给所有的spark executor和flink的task manager。用户无论是进行spark还是flink任务开发,都可以通过以下代码获取到配置信息: + +```scala +// 获取布尔类型配置 +this.conf.getBoolean("xxx", default = false) +// 获取Int类型配置 +this.conf.getInt("xxx", default = -1) +``` + +通过this.conf.getXxx方式,可避免在flink的map等算子中通过open方法获取,同时在JobManager端、TaskManager端都能获取到。fire框架保证了配置的分布式传递与配置的一致性。 + +如果在其他需要非fire子类中获取配置,可通过:PropUtils.getXxx方式获取配置信息。 + +### 8. 配置热更新 + +什么是配置热更新?配置热更新是指集成了fire框架的任务允许在运行时动态修改某些配置信息。比如说spark streaming任务运行中可能需要根据数据量的大小去人为的调优rdd的分区数,那么这种场景下就可以通过热更新做到:**rdd.repartition(this.conf.getInt("user.conf.key", 100))**。当平台通过调用fire内置接口**/system/setConf**传入最新的user.conf.key值时,即可完成动态的配置更新。其中user.conf.key是由用户任意定义的合法字符串,用户自己定义就可以。 + +```scala +// 平台调用fire的/system/setConf接口传入user.conf.key对应的新则,则可达到动态的调整分区数的目的 +rdd.coalesce(this.conf.getInt("user.conf.key", 10)).foreachPartition(t => {/* do something */} +``` + diff --git a/docs/datasource.md b/docs/datasource.md new file mode 100644 index 0000000..12ce781 --- /dev/null +++ b/docs/datasource.md @@ -0,0 +1,91 @@ + + +# Spark DataSource增强 + +Spark DataSource API很强大,为了进一步增强灵活性,Fire框架针对DataSource API做了进一步封装,允许将options等信息放到配置文件中,提高灵活性,如果与实时平台的配置中心集成,可做到重启即完成调优。 + +[示例程序:](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/datasource/DataSourceTest.scala) + +```scala +val ds = this.fire.createDataFrame(Student.newStudentList(), classOf[Student]) +ds.createOrReplaceTempView("test") + +val dataFrame = this.fire.sql("select * from test") + +// 一、 dataFrame.write.format.mode.save中的所有参数均可通过配置文件指定 +// dataFrame.writeEnhance() + +// 二、 dataFrame.write.mode.save中部分参数通过配置文件指定,或全部通过方法硬编码指定 +val savePath = "/user/hive/warehouse/hudi.db/hudi_bill_event_test" + +// 如果代码中与配置文件中均指定了option,则相同的options配置文件优先级更高,不同的option均生效 +val options = Map( + "hoodie.datasource.write.recordkey.field" -> "id", + "hoodie.datasource.write.precombine.field" -> "id" +) + +// 使用keyNum标识读取配置文件中不同配置后缀的options信息 +// dataFrame.writeEnhance("org.apache.hudi", SaveMode.Append, savePath, options = options, keyNum = 2) + +// read.format.mode.load(path) +this.fire.readEnhance(keyNum = 3) +``` + +配置文件: + +```properties +# 一、hudi datasource,全部基于配置文件进行配置 +spark.datasource.format=org.apache.hudi +spark.datasource.saveMode=Append +# 用于区分调用save(path)还是saveAsTable +spark.datasource.isSaveTable=false +# 传入到底层save或saveAsTable方法中 +spark.datasource.saveParam=/user/hive/warehouse/hudi.db/hudi_bill_event_test + +# 以spark.datasource.options.为前缀的配置用于配置hudi相关的参数,可覆盖代码中同名的配置 +spark.datasource.options.hoodie.datasource.write.recordkey.field=id +spark.datasource.options.hoodie.datasource.write.precombine.field=id +spark.datasource.options.hoodie.datasource.write.partitionpath.field=ds +spark.datasource.options.hoodie.table.name=hudi.hudi_bill_event_test +spark.datasource.options.hoodie.datasource.write.hive_style_partitioning=true +spark.datasource.options.hoodie.datasource.write.table.type=MERGE_ON_READ +spark.datasource.options.hoodie.insert.shuffle.parallelism=128 +spark.datasource.options.hoodie.upsert.shuffle.parallelism=128 +spark.datasource.options.hoodie.fail.on.timeline.archiving=false +spark.datasource.options.hoodie.clustering.inline=true +spark.datasource.options.hoodie.clustering.inline.max.commits=8 +spark.datasource.options.hoodie.clustering.plan.strategy.target.file.max.bytes=1073741824 +spark.datasource.options.hoodie.clustering.plan.strategy.small.file.limit=629145600 +spark.datasource.options.hoodie.clustering.plan.strategy.daybased.lookback.partitions=2 + +# 二、配置第二个数据源,以数字后缀作为区分,部分使用配置文件进行配置 +spark.datasource.format2=org.apache.hudi2 +spark.datasource.saveMode2=Overwrite +# 用于区分调用save(path)还是saveAsTable +spark.datasource.isSaveTable2=false +# 传入到底层save或saveAsTable方法中 +spark.datasource.saveParam2=/user/hive/warehouse/hudi.db/hudi_bill_event_test2 + +# 三、配置第三个数据源,用于代码中进行read操作 +spark.datasource.format3=org.apache.hudi3 +spark.datasource.loadParam3=/user/hive/warehouse/hudi.db/hudi_bill_event_test3 +spark.datasource.options.hoodie.datasource.write.recordkey.field3=id3 +``` + diff --git a/docs/dependency-install.md b/docs/dependency-install.md new file mode 100644 index 0000000..3b32703 --- /dev/null +++ b/docs/dependency-install.md @@ -0,0 +1,75 @@ + + +# 第三方库jar包install + +fire框架引用了众多第三方包,由于很多库已经很久没有更新了,甚至在maven的全球中央仓库中也没有,因此,为了使用方便,特地将基于scala2.12+spark3.0.2编译的jar包提供出来,放在fire框架的/docs/lib目录下。用户可以通过maven的命令将这些库install到本地或deploy到公司的私服。 + +## 一、本地maven install + +### 1.1 hbase-spark包 + +```shell +mvn install:install-file -Dfile=/path/to/hbase-spark3_2.12-1.2.0-cdh5.12.1.jar -DgroupId=org.apache.hbase -DartifactId=hbase-spark3_2.12 -Dversion=1.2.0-cdh5.12.1 -Dpackaging=jar +``` + +### 1.2 hbase-client包 + +```shell +mvn install:install-file -Dfile=/path/to/hbase-client_2.12-1.2.0-cdh5.12.1.jar -DgroupId=org.apache.hbase -DartifactId=hbase-client_2.12 -Dversion=1.2.0-cdh5.12.1 -Dpackaging=jar +``` + +### 1.3 rocketmq-spark包 + +```shell +mvn install:install-file -Dfile=/path/to/rocketmq-spark3_2.12.jar -DgroupId=org.apache.rocketmq -DartifactId=rocketmq-spark3_2.12 -Dversion=0.0.2 -Dpackaging=jar +``` + +### 1.4 rocketmq-flink包 + +```shell +mvn install:install-file -Dfile=/path/to/rocketmq-flink_1.12_2.12.jar -DgroupId=org.apache.rocketmq -DartifactId=rocketmq-flink_1.12_2.12 -Dversion=0.0.2 -Dpackaging=jar +``` + +### 1.5 kudu-spark包 + +```shell +mvn install:install-file -Dfile=/path/to/kudu-spark3_2.12-1.4.0.jar -DgroupId=org.apache.kudu -DartifactId=kudu-spark3_2.12 -Dversion=1.4.0 -Dpackaging=jar +``` + +## 二、deploy到私服 + +以下命令以hbase-client包为例,将该包推送到公司自己的私服中。推送私服,首先需要在settings.xml中配置私服账号、密码等信息,可自行查阅资料。 + +```shell +mvn deploy:deploy-file -Dfile=/path/to/hbase-client_2.12-1.2.0-cdh5.12.1.jar -DgroupId=org.apache.hbase -DartifactId=hbase-client_2.12 -Dversion=1.2.0-cdh5.12.1 -Dpackaging=jar -DrepositoryId=releases -Durl=http://ip:port/nexus/content/repositories/releases/ +``` + +## 三、自行编译 + +为了满足差异化、不同版本的编译需求,用户可以到github上找到相应库官方源码(需要进行定制化开发,比如适配scala2.12以及spark或flink版本差异带来的编译错误),进行编译,github地址如下: + +[hbase-spark库源码地址](https://github.com/cloudera/hbase/tree/cdh5-1.2.0_5.12.2/hbase-spark) + +[kudu-spark库源码地址](https://github.com/apache/kudu/tree/master/java/kudu-spark) + +[rocketmq-spark库源码地址](https://github.com/apache/rocketmq-externals/tree/master/rocketmq-spark) + +[rocketmq-flink库源码地址](https://github.com/apache/rocketmq-externals/tree/master/rocketmq-flink) + diff --git a/docs/dependency.md b/docs/dependency.md new file mode 100644 index 0000000..43daa83 --- /dev/null +++ b/docs/dependency.md @@ -0,0 +1,61 @@ + + +### 依赖管理 + +```xml + + com.zto.fire + fire-common_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-core_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-jdbc_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-hbase_${scala.binary.version} + ${project.version} + + + + com.zto.fire + fire-spark_${spark.reference} + ${project.version} + + + + com.zto.fire + fire-flink_${flink.reference} + ${project.version} + +``` + +### 示例pom.xml + +- [spark项目](pom/spark-pom.xml) +- [flink项目](pom/flink-pom.xml) + diff --git a/docs/hbase.md b/docs/hbase.md new file mode 100644 index 0000000..cf38dcb --- /dev/null +++ b/docs/hbase.md @@ -0,0 +1,359 @@ + + +# HBase 读写 + +HBase对更新和点查具有很好的支持,在实时计算场景下也是应用十分广泛的。为了进一步简化HBase读写api,提高开发效率,fire框架对HBase API进行了深度的封装。目前支持3种读写模式,分别是:Java API、Bulk API以及Spark提供的API。另外,fire框架支持在同一个任务中对任意多个hbase集群进行读写。 + +### 一、配置文件 + +```properties +# 方式一:直接指定zkurl +hbase.cluster=zkurl +# 方式二:事先定义好hbase别名与url的映射,然后通过别名配置,以下配置定义了别名test与url的映射关系 +fire.hbase.cluster.map.test=localhost01:2181,localhost02:2181,localhost03:2181 +# 通过别名方式引用 +hbase.cluster=test +``` + +### 二、表与JavaBean映射 + +fire框架通过Javabean与HBase表建立的关系简化读写api: + +```java +/** + * 对应HBase表的JavaBean + * + * @author ChengLong 2019-6-20 16:06:16 + */ +@HConfig(multiVersion = true) +public class Student extends HBaseBaseBean { + private Long id; + private String name; + private Integer age; + // 多列族情况下需使用family单独指定 + private String createTime; + // 若JavaBean的字段名称与HBase中的字段名称不一致,需使用value单独指定 + // 此时hbase中的列名为length1,而不是length + @FieldName(family = "data", value = "length1") + private BigDecimal length; + private Boolean sex; + + /** + * rowkey的构建 + * + * @return + */ + @Override + public Student buildRowKey() { + this.rowKey = this.id.toString(); + return this; + } +} + +``` + +上述代码中定义了名为Student的Javabean,该Javabean需要继承自HBaseBaseBean,并实现buildRowKey方法,这个方法中需要告诉fire框架,rowKey是如何构建的。 + +通过以上两步即可实现Javabean与HBase表的关系绑定。对于个性化需求,如果需要以多版本的方式进行读写,则需在类名上添加@HConfig(multiVersion = true)注解。如果Javabean中的列名与HBase中的字段名不一致,可以通过@FieldName(family = "data", value = "length1")进行单独指定,当然,列族也可以通过这个注解指定。如果不知道列族名称,则默认只有一个名为info的列族。 + +目前暂不支持scala语言的class以及case class,仅支持基本的字段数据类型,不支持嵌套的或者复杂的字段类型。 + +### 三、spark任务 + +#### [1.1 java api](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseConnectorTest.scala) + +```scala +/** + * 使用HBaseConnector插入一个rdd的数据 + * rdd的类型必须为HBaseBaseBean的子类 + */ +def testHbasePutRDD: Unit = { + val studentList = Student.newStudentList() + val studentRDD = this.fire.createRDD(studentList, 2) + // 为空的字段不插入 + studentRDD.hbasePutRDD(this.tableName1) +} + +/** + * 使用HBaseConnector插入一个DataFrame的数据 + */ +def testHBasePutDF: Unit = { + val studentList = Student.newStudentList() + val studentDF = this.fire.createDataFrame(studentList, classOf[Student]) + // 每个批次插100条 + studentDF.hbasePutDF(this.tableName1, classOf[Student]) +} + +/** + * 使用HBaseConnector get数据,并将结果以RDD方式返回 + */ +def testHbaseGetRDD: Unit = { + val getList = Seq("1", "2", "3", "5", "6") + val getRDD = this.fire.createRDD(getList, 2) + // 以多版本方式get,并将结果集封装到rdd中返回 + val studentRDD = this.fire.hbaseGetRDD(this.tableName1, classOf[Student], getRDD) + studentRDD.printEachPartition +} + +/** + * 使用HBaseConnector get数据,并将结果以DataFrame方式返回 + */ +def testHbaseGetDF: Unit = { + val getList = Seq("1", "2", "3", "4", "5", "6") + val getRDD = this.fire.createRDD(getList, 3) + // get到的结果以dataframe形式返回 + val studentDF = this.fire.hbaseGetDF(this.tableName1, classOf[Student], getRDD) + studentDF.show(100, false) +} + +/** + * 使用HBaseConnector scan数据,并以RDD方式返回 + */ +def testHbaseScanRDD: Unit = { + val rdd = this.fire.hbaseScanRDD2(this.tableName1, classOf[Student], "1", "6") + rdd.repartition(3).printEachPartition +} + +/** + * 使用HBaseConnector scan数据,并以DataFrame方式返回 + */ +def testHbaseScanDF: Unit = { + val dataFrame = this.fire.hbaseScanDF2(this.tableName1, classOf[Student], "1", "6") + dataFrame.repartition(3).show(100, false) +} +``` + +#### [1.2 bulk api](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HbaseBulkTest.scala) + +```scala +/** + * 使用bulk的方式将rdd写入到hbase + */ +def testHbaseBulkPutRDD: Unit = { + // 方式一:将rdd的数据写入到hbase中,rdd类型必须为HBaseBaseBean的子类 + val rdd = this.fire.createRDD(Student.newStudentList(), 2) + // rdd.hbaseBulkPutRDD(this.tableName2) + // 方式二:使用this.fire.hbaseBulkPut将rdd中的数据写入到hbase + this.fire.hbaseBulkPutRDD(this.tableName2, rdd) + + // 第二个参数指定false表示不插入为null的字段到hbase中 + // rdd.hbaseBulkPutRDD(this.tableName2, insertEmpty = false) + // 第三个参数为true表示以多版本json格式写入 + // rdd.hbaseBulkPutRDD(this.tableName3, false, true) +} + +/** + * 使用bulk的方式将DataFrame写入到hbase + */ +def testHbaseBulkPutDF: Unit = { + // 方式一:将DataFrame的数据写入到hbase中 + val rdd = this.fire.createRDD(Student.newStudentList(), 2) + val studentDF = this.fire.createDataFrame(rdd, classOf[Student]) + // insertEmpty=false表示为空的字段不插入 + studentDF.hbaseBulkPutDF(this.tableName1, classOf[Student], keyNum = 2) + // 方式二: + // this.fire.hbaseBulkPutDF(this.tableName2, studentDF, classOf[Student]) +} + +/** + * 使用bulk方式根据rowKey获取数据,并将结果集以RDD形式返回 + */ +def testHBaseBulkGetRDD: Unit = { + // 方式一:使用rowKey读取hbase中的数据,rowKeyRdd类型为String + val rowKeyRdd = this.fire.createRDD(Seq(1.toString, 2.toString, 3.toString, 5.toString, 6.toString), 2) + val studentRDD = rowKeyRdd.hbaseBulkGetRDD(this.tableName1, classOf[Student], keyNum = 2) + studentRDD.foreach(println) + // 方式二:使用this.fire.hbaseBulkGetRDD + // val studentRDD2 = this.fire.hbaseBulkGetRDD(this.tableName2, rowKeyRdd, classOf[Student]) + // studentRDD2.foreach(println) +} + +/** + * 使用bulk方式根据rowKey获取数据,并将结果集以DataFrame形式返回 + */ +def testHBaseBulkGetDF: Unit = { + // 方式一:使用rowKey读取hbase中的数据,rowKeyRdd类型为String + val rowKeyRdd = this.fire.createRDD(Seq(1.toString, 2.toString, 3.toString, 5.toString, 6.toString), 2) + val studentDF = rowKeyRdd.hbaseBulkGetDF(this.tableName2, classOf[Student]) + studentDF.show(100, false) + // 方式二:使用this.fire.hbaseBulkGetDF + val studentDF2 = this.fire.hbaseBulkGetDF(this.tableName2, rowKeyRdd, classOf[Student]) + studentDF2.show(100, false) +} + +/** + * 使用bulk方式进行scan,并将结果集映射为RDD + */ +def testHbaseBulkScanRDD: Unit = { + // scan操作,指定rowKey的起止或直接传入自己构建的scan对象实例,返回类型为RDD[Student] + val scanRDD = this.fire.hbaseBulkScanRDD2(this.tableName2, classOf[Student], "1", "6") + scanRDD.foreach(println) +} + +/** + * 使用bulk方式进行scan,并将结果集映射为DataFrame + */ +def testHbaseBulkScanDF: Unit = { + // scan操作,指定rowKey的起止或直接传入自己构建的scan对象实例,返回类型为DataFrame + val scanDF = this.fire.hbaseBulkScanDF2(this.tableName2, classOf[Student], "1", "6") + scanDF.show(100, false) +} +``` + + + +#### [1.3 spark api](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseHadoopTest.scala) + +```scala +/** + * 基于saveAsNewAPIHadoopDataset封装,将rdd数据保存到hbase中 + */ +def testHbaseHadoopPutRDD: Unit = { + val studentRDD = this.fire.createRDD(Student.newStudentList(), 2) + this.fire.hbaseHadoopPutRDD(this.tableName2, studentRDD, keyNum = 2) + // 方式二:直接基于rdd进行方法调用 + // studentRDD.hbaseHadoopPutRDD(this.tableName1) +} + +/** + * 基于saveAsNewAPIHadoopDataset封装,将DataFrame数据保存到hbase中 + */ +def testHbaseHadoopPutDF: Unit = { + val studentRDD = this.fire.createRDD(Student.newStudentList(), 2) + val studentDF = this.fire.createDataFrame(studentRDD, classOf[Student]) + // 由于DataFrame相较于Dataset和RDD是弱类型的数据集合,所以需要传递具体的类型classOf[Type] + this.fire.hbaseHadoopPutDF(this.tableName3, studentDF, classOf[Student]) + // 方式二:基于DataFrame进行方法调用 + // studentDF.hbaseHadoopPutDF(this.tableName3, classOf[Student]) +} + +/** + * 使用Spark的方式scan海量数据,并将结果集映射为RDD + */ +def testHBaseHadoopScanRDD: Unit = { + val studentRDD = this.fire.hbaseHadoopScanRDD2(this.tableName2, classOf[Student], "1", "6", keyNum = 2) + studentRDD.printEachPartition +} + +/** + * 使用Spark的方式scan海量数据,并将结果集映射为DataFrame + */ +def testHBaseHadoopScanDF: Unit = { + val studentDF = this.fire.hbaseHadoopScanDF2(this.tableName3, classOf[Student], "1", "6") + studentDF.show(100, false) +} +``` + +### 四、flink任务 + +[样例代码:](../fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/HBaseTest.scala) + +```scala +/** + * table的hbase sink + */ +def testTableHBaseSink(stream: DataStream[Student]): Unit = { + stream.createOrReplaceTempView("student") + val table = this.flink.sqlQuery("select id, name, age from student group by id, name, age") + // 方式一、自动将row转为对应的JavaBean + // 注意:table对象上调用hbase api,需要指定泛型 + table.hbasePutTable[Student](this.tableName).setParallelism(1) + this.fire.hbasePutTable[Student](table, this.tableName2, keyNum = 2) + + // 方式二、用户自定义取数规则,从row中创建HBaseBaseBean的子类 + table.hbasePutTable2[Student](this.tableName3)(row => new Student(1L, row.getField(1).toString, row.getField(2).toString.toInt)) + // 或者 + this.fire.hbasePutTable2[Student](table, this.tableName5, keyNum = 2)(row => new Student(1L, row.getField(1).toString, row.getField(2).toString.toInt)) +} + +/** + * table的hbase sink + */ +def testTableHBaseSink2(stream: DataStream[Student]): Unit = { + val table = this.fire.sqlQuery("select id, name, age from student group by id, name, age") + + // 方式二、用户自定义取数规则,从row中创建HBaseBaseBean的子类 + table.hbasePutTable2(this.tableName6)(row => new Student(1L, row.getField(1).toString, row.getField(2).toString.toInt)) + // 或者 + this.flink.hbasePutTable2(table, this.tableName7, keyNum = 2)(row => new Student(1L, row.getField(1).toString, row.getField(2).toString.toInt)) +} + +/** + * stream hbase sink + */ +def testStreamHBaseSink(stream: DataStream[Student]): Unit = { + // 方式一、DataStream中的数据类型为HBaseBaseBean的子类 + // stream.hbasePutDS(this.tableName) + this.fire.hbasePutDS[Student](stream, this.tableName8) + + // 方式二、将value组装为HBaseBaseBean的子类,逻辑用户自定义 + stream.hbasePutDS2(this.tableName9, keyNum = 2)(value => value) + // 或者 + this.fire.hbasePutDS2(stream, this.tableName10)(value => value) +} + +/** + * stream hbase sink + */ +def testStreamHBaseSink2(stream: DataStream[Student]): Unit = { + // 方式二、将value组装为HBaseBaseBean的子类,逻辑用户自定义 + stream.hbasePutDS2(this.tableName11)(value => value) + // 或者 + this.fire.hbasePutDS2(stream, this.tableName12, keyNum = 2)(value => value) +} + +/** + * hbase的基本操作 + */ +def testHBase: Unit = { + // get操作 + val getList = ListBuffer(HBaseConnector.buildGet("1")) + val student = HBaseConnector.get(this.tableName, classOf[Student], getList, 1) + if (student != null) println(JSONUtils.toJSONString(student)) + // scan操作 + val studentList = HBaseConnector.scan(this.tableName, classOf[Student], HBaseConnector.buildScan("0", "9"), 1) + if (studentList != null) println(JSONUtils.toJSONString(studentList)) + // delete操作 + HBaseConnector.deleteRows(this.tableName, Seq("1")) +} +``` + +### 五、多集群读写 + +fire框架支持同一个任务中对任意多个hbase集群进行读写,首先要在配置文件中以keyNum进行指定要连接的所有hbase集群的zk地址: + +```properties +hbase.cluster=localhost01:2181 +hbase.cluster3=localhost02:2181 +hbase.cluster8=localhost03:2181 +``` + +在代码中,通过keyNum参数告诉fire这行代码连接的hbase集群是哪个。注意:api中的keyNum要与配置中的数字对应上。 + +```scala +// insert 操作 +studentRDD.hbasePutRDD(this.tableName1) +studentRDD.hbasePutRDD(this.tableName2, keyNum = 3) +studentRDD.hbasePutRDD(this.tableName3, keyNum = 8) +// scan 操作 +this.fire.hbaseScanDF2(this.tableName1, classOf[Student], "1", "6") +this.fire.hbaseScanDF2(this.tableName1, classOf[Student], "1", "6", keyNum = 3) +``` + diff --git a/docs/hive.md b/docs/hive.md new file mode 100644 index 0000000..0a299d6 --- /dev/null +++ b/docs/hive.md @@ -0,0 +1,104 @@ + + +# hive集成与配置 + +fire框架针对hive提供了更友好的支持方式,程序中可以通过配置文件指定读写的hive集群信息。通过fire框架,可以屏蔽spark或flink任务连接hive的具体细节,只需要通过hive.cluster=xxx即可指定任务如何去连hive。 + +### 一、spark任务 + +1. 配置文件 + +```properties +# 方式一:直接指定hive的thrift server地址,多个以逗号分隔 +spark.hive.cluster=thrift://localhost:9083,thrift://localhost02:9083 + +# 方式二(推荐):如果已经通过fire.hive.cluster.map.xxx指定了别名,则可以直接使用别名 +# 公共信息特别是集群信息建议放到commons.properties中 +fire.hive.cluster.map.batch=thrift://localhost:9083,thrift://localhost02:9083 +# batch是上述url的hive别名,支持约定多个hive集群的别名 +spark.hive.cluster=batch +``` + +2. [示例代码](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hive/HiveClusterReader.scala) + +```scala +// 通过上述配置,代码中就可以直接通过以下方式连接指定的hive +this.fire.sql("select * from hive.tableName").show +``` + +3. NameNode HA + +有时,hadoop集群维护,可能会导致那些读写hive的spark streaming任务挂掉。为了提高灵活性,避免将core-site.xml与hdfs-site.xml放到工程的resources目录下,fire提供了配置的方式,将Name Node HA信息通过配置文件进行指定。每项配置中的batch对应fire.hive.cluster.map.batch所指定的别名:batch,其他信息根据集群不同进行单独配置。如果有多个hive集群,可以配置多套HA配置。 + +```properties +# 用于是否启用HDFS HA +spark.hdfs.ha.enable=true +# 离线hive集群的HDFS HA配置项,规则为统一的ha前缀:spark.hdfs.ha.conf.+hive.cluster名称+hdfs专门的ha配置 +spark.hdfs.ha.conf.batch.fs.defaultFS=hdfs://nameservice1 +spark.hdfs.ha.conf.batch.dfs.nameservices=nameservice1 +spark.hdfs.ha.conf.batch.dfs.ha.namenodes.nameservice1=namenode5231,namenode5229 +spark.hdfs.ha.conf.batch.dfs.namenode.rpc-address.nameservice1.namenode5231= 192.168.1.1:8020 +spark.hdfs.ha.conf.batch.dfs.namenode.rpc-address.nameservice1.namenode5229= 192.168.1.2:8020 +spark.hdfs.ha.conf.batch.dfs.client.failover.proxy.provider.nameservice1= org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider +``` + +4. hive参数设置 + +```properties +# 以spark.hive.conf.为前缀的配置将直接生效,比如开启hive动态分区 +# 原理是被直接执行:this.fire.sql("set hive.exec.dynamic.partition=true") +spark.hive.conf.hive.exec.dynamic.partition=true +spark.hive.conf.hive.exec.dynamic.partition.mode=nonstrict +spark.hive.conf.hive.exec.max.dynamic.partitions=5000 +``` + +### 二、flink任务 + +1. 配置文件 + +```properties +# 方式一:最简配置,指定hive-site.xml所在的目录的绝对路径 +flink.hive.cluster=/path/to/hive-site.xml_path/ + +# 方式二(推荐):通过flink.fire.hive.site.path.map.前缀指定别名 +flink.fire.hive.site.path.map.test=/tmp/hive/ +# 此处的test为hive-site.xml所在目录的别名,引用flink.fire.hive.site.path.map.test,建议放到commons.properties中统一约定 +flink.hive.cluster=test +``` + +2. [示例代码](../fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkHiveTest.scala) + +```scala +this.fire.sql("select name from hive.tableName group by name") +``` + +3. 个性化配置 + +```properties +# flink所集成的hive版本号 +flink.hive.version = 1.1.0 +# 默认的hive数据库 +flink.default.database.name = tmp +# 默认的hive分区字段名称 +flink.default.table.partition.name = ds +# hive的catalog名称 +flink.hive.catalog.name = hive +``` + diff --git a/docs/img/Fire.png b/docs/img/Fire.png new file mode 100644 index 0000000..56ab75d Binary files /dev/null and b/docs/img/Fire.png differ diff --git a/docs/img/configuration.jpg b/docs/img/configuration.jpg new file mode 100644 index 0000000..90dee08 Binary files /dev/null and b/docs/img/configuration.jpg differ diff --git a/docs/jdbc.md b/docs/jdbc.md new file mode 100644 index 0000000..0c5e968 --- /dev/null +++ b/docs/jdbc.md @@ -0,0 +1,156 @@ + + +# JDBC读写 + +实时任务开发中,对jdbc读写的需求很高。为了简化jdbc开发步骤,fire框架对jdbc操作做了进一步封装,将许多常见操作简化成一行代码。另外,fire框架支持在同一个任务中对任意多个数据源进行读写。 + +### 一、数据源配置 + +数据源包括jdbc的url、driver、username与password等重要信息,建议将这些配置放到commons.properties中,避免每个任务单独配置。fire框架内置了c3p0数据库连接池,在分布式场景下,限制每个container默认最多3个connection,避免申请过多资源时申请太多的数据库连接。 + +```properties +db.jdbc.url = jdbc:derby:memory:fire;create=true +db.jdbc.driver = org.apache.derby.jdbc.EmbeddedDriver +db.jdbc.maxPoolSize = 3 +db.jdbc.user = fire +db.jdbc.password = fire + +# 如果需要多个数据源,则可在每项配置的结尾添加对应的keyNum作为区分 +db.jdbc.url2 = jdbc:mysql://localhost:3306/fire +db.jdbc.driver2 = com.mysql.jdbc.Driver +db.jdbc.user2 = fire +db.jdbc.password2 = fire +``` + +### 二、API使用 + +#### [2.1 spark任务](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/jdbc/JdbcTest.scala) + +```scala +/** + * 使用jdbc方式对关系型数据库进行增删改操作 + */ +def testJdbcUpdate: Unit = { + val timestamp = DateFormatUtils.formatCurrentDateTime() + // 执行insert操作 + val insertSql = s"INSERT INTO $tableName (name, age, createTime, length, sex) VALUES (?, ?, ?, ?, ?)" + this.fire.jdbcUpdate(insertSql, Seq("admin", 12, timestamp, 10.0, 1)) + // 更新配置文件中指定的第二个关系型数据库 + this.fire.jdbcUpdate(insertSql, Seq("admin", 12, timestamp, 10.0, 1), keyNum = 2) + + // 执行更新操作 + val updateSql = s"UPDATE $tableName SET name=? WHERE id=?" + this.fire.jdbcUpdate(updateSql, Seq("root", 1)) + + // 执行批量操作 + val batchSql = s"INSERT INTO $tableName (name, age, createTime, length, sex) VALUES (?, ?, ?, ?, ?)" + + this.fire.jdbcBatchUpdate(batchSql, Seq(Seq("spark1", 21, timestamp, 100.123, 1), + Seq("flink2", 22, timestamp, 12.236, 0), + Seq("flink3", 22, timestamp, 12.236, 0), + Seq("flink4", 22, timestamp, 12.236, 0), + Seq("flink5", 27, timestamp, 17.236, 0))) + + // 执行批量更新 + this.fire.jdbcBatchUpdate(s"update $tableName set sex=? where id=?", Seq(Seq(1, 1), Seq(2, 2), Seq(3, 3), Seq(4, 4), Seq(5, 5), Seq(6, 6))) + + // 方式一:通过this.fire方式执行delete操作 + val sql = s"DELETE FROM $tableName WHERE id=?" + this.fire.jdbcUpdate(sql, Seq(2)) + // 方式二:通过JdbcConnector.executeUpdate + + // 同一个事务 + /*val connection = this.jdbc.getConnection() + this.fire.jdbcBatchUpdate("insert", connection = connection, commit = false, closeConnection = false) + this.fire.jdbcBatchUpdate("delete", connection = connection, commit = false, closeConnection = false) + this.fire.jdbcBatchUpdate("update", connection = connection, commit = true, closeConnection = true)*/ +} + + /** + * 将DataFrame数据写入到关系型数据库中 + */ + def testDataFrameSave: Unit = { + val df = this.fire.createDataFrame(Student.newStudentList(), classOf[Student]) + + val insertSql = s"INSERT INTO spark_test(name, age, createTime, length, sex) VALUES (?, ?, ?, ?, ?)" + // 指定部分DataFrame列名作为参数,顺序要对应sql中问号占位符的顺序,batch用于指定批次大小,默认取spark.db.jdbc.batch.size配置的值 + df.jdbcBatchUpdate(insertSql, Seq("name", "age", "createTime", "length", "sex"), batch = 100) + + df.createOrReplaceTempViewCache("student") + val sqlDF = this.fire.sql("select name, age, createTime from student where id>=1").repartition(1) + // 若不指定字段,则默认传入当前DataFrame所有列,且列的顺序与sql中问号占位符顺序一致 + sqlDF.jdbcBatchUpdate("insert into spark_test(name, age, createTime) values(?, ?, ?)") + // 等同以上方式 + // this.fire.jdbcBatchUpdateDF(sqlDF, "insert into spark_test(name, age, createTime) values(?, ?, ?)") + } +``` + +#### [2.2 flink任务](../fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/JdbcTest.scala) + +```scala +/** + * table的jdbc sink + */ +def testTableJdbcSink(stream: DataStream[Student]): Unit = { + stream.createOrReplaceTempView("student") + val table = this.fire.sqlQuery("select name, age, createTime, length, sex from student group by name, age, createTime, length, sex") + + // 方式一、table中的列顺序和类型需与jdbc sql中的占位符顺序保持一致 + table.jdbcBatchUpdate(sql(this.tableName)).setParallelism(1) + // 或者 + this.fire.jdbcBatchUpdateTable(table, sql(this.tableName), keyNum = 6).setParallelism(1) + + // 方式二、自定义row取数规则,适用于row中的列个数和顺序与sql占位符不一致的情况 + table.jdbcBatchUpdate2(sql(this.tableName), flushInterval = 10000, keyNum = 7)(row => { + Seq(row.getField(0), row.getField(1), row.getField(2), row.getField(3), row.getField(4)) + }) + // 或者 + this.flink.jdbcBatchUpdateTable2(table, sql(this.tableName), keyNum = 8)(row => { + Seq(row.getField(0), row.getField(1), row.getField(2), row.getField(3), row.getField(4)) + }).setParallelism(1) +} + +/** + * stream jdbc sink + */ +def testStreamJdbcSink(stream: DataStream[Student]): Unit = { + // 方式一、指定字段列表,内部根据反射,自动获取DataStream中的数据并填充到sql中的占位符 + // 此处fields有两层含义:1. sql中的字段顺序(对应表) 2. DataStream中的JavaBean字段数据(对应JavaBean) + // 注:要保证DataStream中字段名称是JavaBean的名称,非表中字段名称 顺序要与占位符顺序一致,个数也要一致 + stream.jdbcBatchUpdate(sql(this.tableName2), fields).setParallelism(3) + // 或者 + this.fire.jdbcBatchUpdateStream(stream, sql(this.tableName2), fields, keyNum = 6).setParallelism(1) + + // 方式二、通过用户指定的匿名函数方式进行数据的组装,适用于上面方法无法反射获取值的情况,适用面更广 + stream.jdbcBatchUpdate2(sql(this.tableName2), 3, 30000, keyNum = 7) { + // 在此处指定取数逻辑,定义如何将dstream中每列数据映射到sql中的占位符 + value => Seq(value.getName, value.getAge, DateFormatUtils.formatCurrentDateTime(), value.getLength, value.getSex) + }.setParallelism(1) + + // 或者 + this.flink.jdbcBatchUpdateStream2(stream, sql(this.tableName2), keyNum = 8) { + value => Seq(value.getName, value.getAge, DateFormatUtils.formatCurrentDateTime(), value.getLength, value.getSex) + }.setParallelism(2) +} +``` + +### 三、多个数据源读写 + +fire框架支持同一个任务中读写任意个数的数据源,只需要通过keyNum指定即可。配置和使用方式可以参考:HBase、kafka等。 \ No newline at end of file diff --git a/docs/kafka.md b/docs/kafka.md new file mode 100644 index 0000000..10f8a9f --- /dev/null +++ b/docs/kafka.md @@ -0,0 +1,102 @@ + + +# Kafka消息接入 + +### 一、API使用 + +使用fire框架可以很方便的消费kafka中的数据,并且支持在同一任务中消费多个kafka集群的多个topic。核心代码仅一行: + +```scala +// Spark Streaming任务 +val dstream = this.fire.createKafkaDirectStream() +// structured streaming任务 +val kafkaDataset = this.fire.loadKafkaParseJson() +// flink 任务 +val dstream = this.fire.createKafkaDirectStream() +``` + +以上的api均支持kafka相关参数的传入,但fire推荐将这些集群信息放到配置文件中,增强代码可读性,提高代码简洁性与灵活性。 + +### 二、kafka配置 + +你可能会疑惑,kafka的broker与topic信息并没有在代码中指定,程序是如何消费的呢?其实,这些信息都放到了任务同名的配置文件中。当然,你可以选择将这些kafka信息放到代码中指定。如果代码中指定了集群信息,同时配置文件中也有指定,则配置文件的优先级更高。 + +```properties +spark.kafka.brokers.name = localhost:9092,localhost02:9092 +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics = topic_name +# 用于指定groupId,如果不指定,则默认为当前类名 +spark.kafka.group.id = fire +``` + +### 三、多kafka多topic消费 + +实际生产场景下,会有同一个任务消费多个kafka集群,多个topic的情况。面对这种需求,fire是如何应对的呢?fire框架约定,配置的key后缀区分不同的kafka配置项,详见以下配置列表: + +```properties +# 以下配置中指定了两个kafka集群信息 +spark.kafka.brokers.name = localhost:9092,localhost02:9092 +# 多个topic以逗号分隔 +spark.kafka.topics = topic_name,topic_name02 +spark.kafka.group.id = fire +# 注意key的数字后缀,对应代码中的keyNum=2 +spark.kafka.brokers.name2 = localhost:9092,localhost02:9092 +spark.kafka.topics2 = fire,flink +spark.kafka.group.id2 = fire +``` + +那么,代码中是如何关联带有数字后缀的key的呢?答案是通过keyNum参数来指定: + +```scala +// 对应spark.kafka.brokers.name这个kafka集群 +val dstream = this.fire.createKafkaDirectStream(keyNum=1) +// 对应spark.kafka.brokers.name2这个kafka集群 +val dstream2 = this.fire.createKafkaDirectStream(keyNum=2) +``` + +### 三、offset提交 + +```scala +dstream.kafkaCommitOffsets() +``` + +### 四、kafka-client参数调优 + +有时,需要对kafka消费进行client端的调优,fire支持所有的kafka-client参数,这些参数只需要添加到配置文件中即可生效: + +```properties +# 用于配置启动时的消费位点,默认取最新 +spark.kafka.starting.offsets = latest +# 数据丢失时执行失败 +spark.kafka.failOnDataLoss = true +# 是否启用自动commit +spark.kafka.enable.auto.commit = false +# 以spark.kafka.conf开头的配置支持所有kafka client的配置 +spark.kafka.conf.session.timeout.ms = 300000 +spark.kafka.conf.request.timeout.ms = 400000 +spark.kafka.conf.session.timeout.ms2 = 300000 +``` + +### 五、代码示例 + +[1. spark消费kafka demo](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/KafkaTest.scala) + +[2. flink消费kafka demo](../fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/HBaseTest.scala) + diff --git a/docs/lib/hbase-client_2.12-1.2.0-cdh5.12.1.jar b/docs/lib/hbase-client_2.12-1.2.0-cdh5.12.1.jar new file mode 100644 index 0000000..0fd045c Binary files /dev/null and b/docs/lib/hbase-client_2.12-1.2.0-cdh5.12.1.jar differ diff --git a/docs/lib/hbase-spark3_2.12-1.2.0-cdh5.12.1.jar b/docs/lib/hbase-spark3_2.12-1.2.0-cdh5.12.1.jar new file mode 100644 index 0000000..c545514 Binary files /dev/null and b/docs/lib/hbase-spark3_2.12-1.2.0-cdh5.12.1.jar differ diff --git a/docs/lib/kudu-spark3_2.12-1.4.0.jar b/docs/lib/kudu-spark3_2.12-1.4.0.jar new file mode 100644 index 0000000..2a6f183 Binary files /dev/null and b/docs/lib/kudu-spark3_2.12-1.4.0.jar differ diff --git a/docs/lib/rocketmq-flink_1.12_2.12.jar b/docs/lib/rocketmq-flink_1.12_2.12.jar new file mode 100644 index 0000000..e9dc07f Binary files /dev/null and b/docs/lib/rocketmq-flink_1.12_2.12.jar differ diff --git a/docs/lib/rocketmq-spark3_2.12.jar b/docs/lib/rocketmq-spark3_2.12.jar new file mode 100644 index 0000000..b1c0d36 Binary files /dev/null and b/docs/lib/rocketmq-spark3_2.12.jar differ diff --git a/docs/outline.md b/docs/outline.md new file mode 100644 index 0000000..02e0199 --- /dev/null +++ b/docs/outline.md @@ -0,0 +1,114 @@ + + +# 框架集成 + +## 一、环境准备 + +通过git拉取到fire源码后,需要开发者将集群环境信息配置到fire框架中,该配置文件位于fire-core/src/main/resources/cluster.properties。如何配置请参考:[fire配置手册](./properties.md)。配置完成后,即可通过maven进行编译打包,建议将fire的包deploy到公司的私服,方便团队共用。 + +## 二、开发步骤 + +fire框架提供一种代码结构,基于该结构有助于为spark或flink程序进行一定的代码梳理,便于形成统一的编程风格,利于团队协作、排查问题。 + +```scala +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkStreaming + +/** + * Flink流式计算任务模板 + */ +object Test extends BaseFlinkStreaming { + + override def process: Unit = { + val dstream = this.fire.createKafkaDirectStream() + dstream.print + // 提交flink streaming任务,job名称不指定默认当前类名 + this.fire.start + } + + def main(args: Array[String]): Unit = { + // 根据配置信息自动创建fire变量、StreamExecutionEnvironment、StreamTableEnvironment等 + this.init() + } +} +``` + +从以上代码片段中可以看到,引入fire框架大体分为5个步骤: + +### 一、隐式转换 + +无论是spark还是flink任务,都需要引入以下的隐式转换,该隐式转换提供了众多简单易用的api。 + +```scala +import com.zto.fire._ +``` + +### 二、继承父类 + +​ fire框架针对不同的引擎、不同的场景提供了对应的父类,用户需要根据实际情况去继承: + +​ **1. spark引擎父类列表:** + +​ **SparkStreaming**:适用于进行Spark Streaming任务的开发 + +​ **BaseSparkCore**:适用于进行Spark批处理任务的开发 + +​ **BaseStructuredStreaming**:适用于进行Spark Structured Streaming任务的开发 + +​ **2. flink引擎父类列表:** + +​ **BaseFlinkStreaming**:适用于进行flink流式计算任务的开发 + +​ **BaseFlinkBatch**:适用于进行flink批处理任务的开发 + +### 三、初始化 + +实时任务有一个特点就是一个任务一个类,由于缺少统一的规范,用户进行实时任务开发时,会将很多业务代码写到main方法中,导致main方法过胖。由此带来的问题是代码难以阅读、难以维护。另外,在进行代码开发时,难以避免重复的写初始化spark或flink引擎相关的上下文信息。为了解决以上问题,fire框架将引擎上下文初始化简化成了一行代码,并建议在main方法中只做初始化动作,业务逻辑则放到process方法中。 + +```scala +def main(args: Array[String]): Unit = { + // 根据任务同名的配置文件进行引擎上下文的初始化 + this.init() +} +``` + +上述代码适用于spark或flink引擎,对于个性化的初始化需求,可以将一些参数信息放到任务同名的配置文件中。该配置文件会在初始化之前自动被加载,然后设置到SparkSession或flink的environment中。 + +### 四、业务逻辑 + +为了解决main方法“过胖”的问题,fire父类中统一约定了process方法,该方法会被fire框架自动调用,用户无需在代码中主动调用该方法。process方法作为业务逻辑的聚集地,是业务逻辑的开始。 + +```scala +override def process: Unit = { + val dstream = this.fire.createKafkaDirectStream() + dstream.print + // 提交streaming任务 + this.fire.start +} +``` + +当然,如果业务逻辑很复杂,可以进一步抽取,然后在process中调用即可。 + +#### 五、配置文件 + +将配置信息硬编码到代码中是很不好的做法,为了让程序足够灵活,代码足够简洁,fire框架约定,每个任务可以有一个与类名同名的配置文件。比如说类名是:**Test.scala**,则fire框架在init的时候会自动扫描并加载src/main/resources/**Test.properties**文件。支持配置文件的嵌套结构,比如说在resources下可以进一步创建多个子目录,存放不同类别的配置文件,便于管理。![配置文件](D:\project\workspace\fire\docs\img\configuration.jpg) + +​ + diff --git a/docs/platform.md b/docs/platform.md new file mode 100644 index 0000000..3c82fb9 --- /dev/null +++ b/docs/platform.md @@ -0,0 +1,19 @@ + + diff --git a/docs/pom/flink-pom.xml b/docs/pom/flink-pom.xml new file mode 100644 index 0000000..b0cd29a --- /dev/null +++ b/docs/pom/flink-pom.xml @@ -0,0 +1,409 @@ + + + + + 4.0.0 + com.zto.bigdata.flink + flink-demo + 1.0-SNAPSHOT + ${project.artifactId} + + + 2.0.0-SNAPSHOT + 1.12.2 + 2.12 + 12 + 0.11.0.2 + provided + 2.6.0-cdh5.12.1 + 1.1.0-cdh5.12.1 + 1.2.0-cdh5.12.1 + ${flink.version}_${scala.binary.version} + + + + + com.zto.fire + fire-common_${scala.binary.version} + ${fire.version} + + + com.zto.fire + fire-core_${scala.binary.version} + ${fire.version} + + + com.zto.fire + fire-jdbc_${scala.binary.version} + ${fire.version} + + + com.zto.fire + fire-hbase_${scala.binary.version} + ${fire.version} + + + com.zto.fire + fire-flink_${flink.reference} + ${fire.version} + + + + org.apache.flink + flink-java + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-scala_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-streaming-scala_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-clients_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-runtime-web_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-queryable-state-runtime_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-queryable-state-client-java + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-statebackend-rocksdb_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-kafka_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.kafka + kafka_${scala.binary.version} + ${kafka.version} + ${maven.scope} + + + org.apache.flink + flink-table-api-java-bridge_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-api-java + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-api-scala-bridge_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-planner_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-planner-blink_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-common + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-hive_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-jdbc_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-elasticsearch-base_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-hadoop-compatibility_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-shaded-hadoop-2-uber + 2.6.5-8.0 + ${maven.scope} + + + javax.servlet + servlet-api + + + + + + + org.apache.hive + hive-exec + ${hive.apache.version} + ${maven.scope} + + + calcite-core + org.apache.calcite + + + + + + + org.apache.hbase + hbase-common + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-server + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-client_${scala.binary.version} + ${hbase.version} + ${maven.scope} + + + + + org.apache.hudi + hudi-flink-bundle_${scala.binary.version} + ${hudi.version} + ${maven.scope} + + + + org.apache.rocketmq + rocketmq-flink_${flink.major.version}_${scala.binary.version} + ${rocketmq.external.version} + + + + + + + + true + org.apache.maven.plugins + maven-compiler-plugin + + 1.8 + 1.8 + + + + + org.scala-tools + maven-scala-plugin + 2.15.2 + + + + scala-compile-first + process-resources + + compile + + + + + + scala-test-compile + process-test-resources + + testCompile + + + + + + + org.codehaus.mojo + build-helper-maven-plugin + + + + add-source + generate-sources + + add-source + + + + src/main/scala + + + + + + + add-test-source + generate-test-sources + + add-test-source + + + + src/test/scala + + + + + + + + + org.apache.maven.plugins + maven-eclipse-plugin + 2.10 + + true + true + + org.scala-ide.sdt.core.scalanature + org.eclipse.jdt.core.javanature + + + org.scala-ide.sdt.core.scalabuilder + + + org.scala-ide.sdt.launching.SCALA_CONTAINER + org.eclipse.jdt.launching.JRE_CONTAINER + + + + org.scala-lang:scala-library + org.scala-lang:scala-compiler + + + **/*.scala + **/*.java + + + + + + + org.apache.maven.plugins + maven-surefire-plugin + 2.19.1 + + + **/*.java + **/*.scala + + + + + + org.apache.maven.plugins + maven-shade-plugin + 2.4.2 + + + package + + shade + + + + + + + *:* + + META-INF/*.SF + META-INF/*.DSA + META-INF/*.RSA + + + + zto-${project.artifactId}-${project.version} + + + + + \ No newline at end of file diff --git a/docs/pom/spark-pom.xml b/docs/pom/spark-pom.xml new file mode 100644 index 0000000..c62281a --- /dev/null +++ b/docs/pom/spark-pom.xml @@ -0,0 +1,522 @@ + + + + + 4.0.0 + com.zto.bigdata.spark + spark-demo + 1.0-SNAPSHOT + 2008 + + + provided + 2.0.0-SNAPSHOT + 2.12.12 + 2.12 + 3.0.2 + 2.6.0-cdh5.12.1 + 1.1.0-cdh5.12.1 + 1.2.0-cdh5.12.1 + 1.4.0 + 2.5.30 + 2.8.10 + 18.0 + 4.0.0-incubating + ${spark.version}_${scala.binary.version} + UTF-8 + + + + + aliyun + https://maven.aliyun.com/repository/central + + true + + + true + + + + huaweicloud + https://mirrors.huaweicloud.com/repository/maven/ + + true + + + true + + + + + + + + com.zto.fire + fire-common_${scala.binary.version} + ${fire.version} + + + com.zto.fire + fire-spark_${spark.reference} + ${fire.version} + + + + com.fasterxml.jackson.core + jackson-databind + 2.10.0 + ${maven.scope} + + + com.fasterxml.jackson.core + jackson-core + 2.10.0 + ${maven.scope} + + + + org.apache.spark + spark-core_${scala.binary.version} + + + com.esotericsoftware.kryo + kryo + + + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-sql_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-streaming_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-hive_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-sql-kafka-0-10_${scala.binary.version} + ${spark.version} + + + org.apache.spark + spark-streaming_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-streaming-kafka-0-10_${scala.binary.version} + ${spark.version} + + + + + org.apache.hadoop + hadoop-common + ${hadoop.version} + ${maven.scope} + + + org.apache.hadoop + hadoop-hdfs + ${hadoop.version} + ${maven.scope} + + + org.apache.hadoop + hadoop-client + ${hadoop.version} + ${maven.scope} + + + + + org.apache.hbase + hbase-common + ${hbase.version} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-server + ${hbase.version} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-client_${scala.binary.version} + ${hbase.version} + + + org.apache.hbase + hbase-spark${spark.major.version}_${scala.binary.version} + ${hbase.version} + + + org.apache.hbase + hbase-client + + + + + + + org.apache.kudu + kudu-spark${spark.major.version}_${scala.binary.version} + ${kudu.version} + ${maven.scope} + + + org.apache.kudu + kudu-client + ${kudu.version} + ${maven.scope} + + + + + org.apache.rocketmq + rocketmq-client + ${rocketmq.version} + + + org.apache.rocketmq + rocketmq-spark${spark.major.version}_${scala.binary.version} + 0.0.2 + + + + org.apache.spark + spark-avro_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.hudi + hudi-spark-bundle_${scala.binary.version} + 0.7.0 + ${maven.scope} + + + ru.yandex.clickhouse + clickhouse-jdbc + 0.2.4 + ${maven.scope} + + + com.google.guava + guava + ${guava.version} + + + + + + + hadoop-2.7 + + org.spark-project.hive + 1.2.1.spark2 + + + true + + + + org.apache.spark + spark-hive_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.hive + hive-common + + + org.apache.hive + hive-exec + + + org.apache.hive + hive-metastore + + + org.apache.hive + hive-serde + + + org.apache.hive + hive-shims + + + + + org.apache.spark + spark-hive-thriftserver_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.hive + hive-cli + + + org.apache.hive + hive-jdbc + + + org.apache.hive + hive-beeline + + + + + ${hive.group} + hive-cli + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-jdbc + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-beeline + ${hive.version} + ${maven.scope} + + + + ${hive.group} + hive-common + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-metastore + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-exec + ${hive.version} + ${maven.scope} + + + org.apache.commons + commons-lang3 + + + org.apache.spark + spark-core_2.10 + + + + + + + hadoop-3.2 + + + org.apache.spark + spark-hive_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + + + + + + + + true + org.apache.maven.plugins + maven-compiler-plugin + + 1.7 + 1.7 + + + + + org.scala-tools + maven-scala-plugin + 2.15.2 + + + + scala-compile-first + process-resources + + compile + + + + + + scala-test-compile + process-test-resources + + testCompile + + + + + + + org.codehaus.mojo + build-helper-maven-plugin + + + + add-source + generate-sources + + add-source + + + + src/main/scala + + + + + + + add-test-source + generate-test-sources + + add-test-source + + + + src/test/scala + + + + + + + + + org.apache.maven.plugins + maven-eclipse-plugin + 2.10 + + true + true + + org.scala-ide.sdt.core.scalanature + org.eclipse.jdt.core.javanature + + + org.scala-ide.sdt.core.scalabuilder + + + org.scala-ide.sdt.launching.SCALA_CONTAINER + org.eclipse.jdt.launching.JRE_CONTAINER + + + + org.scala-lang:scala-library + org.scala-lang:scala-compiler + + + **/*.scala + **/*.java + + + + + + + org.apache.maven.plugins + maven-surefire-plugin + 2.19.1 + + + **/*.java + **/*.scala + + + + + + org.apache.maven.plugins + maven-shade-plugin + 2.4.2 + + + package + + shade + + + + + + + *:* + + META-INF/*.SF + META-INF/*.DSA + META-INF/*.RSA + + + + zto-${project.artifactId}-${project.version} + + + + + \ No newline at end of file diff --git a/docs/properties.md b/docs/properties.md new file mode 100644 index 0000000..7ba315d --- /dev/null +++ b/docs/properties.md @@ -0,0 +1,156 @@ + + +# fire框架参数 + +fire框架提供了很多参数,这些参数为程序带来了很大的灵活性。参数大体分为:fire框架参数、spark引擎参数、flink引擎参数、kafka参数、hbase参数等。详见以下列表: + +# 一、fire框架参数 + +| 参数 | 默认值 | 含义 | 生效版本 | 是否废弃 | +| --------------------------------------------------- | ------------------- | ------------------------------------------------------------ | -------- | -------- | +| fire.thread.pool.size | 5 | fire内置线程池大小 | 0.4.0 | 否 | +| fire.thread.pool.schedule.size | 5 | fire内置定时任务线程池大小 | 0.4.0 | 否 | +| fire.rest.enable | true | 用于配置是否启用fire框架内置的restful服务,可用于与平台系统做集成。 | 0.3.0 | 否 | +| fire.conf.show.enable | true | 是否打印非敏感的配置信息 | 0.1.0 | 否 | +| fire.rest.url.show.enable | false | 是否在日志中打印fire框架restful服务地址 | 0.3.0 | 否 | +| fire.rest.url.hostname | false | 是否启用hostname作为rest服务的访问地址 | 2.0.0 | 否 | +| fire.acc.enable | true | 是否启用fire框架内置的所有累加器 | 0.4.0 | 否 | +| fire.acc.log.enable | true | 是否启用fire框架日志累加器 | 0.4.0 | 否 | +| fire.acc.multi.counter.enable | true | 是否启用多值累加器 | 0.4.0 | 否 | +| fire.acc.multi.timer.enable | true | 是否启用时间维度累加器 | 0.4.0 | 否 | +| fire.log.enable | true | fire框架埋点日志开关,关闭以后将不再打印埋点日志 | 0.4.0 | 否 | +| fire.log.sql.length | 100 | 用于限定fire框架中sql日志的字符串长度 | 0.4.1 | 否 | +| fire.jdbc.storage.level | memory_and_disk_ser | fire框架针对jdbc操作后数据集的缓存策略,避免重复查询数据库 | 0.4.0 | 否 | +| fire.jdbc.query.partitions | 10 | 通过JdbcConnector查询后将数据集放到多少个分区中,需根据实际的结果集做配置 | 0.3.0 | 否 | +| fire.task.schedule.enable | true | 是否启用fire框架定时任务,基于quartz实现 | 0.4.0 | 否 | +| fire.dynamic.conf.enable | true | 是否启用动态配置功能,fire框架允许在运行时更新用户配置信息,比如:rdd.repartition(this.conf.getInt(count)),此处可实现动态的改变分区大小,实现动态调优。 | 0.4.0 | 否 | +| fire.restful.max.thread | 8 | fire框架rest接口服务最大线程数,如果平台调用fire接口比较频繁,建议调大。 | 0.4.0 | 否 | +| fire.quartz.max.thread | 8 | quartz最大线程池大小,如果任务中的定时任务比较多,建议调大。 | 0.4.0 | 否 | +| fire.acc.log.min.size | 500 | 收集日志记录保留的最小条数。 | 0.4.0 | 否 | +| fire.acc.log.max.size | 1000 | 收集日志记录保留的最大条数。 | 0.4.0 | 否 | +| fire.acc.timer.max.size | 1000 | timer累加器保留最大的记录数 | 0.4.0 | 否 | +| fire.acc.timer.max.hour | 12 | timer累加器清理几小时之前的记录 | 0.4.0 | 否 | +| fire.acc.env.enable | true | env累加器开关 | 0.4.0 | 否 | +| fire.acc.env.max.size | 500 | env累加器保留最多的记录数 | 0.4.0 | 否 | +| fire.acc.env.min.size | 100 | env累加器保留最少的记录数 | 0.4.0 | 否 | +| fire.scheduler.blacklist | | 定时调度任务黑名单,配置的value为定时任务方法名,多个以逗号分隔,配置黑名单的方法将不会被quartz定时调度。 | 0.4.1 | 否 | +| fire.conf.print.blacklist | .map.,pass,secret | 配置打印黑名单,含有配置中指定的片段将不会被打印,也不会被展示到spark&flink的webui中。 | 0.4.2 | 否 | +| fire.restful.port.retry_num | 3 | 启用fire restserver可能会因为端口冲突导致失败,通过该参数可允许fire重试几次。 | 1.0.0 | 否 | +| fire.restful.port.retry_duration | 1000 | 端口重试间隔时间(ms) | 1.0.0 | 否 | +| fire.log.level.conf.org.apache.spark | info | 用于设置某个包的日志级别,默认将spark包所有的类日志级别设置为info | 1.0.0 | 否 | +| fire.deploy_conf.enable | true | 是否进行累加器的分布式初始化 | 0.4.0 | 否 | +| fire.exception_bus.size | 1000 | 用于限制每个jvm实例内部queue用于存放异常对象数最大大小,避免队列过大造成内存溢出 | 2.0.0 | 否 | +| fire.buried_point.datasource.enable | true | 是否开启数据源埋点,开启后fire将自动采集任务用到的数据源信息(kafka、jdbc、hbase、hive等)。 | 2.0.0 | 否 | +| fire.buried_point.datasource.max.size | 200 | 用于存放埋点的队列最大大小,超过该大小将会被丢弃 | 2.0.0 | 否 | +| fire.buried_point.datasource.initialDelay | 30 | 定时解析埋点SQL的初始延迟(s) | 2.0.0 | 否 | +| fire.buried_point.datasource.period | 60 | 定时解析埋点SQL的执行频率(s) | 2.0.0 | 否 | +| fire.buried_point.datasource.map.tidb | 4000 | 用于jdbc url的识别,当无法通过driver class识别数据源时,将从url中的端口号进行区分,不同数据配置使用统一的前缀:fire.buried_point.datasource.map. | 2.0.0 | 否 | +| fire.conf.adaptive.prefix | true | 是否开启配置自适应前缀,自动为配置加上引擎前缀(spark.\|flink.) | 2.0.0 | 否 | +| fire.user.common.conf | common.properties | 用户统一配置文件,允许用户在该配置文件中存放公共的配置信息,优先级低于任务配置文件(多个以逗号分隔) | 2.0.0 | 否 | +| fire.shutdown.auto.exit | false | 是否在调用shutdown方法时主动退出jvm进程,如果为true,则执行到this.stop方法,关闭上下文信息,回收线程池后将调用System.exit(0)强制退出进程。 | 2.0.0 | 否 | +| fire.kafka.cluster.map.test | ip1:9092,ip2:9092 | kafka集群名称与集群地址映射,便于用户配置中通过别名即可消费指定的kafka。比如:kafka.brokers.name=test则表明消费ip1:9092,ip2:9092这个kafka集群。当然,也支持直接配置url:kafka.brokers.name=ip1:9092,ip2:9092。 | 0.1.0 | 否 | +| fire.hive.default.database.name | tmp | 默认的hive数据库 | 0.1.0 | 否 | +| fire.hive.table.default.partition.name | ds | 默认的hive分区字段名称 | 0.1.0 | 否 | +| fire.hive.cluster.map.test | thrift://ip:9083 | 测试集群hive metastore地址(别名:test),任务中就可以通过fire.hive.cluster=test这种配置方式指定连接test对应的thrift server地址。 | | | +| fire.hbase.batch.size | 10000 | 单个线程读写HBase的数据量 | 0.1.0 | 否 | +| fire.hbase.storage.level | memory_and_disk_ser | fire框架针对hbase操作后数据集的缓存策略,避免因懒加载或其他原因导致的重复读取hbase问题,降低hbase压力。 | 0.3.2 | 否 | +| fire.hbase.scan.partitions | -1 | 通过HBase scan后repartition的分区数,需根据scan后的数据量做配置,-1表示不生效。 | 0.3.2 | 否 | +| fire.hbase.table.exists.cache.enable | true | 是否开启HBase表存在判断的缓存,开启后表存在判断将避免大量的connection消耗 | 2.0.0 | 否 | +| fire.hbase.table.exists.cache.reload.enable | true | 是否开启HBase表存在列表缓存的定时更新任务,避免hbase表被drop导致报错。 | 2.0.0 | 否 | +| fire.hbase.table.exists.cache.initialDelay | 60 | 定时刷新缓存HBase表任务的初始延迟(s) | 2.0.0 | 否 | +| fire.hbase.table.exists.cache.period | 600 | 定时刷新缓存HBase表任务的执行频率(s) | 2.0.0 | 否 | +| fire.hbase.cluster.map.test | zk1:2181,zk2:2181 | 测试集群hbase的zk地址(别名:test) | 2.0.0 | 否 | +| fire.hbase.conf.hbase.zookeeper.property.clientPort | 2181 | hbase connection 配置,约定以:fire.hbase.conf.开头,比如:fire.hbase.conf.hbase.rpc.timeout对应hbase中的配置为hbase.rpc.timeout | 2.0.0 | 否 | +| fire.config_center.enable | true | 是否在任务启动时从配置中心获取配置文件,以便实现动态覆盖jar包中的配置信息。 | 1.0.0 | 否 | +| fire.config_center.local.enable | false | 本地运行环境下(Windows、Mac)是否调用配置中心接口获取配置信息。 | 1.0.0 | 否 | +| fire.config_center.register.conf.secret | | 配置中心接口调用秘钥 | 1.0.0 | 否 | +| fire.config_center.register.conf.prod.address | | 配置中心接口地址 | 0.4.1 | 否 | + +# 二、Spark引擎参数 + +| 参数 | 默认值 | 含义 | 生效版本 | 是否废弃 | +| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | -------- | +| spark.appName | | spark的应用名称,为空则取默认获取类名 | 0.1.0 | 否 | +| spark.local.cores | * | spark local模式下使用多少core运行,默认为local[*],自动根据当前pc的cpu核心数设置 | 0.4.1 | 否 | +| spark.chkpoint.dir | | spark checkpoint目录地址 | 0.1.0 | 否 | +| spark.log.level | WARN | spark的日志级别 | 0.1.0 | 否 | +| spark.fire.scheduler.blacklist | jvmMonitor | 定时任务黑名单,指定到@Scheduled所修饰的方法名,多个以逗号分隔。当配置了黑名单后,该定时任务将不会被定时调用。 | 0.4.0 | 否 | +| spark.kafka.group.id | 指定spark消费kafka的groupId | kafka的groupid,为空则取类名 | 0.1.0 | 否 | +| spark.kafka.brokers.name | | 用于配置任务消费的kafka broker地址,如果通过fire.kafka.cluster.map.xxx指定了broker别名,则此处也可以填写别名。 | 0.1.0 | 否 | +| spark.kafka.topics | | 消费的topic列表,多个以逗号分隔 | 0.1.0 | 否 | +| spark.kafka.starting.offsets | latest | 用于配置启动时的消费位点,默认取最新 | 0.1.0 | 否 | +| spark.kafka.failOnDataLoss | true | 数据丢失时执行失败 | 0.1.0 | 否 | +| spark.kafka.enable.auto.commit | false | 是否启用自动commit kafka的offset | 0.4.0 | 否 | +| spark.kafka.conf.xxx | | 以spark.kafka.conf开头加上kafka参数,则可用于设置kafka相关的参数。比如:spark.kafka.conf.request.timeout.ms对应kafka的request.timeout.ms参数。 | 0.4.0 | 否 | +| spark.hive.cluster | | 用于配置spark连接的hive thriftserver地址,支持url和别名两种配置方式。别名需要事先通过fire.hive.cluster.map.别名 = thrift://ip:9083指定。 | 0.1.0 | 否 | +| spark.rocket.cluster.map.别名 | ip:9876 | rocketmq别名列表 | 1.0.0 | 否 | +| spark.rocket.conf.xxx | | 以spark.rocket.conf开头的配置支持所有rocket client的配置 | 1.0.0 | 否 | +| spark.hdfs.ha.enable | true | 是否启用hdfs的ha配置,避免将hdfs-site.xml、core-site.xml放到resources中导致多hadoop集群hdfs不灵活的问题。同时也可以避免引namenode维护导致spark任务挂掉的问题。 | 1.0.0 | 否 | +| spark.hdfs.ha.conf.test.fs.defaultFS | hdfs://nameservice1 | 对应fs.defaultFS,其中test与fire.hive.cluster.map.test中指定的别名test相对应,当通过fire.hive.cluster=test指定读写test这个hive时,namenode的ha将生效。 | 1.0.0 | 否 | +| spark.hdfs.ha.conf.test.dfs.nameservices | nameservice1 | 对应dfs.nameservices | 1.0.0 | 否 | +| spark.hdfs.ha.conf.test.dfs.ha.namenodes.nameservice1 | namenode5231,namenode5229 | 对应dfs.ha.namenodes.nameservice1 | 1.0.0 | 否 | +| spark.hdfs.ha.conf.test.dfs.namenode.rpc-address.nameservice1.namenode5231 | ip:8020 | 对应dfs.namenode.rpc-address.nameservice1.namenode5231 | 1.0.0 | 否 | +| spark.hdfs.ha.conf.test.dfs.namenode.rpc-address.nameservice1.namenode5229 | ip2:8020 | 对应dfs.namenode.rpc-address.nameservice1.namenode5229 | 1.0.0 | 否 | +| spark.hdfs.ha.conf.test.dfs.client.failover.proxy.provider.nameservice1 | org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider | 对应dfs.client.failover.proxy.provider.nameservice1 | 1.0.0 | 否 | +| spark.impala.connection.url | jdbc:hive2://ip:21050/;auth=noSasl | impala jdbc地址 | 0.1.0 | 否 | +| spark.impala.jdbc.driver.class.name | org.apache.hive.jdbc.HiveDriver | impala jdbc驱动 | 0.1.0 | 否 | +| spark.datasource.options. | | 以此开头的配置将被加载到datasource api的options中 | 2.0.0 | 否 | +| spark.datasource.format | | datasource api的format | 2.0.0 | 否 | +| spark.datasource.saveMode | Append | datasource api的saveMode | 2.0.0 | 否 | +| spark.datasource.saveParam | | 用于dataFrame.write.format.save()参数 | 2.0.0 | 否 | +| spark.datasource.isSaveTable | false | 用于决定调用save(path)还是saveAsTable | 2.0.0 | 否 | +| spark.datasource.loadParam | | 用于spark.read.format.load()参数 | 2.0.0 | 否 | + +# 三、Flink引擎参数 + +| 参数 | 默认值 | 含义 | 生效版本 | 是否废弃 | +| ------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | -------- | +| flink.appName | | flink的应用名称,为空则取类名 | 1.0.0 | 否 | +| flink.kafka.group.id | | kafka的groupid,为空则取类名 | 1.0.0 | 否 | +| flink.kafka.brokers.name | | 用于配置任务消费的kafka broker地址,如果通过fire.kafka.cluster.map.xxx指定了broker别名,则此处也可以填写别名。 | 1.0.0 | 否 | +| flink.kafka.topics | | 消费的kafka topic列表,多个以逗号分隔 | 1.0.0 | 否 | +| flink.kafka.starting.offsets | | 用于配置启动时的消费位点,默认取最新 | 1.0.0 | 否 | +| flink.kafka.failOnDataLoss | true | 数据丢失时执行失败 | 1.0.0 | 否 | +| flink.kafka.enable.auto.commit | false | 是否启用自动提交kafka offset | 1.0.0 | 否 | +| flink.kafka.CommitOffsetsOnCheckpoints | true | 是否在checkpoint时记录offset值 | 1.0.0 | 否 | +| flink.kafka.StartFromTimestamp | 0 | 设置从指定时间戳位置开始消费kafka | 1.0.0 | 否 | +| flink.kafka.StartFromGroupOffsets | false | 从topic中指定的group上次消费的位置开始消费,必须配置group.id参数 | 1.0.0 | 否 | +| flink.log.level | WARN | 默认的日志级别 | 1.0.0 | 否 | +| flink.hive.cluster | | 用于配置flink读写的hive集群别名 | 1.0.0 | 否 | +| flink.hive.version | | 指定hive版本号 | 1.0.0 | 否 | +| flink.default.database.name | tmp | 默认的hive数据库 | 1.0.0 | 否 | +| flink.default.table.partition.name | ds | 默认的hive分区字段名称 | 1.0.0 | 否 | +| flink.hive.catalog.name | hive | hive的catalog名称 | 1.0.0 | 否 | +| flink.fire.hive.site.path.map.别名 | test | /path/to/hive-site-path/ | 1.0.0 | 否 | +| flink.hbase.cluster | test | 读写的hbase集群zk地址 | 1.0.0 | 否 | +| flink.max.parallelism | | 用于配置flink的max parallelism | 1.0.0 | 否 | +| flink.default.parallelism | | 用于配置任务默认的parallelism | 1.0.0 | 否 | +| flink.stream.checkpoint.interval | -1 | checkpoint频率,-1表示关闭 | 1.0.0 | 否 | +| flink.stream.checkpoint.mode | EXACTLY_ONCE | checkpoint的模式:EXACTLY_ONCE/AT_LEAST_ONCE | 1.0.0 | 否 | +| flink.stream.checkpoint.timeout | 600000 | checkpoint超时时间,单位:毫秒 | 1.0.0 | 否 | +| flink.stream.checkpoint.max.concurrent | 1 | 同时checkpoint操作的并发数 | 1.0.0 | 否 | +| flink.stream.checkpoint.min.pause.between | 0 | 两次checkpoint的最小停顿时间 | 1.0.0 | 否 | +| flink.stream.checkpoint.prefer.recovery | false | 如果有更近的checkpoint时,是否将作业回退到该检查点 | 1.0.0 | 否 | +| flink.stream.checkpoint.tolerable.failure.number | 0 | 可容忍checkpoint失败的次数,默认不允许失败 | 1.0.0 | 否 | +| flink.stream.checkpoint.externalized | RETAIN_ON_CANCELLATION | 当cancel job时保留checkpoint | 1.0.0 | 否 | +| flink.sql.log.enable | false | 是否打印组装with语句后的flink sql,由于with表达式中可能含有敏感信息,默认为关闭 | 2.0.0 | 否 | +| flink.sql.with.xxx | flink.sql.with.connector=jdbc flink.sql.with.url=jdbc:mysql://ip:3306/db | 以flink.sql.with.开头的配置,用于sql语句的with表达式。通过this.fire.sql(sql, keyNum)即可自动读取并映射成with表达式的sql。避免sql中的with表达式硬编码到代码中,提高灵活性。 | 2.0.0 | 否 | +| flink.sql_with.replaceMode.enable | false | 是否启用配置文件中with强制替换sql中已有的with表达式,如果启用,则会强制替换掉代码中sql的with列表,达到最大的灵活性。 | 2.0.0 | 否 | + diff --git a/docs/restful.md b/docs/restful.md new file mode 100644 index 0000000..2c1c4e6 --- /dev/null +++ b/docs/restful.md @@ -0,0 +1,45 @@ + + +# fire内置的restful接口 + +fire框架在提供丰富好用的api给开发者的同时,也提供了大量的restful接口给大数据实时计算平台。通过对外暴露的restful接口,可以将每个任务与实时平台进行深入绑定,为平台建设提供了更大的想象空间。其中包括:**实时热重启接口、动态批次时间调整接口、sql在线调试接口**等。 + +| 引擎 | 接口 | 含义 | +| ----- | ---------------------------- | ------------------------------------------------------------ | +| spark | /system/kill | 用于kill 任务自身。 | +| spark | /system/cancelJob | 生产环境中,通常会禁用掉spark webui的kill功能,但有时任务owner有kill的需求,为了满足此类需求,fire通过接口的方式将kill功能暴露给平台,由平台控制权限并完成kill job的触发。 | +| spark | /system/cancelStage | 同job的kill功能,该接口用于kill指定的stage。 | +| spark | /system/sql | 该接口允许用户传递sql给spark任务执行,可用于sql的动态调试,支持在任务开发阶段spark临时表与hive表的关联,降低sql开发的人力成本。 | +| spark | /system/sparkInfo | 用户获取当前spark任务的配置信息。 | +| spark | /system/counter | 用于获取累加器的值。 | +| spark | /system/multiCounter | 用于获取多值累加器的值。 | +| spark | /system/multiTimer | 用于获取时间维度多值累加器的值。 | +| spark | /system/log | 用于获取日志信息,平台可调用该接口获取日志并进行日志展示。 | +| spark | /system/env | 获取运行时状态信息,包括GC、jvm、thread、memory、cpu等 | +| spark | /system/listDatabases | 用于列举当前spark任务catalog中所有的数据库,包括hive库等。 | +| spark | /system/listTables | 用于列举指定库下所有的表信息。 | +| spark | /system/listColumns | 用于列举某张表的所有字段信息。 | +| spark | /system/listFunctions | 用于列举当前任务支持的函数。 | +| spark | /system/setConf | 用于配置热覆盖,在运行时动态修改指定的配置信息。比如动态修改spark streaming某个rdd的分区数,实现动态调优的目的。 | +| spark | /system/datasource | 用于获取当前任务使用到的数据源信息、表信息等。支持jdbc、hbase、kafka、hive等众多组件,可用于和平台集成,做实时血缘关系。 | +| spark | /system/streaming/hotRestart | spark streaming热重启接口,可以动态的修改运行中的spark streaming的批次时间。 | +| flink | /system/flink/kill | 用于kill flink流式任务。 | +| flink | /system/flink/dataSource | 用于收集flink任务使用到的数据源。 | + diff --git a/docs/rocketmq.md b/docs/rocketmq.md new file mode 100644 index 0000000..cc9ed8a --- /dev/null +++ b/docs/rocketmq.md @@ -0,0 +1,114 @@ + + +# RocketMQ消息接入 + +### 一、API使用 + +使用fire框架可以很方便的消费rocketmq中的数据,并且支持在同一任务中消费多个rocketmq集群的多个topic。核心代码仅一行: + +```scala +// Spark Streaming或flink streaming任务 +val dstream = this.fire.createRocketMqPullStream() +``` + +以上的api均支持rocketmq相关参数的传入,但fire推荐将这些集群信息放到配置文件中,增强代码可读性,提高代码简洁性与灵活性。 + +### 二、flink sql connector + +```scala +this.fire.sql(""" + |CREATE table source ( + | id bigint, + | name string, + | age int, + | length double, + | data DECIMAL(10, 5) + |) WITH + | ( + | 'connector' = 'fire-rocketmq', + | 'format' = 'json', + | 'rocket.brokers.name' = 'zms', + | 'rocket.topics' = 'fire', + | 'rocket.group.id' = 'fire', + | 'rocket.consumer.tag' = '*' + | ) + |""".stripMargin) +``` + +**with参数的使用:** + +rocketmq sql connector中的with参数复用了api中的配置参数,如果需要进行rocketmq-client相关参数设置,可以以rocket.conf.为前缀,后面跟上rocketmq调优参数即可。 + +### 二、RocketMQ配置 + +```properties +spark.rocket.brokers.name = localhost:9876;localhost02:9876 +spark.rocket.topics = topic_name +spark.rocket.consumer.instance = FireFramework +spark.rocket.group.id = groupId +spark.rocket.pull.max.speed.per.partition = 15000 +spark.rocket.consumer.tag = 1||2||3||4||5||8||44||45 +# 以spark.rocket.conf开头的配置支持所有rocket client的配置 +#spark.rocket.conf.pull.max.speed.per.partition = 5000 +``` + +### 三、多RocketMQ多topic消费 + +实际生产场景下,会有同一个任务消费多个RocketMQ集群,多个topic的情况。面对这种需求,fire是如何应对的呢?fire框架约定,配置的key后缀区分不同的RocketMQ配置项,详见以下配置列表: + +```properties +# 以下配置中指定了两个RocketMQ集群信息 +spark.rocket.brokers.name = localhost:9876;localhost02:9876 +spark.rocket.topics = topic_name +spark.rocket.consumer.instance = FireFramework +spark.rocket.group.id = groupId +# 注意key的数字后缀,对应代码中的keyNum=2 +spark.rocket.brokers.name2 = localhost:9876;localhost02:9876 +spark.rocket.topics2 = topic_name2 +spark.rocket.consumer.instance2 = FireFramework +spark.rocket.group.id2 = groupId2 +``` + +那么,代码中是如何关联带有数字后缀的key的呢?答案是通过keyNum参数来指定: + +```scala +// 对应spark.rocket.brokers.name这个RocketMQ集群 +val dstream = this.fire.createRocketMqPullStream(keyNum=1) +// 对应spark.rocket.brokers.name2这个RocketMQ集群 +val dstream2 = this.fire.createRocketMqPullStream(keyNum=2) +``` + +### 四、RocketMQ-client参数调优 + +有时,需要对RocketMQ消费进行client端的调优,fire支持所有的RocketMQ-client参数,这些参数只需要添加到配置文件中即可生效: + +```properties +# 以spark.rocket.conf开头的配置支持所有rocket client的配置 +spark.rocket.conf.pull.max.speed.per.partition = 5000 +``` + +### 五、代码示例 + +[1. spark示例代码](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/RocketTest.scala) + +[2. flink streaming示例代码](../fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/rocketmq/RocketTest.scala) + +[3. flink sql connector示例代码](../fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/rocketmq/RocketMQConnectorTest.scala) + diff --git a/docs/schedule.md b/docs/schedule.md new file mode 100644 index 0000000..f5d7279 --- /dev/null +++ b/docs/schedule.md @@ -0,0 +1,54 @@ + + +# 定时任务 + +fire框架内部进一步封装了quart进行定时任务的声明与调度,使用方法和spring的@Scheduled注解类似。参考:[示例程序](../fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/schedule/ScheduleTest.scala)。基于该功能,可以很容易实现诸如定时加载与更新维表等功能,十分方便。 + +```scala + /** + * 声明了@Scheduled注解的方法是定时任务方法,会周期性执行 + * + * @cron cron表达式 + * @scope 默认同时在driver端和executor端执行,如果指定了driver,则只在driver端定时执行 + * @concurrent 上一个周期定时任务未执行完成时是否允许下一个周期任务开始执行 + * @startAt 用于指定第一次开始执行的时间 + * @initialDelay 延迟多长时间开始执行第一次定时任务 + */ + @Scheduled(cron = "0/5 * * * * ?", scope = "driver", concurrent = false, startAt = "2021-01-21 11:30:00", initialDelay = 60000) + def loadTable: Unit = { + this.logger.info("更新维表动作") + } + + /** + * 只在driver端执行,不允许同一时刻同时执行该方法 + * startAt用于指定首次执行时间 + */ + @Scheduled(cron = "0/5 * * * * ?", scope = "all", concurrent = false) + def test2: Unit = { + this.logger.info("executorId=" + SparkUtils.getExecutorId + "=方法 test2() 每5秒执行" + DateFormatUtils.formatCurrentDateTime()) + } + + // 每天凌晨4点01将锁标志设置为false,这样下一个批次就可以先更新维表再执行sql + @Scheduled(cron = "0 1 4 * * ?") + def updateTableJob: Unit = this.lock.compareAndSet(true, false) +``` + +**注:**目前定时任务不支持flink任务在每个TaskManager端执行。 + diff --git a/docs/threadpool.md b/docs/threadpool.md new file mode 100644 index 0000000..e423dfa --- /dev/null +++ b/docs/threadpool.md @@ -0,0 +1,73 @@ + + +# 线程池与并发计算 + +集成fire后,可以很简单的在程序内部进行多个任务的提交,充分榨干申请到的资源。 + +```scala +/** + * 在driver中启用线程池的示例 + * 1. 开启子线程执行一个任务 + * 2. 开启子线程执行周期性任务 + */ +object ThreadTest extends BaseSparkStreaming { + + def main(args: Array[String]): Unit = { + // 第二个参数为true表示开启checkPoint机制 + this.init(10L, false) + } + + /** + * Streaming的处理过程强烈建议放到process中,保持风格统一 + * 注:此方法会被自动调用,在以下两种情况下,必须将逻辑写在process中 + * 1. 开启checkpoint + * 2. 支持streaming热重启(可在不关闭streaming任务的前提下修改batch时间) + */ + override def process: Unit = { + // 第一次执行时延迟两分钟,每隔1分钟执行一次showSchema函数 + this.runAsSchedule(this.showSchema, 1, 1) + // 以子线程方式执行print方法中的逻辑 + this.runAsThread(this.print) + + val dstream = this.fire.createKafkaDirectStream() + dstream.foreachRDD(rdd => { + println("count--> " + rdd.count()) + }) + + this.fire.start + } + + /** + * 以子线程方式执行一次 + */ + def print: Unit = { + println("==========子线程执行===========") + } + + /** + * 查看表结构信息 + */ + def showSchema: Unit = { + println(s"${DateFormatUtils.formatCurrentDateTime()}--------> atFixRate <----------") + this.fire.sql("use tmp") + this.fire.sql("show tables").show(false) + } +} +``` diff --git a/fire-common/pom.xml b/fire-common/pom.xml new file mode 100644 index 0000000..ab6ccb0 --- /dev/null +++ b/fire-common/pom.xml @@ -0,0 +1,79 @@ + + + + + 4.0.0 + fire-common_${scala.binary.version} + jar + fire-common + + + com.zto.fire + fire-parent_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + + org.apache.kafka + kafka_${scala.binary.version} + 0.10.2.0 + ${maven.scope} + + + commons-httpclient + commons-httpclient + 3.1 + + + org.apache.httpcomponents + httpclient + 4.3.3 + + + org.apache.httpcomponents + httpcore + 4.4.3 + + + org.apache.htrace + htrace-core + 3.2.0-incubating + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-common/src/main/java/com/zto/fire/common/anno/FieldName.java b/fire-common/src/main/java/com/zto/fire/common/anno/FieldName.java new file mode 100644 index 0000000..cc5f2e1 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/anno/FieldName.java @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.anno; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * 用于标识该field对应数据库中的名称 + * Created by ChengLong on 2017-03-15. + */ +@Retention(RetentionPolicy.RUNTIME) +@Target({ElementType.TYPE, ElementType.FIELD}) +public @interface FieldName { + /** + * fieldName,映射到hbase中作为qualifier名称 + */ + String value() default ""; + + /** + * 列族名称 + */ + String family() default "info"; + + /** + * 不使用该字段,默认为使用 + */ + boolean disuse() default false; + + /** + * 是否可以为空 + */ + boolean nullable() default true; + + /** + * 是否为主键字段 + * @return + */ + boolean id() default false; + + /** + * HBase表的命名空间 + */ + String namespace() default "default"; + + /** + * 字段注释 + */ + String comment() default ""; +} diff --git a/fire-common/src/main/java/com/zto/fire/common/anno/Internal.java b/fire-common/src/main/java/com/zto/fire/common/anno/Internal.java new file mode 100644 index 0000000..87db397 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/anno/Internal.java @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.anno; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * For fire internal use only. + * + * @author ChengLong 2020-11-13 09:39:28 + */ +@Retention(RetentionPolicy.RUNTIME) +@Target({ElementType.TYPE, ElementType.FIELD, ElementType.METHOD}) +public @interface Internal { +} diff --git a/fire-common/src/main/java/com/zto/fire/common/anno/Rest.java b/fire-common/src/main/java/com/zto/fire/common/anno/Rest.java new file mode 100644 index 0000000..21a0328 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/anno/Rest.java @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.anno; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * 用于标识启用restful接口 + * + * @author ChengLong 2019-4-16 11:07:13 + */ +@Retention(RetentionPolicy.RUNTIME) +@Target({ElementType.TYPE, ElementType.FIELD}) +public @interface Rest { + + /** + * restful路径名 + * + * @return + */ + String value() default ""; + + /** + * 接口访问的方式: GET/POST + * + * @return + */ + String method() default "GET"; +} diff --git a/fire-common/src/main/java/com/zto/fire/common/anno/Scheduled.java b/fire-common/src/main/java/com/zto/fire/common/anno/Scheduled.java new file mode 100644 index 0000000..a196664 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/anno/Scheduled.java @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.anno; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * 定时任务注解,放在方法上,要求方法不带参数,且无返回值 + * 优先级:cron > fixedInterval startAt > initialDelay + * @author ChengLong 2019年11月4日 21:12:06 + * @since 0.3.5 + */ +@Retention(RetentionPolicy.RUNTIME) +@Target({ElementType.METHOD}) +public @interface Scheduled { + + /** + * cron表达式 + */ + String cron() default ""; + + /** + * 指定是否允许并发执行同一个任务 + * 默认为true,表示同一时间范围内同一个任务可以有多个实例并行执行 + */ + boolean concurrent() default true; + + /** + * 按照给定的时间间隔(毫秒)周期性执行 + */ + long fixedInterval() default -1; + + /** + * 周期性执行的次数,-1表示无限重复执行 + */ + long repeatCount() default -1; + + /** + * 第一次延迟多久(毫秒)执行,0表示立即执行 + */ + long initialDelay() default -1; + + /** + * 用于指定首次开始执行的时间,优先级高于initialDelay + * 日期的格式为:yyyy-MM-dd HH:mm:ss + */ + String startAt() default ""; + + /** + * 定时任务的作用域,driver、executor、all + * 默认仅driver端执行 + */ + String scope() default "driver"; +} diff --git a/fire-common/src/main/java/com/zto/fire/common/anno/TestStep.java b/fire-common/src/main/java/com/zto/fire/common/anno/TestStep.java new file mode 100644 index 0000000..5b5a711 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/anno/TestStep.java @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.anno; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * 用于标识单元测试的测试步骤 + * + * @author ChengLong 2020-11-13 09:39:28 + */ +@Retention(RetentionPolicy.RUNTIME) +@Target(ElementType.METHOD) +public @interface TestStep { + + /** + * 测试步骤 + */ + int step() default 1; + + /** + * 用于单元测试描述 + */ + String desc() default "单元测试说明"; +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/rest/ResultMsg.java b/fire-common/src/main/java/com/zto/fire/common/bean/rest/ResultMsg.java new file mode 100644 index 0000000..89c250f --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/rest/ResultMsg.java @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.rest; + +import com.zto.fire.common.enu.ErrorCode; +import com.zto.fire.common.util.JSONUtils; + +/** + * 返回消息 + * + * @author ChengLong 2018年6月12日 13:42:23 + */ +public class ResultMsg { + // 消息体 + private Object content; + // 系统错误码 + private ErrorCode code; + // 错误描述 + private String msg; + + /** + * 验证是否成功 + * + * @param resultMsg + * @return true: 成功 false 失败 + */ + public static boolean isSuccess(ResultMsg resultMsg) { + return resultMsg != null && resultMsg.getCode() == ErrorCode.SUCCESS; + } + + /** + * 获取描述信息 + * + * @param resultMsg + * @return 描述信息 + */ + public static String getMsg(ResultMsg resultMsg) { + if (resultMsg != null) { + return resultMsg.getMsg(); + } else { + return ""; + } + } + + /** + * 获取状态码 + * + * @return 状态码 + */ + public static ErrorCode getCode(ResultMsg resultMsg) { + if (resultMsg != null) { + return resultMsg.getCode(); + } + return ErrorCode.ERROR; + } + + public ResultMsg() { + } + + public ResultMsg(String content, ErrorCode code, String msg) { + this.content = content; + this.code = code; + this.msg = msg; + } + + public Object getContent() { + return content; + } + + public void setContent(Object content) { + this.content = content; + } + + public ErrorCode getCode() { + return code; + } + + public void setCode(ErrorCode code) { + this.code = code; + } + + public String getMsg() { + return msg; + } + + public void setMsg(String msg) { + this.msg = msg; + } + + /** + * 构建成功消息 + */ + public String buildSuccess(Object content, String msg) { + this.content = content; + this.code = ErrorCode.SUCCESS; + this.msg = msg; + return this.toString(); + } + + /** + * 构建失败消息 + */ + public String buildError(String msg, ErrorCode errorCode) { + this.content = ""; + this.code = errorCode; + this.msg = msg; + return this.toString(); + } + + @Override + public String toString() { + return JSONUtils.toJSONString(this); + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/ColumnMeta.java b/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/ColumnMeta.java new file mode 100644 index 0000000..edd9504 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/ColumnMeta.java @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.rest.spark; + +/** + * 用于封装字段元数据 + * + * @author ChengLong 2019-9-2 13:19:06 + */ +public class ColumnMeta { + // 所在数据库名称 + protected String database; + // 表名 + protected String tableName; + // 字段描述 + protected String description; + // 字段名 + protected String columnName; + // 字段类型 + protected String dataType; + // 是否允许为空 + protected Boolean nullable; + // 是否为分区字段 + protected Boolean isPartition; + // 是否为bucket字段 + protected Boolean isBucket; + + public ColumnMeta() { + } + + private ColumnMeta(Builder builder) { + this.nullable = builder.nullable; + this.tableName = builder.tableName; + this.columnName = builder.columnName; + this.database = builder.database; + this.dataType = builder.dataType; + this.description = builder.description; + this.isBucket = builder.isBucket; + this.isPartition = builder.isPartition; + } + + public String getDatabase() { + return database; + } + + public String getTableName() { + return tableName; + } + + public String getDescription() { + return description; + } + + public String getColumnName() { + return columnName; + } + + public String getDataType() { + return dataType; + } + + public Boolean getNullable() { + return nullable; + } + + public Boolean getPartition() { + return isPartition; + } + + public Boolean getBucket() { + return isBucket; + } + + public static class Builder extends ColumnMeta { + public Builder setDescription(String description) { + this.description = description; + return this; + } + + public Builder setColumnName(String columnName) { + this.columnName = columnName; + return this; + } + + public Builder setDataType(String dataType) { + this.dataType = dataType; + return this; + } + + public Builder setNullable(Boolean nullable) { + this.nullable = nullable; + return this; + } + + public Builder setPartition(Boolean partition) { + isPartition = partition; + return this; + } + + public Builder setBucket(Boolean bucket) { + isBucket = bucket; + return this; + } + + public Builder setDatabase(String database) { + this.database = database; + return this; + } + + public Builder setTableName(String tableName) { + this.tableName = tableName; + return this; + } + + public ColumnMeta build() { + return new ColumnMeta(this); + } + } +} \ No newline at end of file diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/FunctionMeta.java b/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/FunctionMeta.java new file mode 100644 index 0000000..c509d47 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/FunctionMeta.java @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.rest.spark; + +/** + * 用于封装函数元数据信息 + * @author ChengLong 2019-9-2 16:50:50 + */ +public class FunctionMeta { + // 函数描述 + private String description; + // 数据库 + private String database; + // 函数名称 + private String name; + // 函数定义的类 + private String className; + // 是否为临时函数 + private Boolean isTemporary; + + public FunctionMeta() { + } + + public FunctionMeta(String description, String database, String name, String className, Boolean isTemporary) { + this.description = description; + this.database = database; + this.name = name; + this.className = className; + this.isTemporary = isTemporary; + } + + public String getDescription() { + return description; + } + + public void setDescription(String description) { + this.description = description; + } + + public String getDatabase() { + return database; + } + + public void setDatabase(String database) { + this.database = database; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public String getClassName() { + return className; + } + + public void setClassName(String className) { + this.className = className; + } + + public Boolean getTemporary() { + return isTemporary; + } + + public void setTemporary(Boolean temporary) { + isTemporary = temporary; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/SparkInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/SparkInfo.java new file mode 100644 index 0000000..af6c6f2 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/SparkInfo.java @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.rest.spark; + +import com.zto.fire.common.util.DateFormatUtils; + +import java.util.Map; +import java.util.Properties; + +/** + * 用于封装spark运行时的信息 + * @author ChengLong 2019-5-13 10:27:33 + */ +public class SparkInfo { + // spark应用名称 + private String appName; + // spark应用的类名 + private String className; + // common包的版本号 + private String fireVersion; + // spark conf信息 + private Map conf; + // 当前spark版本 + private String version; + // spark 运行模式 + private String master; + // spark 的 applicationId + private String applicationId; + // yarn 的 applicationAttemptId + private String applicationAttemptId; + // spark 的 webui地址 + private String ui; + // driver的进程id + private String pid; + // spark的运行时间 + private String uptime; + // 程序启动的起始时间 + private String startTime; + // 申请的每个executor的内存大小 + private String executorMemory; + // 申请的executor个数 + private String executorInstances; + // 申请的每个executor的cpu数 + private String executorCores; + // 申请的driver cpu数量 + private String driverCores; + // 申请的driver内存大小 + private String driverMemory; + // 申请的driver堆外内存大小 + private String driverMemoryOverhead; + // driver所在服务器ip + private String driverHost; + // driver占用的端口号 + private String driverPort; + // restful接口的端口号 + private String restPort; + // 申请的executor堆外内存大小 + private String executorMemoryOverhead; + // 当前spark应用申请的总内存大小(driver+executor+总的堆外内存) + private String memory; + // 当前spark应用申请的总的cpu数量(driver+executor) + private String cpu; + // streaming批次时间 + private String batchDuration; + // 当前driver系统时间 + private String timestamp = DateFormatUtils.formatCurrentDateTime(); + // 配置信息 + private Map properties; + + public String getAppName() { + return appName; + } + + public void setAppName(String appName) { + this.appName = appName; + } + + public String getClassName() { + return className; + } + + public void setClassName(String className) { + this.className = className; + } + + public String getFireVersion() { + return fireVersion; + } + + public void setFireVersion(String fireVersion) { + this.fireVersion = fireVersion; + } + + public Map getConf() { + return conf; + } + + public void setConf(Map conf) { + this.conf = conf; + } + + public String getVersion() { + return version; + } + + public void setVersion(String version) { + this.version = version; + } + + public String getMaster() { + return master; + } + + public void setMaster(String master) { + this.master = master; + } + + public String getApplicationId() { + return applicationId; + } + + public void setApplicationId(String applicationId) { + this.applicationId = applicationId; + } + + public String getApplicationAttemptId() { + return applicationAttemptId; + } + + public void setApplicationAttemptId(String applicationAttemptId) { + this.applicationAttemptId = applicationAttemptId; + } + + public String getUi() { + return ui; + } + + public void setUi(String ui) { + this.ui = ui; + } + + public String getPid() { + return pid; + } + + public void setPid(String pid) { + this.pid = pid; + } + + public String getUptime() { + return uptime; + } + + public void setUptime(String uptime) { + this.uptime = uptime; + } + + public String getStartTime() { + return startTime; + } + + public void setStartTime(String startTime) { + this.startTime = startTime; + } + + public String getExecutorMemory() { + return executorMemory; + } + + public void setExecutorMemory(String executorMemory) { + this.executorMemory = executorMemory; + } + + public String getExecutorInstances() { + return executorInstances; + } + + public void setExecutorInstances(String executorInstances) { + this.executorInstances = executorInstances; + } + + public String getExecutorCores() { + return executorCores; + } + + public void setExecutorCores(String executorCores) { + this.executorCores = executorCores; + } + + public String getDriverCores() { + return driverCores; + } + + public void setDriverCores(String driverCores) { + this.driverCores = driverCores; + } + + public String getDriverMemory() { + return driverMemory; + } + + public void setDriverMemory(String driverMemory) { + this.driverMemory = driverMemory; + } + + public String getDriverMemoryOverhead() { + return driverMemoryOverhead; + } + + public void setDriverMemoryOverhead(String driverMemoryOverhead) { + this.driverMemoryOverhead = driverMemoryOverhead; + } + + public String getDriverHost() { + return driverHost; + } + + public void setDriverHost(String driverHost) { + this.driverHost = driverHost; + } + + public String getDriverPort() { + return driverPort; + } + + public void setDriverPort(String driverPort) { + this.driverPort = driverPort; + } + + public String getExecutorMemoryOverhead() { + return executorMemoryOverhead; + } + + public void setExecutorMemoryOverhead(String executorMemoryOverhead) { + this.executorMemoryOverhead = executorMemoryOverhead; + } + + public String getMemory() { + return memory; + } + + public void setMemory(String memory) { + this.memory = memory; + } + + public String getCpu() { + return cpu; + } + + public void setCpu(String cpu) { + this.cpu = cpu; + } + + public String getBatchDuration() { + return batchDuration; + } + + public void setBatchDuration(String batchDuration) { + this.batchDuration = batchDuration; + } + + public String getTimestamp() { + return timestamp; + } + + public void setTimestamp(String timestamp) { + this.timestamp = timestamp; + } + + public String getRestPort() { + return restPort; + } + + public void setRestPort(String restPort) { + this.restPort = restPort; + } + + public Map getProperties() { + return properties; + } + + public void setProperties(Map properties) { + this.properties = properties; + } + + /** + * 计算cpu和内存总数 + */ + public void computeCpuMemory() { + this.memory = (Integer.parseInt(this.driverMemory.replace("g", "")) + Integer.parseInt(this.driverMemoryOverhead.replace("g", "")) + Integer.parseInt(this.executorInstances) * (Integer.parseInt(this.executorMemory.replace("g", "")) + Integer.parseInt(this.executorMemoryOverhead.replace("g", "")))) + "g"; + this.cpu = (Integer.parseInt(this.executorInstances) * Integer.parseInt(this.executorCores) + Integer.parseInt(this.driverCores)) + ""; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/TableMeta.java b/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/TableMeta.java new file mode 100644 index 0000000..ed4dfd0 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/rest/spark/TableMeta.java @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.rest.spark; + +/** + * 用于封装表的元数据 + * @author ChengLong 2019-9-2 13:11:56 + */ +public class TableMeta { + // 表的描述 + private String description; + // 所在数据库名称 + private String database; + // 表名 + private String tableName; + // 表的类型 + private String tableType; + // 是否为临时表 + private Boolean isTemporary; + + public String getDescription() { + return description; + } + + public void setDescription(String description) { + this.description = description; + } + + public String getDatabase() { + return database; + } + + public void setDatabase(String database) { + this.database = database; + } + + public String getTableName() { + return tableName; + } + + public void setTableName(String tableName) { + this.tableName = tableName; + } + + public String getTableType() { + return tableType; + } + + public void setTableType(String tableType) { + this.tableType = tableType; + } + + public Boolean getTemporary() { + return isTemporary; + } + + public void setTemporary(Boolean temporary) { + isTemporary = temporary; + } + + public TableMeta() { + } + + public TableMeta(String description, String database, String tableName, String tableType, Boolean isTemporary) { + this.description = description; + this.database = database; + this.tableName = tableName; + this.tableType = tableType; + this.isTemporary = isTemporary; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/rest/yarn/App.java b/fire-common/src/main/java/com/zto/fire/common/bean/rest/yarn/App.java new file mode 100644 index 0000000..79f2004 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/rest/yarn/App.java @@ -0,0 +1,303 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.rest.yarn; + +/** + * 用于解析调用yarn接口返回的json + * @author ChengLong 2019-5-15 17:50:06 + */ +public class App { + // yarn applicationId + private String id; + // yarn程序的启动用户 + private String user; + // yarn程序名称 + private String name; + // yarn的队列名称 + private String queue; + // 程序的状态 + private String state; + // 程序的最终状态 + private String finalStatus; + // 执行进度 + private Double progress; + // 程序的ui + private String trackingUI; + // 程序ui的url地址 + private String trackingUrl; + // 诊断 + private String diagnostics; + // 集群id + private Long clusterId; + // 程序类型(spark、mr) + private String applicationType; + // 程序的标签 + private String applicationTags; + // 程序启动时间 + private Long startedTime; + // 程序结束时间 + private Long finishedTime; + // 程序执行时间 + private Long elapsedTime; + // master 的日志路径 + private String amContainerLogs; + // master所在主机host名称 + private String amHostHttpAddress; + // 已分配的内存大小 + private Long allocatedMB; + // 已分配的cpu数量 + private Long allocatedVCores; + // 运行的container数量 + private Long runningContainers; + // 内存时间 + private Long memorySeconds; + // cpu时间 + private Long vcoreSeconds; + // 占用的内存大小 + private Long preemptedResourceMB; + // 占用的cpu数量 + private Long preemptedResourceVCores; + private Long numNonAMContainerPreempted; + private Long numAMContainerPreempted; + // yarn的日志聚合状态(NOT_START、SUCCEEDED) + private String logAggregationStatus; + + public String getId() { + return id; + } + + public void setId(String id) { + this.id = id; + } + + public String getUser() { + return user; + } + + public void setUser(String user) { + this.user = user; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public String getQueue() { + return queue; + } + + public void setQueue(String queue) { + this.queue = queue; + } + + public String getState() { + return state; + } + + public void setState(String state) { + this.state = state; + } + + public String getFinalStatus() { + return finalStatus; + } + + public void setFinalStatus(String finalStatus) { + this.finalStatus = finalStatus; + } + + public Double getProgress() { + return progress; + } + + public void setProgress(Double progress) { + this.progress = progress; + } + + public String getTrackingUI() { + return trackingUI; + } + + public void setTrackingUI(String trackingUI) { + this.trackingUI = trackingUI; + } + + public String getTrackingUrl() { + return trackingUrl; + } + + public void setTrackingUrl(String trackingUrl) { + this.trackingUrl = trackingUrl; + } + + public String getDiagnostics() { + return diagnostics; + } + + public void setDiagnostics(String diagnostics) { + this.diagnostics = diagnostics; + } + + public Long getClusterId() { + return clusterId; + } + + public void setClusterId(Long clusterId) { + this.clusterId = clusterId; + } + + public String getApplicationType() { + return applicationType; + } + + public void setApplicationType(String applicationType) { + this.applicationType = applicationType; + } + + public String getApplicationTags() { + return applicationTags; + } + + public void setApplicationTags(String applicationTags) { + this.applicationTags = applicationTags; + } + + public Long getStartedTime() { + return startedTime; + } + + public void setStartedTime(Long startedTime) { + this.startedTime = startedTime; + } + + public Long getFinishedTime() { + return finishedTime; + } + + public void setFinishedTime(Long finishedTime) { + this.finishedTime = finishedTime; + } + + public Long getElapsedTime() { + return elapsedTime; + } + + public void setElapsedTime(Long elapsedTime) { + this.elapsedTime = elapsedTime; + } + + public String getAmContainerLogs() { + return amContainerLogs; + } + + public void setAmContainerLogs(String amContainerLogs) { + this.amContainerLogs = amContainerLogs; + } + + public String getAmHostHttpAddress() { + return amHostHttpAddress; + } + + public void setAmHostHttpAddress(String amHostHttpAddress) { + this.amHostHttpAddress = amHostHttpAddress; + } + + public Long getAllocatedMB() { + return allocatedMB; + } + + public void setAllocatedMB(Long allocatedMB) { + this.allocatedMB = allocatedMB; + } + + public Long getAllocatedVCores() { + return allocatedVCores; + } + + public void setAllocatedVCores(Long allocatedVCores) { + this.allocatedVCores = allocatedVCores; + } + + public Long getRunningContainers() { + return runningContainers; + } + + public void setRunningContainers(Long runningContainers) { + this.runningContainers = runningContainers; + } + + public Long getMemorySeconds() { + return memorySeconds; + } + + public void setMemorySeconds(Long memorySeconds) { + this.memorySeconds = memorySeconds; + } + + public Long getVcoreSeconds() { + return vcoreSeconds; + } + + public void setVcoreSeconds(Long vcoreSeconds) { + this.vcoreSeconds = vcoreSeconds; + } + + public Long getPreemptedResourceMB() { + return preemptedResourceMB; + } + + public void setPreemptedResourceMB(Long preemptedResourceMB) { + this.preemptedResourceMB = preemptedResourceMB; + } + + public Long getPreemptedResourceVCores() { + return preemptedResourceVCores; + } + + public void setPreemptedResourceVCores(Long preemptedResourceVCores) { + this.preemptedResourceVCores = preemptedResourceVCores; + } + + public Long getNumNonAMContainerPreempted() { + return numNonAMContainerPreempted; + } + + public void setNumNonAMContainerPreempted(Long numNonAMContainerPreempted) { + this.numNonAMContainerPreempted = numNonAMContainerPreempted; + } + + public Long getNumAMContainerPreempted() { + return numAMContainerPreempted; + } + + public void setNumAMContainerPreempted(Long numAMContainerPreempted) { + this.numAMContainerPreempted = numAMContainerPreempted; + } + + public String getLogAggregationStatus() { + return logAggregationStatus; + } + + public void setLogAggregationStatus(String logAggregationStatus) { + this.logAggregationStatus = logAggregationStatus; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/ClassLoaderInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/ClassLoaderInfo.java new file mode 100644 index 0000000..f3bf3f6 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/ClassLoaderInfo.java @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import java.io.Serializable; +import java.lang.management.ClassLoadingMXBean; +import java.lang.management.ManagementFactory; + +/** + * 获取运行时class loader信息 + * @author ChengLong 2019年9月28日 19:56:18 + */ +public class ClassLoaderInfo implements Serializable { + private static final long serialVersionUID = 4958598582046079565L; + // 获取已加载的类数量 + private long loadedClassCount; + // 获取总的类加载数 + private long totalLoadedClassCount; + // 获取未被加载的类总数 + private long unloadedClassCount; + + private ClassLoaderInfo() {} + + public long getLoadedClassCount() { + return loadedClassCount; + } + + public long getTotalLoadedClassCount() { + return totalLoadedClassCount; + } + + public long getUnloadedClassCount() { + return unloadedClassCount; + } + + /** + * 获取类加载器相关信息 + */ + public static ClassLoaderInfo getClassLoaderInfo() { + ClassLoaderInfo classLoaderInfo = new ClassLoaderInfo(); + // 获取类加载器相关信息 + ClassLoadingMXBean classLoadingMXBean = ManagementFactory.getClassLoadingMXBean(); + classLoaderInfo.loadedClassCount = classLoadingMXBean.getLoadedClassCount(); + classLoaderInfo.totalLoadedClassCount = classLoadingMXBean.getTotalLoadedClassCount(); + classLoaderInfo.unloadedClassCount = classLoadingMXBean.getUnloadedClassCount(); + + return classLoaderInfo; + } + +} \ No newline at end of file diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/CpuInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/CpuInfo.java new file mode 100644 index 0000000..5cf5542 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/CpuInfo.java @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import com.sun.management.OperatingSystemMXBean; +import com.zto.fire.common.util.MathUtils; +import oshi.SystemInfo; +import oshi.hardware.CentralProcessor; +import oshi.hardware.CentralProcessor.TickType; +import oshi.hardware.HardwareAbstractionLayer; +import oshi.hardware.Sensors; +import oshi.util.FormatUtil; + +import java.io.Serializable; +import java.lang.management.ManagementFactory; + +/** + * 用于封装cpu运行时信息 + * + * @author ChengLong 2019-9-28 19:52:56 + */ +public class CpuInfo implements Serializable { + private static final long serialVersionUID = 7712733535989008368L; + // 系统cpu的负载 + private double cpuLoad; + // 当前jvm可用的处理器数量 + private int availableProcessors; + // 当前jvm占用的cpu时长 + private long processCpuTime; + // 当前jvm占用的cpu负载 + private double processCpuLoad; + // cpu温度 + private double temperature; + // cpu电压 + private double voltage; + // 风扇转速 + private int[] fanSpeeds; + // 物理cpu数 + private int physicalCpu; + // 逻辑cpu数 + private int logicalCpu; + // 运行时间 + private long uptime; + // io等待 + private long ioWait; + // 用户时长 + private long userTick; + // nice时长 + private long niceTick; + // 系统时长 + private long sysTick; + // 空闲时长 + private long idleTick; + // 中断时长 + private long irqTick; + // 软中断时长 + private long softIrqTick; + // cpu steal 时长 + private long stealTick; + // cpu平均负载 + private double[] loadAverage; + // 最近一次平均负载 + private double lastLoadAverage; + + public double[] getLoadAverage() { + return this.loadAverage; + } + + public double getLastLoadAverage() { + return lastLoadAverage; + } + + public double getCpuLoad() { + return MathUtils.doubleScale(cpuLoad, 2); + } + + public int getAvailableProcessors() { + return availableProcessors; + } + + public long getProcessCpuTime() { + return processCpuTime; + } + + public double getProcessCpuLoad() { + return MathUtils.doubleScale(processCpuLoad, 2); + } + + public String getTemperature() { + return temperature + "℃"; + } + + public String getVoltage() { + return voltage + "v"; + } + + public int[] getFanSpeeds() { + return fanSpeeds; + } + + public int getPhysicalCpu() { + return physicalCpu; + } + + public int getLogicalCpu() { + return logicalCpu; + } + + public String getUptime() { + return FormatUtil.formatElapsedSecs(uptime); + } + + public long getIoWait() { + return ioWait; + } + + public long getUserTick() { + return userTick; + } + + public long getNiceTick() { + return niceTick; + } + + public long getSysTick() { + return sysTick; + } + + public long getIdleTick() { + return idleTick; + } + + public long getIrqTick() { + return irqTick; + } + + public long getSoftIrqTick() { + return softIrqTick; + } + + public long getStealTick() { + return stealTick; + } + + private CpuInfo() { + } + + /** + * 获取cpu使用信息 + */ + public static CpuInfo getCpuInfo() { + CpuInfo cpuInfo = new CpuInfo(); + OperatingSystemMXBean osmxb = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean(); + cpuInfo.lastLoadAverage = osmxb.getSystemLoadAverage(); + cpuInfo.cpuLoad = osmxb.getSystemCpuLoad(); + cpuInfo.availableProcessors = osmxb.getAvailableProcessors(); + cpuInfo.processCpuTime = osmxb.getProcessCpuTime(); + cpuInfo.processCpuLoad = osmxb.getProcessCpuLoad(); + SystemInfo systemInfo = new SystemInfo(); + HardwareAbstractionLayer hal = systemInfo.getHardware(); + Sensors sensors = hal.getSensors(); + cpuInfo.temperature = sensors.getCpuTemperature(); + cpuInfo.voltage = sensors.getCpuVoltage(); + cpuInfo.fanSpeeds = sensors.getFanSpeeds(); + CentralProcessor centralProcessor = hal.getProcessor(); + cpuInfo.physicalCpu = centralProcessor.getPhysicalProcessorCount(); + cpuInfo.logicalCpu = centralProcessor.getLogicalProcessorCount(); + + CentralProcessor processor = hal.getProcessor(); + cpuInfo.uptime = processor.getSystemUptime(); + long[] ticks = processor.getSystemCpuLoadTicks(); + long[] prevTicks = processor.getSystemCpuLoadTicks(); + cpuInfo.userTick = ticks[TickType.USER.getIndex()] - prevTicks[TickType.USER.getIndex()]; + cpuInfo.niceTick = ticks[TickType.NICE.getIndex()] - prevTicks[TickType.NICE.getIndex()]; + cpuInfo.sysTick = ticks[TickType.SYSTEM.getIndex()] - prevTicks[TickType.SYSTEM.getIndex()]; + cpuInfo.idleTick = ticks[TickType.IDLE.getIndex()] - prevTicks[TickType.IDLE.getIndex()]; + cpuInfo.ioWait = ticks[TickType.IOWAIT.getIndex()] - prevTicks[TickType.IOWAIT.getIndex()]; + cpuInfo.irqTick = ticks[TickType.IRQ.getIndex()] - prevTicks[TickType.IRQ.getIndex()]; + cpuInfo.softIrqTick = ticks[TickType.SOFTIRQ.getIndex()] - prevTicks[TickType.SOFTIRQ.getIndex()]; + cpuInfo.stealTick = ticks[TickType.STEAL.getIndex()] - prevTicks[TickType.STEAL.getIndex()]; + cpuInfo.loadAverage = processor.getSystemLoadAverage(3); + + return cpuInfo; + } + +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/DiskInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/DiskInfo.java new file mode 100644 index 0000000..51f4cfd --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/DiskInfo.java @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import com.google.common.collect.ImmutableMap; +import com.zto.fire.common.util.MathUtils; +import oshi.SystemInfo; +import oshi.hardware.HWDiskStore; +import oshi.hardware.HardwareAbstractionLayer; +import oshi.software.os.FileSystem; +import oshi.software.os.OSFileStore; + +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +/** + * 用于封装系统磁盘信息 + * + * @author ChengLong 2019年9月29日 09:36:57 + */ +public class DiskInfo { + // 磁盘名称 + private String name; + // 磁盘制造商 + private String model; + // 磁盘总空间 + private long total; + // 磁盘读取总量 + private long reads; + // 磁盘写入总量 + private long writes; + // 磁盘读/写花费的毫秒数 + private long transferTime; + + /** + * 磁盘分区信息 + */ + private static class DiskPartitionInfo { + // 分区名称 + private String name; + // 文件系统类型 + private String fileSystem; + // 挂载点 + private String mount; + // 磁盘总空间 + private long total; + // 磁盘可用空间 + private long free; + // 磁盘已使用空间 + private long used; + // 磁盘已用空间的百分比 + private double usedPer; + // 总的inodes数 + private long totalInodes; + // 可用的inodes数 + private long freeInodes; + // 已用的inodes数 + private long usedInodes; + // 已用的inode百分比 + private double usedInodesPer; + + public DiskPartitionInfo() { + } + + public DiskPartitionInfo(String name, String fileSystem, String mount, long total, long free, long totalInodes, long freeInodes) { + this.name = name; + this.fileSystem = fileSystem; + this.mount = mount; + this.total = total; + this.free = free; + this.used = total - free; + this.usedPer = MathUtils.percent(this.used, this.total, 2); + this.totalInodes = totalInodes; + this.freeInodes = freeInodes; + this.usedInodes = totalInodes - freeInodes; + this.usedInodesPer = MathUtils.percent(this.usedInodes, this.totalInodes, 2); + } + + public String getName() { + return name; + } + + public String getFileSystem() { + return fileSystem; + } + + public String getMount() { + return mount; + } + + public long getTotal() { + return total; + } + + public long getFree() { + return free; + } + + public long getUsed() { + return used; + } + + public long getTotalInodes() { + return totalInodes; + } + + public long getFreeInodes() { + return freeInodes; + } + + public long getUsedInodes() { + return usedInodes; + } + + public String getUsedPer() { + return usedPer + "%"; + } + + public String getUsedInodesPer() { + return usedInodesPer + "%"; + } + } + + public String getName() { + return name; + } + + public String getModel() { + return model; + } + + public long getTotal() { + return total; + } + + public long getReads() { + return reads; + } + + public long getWrites() { + return writes; + } + + public long getTransferTime() { + return transferTime; + } + + private DiskInfo() { + } + + private DiskInfo(String name, String model, long total, long reads, long writes, long transferTime) { + this.name = name; + this.model = model; + this.total = total; + this.reads = reads; + this.writes = writes; + this.transferTime = transferTime; + } + + /** + * 获取磁盘与分区信息 + */ + public static Map getDiskInfo() { + SystemInfo systemInfo = new SystemInfo(); + // 获取文件系统信息 + FileSystem fileSystem = systemInfo.getOperatingSystem().getFileSystem(); + OSFileStore[] fileStores = fileSystem.getFileStores(); + List partitionInfoList = new LinkedList<>(); + for (OSFileStore fileStore : fileStores) { + if (fileStore != null) { + partitionInfoList.add(new DiskPartitionInfo(fileStore.getName(), fileStore.getType(), fileStore.getMount(), fileStore.getTotalSpace(), fileStore.getUsableSpace(), fileStore.getTotalInodes(), fileStore.getFreeInodes())); + } + } + + // 获取磁盘信息 + HardwareAbstractionLayer hal = systemInfo.getHardware(); + List diskInfoList = new LinkedList<>(); + + for (HWDiskStore disk : hal.getDiskStores()) { + DiskInfo diskInfo = new DiskInfo(disk.getName(), disk.getModel(), disk.getSize(), disk.getReadBytes(), disk.getWriteBytes(), disk.getTransferTime()); + diskInfoList.add(diskInfo); + } + return ImmutableMap.builder().put("disks", diskInfoList).put("partitions", partitionInfoList).build(); + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/DisplayInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/DisplayInfo.java new file mode 100644 index 0000000..1798740 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/DisplayInfo.java @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import oshi.SystemInfo; +import oshi.hardware.Display; + +/** + * 用于封装显示器相关信息 + * @author ChengLong 2019年9月30日 13:36:16 + */ +public class DisplayInfo { + // 显示器描述信息 + private String display; + + public String getDisplay() { + return display; + } + + private DisplayInfo() { + } + + /** + * 获取显示器信息 + */ + public static DisplayInfo getDisplayInfo() { + SystemInfo systemInfo = new SystemInfo(); + Display[] displays = systemInfo.getHardware().getDisplays(); + + StringBuilder sb = new StringBuilder(); + if (displays != null && displays.length > 0) { + for (Display display : displays) { + sb.append(display); + } + } + DisplayInfo displayInfo = new DisplayInfo(); + displayInfo.display = sb.toString(); + + return displayInfo; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/HardwareInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/HardwareInfo.java new file mode 100644 index 0000000..c5c93b5 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/HardwareInfo.java @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import com.zto.fire.common.util.MathUtils; +import oshi.SystemInfo; +import oshi.hardware.ComputerSystem; +import oshi.hardware.HardwareAbstractionLayer; +import oshi.hardware.PowerSource; + +/** + * 硬件信息封装类 + * + * @author ChengLong 2019年9月29日 15:52:50 + */ +public class HardwareInfo { + private static HardwareInfo hardwareInfo = new HardwareInfo(); + // 制造商 + private String manufacturer; + // 型号 + private String model; + // 序列号 + private String serialNumber; + // 电源信息 + private String power; + // 电池容量 + private String batteryCapacity; + + public String getManufacturer() { + return manufacturer; + } + + public String getModel() { + return model; + } + + public String getSerialNumber() { + return serialNumber; + } + + public String getPower() { + return power; + } + + public String getBatteryCapacity() { + return batteryCapacity; + } + + private HardwareInfo() { + } + + /** + * 获取硬件设备信息 + */ + public static HardwareInfo getHardwareInfo() { + SystemInfo systemInfo = new SystemInfo(); + HardwareAbstractionLayer hardware = systemInfo.getHardware(); + ComputerSystem computerSystem = hardware.getComputerSystem(); + + if (hardwareInfo.manufacturer == null) { + hardwareInfo.manufacturer = computerSystem.getManufacturer(); + } + + if (hardwareInfo.model == null) { + hardwareInfo.model = computerSystem.getModel(); + } + + if (hardwareInfo.serialNumber == null) { + hardwareInfo.serialNumber = computerSystem.getSerialNumber().trim(); + } + + // 获取电源信息 + PowerSource[] powerSources = hardware.getPowerSources(); + if (powerSources == null || powerSources.length == 0) { + hardwareInfo.power = "Unknown"; + } else { + double timeRemaining = powerSources[0].getTimeRemaining(); + if (timeRemaining < -1d) { + hardwareInfo.power = "充电中"; + } else if (timeRemaining < 0d) { + hardwareInfo.power = "计算剩余时间"; + } else { + hardwareInfo.power = String.format("%d:%02d remaining", (int) (timeRemaining / 3600), + (int) (timeRemaining / 60) % 60); + } + + for (PowerSource pSource : powerSources) { + hardwareInfo.batteryCapacity = MathUtils.doubleScale(pSource.getRemainingCapacity() * 100d, 2) + ""; + } + } + + return hardwareInfo; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/JvmInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/JvmInfo.java new file mode 100644 index 0000000..8bc94cc --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/JvmInfo.java @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import java.io.Serializable; +import java.lang.management.*; +import java.util.List; + +/** + * Jvm信息包装类,可获取jvm相关信息 + * @author ChengLong 2019-9-28 19:38:36 + */ +public class JvmInfo implements Serializable { + private static final long serialVersionUID = 3857878519712626828L; + // Java版本 + private String javaVersion; + // JavaHome + private String javaHome; + // 类版本 + private String classVersion; + // jvm可从操作系统申请的最大内存 + private long memoryMax; + // jvm已使用操作系统的总内存空间 + private long memoryTotal; + // jvm剩余内存空间 + private long memoryFree; + // jvm已使用内存空间 + private long memoryUsed; + // jvm启动时间,unix时间戳 + private long startTime; + // jvm运行时间 + private long uptime; + // jvm heap 初始内存大小 + private long heapInitSize; + // jvm heap 最大内存空间 + private long heapMaxSize; + // jvm heap 已使用空间大小 + private long heapUseSize; + // jvm heap 已提交的空间大小 + private long heapCommitedSize; + // jvm Non-Heap初始空间 + private long nonHeapInitSize; + // jvm Non-Heap最大空间 + private long nonHeapMaxSize; + // jvm Non-Heap已使用空间 + private long nonHeapUseSize; + // jvm Non-Heap已提交空间 + private long nonHeapCommittedSize; + // minor gc 次数 + private long minorGCCount; + // minor gc 总耗时 + private long minorGCTime; + // full gc 次数 + private long fullGCCount; + // full gc 总耗时 + private long fullGCTime; + // 虚拟机参数 + private List jvmOptions; + + private JvmInfo() {} + + public long getMemoryMax() { + return memoryMax; + } + + public long getMemoryTotal() { + return memoryTotal; + } + + public long getMemoryFree() { + return memoryFree; + } + + public long getMemoryUsed() { + return memoryUsed; + } + + public long getStartTime() { + return startTime; + } + + public long getUptime() { + return uptime; + } + + public long getHeapInitSize() { + return heapInitSize; + } + + public long getHeapMaxSize() { + return heapMaxSize; + } + + public long getHeapUseSize() { + return heapUseSize; + } + + public long getHeapCommitedSize() { + return heapCommitedSize; + } + + public long getNonHeapInitSize() { + return nonHeapInitSize; + } + + public long getNonHeapMaxSize() { + return nonHeapMaxSize; + } + + public long getNonHeapUseSize() { + return nonHeapUseSize; + } + + public long getNonHeapCommittedSize() { + return nonHeapCommittedSize; + } + + public String getJavaVersion() { + return javaVersion; + } + + public String getJavaHome() { + return javaHome; + } + + public String getClassVersion() { + return classVersion; + } + + public long getMinorGCCount() { + return minorGCCount; + } + + public long getMinorGCTime() { + return minorGCTime; + } + + public long getFullGCCount() { + return fullGCCount; + } + + public long getFullGCTime() { + return fullGCTime; + } + + public List getJvmOptions() { + return jvmOptions; + } + + /** + * 获取Jvm、类加载器与线程相关信息 + */ + public static JvmInfo getJvmInfo() { + Runtime runtime = Runtime.getRuntime(); + JvmInfo jvmInfo = new JvmInfo(); + jvmInfo.memoryMax = runtime.maxMemory(); + jvmInfo.memoryTotal = runtime.totalMemory(); + jvmInfo.memoryFree = runtime.freeMemory(); + jvmInfo.memoryUsed = jvmInfo.memoryTotal - jvmInfo.memoryFree; + RuntimeMXBean runtimeMXBean = ManagementFactory.getRuntimeMXBean(); + jvmInfo.startTime = runtimeMXBean.getStartTime(); + jvmInfo.uptime = runtimeMXBean.getUptime(); + + // 获取jvm heap相关信息 + MemoryMXBean memoryMBean = ManagementFactory.getMemoryMXBean(); + MemoryUsage heapUsage = memoryMBean.getHeapMemoryUsage(); + jvmInfo.heapInitSize = heapUsage.getInit(); + jvmInfo.heapMaxSize = heapUsage.getMax(); + jvmInfo.heapUseSize = heapUsage.getUsed(); + jvmInfo.heapCommitedSize = heapUsage.getCommitted(); + + // 获取jvm non-heap相关信息 + MemoryUsage nonHeapUsage = memoryMBean.getNonHeapMemoryUsage(); + jvmInfo.nonHeapInitSize = nonHeapUsage.getInit(); + jvmInfo.nonHeapMaxSize = nonHeapUsage.getMax(); + jvmInfo.nonHeapUseSize = nonHeapUsage.getUsed(); + jvmInfo.nonHeapCommittedSize = nonHeapUsage.getCommitted(); + + // 获取jvm版本与安装信息 + jvmInfo.javaVersion = System.getProperty("java.version"); + jvmInfo.javaHome = System.getProperty("java.home"); + jvmInfo.classVersion = System.getProperty("java.class.version"); + + // jvm 参数 + jvmInfo.jvmOptions = ManagementFactory.getRuntimeMXBean().getInputArguments(); + + // 获取gc信息 + List gcs = ManagementFactory.getGarbageCollectorMXBeans(); + for (GarbageCollectorMXBean gc : gcs) { + if (gc.getName().contains("Young") || gc.getName().contains("MarkSweep")) { + jvmInfo.minorGCCount = gc.getCollectionCount(); + jvmInfo.minorGCTime = gc.getCollectionTime(); + } + if (gc.getName().contains("Old") || gc.getName().contains("Scavenge")) { + jvmInfo.fullGCCount = gc.getCollectionCount(); + jvmInfo.fullGCTime = gc.getCollectionTime(); + } + } + + return jvmInfo; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/MemoryInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/MemoryInfo.java new file mode 100644 index 0000000..297cd44 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/MemoryInfo.java @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import com.sun.management.OperatingSystemMXBean; + +import java.io.Serializable; +import java.lang.management.ManagementFactory; + +/** + * 用于封装当前系统内存信息 + * @author ChengLong 2019-9-28 19:50:22 + */ +public class MemoryInfo implements Serializable { + private static final long serialVersionUID = 7803435486311085016L; + // 操作系统总内存空间 + private long total; + // 操作系统内存剩余空间 + private long free; + // 操作系统内存使用空间 + private long used; + // 操作系统提交的虚拟内存大小 + private long commitVirtual; + // 操作系统交换内存总空间 + private long swapTotal; + // 操作系统交换内存剩余空间 + private long swapFree; + // 操作系统交换内存已使用空间 + private long swapUsed; + + private MemoryInfo() {} + + public long getTotal() { + return total; + } + + public long getFree() { + return free; + } + + public long getUsed() { + return used; + } + + public long getCommitVirtual() { + return commitVirtual; + } + + public long getSwapTotal() { + return swapTotal; + } + + public long getSwapFree() { + return swapFree; + } + + public long getSwapUsed() { + return swapUsed; + } + + /** + * 获取内存使用信息 + */ + public static MemoryInfo getMemoryInfo() { + MemoryInfo memoryInfo = new MemoryInfo(); + OperatingSystemMXBean osmxb = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean(); + memoryInfo.total = osmxb.getTotalPhysicalMemorySize(); + memoryInfo.free = osmxb.getFreePhysicalMemorySize(); + memoryInfo.used = memoryInfo.total - memoryInfo.free; + memoryInfo.swapTotal = osmxb.getTotalSwapSpaceSize(); + memoryInfo.swapFree = osmxb.getFreeSwapSpaceSize(); + memoryInfo.swapUsed = memoryInfo.swapTotal - memoryInfo.swapFree; + memoryInfo.commitVirtual = osmxb.getCommittedVirtualMemorySize(); + + return memoryInfo; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/NetworkInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/NetworkInfo.java new file mode 100644 index 0000000..9be33df --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/NetworkInfo.java @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import com.zto.fire.common.util.OSUtils; +import oshi.SystemInfo; +import oshi.hardware.HardwareAbstractionLayer; +import oshi.hardware.NetworkIF; +import oshi.software.os.NetworkParams; + +import java.util.LinkedList; +import java.util.List; + +/** + * 网卡信息封装类 + * @author ChengLong 2019年9月30日 10:39:08 + */ +public class NetworkInfo { + // 网卡名称 + private String name; + // 网卡display名称 + private String displayName; + // mac地址 + private String macAddress; + // 最大传输单元 + private int mtu; + // 网卡带宽 + private long speed; + // ip v4 地址 + private String[] ipv4; + // ip v6 地址 + private String[] ipv6; + // ip 地址 + private String ip; + // 接收到的数据报个数 + private long packetsRecv; + // 发送的数据报个数 + private long packetsSent; + // 接收到的数据大小 + private long bytesRecv; + // 发送的数据大小 + private long bytesSent; + // 主机名 + private String hostname; + // 域名称 + private String domainName; + // dns + private String[] dns; + // ip v4 网关 + private String ipv4Gateway; + // ip v6 网关 + private String ipv6Gateway; + + public String getName() { + return name; + } + + public String getDisplayName() { + return displayName; + } + + public String getMacAddress() { + return macAddress; + } + + public int getMtu() { + return mtu; + } + + public long getSpeed() { + return speed; + } + + public String[] getIpv4() { + return ipv4; + } + + public String[] getIpv6() { + return ipv6; + } + + public String getIp() { + return ip; + } + + public long getPacketsRecv() { + return packetsRecv; + } + + public long getPacketsSent() { + return packetsSent; + } + + public long getBytesRecv() { + return bytesRecv; + } + + public long getBytesSent() { + return bytesSent; + } + + public String getHostname() { + return hostname; + } + + public String getDomainName() { + return domainName; + } + + public String[] getDns() { + return dns; + } + + public String getIpv4Gateway() { + return ipv4Gateway; + } + + public String getIpv6Gateway() { + return ipv6Gateway; + } + + private NetworkInfo() {} + + public static List getNetworkInfo() { + SystemInfo systemInfo = new SystemInfo(); + HardwareAbstractionLayer hal = systemInfo.getHardware(); + NetworkIF[] networkIFS = hal.getNetworkIFs(); + List networkInfoList = new LinkedList<>(); + if (networkIFS != null && networkIFS.length > 0) { + NetworkParams networkParams = systemInfo.getOperatingSystem().getNetworkParams(); + for (NetworkIF networkIF : networkIFS) { + NetworkInfo networkInfo = new NetworkInfo(); + networkInfo.name = networkIF.getName(); + networkInfo.displayName = networkIF.getDisplayName(); + networkInfo.bytesRecv = networkIF.getBytesRecv(); + networkInfo.bytesSent = networkIF.getBytesSent(); + networkInfo.packetsRecv = networkIF.getPacketsRecv(); + networkInfo.packetsSent = networkIF.getPacketsSent(); + networkInfo.ip = OSUtils.getIp(); + networkInfo.ipv4 = networkIF.getIPv4addr(); + networkInfo.ipv6 = networkIF.getIPv6addr(); + networkInfo.mtu = networkIF.getMTU(); + networkInfo.speed = networkIF.getSpeed(); + networkInfo.macAddress = networkIF.getMacaddr(); + networkInfo.hostname = networkParams.getHostName(); + networkInfo.domainName = networkParams.getDomainName(); + networkInfo.ipv4Gateway = networkParams.getIpv4DefaultGateway(); + networkInfo.ipv6Gateway = networkParams.getIpv6DefaultGateway(); + networkInfo.dns = networkParams.getDnsServers(); + networkInfoList.add(networkInfo); + } + } + return networkInfoList; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/OSInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/OSInfo.java new file mode 100644 index 0000000..9497d84 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/OSInfo.java @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import com.zto.fire.common.util.OSUtils; +import oshi.SystemInfo; +import oshi.software.os.OperatingSystem; +import oshi.util.FormatUtil; + +/** + * 用于封装操作系统信息 + * + * @author ChengLong 2019-9-28 19:56:59 + */ +public class OSInfo { + private static OSInfo osInfo = new OSInfo(); + // 制造商 + private String manufacturer; + // 操作系统名称 + private String name; + // 操作系统架构 + private String arch; + // 操作系统版本 + private String version; + // 当前用户 + private String userName; + // 当前用户家目录 + private String userHome; + // 当前用户工作目录 + private String userDir; + // 机器的ip + private String ip; + // 集群的主机名 + private String hostname; + // 运行时间 + private String uptime; + // 组织信息 + private String family; + + private OSInfo() { + } + + public String getName() { + return name; + } + + public String getArch() { + return arch; + } + + public String getVersion() { + return version; + } + + public String getUserName() { + return userName; + } + + public String getUserHome() { + return userHome; + } + + public String getUserDir() { + return userDir; + } + + public String getIp() { + return ip; + } + + public String getHostname() { + return hostname; + } + + public String getManufacturer() { + return manufacturer; + } + + public String getUptime() { + return uptime; + } + + public String getFamily() { + return family; + } + + /** + * 获取操作系统相关信息 + */ + public static OSInfo getOSInfo() { + SystemInfo systemInfo = new SystemInfo(); + osInfo.name = System.getProperty("os.name"); + osInfo.arch = System.getProperty("os.arch"); + osInfo.version = System.getProperty("os.version"); + osInfo.userName = System.getProperty("user.name"); + osInfo.userHome = System.getProperty("user.home"); + osInfo.userDir = System.getProperty("user.dir"); + osInfo.ip = OSUtils.getIp(); + osInfo.hostname = OSUtils.getHostName(); + OperatingSystem os = systemInfo.getOperatingSystem(); + osInfo.manufacturer = os.getManufacturer(); + osInfo.family = os.getFamily(); + osInfo.uptime = FormatUtil.formatElapsedSecs(systemInfo.getHardware().getProcessor().getSystemUptime()); + return osInfo; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/RuntimeInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/RuntimeInfo.java new file mode 100644 index 0000000..a9ab9d3 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/RuntimeInfo.java @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import com.zto.fire.common.util.OSUtils; +import org.apache.commons.lang3.StringUtils; + +import java.io.Serializable; + +/** + * 用于获取jvm、os、memory等运行时信息 + * + * @author ChengLong 2019年9月28日 16:57:03 + */ +public class RuntimeInfo implements Serializable { + private static final long serialVersionUID = 1960438466835847330L; + private static RuntimeInfo runtimeInfo = new RuntimeInfo(); + // jvm运行时信息 + private JvmInfo jvmInfo; + // 线程运行时信息 + private ThreadInfo threadInfo; + // cpu运行时信息 + private CpuInfo cpuInfo; + // 内存运行时信息 + private MemoryInfo memoryInfo; + // 类加载器运行时信息 + private ClassLoaderInfo classLoaderInfo; + // executor所在ip + private static String ip; + // executor所在主机名 + private static String hostname; + // 当前pid的进程号 + private static String pid; + // executor启动时间(UNIX时间戳) + private long startTime = System.currentTimeMillis(); + + private RuntimeInfo() { + } + + public JvmInfo getJvmInfo() { + return jvmInfo; + } + + public ThreadInfo getThreadInfo() { + return threadInfo; + } + + public CpuInfo getCpuInfo() { + return cpuInfo; + } + + public MemoryInfo getMemoryInfo() { + return memoryInfo; + } + + public ClassLoaderInfo getClassLoaderInfo() { + return classLoaderInfo; + } + + public String getIp() { + return ip; + } + + public String getHostname() { + return hostname; + } + + public String getPid() { + return pid; + } + + public long getStartTime() { + return startTime; + } + + public long getUptime() { + // executor运行时间(毫秒) + return System.currentTimeMillis() - this.startTime; + } + + /** + * 获取运行时信息 + * + * @return 当前运行时信息 + */ + public static RuntimeInfo getRuntimeInfo() { + if (StringUtils.isBlank(ip)) { + ip = OSUtils.getIp(); + } + if (StringUtils.isBlank(hostname)) { + hostname = OSUtils.getHostName(); + } + if (StringUtils.isBlank(pid)) { + pid = OSUtils.getPid(); + } + runtimeInfo.jvmInfo = JvmInfo.getJvmInfo(); + runtimeInfo.classLoaderInfo = ClassLoaderInfo.getClassLoaderInfo(); + runtimeInfo.threadInfo = ThreadInfo.getThreadInfo(); + runtimeInfo.cpuInfo = CpuInfo.getCpuInfo(); + runtimeInfo.memoryInfo = MemoryInfo.getMemoryInfo(); + + return runtimeInfo; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/ThreadInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/ThreadInfo.java new file mode 100644 index 0000000..67f7263 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/ThreadInfo.java @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import com.sun.management.ThreadMXBean; + +import java.io.Serializable; +import java.lang.management.ManagementFactory; + +/** + * 用于包装运行时线程信息 + * @author ChengLong 2019-9-28 19:36:52 + */ +public class ThreadInfo implements Serializable { + private static final long serialVersionUID = 7950498675819426939L; + // 当前线程的总 CPU 时间(以毫微秒为单位) + private long cpuTime; + // 当前线程的总用户cpu时间(以毫微秒为单位) + private long userTime; + // 当前守护线程的总数 + private int deamonCount; + // 返回自从 Java 虚拟机启动或峰值重置以来峰值活动线程计数 + private int peakCount; + // 返回当前线程的总数,包括守护线程和非守护线程 + private int totalCount; + // 返回自从 Java 虚拟机启动以来创建和启动的线程总数目 + private long totalStartedCount; + + private ThreadInfo() {} + + public long getCpuTime() { + return cpuTime; + } + + public long getUserTime() { + return userTime; + } + + public int getDeamonCount() { + return deamonCount; + } + + public int getPeakCount() { + return peakCount; + } + + public int getTotalCount() { + return totalCount; + } + + public long getTotalStartedCount() { + return totalStartedCount; + } + + /** + * 获取线程相关信息 + */ + public static ThreadInfo getThreadInfo() { + ThreadInfo threadInfo = new ThreadInfo(); + ThreadMXBean threadMBean = (ThreadMXBean) ManagementFactory.getThreadMXBean(); + threadInfo.cpuTime = threadMBean.getCurrentThreadCpuTime(); + threadInfo.userTime = threadMBean.getCurrentThreadUserTime(); + threadInfo.deamonCount = threadMBean.getDaemonThreadCount(); + threadInfo.peakCount = threadMBean.getPeakThreadCount(); + threadInfo.totalCount = threadMBean.getThreadCount(); + threadInfo.totalStartedCount = threadMBean.getTotalStartedThreadCount(); + + return threadInfo; + } + +} diff --git a/fire-common/src/main/java/com/zto/fire/common/bean/runtime/UsbInfo.java b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/UsbInfo.java new file mode 100644 index 0000000..0b5be38 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/bean/runtime/UsbInfo.java @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.bean.runtime; + +import oshi.SystemInfo; +import oshi.hardware.UsbDevice; + +import java.util.LinkedList; +import java.util.List; + +/** + * 用于封装usb设备信息 + * @author ChengLong 2019年9月30日 13:33:35 + */ +public class UsbInfo { + // usb 设备名称 + private String name; + // usb设备id + private String productId; + // usb设备制造商 + private String vendor; + // usb设备制造商id + private String vendorId; + // usb设备序列号 + private String serialNumber; + + public String getName() { + return name; + } + + public String getProductId() { + return productId; + } + + public String getVendor() { + return vendor; + } + + public String getVendorId() { + return vendorId; + } + + public String getSerialNumber() { + return serialNumber; + } + + private UsbInfo() {} + + public UsbInfo(String name, String productId, String vendor, String vendorId, String serialNumber) { + this.name = name; + this.productId = productId; + this.vendor = vendor; + this.vendorId = vendorId; + this.serialNumber = serialNumber; + } + + /** + * 获取usb社保信息 + */ + public static List getUsbInfo() { + SystemInfo systemInfo = new SystemInfo(); + UsbDevice[] usbDevices = systemInfo.getHardware().getUsbDevices(true); + List usbInfoList = new LinkedList<>(); + if (usbDevices != null && usbDevices.length > 0) { + for (UsbDevice usbDevice : usbDevices) { + UsbInfo usbInfo = new UsbInfo(usbDevice.getName(), usbDevice.getProductId(), usbDevice.getVendor(), usbDevice.getVendorId(), usbDevice.getSerialNumber()); + usbInfoList.add(usbInfo); + } + } + return usbInfoList; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/enu/Datasource.java b/fire-common/src/main/java/com/zto/fire/common/enu/Datasource.java new file mode 100644 index 0000000..4730989 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/enu/Datasource.java @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.enu; + +import org.apache.commons.lang3.StringUtils; + +/** + * 数据源类型 + * + * @author ChengLong + * @create 2020-07-07 16:36 + * @since 1.0.0 + */ +public enum Datasource { + HIVE(1), HBASE(2), KAFKA(3), ROCKETMQ(4), REDIS(5), + ES(6), MYSQL(7), TIDB(8), ORACLE(9), SQLSERVER(10), + DB2(11), CLICKHOUSE(12), PRESTO(13), KYLIN(14), DERBY(15), UNKNOWN(20); + + Datasource(int type) { + } + + /** + * 将字符串解析成指定的枚举类型 + */ + public static Datasource parse(String dataSource) { + if (StringUtils.isBlank(dataSource)) return UNKNOWN; + try { + return Enum.valueOf(Datasource.class, dataSource.trim().toUpperCase()); + } catch (Exception e) { + return UNKNOWN; + } + } + +} diff --git a/fire-common/src/main/java/com/zto/fire/common/enu/ErrorCode.java b/fire-common/src/main/java/com/zto/fire/common/enu/ErrorCode.java new file mode 100644 index 0000000..3f2b51e --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/enu/ErrorCode.java @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.enu; + +/** + * 系统预定义错误码 + * @author ChengLong 2018年6月12日 13:39:50 + */ +public enum ErrorCode { + SUCCESS, ERROR, PARAM_ILLEGAL, NOT_FOUND, IS_EXISTS, NOT_LOGIN, TIME_OUT, GONE, UNAUTHORIZED +} diff --git a/fire-common/src/main/java/com/zto/fire/common/enu/JobType.java b/fire-common/src/main/java/com/zto/fire/common/enu/JobType.java new file mode 100644 index 0000000..fcdb1d9 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/enu/JobType.java @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.enu; + +/** + * Fire任务类型 + * + * @author ChengLong 2019-7-26 11:06:38 + */ +public enum JobType { + SPARK_CORE("spark_core"), SPARK_STREAMING("spark_streaming"), SPARK_STRUCTURED_STREAMING("spark_structured_streaming"), SPARK_SQL("spark_sql"), FLINK_STREAMING("flink_streaming"), FLINK_BATCH("flink_batch"), UNDEFINED("undefined"); + + // 任务类型 + private String jobTypeDesc; + + JobType(String jobType) { + this.jobTypeDesc = jobType; + } + + /** + * 获取当前任务的类型 + * + * @return + */ + public String getJobTypeDesc() { + return this.jobTypeDesc; + } + + /** + * 用于判断当前任务是否为spark任务 + * + * @return true: spark任务 false:非spark任务 + */ + public boolean isSpark() { + return this.jobTypeDesc.contains("spark"); + } + + /** + * 用于判断当前任务是否为flink任务 + * + * @return true: flink任务 false:非flink任务 + */ + public boolean isFlink() { + return this.jobTypeDesc.contains("flink"); + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/enu/RequestMethod.scala b/fire-common/src/main/java/com/zto/fire/common/enu/RequestMethod.scala new file mode 100644 index 0000000..6b94dd0 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/enu/RequestMethod.scala @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.enu + +/** + * 定义http请求的方式枚举 + * + * @author ChengLong 2019-3-16 10:27:11 + */ +object RequestMethod extends Enumeration { + type RequestMethod = Value + + val GET = Value("get") + val POST = Value("post") + val DELETE = Value("delete") + val PUT = Value("put") +} diff --git a/fire-common/src/main/java/com/zto/fire/common/enu/ThreadPoolType.java b/fire-common/src/main/java/com/zto/fire/common/enu/ThreadPoolType.java new file mode 100644 index 0000000..f402cd3 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/enu/ThreadPoolType.java @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.enu; + +/** + * 线程池类型 + * @author ChengLong 2019年10月18日 14:33:52 + */ +public enum ThreadPoolType { + FIXED, SINGLE, CACHED, SCHEDULED, WORK_STEALING +} diff --git a/fire-common/src/main/java/com/zto/fire/common/enu/YarnState.java b/fire-common/src/main/java/com/zto/fire/common/enu/YarnState.java new file mode 100644 index 0000000..d340c92 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/enu/YarnState.java @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.enu; + +import org.apache.commons.lang3.StringUtils; + +/** + * yarn的job状态 + * + * @author ChengLong 2019-5-16 09:19:56 + */ +public enum YarnState { + RUNNING("running"), + ACCEPTED("accepted"), + SUBMITTED("submitted"), + FINISHED("finished"), + FAILED("failed"), + KILLED("killed"), + UNDEFINED("undefined"), + NULL(""), + UNKONOW("unknow"); + + // 状态信息 + private final String state; + + YarnState(String state) { + this.state = state; + } + + public String getState() { + return state; + } + + /** + * 根据状态字符串返回状态枚举 + * + * @param state 状态 + * @return + */ + public static YarnState getState(String state) { + if (StringUtils.isBlank(state)) { + return NULL; + } + + switch (state.toLowerCase()) { + case "running": + return RUNNING; + case "accepted": + return ACCEPTED; + case "submitted": + return SUBMITTED; + case "finished": + return FINISHED; + case "failed": + return FAILED; + case "killed": + return KILLED; + default: + return NULL; + } + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/EncryptUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/EncryptUtils.java new file mode 100644 index 0000000..8b42c6a --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/EncryptUtils.java @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import com.zto.fire.common.conf.FireFrameworkConf; +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import sun.misc.BASE64Decoder; +import sun.misc.BASE64Encoder; + +import java.math.BigInteger; +import java.nio.charset.StandardCharsets; +import java.security.MessageDigest; +import java.security.NoSuchAlgorithmException; +import java.util.Objects; + +/** + * 各种常用算法加密工具类 + * + * @author ChengLong 2018年7月16日 09:53:59 + */ +public class EncryptUtils { + private static final String ERROR_MESSAGE = "参数不合法"; + private static final Logger logger = LoggerFactory.getLogger(EncryptUtils.class); + + private EncryptUtils() {} + + /** + * BASE64解密 + */ + public static String base64Decrypt(String message) { + Objects.requireNonNull(message, ERROR_MESSAGE); + try { + return new String((new BASE64Decoder()).decodeBuffer(message), StandardCharsets.UTF_8); + } catch (Exception e) { + logger.error("BASE64解密出错", e); + } + return ""; + } + + /** + * BASE64加密 + */ + public static String base64Encrypt(String message) { + Objects.requireNonNull(message, ERROR_MESSAGE); + try { + return new BASE64Encoder().encodeBuffer(message.getBytes()); + } catch (Exception e) { + logger.error("BASE64加密出错", e); + } + return ""; + } + + /** + * 生成32位md5码 + */ + public static String md5Encrypt(String message) { + Objects.requireNonNull(message, ERROR_MESSAGE); + try { + // 得到一个信息摘要器 + MessageDigest digest = MessageDigest.getInstance("md5"); + byte[] result = digest.digest(message.getBytes(StandardCharsets.UTF_8)); + StringBuilder buffer = new StringBuilder(); + for (byte b : result) { + int number = b & 0xff;// 加盐 + String str = Integer.toHexString(number); + if (str.length() == 1) { + buffer.append('0'); + } + buffer.append(str); + } + // 标准的md5加密后的结果 + return buffer.toString(); + } catch (NoSuchAlgorithmException e) { + logger.error("生成32位md5码出错", e); + } + return ""; + } + + /** + * SHA加密 + */ + public static String shaEncrypt(String message, String key) { + Objects.requireNonNull(message, ERROR_MESSAGE); + if(StringUtils.isBlank(key)) { + key = "SHA"; + } + try { + MessageDigest sha = MessageDigest.getInstance(key); + sha.update(message.getBytes(StandardCharsets.UTF_8)); + return new BigInteger(sha.digest()).toString(32); + } catch (Exception e) { + logger.error("生成SHA加密出错", e); + } + return ""; + } + + /** + * header权限校验 + * @param auth + * 请求json + * @return + * true:身份合法 false:身份非法 + */ + public static boolean checkAuth(String auth, String privateKey) { + if (StringUtils.isBlank(auth)) { + return false; + } + String fireAuth = EncryptUtils.shaEncrypt(FireFrameworkConf.restServerSecret() + privateKey, "SHA"); + return fireAuth.equals(auth); + } + +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/FileUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/FileUtils.java new file mode 100644 index 0000000..368be76 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/FileUtils.java @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import java.io.File; +import java.io.InputStream; +import java.util.List; +import java.util.Objects; + +/** + * 文件操作工具类 + * + * @author ChengLong 2018年8月22日 13:10:03 + */ +public class FileUtils { + private FileUtils() {} + + + /** + * 递归查找指定目录下的文件 + * + * @param path 路径 + * @param fileName 文件名 + * @return 文件全路径 + */ + public static File findFile(String path, String fileName, List fileList) { + File searchFile = null; + File dir = new File(path); + if (dir.exists() && dir.isDirectory()) { + for (File file : Objects.requireNonNull(dir.listFiles())) { + if (file.isDirectory()) { + searchFile = findFile(file.getPath(), fileName, fileList); + } else { + if (file.getName().equals(fileName)) { + searchFile = file; + break; + } + } + } + } + if (searchFile != null) fileList.add(searchFile); + return searchFile; + } + + + /** + * 判断resource路径下的文件是否存在 + * + * @param fileName 配置文件名称 + * @return null: 不存在,否则为存在 + */ + public static InputStream resourceFileExists(String fileName) { + return FileUtils.class.getClassLoader().getResourceAsStream(fileName); + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/FindClassUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/FindClassUtils.java new file mode 100644 index 0000000..d355bc0 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/FindClassUtils.java @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.File; +import java.io.IOException; +import java.io.Serializable; +import java.net.JarURLConnection; +import java.net.URI; +import java.net.URISyntaxException; +import java.net.URL; +import java.util.ArrayList; +import java.util.Enumeration; +import java.util.LinkedList; +import java.util.List; +import java.util.jar.JarEntry; +import java.util.jar.JarFile; + +/** + * 查找指定包下所有的类 + * Created by ChengLong on 2018-03-23. + */ +public class FindClassUtils { + // 接口类class 用于过滤 + private static Class superStrategy = Serializable.class; + // 默认使用的类加载器 + private static ClassLoader classLoader = FindClassUtils.class.getClassLoader(); + private static final Logger logger = LoggerFactory.getLogger(FindClassUtils.class); + private static final String CLASS_FILE = ".class"; + + private FindClassUtils() { + } + + /** + * 获取包下所有实现了superStrategy的类并加入list + */ + public static List> listPackageClasses(String... packageNames) { + List> classList = new ArrayList<>(); + if (packageNames != null && packageNames.length > 0) { + for (String packageName : packageNames) { + if (StringUtils.isNotBlank(packageName) && packageName.contains(".")) { + URL url = FindClassUtils.classLoader.getResource(packageName.replace('.', '/')); + String protocol = url.getProtocol(); + if ("file".equals(protocol)) { + // 本地自己可见的代码 + FindClassUtils.findClassLocal(packageName, classList); + } else if ("jar".equals(protocol)) { + // 引用jar包的代码 + FindClassUtils.findClassJar(packageName, classList); + } + } + } + } + return classList; + } + + /** + * 本地查找 + * + * @param packName 包名 + */ + private static void findClassLocal(final String packName, final List> list) { + URI url = null; + try { + url = FindClassUtils.classLoader.getResource(packName.replace('.', '/')).toURI(); + File file = new File(url); + file.listFiles(chiFile -> { + if (chiFile.isDirectory()) { + FindClassUtils.findClassLocal(packName + "." + chiFile.getName(), list); + } + if (chiFile.getName().endsWith(CLASS_FILE)) { + Class clazz = null; + try { + clazz = FindClassUtils.classLoader.loadClass(packName + "." + chiFile.getName().replace(CLASS_FILE, "")); + } catch (ClassNotFoundException e) { + logger.error("未找到类异常", e); + } + if (FindClassUtils.superStrategy.isAssignableFrom(clazz)) { + list.add((Class) clazz); + } + return true; + } + return false; + }); + } catch (URISyntaxException e1) { + logger.error("未找到相关资源", e1); + } + } + + + /** + * 从jar包中查找指定包下的文件 + * + * @param packName 包名 + */ + private static void findClassJar(final String packName, final List> list) { + String pathName = packName.replace('.', '/'); + JarFile jarFile = null; + try { + URL url = FindClassUtils.classLoader.getResource(pathName); + JarURLConnection jarURLConnection = (JarURLConnection) url.openConnection(); + jarFile = jarURLConnection.getJarFile(); + Enumeration jarEntries = jarFile.entries(); + while (jarEntries.hasMoreElements()) { + JarEntry jarEntry = jarEntries.nextElement(); + String jarEntryName = jarEntry.getName(); + + if (jarEntryName.contains(pathName) && !jarEntryName.equals(pathName + "/")) { + // 递归遍历子目录 + if (jarEntry.isDirectory()) { + String clazzName = jarEntry.getName().replace('/', '.'); + int endIndex = clazzName.lastIndexOf('.'); + String prefix = null; + if (endIndex > 0) { + prefix = clazzName.substring(0, endIndex); + } + findClassJar(prefix, list); + } + if (jarEntry.getName().endsWith(CLASS_FILE)) { + Class clazz = FindClassUtils.classLoader.loadClass(jarEntry.getName().replace('/', '.').replace(CLASS_FILE, "")); + if (FindClassUtils.superStrategy.isAssignableFrom(clazz)) { + list.add((Class) clazz); + } + } + } + } + } catch (Exception e) { + logger.error("未在jar包中找到相关文件", e); + } finally { + try { + if (jarFile != null) jarFile.close(); + } catch (Exception e) { + logger.error("关闭jarFile对象失败"); + } + } + } + + /** + * 用于判断当前以jar方式运行还是以idea方式运行 + * + * @return true:jar方式 false:idea运行 + */ + public static boolean isJar() { + URL url = FindClassUtils.class.getProtectionDomain().getCodeSource().getLocation(); + return url.getPath().endsWith(".jar"); + } + + /** + * 获取指定文件名在jar包中的位置,兼容非jar包 + * + * @param fileName 文件名 + * @return 路径名+文件名 + */ + public static String findFileInJar(String fileName) { + if (StringUtils.isBlank(fileName)) { + return null; + } + String fullName = ""; + URL url = FindClassUtils.class.getProtectionDomain().getCodeSource().getLocation(); + if (url.getPath().endsWith(".jar")) { + try (JarFile jarFile = new JarFile(url.getFile())) { + Enumeration entrys = jarFile.entries(); + while (entrys.hasMoreElements()) { + JarEntry jar = entrys.nextElement(); + String name = jar.getName(); + if (name.endsWith("/" + fileName)) { + fullName = name; + break; + } + } + } catch (IOException e) { + logger.error("从jar包中查找文件过程中报错", e); + } + } else { + // 在IDEA中执行 + try { + List searchList = new LinkedList<>(); + FileUtils.findFile(FindClassUtils.class.getResource("/").getPath(), fileName, searchList); + if (!searchList.isEmpty()) { + fullName = searchList.get(0).getPath(); + } + } catch (Exception ex) { + logger.error("从project中查找文件过程中报错", ex); + } + } + + return fullName; + } + +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/HttpClientUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/HttpClientUtils.java new file mode 100644 index 0000000..44eb529 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/HttpClientUtils.java @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import org.apache.commons.httpclient.*; +import org.apache.commons.httpclient.methods.*; +import org.apache.commons.httpclient.params.HttpMethodParams; +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; + +/** + * HTTP接口调用,各模块继承自该类 + * Created by ChengLong on 2017-12-12. + */ +public class HttpClientUtils { + private static final String CHARSET = "UTF-8"; + private static final String HEADER_JSON_VALUE = "application/json"; + private static final Logger logger = LoggerFactory.getLogger(HttpClientUtils.class); + + private HttpClientUtils() { + } + + /** + * 添加header请求信息 + * + * @param method 请求的方式 + * @param headers 请求头信息 + */ + private static void setHeaders(HttpMethodBase method, Header... headers) { + if (method != null && headers != null && headers.length > 0) { + for (Header header : headers) { + if (header != null) method.setRequestHeader(header); + } + } + } + + /** + * 以流的方式获取返回的消息体 + */ + private static String responseBody(HttpMethodBase method) throws IOException { + if (method == null) return ""; + + StringBuilder stringBuffer = new StringBuilder(); + BufferedReader reader = new BufferedReader(new InputStreamReader(method.getResponseBodyAsStream())); + String str = ""; + while ((str = reader.readLine()) != null) { + stringBuffer.append(str); + } + return stringBuffer.toString(); + } + + /** + * HTTP通用接口调用(Get请求) + * + * @param url 地址 + * @return 调用结果 + */ + public static String doGet(String url, Header... headers) throws IOException { + String responseBody = ""; + GetMethod getMethod = new GetMethod(); + HttpClient httpClient = new HttpClient(); + // 设置 get 请求超时为 5 秒 + getMethod.getParams().setParameter(HttpMethodParams.SO_TIMEOUT, 3000); + // 设置请求重试处理,用的是默认的重试处理:请求三次 + getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new DefaultHttpMethodRetryHandler()); + // 设置请求头 + setHeaders(getMethod, headers); + + getMethod.setURI(new URI(url, true, CHARSET)); + int statusCode = httpClient.executeMethod(getMethod); + // 判断访问的状态码 + if (statusCode != HttpStatus.SC_OK) { + logger.error("请求出错: {}", getMethod.getStatusLine()); + } + // 读取 HTTP 响应内容,这里简单打印网页内容 + responseBody = responseBody(getMethod); + getMethod.releaseConnection(); + httpClient.getHttpConnectionManager().closeIdleConnections(0); + return responseBody; + } + + /** + * HTTP通用接口调用(Post请求) + * + * @param url 地址 + * @return 调用结果 + */ + public static String doPost(String url, String json, Header... headers) throws IOException { + String responses = ""; + PostMethod postMethod = new PostMethod(); + HttpClient httpClient = new HttpClient(); + postMethod.getParams().setParameter(HttpMethodParams.SO_TIMEOUT, 3000); + postMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new DefaultHttpMethodRetryHandler()); + // 设置请求头 + setHeaders(postMethod, headers); + postMethod.setURI(new URI(url, true, CHARSET)); + postMethod.addRequestHeader("Content-Type", HEADER_JSON_VALUE); + if (json != null && StringUtils.isNotBlank(json.trim())) { + RequestEntity requestEntity = new StringRequestEntity(json, HEADER_JSON_VALUE, CHARSET); + postMethod.setRequestHeader("Content-Length", String.valueOf(requestEntity.getContentLength())); + postMethod.setRequestEntity(requestEntity); + } + httpClient.executeMethod(postMethod); + responses = responseBody(postMethod); + postMethod.releaseConnection(); + httpClient.getHttpConnectionManager().closeIdleConnections(0); + return responses; + } + + /** + * 发送一次post请求到指定的地址,不向上抛出异常 + * + * @param url 接口地址 + * @return 调用结果 + */ + public static String doPut(String url, String json, Header... headers) throws IOException { + String responseBody = ""; + PutMethod putMethod = new PutMethod(); + HttpClient htpClient = new HttpClient(); + putMethod.setURI(new URI(url, true, CHARSET)); + putMethod.getParams().setParameter(HttpMethodParams.SO_TIMEOUT, 3000); + putMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new DefaultHttpMethodRetryHandler()); + // 设置请求头 + setHeaders(putMethod, headers); + if (json != null && StringUtils.isNotBlank(json.trim())) { + RequestEntity requestEntity = new StringRequestEntity(json, HEADER_JSON_VALUE, CHARSET); + putMethod.setRequestHeader("Content-Length", String.valueOf(requestEntity.getContentLength())); + putMethod.setRequestEntity(requestEntity); + } + int statusCode = htpClient.executeMethod(putMethod); + if (statusCode != HttpStatus.SC_OK) { + return ""; + } + responseBody = responseBody(putMethod); + putMethod.releaseConnection(); + htpClient.getHttpConnectionManager().closeIdleConnections(0); + return responseBody; + } + + /** + * 发送一次get请求到指定的地址,不向上抛出异常 + * + * @param url 接口地址 + * @return 调用结果 + */ + public static String doGetIgnore(String url, Header... headers) { + String response = ""; + try { + response = doGet(url, headers); + } catch (Exception e) { + logger.error("HTTP通用接口调用(Get)失败", e); + } + return response; + } + + /** + * 发送一次post请求到指定的地址,不向上抛出异常 + * + * @param url 接口地址 + * @return 调用结果 + */ + public static String doPostIgnore(String url, String json, Header... headers) { + String response = ""; + try { + response = doPost(url, json, headers); + } catch (Exception e) { + logger.error("HTTP通用接口调用(Post)失败", e); + } + return response; + } + + /** + * 发送一次put请求到指定的地址,不向上抛出异常 + * + * @param url 接口地址 + * @return 调用结果 + */ + public static String doPutIgnore(String url, String json, Header... headers) { + String response = ""; + try { + response = doPut(url, json, headers); + } catch (Exception e) { + logger.error("HTTP通用接口调用(Put)失败", e); + } + + return response; + } + +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/IOUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/IOUtils.java new file mode 100644 index 0000000..a82a164 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/IOUtils.java @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.Closeable; + +/** + * io流工具类 + * + * @author ChengLong 2019-3-27 11:17:56 + */ +public class IOUtils { + private static final Logger logger = LoggerFactory.getLogger(IOUtils.class); + + private IOUtils() {} + + /** + * 关闭多个流 + */ + public static void close(Closeable... closeables) { + if (closeables != null && closeables.length > 0) { + for (Closeable io : closeables) { + try { + if (io != null) { + io.close(); + } + } catch (Exception e) { + logger.error("close 对象失败", e); + } + } + } + } + + /** + * 关闭多个process对象 + */ + public static void close(Process... process) { + if (process != null && process.length > 0) { + for (Process pro : process) { + try { + if (pro != null) { + pro.destroy(); + } + } catch (Exception e) { + logger.error("close process 对象失败", e); + } + } + } + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/MathUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/MathUtils.java new file mode 100644 index 0000000..b2a323d --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/MathUtils.java @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import java.math.*; + +/** + * 数据计算工具类 + * + * @author ChengLong 2019年9月29日 13:50:31 + */ +public class MathUtils { + + private MathUtils() {} + + /** + * 计算百分比,并保留指定的小数位 + * + * @param molecule 分子 + * @param denominator 分母 + * @param scale 精度 + * @return 百分比 + */ + public static double percent(long molecule, long denominator, int scale) { + if (molecule == 0 || denominator == 0) { + return 0.00; + } + return BigDecimal.valueOf(100.00 * molecule / denominator).setScale(scale, RoundingMode.HALF_UP).doubleValue(); + } + + /** + * 将指定double类型数据以四舍五入的方式保留指定的精度 + * + * @param data 数据 + * @param scale 精度 + * @return 四舍五入后的数据 + */ + public static double doubleScale(double data, int scale) { + return BigDecimal.valueOf(data).setScale(scale, RoundingMode.HALF_UP).doubleValue(); + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/OSUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/OSUtils.java new file mode 100644 index 0000000..19955df --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/OSUtils.java @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.lang.management.ManagementFactory; +import java.net.InetAddress; +import java.net.NetworkInterface; +import java.net.ServerSocket; +import java.util.Enumeration; +import java.util.Random; + +/** + * 用于获取服务器负载信息,包括磁盘io、cpu负载、内存使用、网络使用等等 + * 注:使用此工具需预先安装:sudo yum install sysstat + * + * @author ChengLong 2019-04-08 13:57:31 + */ +public class OSUtils { + private static String ip; + private static String hostname; + private static String pid; + private static Random random = new Random(); + private static final String OSNAME = "os.name"; + private static final Logger logger = LoggerFactory.getLogger(OSUtils.class); + + private OSUtils() { + } + + /** + * 获取主机地址信息 + */ + public static InetAddress getHostLANAddress() { + try { + InetAddress candidateAddress = null; + // 遍历所有的网络接口 + for (Enumeration ifaces = NetworkInterface.getNetworkInterfaces(); ifaces.hasMoreElements(); ) { + NetworkInterface iface = ifaces.nextElement(); + // 在所有的接口下再遍历IP + for (Enumeration inetAddrs = iface.getInetAddresses(); inetAddrs.hasMoreElements(); ) { + InetAddress inetAddr = inetAddrs.nextElement(); + if (!inetAddr.isLoopbackAddress()) { + // 排除loopback类型地址 + if (inetAddr.isSiteLocalAddress()) { + // 如果是site-local地址 + return inetAddr; + } else if (candidateAddress == null) { + // site-local类型的地址未被发现,先记录候选地址 + candidateAddress = inetAddr; + } + } + } + } + if (candidateAddress != null) { + return candidateAddress; + } + // 如果没有发现 non-loopback地址.只能用最次选的方案 + return InetAddress.getLocalHost(); + } catch (Exception e) { + logger.error("获取主机地址信息失败", e); + } + return null; + } + + /** + * 获取本机的ip地址 + * + * @return ip地址 + */ + public static String getIp() { + if (StringUtils.isBlank(ip)) { + InetAddress inetAddress = getHostLANAddress(); + if (inetAddress != null) { + ip = inetAddress.getHostAddress(); + } + } + return ip; + } + + /** + * 获取本机的hostname + * + * @return hostname + */ + public static String getHostName() { + if (StringUtils.isBlank(hostname)) { + InetAddress inetAddress = getHostLANAddress(); + if (inetAddress != null) { + hostname = inetAddress.getHostName(); + } + } + return hostname; + } + + + /** + * 随机获取系统未被使用的端口号 + */ + public static int getRundomPort() { + int port = 0; + try (ServerSocket socket = new ServerSocket(0)) { + port = socket.getLocalPort(); + logger.debug("成功获取随机端口号:{}", port); + } catch (Exception e) { + logger.error("端口号{}已被占用,尝试扫描新的未被占用的端口号."); + } + return port; + } + + /** + * 获取当前进程的pid + * + * @return pid + */ + public static String getPid() { + if (StringUtils.isBlank(pid)) { + pid = ManagementFactory.getRuntimeMXBean().getName().split("@")[0]; + } + return pid; + } + + /** + * 判断当前运行环境是否为linux + */ + public static boolean isLinux() { + return !isWindows() && !isMac(); + } + + /** + * 判断当前运行环境是否为windows + */ + public static boolean isWindows() { + String os = System.getProperty(OSNAME); + return os.toLowerCase().startsWith("windows"); + } + + /** + * 判断当前是否运行在本地环境下 + * 本地环境包括:Windows、Mac OS + */ + public static boolean isLocal() { + return isWindows() || isMac(); + } + + /** + * 是否为mac os环境 + */ + public static boolean isMac() { + String os = System.getProperty(OSNAME); + return os.toLowerCase().contains("mac"); + } + +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/ProcessUtil.java b/fire-common/src/main/java/com/zto/fire/common/util/ProcessUtil.java new file mode 100644 index 0000000..3ab2032 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/ProcessUtil.java @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.InputStreamReader; +import java.util.Objects; + +/** + * 执行命令的工具 + * + * @author ChengLong 2019-4-10 15:50:23 + */ +public class ProcessUtil { + private static final Logger logger = LoggerFactory.getLogger(ProcessUtil.class); + + private ProcessUtil() {} + + /** + * 执行多条linux命令,不返回命令执行日志 + * + * @param commands linux命令 + * @return 命令执行结果的一行数据 + */ + public static void executeCmds(String... commands) { + Objects.requireNonNull(commands, "命令不能为空"); + for (String command : commands) { + executeCmdForLine(command); + } + } + + /** + * 执行一条linux命令,仅返回命令的一行 + * + * @param cmd linux命令 + * @return 命令执行结果的一行数据 + */ + public static String executeCmdForLine(String cmd) { + if (!OSUtils.isLinux() || StringUtils.isBlank(cmd)) { + // 如果是windows环境 + return " "; + } + Process process = null; + BufferedReader reader = null; + String result = ""; + try { + process = Runtime.getRuntime().exec(cmd); + reader = new BufferedReader(new InputStreamReader(process.getInputStream())); + String line = ""; + while ((line = reader.readLine()) != null) { + if (StringUtils.isNotBlank(line)) { + result = line; + } + } + } catch (Exception e) { + logger.error("执行命令报错", e); + } finally { + IOUtils.close(process); + IOUtils.close(reader); + } + return result; + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/ReflectionUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/ReflectionUtils.java new file mode 100644 index 0000000..e02e1b9 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/ReflectionUtils.java @@ -0,0 +1,299 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.lang.annotation.Annotation; +import java.lang.annotation.ElementType; +import java.lang.reflect.Field; +import java.lang.reflect.Method; +import java.util.*; +import java.util.concurrent.ConcurrentHashMap; + +/** + * 反射工具类,获取各元素信息后缓存到map中 + * Created by ChengLong on 2017-03-30. + */ +public class ReflectionUtils { + private static final Map, Map> cacheFieldMap = new ConcurrentHashMap<>(); + private static final Map, Map> cacheMethodMap = new ConcurrentHashMap<>(); + private static final Logger logger = LoggerFactory.getLogger(ReflectionUtils.class); + + private ReflectionUtils() { + } + + public static void setAccessible(Field field) { + if (field != null) field.setAccessible(true); + } + + public static void setAccessible(Method method) { + if (method != null) method.setAccessible(true); + } + + /** + * 根据类名反射获取Class对象 + */ + public static Class forName(String className) { + try { + return Class.forName(className); + } catch (Exception e) { + logger.error("未找到类信息:" + className, e); + return null; + } + } + + /** + * 用于判断某类是否存在指定的字段 + */ + public static boolean containsField(Class clazz, String fieldName) { + Field field = getFieldByName(clazz, fieldName); + return field != null ? true : false; + } + + /** + * 获取所有公有字段,并返回Map + */ + private static Map getFields(Class clazz) { + if (clazz == null) { + return Collections.emptyMap(); + } + Field[] fields = clazz.getFields(); + Map fieldMap = new HashMap<>(fields.length); + for (Field field : fields) { + fieldMap.put(field.getName(), field); + } + return fieldMap; + } + + /** + * 获取所有声明字段,并返回Map + */ + private static Map getDeclaredFields(Class clazz) { + if (clazz == null) { + return Collections.emptyMap(); + } + Field[] fields = clazz.getDeclaredFields(); + Map fieldMap = new HashMap<>(fields.length); + for (Field field : fields) { + setAccessible(field); + fieldMap.put(field.getName(), field); + } + return fieldMap; + } + + /** + * 获取所有字段,含私有和继承而来的,并返回Map + */ + public static Map getAllFields(Class clazz) { + if (!cacheFieldMap.containsKey(clazz)) { + Map fieldMap = new HashMap<>(); + fieldMap.putAll(getFields(clazz)); + fieldMap.putAll(getDeclaredFields(clazz)); + cacheFieldMap.put(clazz, fieldMap); + } + + return cacheFieldMap.get(clazz); + } + + /** + * 根据成员变量名称获取Filed类型(从缓存中获取) + */ + public static Field getFieldByName(Class clazz, String fieldName) { + return getAllFields(clazz).get(fieldName); + } + + /** + * 获取所有方法,含私有和继承而来的,并返回Map + */ + public static Map getAllMethods(Class clazz) { + if (!cacheMethodMap.containsKey(clazz)) { + Map methodMap = new HashMap<>(); + methodMap.putAll(getMethods(clazz)); + methodMap.putAll(getDeclaredMethods(clazz)); + cacheMethodMap.put(clazz, methodMap); + } + + return cacheMethodMap.get(clazz); + } + + /** + * 根据方法名称获取Method类型(从缓存中获取) + * + * @param clazz 类类型 + * @param methodName 方法名称 + * @return Method + */ + public static Method getMethodByName(Class clazz, String methodName) { + return getAllMethods(clazz).get(methodName); + } + + /** + * 用于判断某类是否存在指定的方法名 + */ + public static boolean containsMethod(Class clazz, String methodName) { + Method method = getMethodByName(clazz, methodName); + return method != null ? true : false; + } + + /** + * 获取所有公有方法,并返回Map + */ + private static Map getMethods(Class clazz) { + if (clazz == null) { + return Collections.emptyMap(); + } + Method[] methods = clazz.getMethods(); + Map methodMap = new HashMap<>(methods.length); + for (Method method : methods) { + methodMap.put(method.getName(), method); + } + return methodMap; + } + + /** + * 获取所有声明方法,并返回Map + */ + private static Map getDeclaredMethods(Class clazz) { + if (clazz == null) { + return Collections.emptyMap(); + } + Method[] methods = clazz.getDeclaredMethods(); + Map methodMap = new HashMap<>(methods.length); + for (Method method : methods) { + setAccessible(method); + methodMap.put(method.getName(), method); + } + return methodMap; + } + + /** + * 获取指定field的类型 + */ + public static Class getFieldType(Class clazz, String fieldName) { + if (clazz == null || StringUtils.isBlank(fieldName)) return null; + + try { + Map fieldMap = getAllFields(clazz); + if (fieldMap == null) { + return null; + } + Field field = fieldMap.get(fieldName); + if (field != null) { + return field.getType(); + } + } catch (Exception e) { + logger.error("指定的Field:" + fieldName + "不存在,请检查", e); + } + return null; + } + + /** + * 获取指定的annotation + * + * @param scope annotation所在的位置 + * @param memberName 成员名称,指定获取指定成员的Annotation实例 + */ + private static Annotation getAnnotation(Class clazz, ElementType scope, String memberName, Class annoClass) { + try { + if (ElementType.FIELD == scope) { + Field field = clazz.getDeclaredField(memberName); + setAccessible(field); + return field.getAnnotation(annoClass); + } else if (ElementType.METHOD == scope) { + Method method = clazz.getDeclaredMethod(memberName); + setAccessible(method); + return method.getAnnotation(annoClass); + } else if (ElementType.TYPE == scope) { + return clazz.getAnnotation(annoClass); + } + } catch (Exception e) { + logger.error("获取annotation出现异常", e); + } + return null; + } + + /** + * 获取指定的annotation + * + * @param scope annotation所在的位置 + * @param memberName 成员名称,指定获取指定成员的Annotation实例 + */ + private static List getAnnotations(Class clazz, ElementType scope, String memberName) { + try { + if (ElementType.FIELD == scope) { + Field field = clazz.getDeclaredField(memberName); + setAccessible(field); + return Arrays.asList(field.getDeclaredAnnotations()); + } else if (ElementType.METHOD == scope) { + Method method = clazz.getDeclaredMethod(memberName); + setAccessible(method); + return Arrays.asList(method.getDeclaredAnnotations()); + } else if (ElementType.TYPE == scope) { + return Arrays.asList(clazz.getDeclaredAnnotations()); + } + } catch (Exception e) { + logger.error("获取annotation出现异常", e); + } + return Collections.emptyList(); + } + + /** + * 获取Field指定的annotation + */ + public static Annotation getFieldAnnotation(Class clazz, String fieldName, Class annoClass) { + return getAnnotation(clazz, ElementType.FIELD, fieldName, annoClass); + } + + /** + * 获取Field所有annotation + */ + public static List getFieldAnnotations(Class clazz, String fieldName) { + return getAnnotations(clazz, ElementType.FIELD, fieldName); + } + + /** + * 获取Method指定的annotation + */ + public static Annotation getMethodAnnotation(Class clazz, String methodName, Class annoClass) { + return getAnnotation(clazz, ElementType.METHOD, methodName, annoClass); + } + + /** + * 获取Method所有annotation + */ + public static List getMethodAnnotations(Class clazz, String methodName) { + return getAnnotations(clazz, ElementType.METHOD, methodName); + } + + /** + * 获取类指定annotation + */ + public static Annotation getClassAnnotation(Class clazz, Class annoClass) { + return getAnnotation(clazz, ElementType.TYPE, clazz.getName(), annoClass); + } + + /** + * 获取类所有annotation + */ + public static List getClassAnnotations(Class clazz) { + return getAnnotations(clazz, ElementType.TYPE, clazz.getName()); + } +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/StringsUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/StringsUtils.java new file mode 100644 index 0000000..3b63eb8 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/StringsUtils.java @@ -0,0 +1,262 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import org.apache.commons.lang3.StringUtils; + +import java.util.Map; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * 字符串工具类 + * + * @author ChengLong 2019-4-11 09:06:26 + */ +public class StringsUtils { + private StringsUtils() { + } + + /** + * 处理成超链接 + * + * @param str + * @return + */ + public static String hrefTag(String str) { + return append("", str, ""); + } + + /** + * 追加换行 + * + * @param str + * @return + */ + public static String brTag(String str) { + return append(str, "
"); + } + + /** + * 字符串拼接 + * + * @param strs 多个字符串 + * @return 拼接结果 + */ + public static String append(String... strs) { + StringBuilder sb = new StringBuilder(); + if (null != strs && strs.length > 0) { + for (String str : strs) { + sb.append(str); + } + } + + return sb.toString(); + } + + /** + * replace多组字符串中的数据 + * + * @param map + * @return + * @apiNote replace(str, ImmutableMap.of ( " # ", " ", ", ", " ")) + */ + public static String replace(String str, Map map) { + if (StringUtils.isNotBlank(str) && null != map && map.size() > 0) { + for (Map.Entry entry : map.entrySet()) { + str = str.replace(entry.getKey(), entry.getValue()); + } + } + return str; + } + + /** + * 16进制的字符串表示转成字节数组 + * + * @param hexString 16进制格式的字符串 + * @return 转换后的字节数组 + **/ + public static byte[] toByteArray(String hexString) { + if (StringUtils.isEmpty(hexString)) + throw new IllegalArgumentException("this hexString must not be empty"); + + hexString = hexString.toLowerCase(); + final byte[] byteArray = new byte[hexString.length() / 2]; + int k = 0; + for (int i = 0; i < byteArray.length; i++) {//因为是16进制,最多只会占用4位,转换成字节需要两个16进制的字符,高位在先 + byte high = (byte) (Character.digit(hexString.charAt(k), 16) & 0xff); + byte low = (byte) (Character.digit(hexString.charAt(k + 1), 16) & 0xff); + byteArray[i] = (byte) (high << 4 | low); + k += 2; + } + return byteArray; + } + + /** + * 字节数组转成16进制表示格式的字符串 + * + * @param byteArray 需要转换的字节数组 + * @return 16进制表示格式的字符串 + **/ + public static String toHexString(byte[] byteArray) { + if (byteArray == null || byteArray.length < 1) + throw new IllegalArgumentException("this byteArray must not be null or empty"); + + final StringBuilder hexString = new StringBuilder(); + for (int i = 0; i < byteArray.length; i++) { + if ((byteArray[i] & 0xff) < 0x10)//0~F前面不零 + hexString.append('0'); + hexString.append(Integer.toHexString(0xFF & byteArray[i])); + } + return hexString.toString().toLowerCase(); + } + + /** + * 具有容错功能的substring,如果下标越界,则默认取到尾部 + * + * @param str 原字符串 + * @param start 索引起始 + * @param end 索引结束 + * @return 截取后的子字符串 + */ + public static String substring(String str, int start, int end) { + if (StringUtils.isBlank(str) || Math.abs(start) > Math.abs(end)) { + return ""; + } + int length = str.length(); + if (length >= Math.abs(end)) { + return str.substring(Math.abs(start), Math.abs(end)); + } else { + return str.substring(Math.abs(start), Math.abs(length)); + } + } + + /** + * 判断一个字符串是否为整型 + * 1. 包号空字符串的不能看作是整数 + * 2. 超过Int最大值的不能作为整数 + */ + public static boolean isInt(String str) { + if (StringUtils.isBlank(str)) return false; + try { + Integer.parseInt(str); + return true; + } catch (Exception e) { + // 如果超过精度,则不能看做是整型 + return false; + } + } + + /** + * 判断字符串是否为整数(前面是数值类型,最后是L或l结尾,也认为是长整数) + */ + public static boolean isLong(String str) { + if (StringUtils.isBlank(str)) return false; + str = str.toUpperCase(); + if (str.endsWith("L")) { + try { + Long.parseLong(str.replace("L", "")); + return true; + } catch (Exception e) { + return false; + } + } + + return false; + } + + /** + * 用于判断字符串是否为布尔类型 + */ + public static boolean isBoolean(String str) { + if (StringUtils.isBlank(str)) return false; + return "true".equalsIgnoreCase(str) || "false".equalsIgnoreCase(str); + } + + /** + * 用于判断字符串是否为float类型 + * 以字母F或f结尾的合法数值型字符串认为是float类型 + */ + public static boolean isFloat(String str) { + if (StringUtils.isBlank(str)) return false; + str = str.toUpperCase(); + if (str.endsWith("F")) { + try { + Float.parseFloat(str.replace("F", "")); + return true; + } catch (Exception e) { + return false; + } + } + return false; + } + + /** + * 用于判断字符串是否为float类型 + * 以字母F或f结尾的合法数值型字符串认为是float类型 + */ + public static boolean isDouble(String str) { + if (StringUtils.isBlank(str)) return false; + str = str.toUpperCase(); + if (str.endsWith("D")) { + try { + Double.parseDouble(str.replace("D", "")); + return true; + } catch (Exception e) { + return false; + } + } + return false; + } + + /** + * 根据字符串具体的类型进行转换,返回转换类型之后的数据 + */ + public static Object parseString(String str) { + if (StringsUtils.isLong(str)) { + String longStr = str.toUpperCase().replace("L", ""); + return Long.valueOf(longStr); + } else if (StringsUtils.isInt(str)) { + return Integer.valueOf(str); + } else if (StringsUtils.isBoolean(str)) { + return Boolean.valueOf(str); + } else if (StringsUtils.isFloat(str)) { + String floatStr = str.toUpperCase().replace("F", ""); + return Float.valueOf(floatStr); + } else if (StringsUtils.isDouble(str)) { + String doubleStr = str.toUpperCase().replace("D", ""); + return Double.valueOf(doubleStr); + } else { + return str; + } + } + + /** + * 用于判断给定的字符串是否为数值类型 + * @param str + * 字符串 + * @return + * true:数值类型 false:非数值类型 + */ + public static boolean isNumeric(String str) { + Pattern pattern = Pattern.compile("(^[1-9]\\d*\\.?\\d*$)|(^0\\.\\d*[1-9]$)"); + Matcher matcher = pattern.matcher(str); + return matcher.matches(); + } + +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/UnitFormatUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/UnitFormatUtils.java new file mode 100644 index 0000000..66b983d --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/UnitFormatUtils.java @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import java.math.BigDecimal; +import java.math.RoundingMode; +import java.util.Arrays; +import java.util.LinkedList; +import java.util.List; + +/** + * 通用的计量单位转换工具 + * + * @author ChengLong 2019年9月29日 18:05:56 + */ +public class UnitFormatUtils { + + /** + * 磁盘数据单位体系中的单位的枚举 + */ + public enum DateUnitEnum { + // 数据类型的内容 + BYTE, KB, MB, GB, TB, PB, EB; + // 对数据类型进行排序 + private static List orderList = Arrays.asList(BYTE, KB, MB, GB, TB, PB, EB); + // 定义计量单位换算关系 + private static List metric = init(1024, 1024, 1024, 1024, 1024, 1024, 1); + } + + /** + * 时间单位体系中的单位的枚举 + */ + public enum TimeUnitEnum { + // 数据类型的内容 + US, MS, S, MIN, H, D; + // 对数据类型进行排序 + private static List orderList = Arrays.asList(US, MS, S, MIN, H, D); + // 定义计量单位换算关系 + private static List metric = init(1000, 1000, 60, 60, 24, 1); + } + + + /** + * 获取当前单位在list中的索引值 + * + * @param unit 要查询的单位 + * @return 索引值 + */ + private static int getIndex(List orderList, T unit) { + for (int i = 0; i < orderList.size(); i++) { + if (orderList.get(i) == unit) { + return i; + } + } + return 0; + } + + /** + * 初始化计量单位列表 + */ + private static List init(int ... metrics) { + List list = new LinkedList<>(); + for (int metric : metrics) { + list.add(new BigDecimal(metric)); + } + return list; + } + + /** + * 将传入磁盘数据的大数/小数等等转换为易读的形式 + * 易读的标准是可以展示为某一单位区间内的大于1的数,自动取两位小数 + * + * @param data 传入的初始数值 + * @param unit 传入数值的单位 + * @return 转换过后的易读字符串,带单位 + */ + public static String readable(Number data, DateUnitEnum unit) { + BigDecimal data1 = new BigDecimal(data.toString()); + // 获取初始参数的索引值 + int index = getIndex(DateUnitEnum.orderList, unit); + // 判定传入参数在当前单位下,是否超出其数值区间 + if (data.longValue() < DateUnitEnum.metric.get(DateUnitEnum.orderList.indexOf(unit)).longValue() || unit == DateUnitEnum.orderList.get(DateUnitEnum.orderList.size() - 1)) { + // 判定传入的数值是否小于1,如果小于1,则进入 + if (data.longValue() < 1 && unit != DateUnitEnum.orderList.get(0)) { + // 对小于1的参数进行放大,向上进一位:数值放大相应进制,进制下调一位 + return readable(data1.multiply(DateUnitEnum.metric.get(index - 1)), DateUnitEnum.orderList.get(index - 1)); + } + // 如果是本单位区间的大于1的值,进行返回处理 + return data1.divide(new BigDecimal(1), 2, RoundingMode.HALF_UP) + unit.toString(); + } + // 超出了当前单位的取值范围 + else { + // 对数值升位:数值除以相应的进制,单位上调一位 + return readable(data1.divide(DateUnitEnum.metric.get(index), 2, RoundingMode.HALF_UP), DateUnitEnum.orderList.get(index + 1)); + } + } + + /** + * 将磁盘数据大小从一种单位转换为传入的单位 + * + * @param data 输入的初始参数 + * @param fromUnit 输入的初始参数的单位 + * @param toUnit 要转换的目标单位 + */ + public static String format(Number data, DateUnitEnum fromUnit, DateUnitEnum toUnit) { + BigDecimal data1 = new BigDecimal(data.toString()); + // 获取初始参数的索引值 + int index = getIndex(DateUnitEnum.orderList, fromUnit); + // 判别初始参数索引是否高于目标参数索引 + if (DateUnitEnum.orderList.indexOf(fromUnit) > DateUnitEnum.orderList.indexOf(toUnit)) { + // 递归调用方法,对参数放大相应进制倍数,将单位下调一位 + return format(data1.multiply(DateUnitEnum.metric.get(index - 1)), DateUnitEnum.orderList.get(index - 1), toUnit); + // 判别初始参数索引是否低于目标参数索引 + } else if (DateUnitEnum.orderList.indexOf(fromUnit) < DateUnitEnum.orderList.indexOf(toUnit)) { + // 递归调用方法,对参数缩小相应进制倍数,将单位上调一位 + return format(data1.divide(DateUnitEnum.metric.get(index), 2, RoundingMode.HALF_UP), DateUnitEnum.orderList.get(index + 1), toUnit); + // 取得fromUnit与toUnit的索引值相同的情况 + } else { + // 进行数据处理,返回相应结果 + return data1.divide(new BigDecimal(1), 2, RoundingMode.HALF_UP) + fromUnit.toString(); + } + } + + /** + * 将传入时间的大数/小数等等转换为易读的形式 + * 易读的标准是可以展示为某一单位区间内的大于1的数,自动取两位小数 + * + * @param data 传入的初始数值 + * @param unit 传入数值的单位 + * @return 转换过后的易读字符串,带单位 + */ + public static String readable(Number data, TimeUnitEnum unit) { + BigDecimal data1 = new BigDecimal(data.toString()); + // 获取初始参数的索引值 + int index = getIndex(TimeUnitEnum.orderList, unit); + // 判定传入参数在当前单位下,是否超出其数值区间 + if (data.longValue() < TimeUnitEnum.metric.get(TimeUnitEnum.orderList.indexOf(unit)).longValue() || unit == TimeUnitEnum.orderList.get(TimeUnitEnum.orderList.size() - 1)) { + // 判定传入的数值是否小于1,如果小于1,则进入 + if (data.longValue() < 1 && unit != TimeUnitEnum.orderList.get(0)) { + // 对小于1的参数进行放大,向上进一位:数值放大相应进制,进制下调一位 + return readable(data1.multiply(TimeUnitEnum.metric.get(index - 1)), TimeUnitEnum.orderList.get(index - 1)); + } + // 如果是本单位区间的大于1的值,进行返回处理 + return data1.divide(new BigDecimal(1), 2, RoundingMode.HALF_UP) + unit.toString().toLowerCase(); + } + // 超出了当前单位的取值范围 + else { + // 对数值升位:数值除以相应的进制,单位上调一位 + return readable(data1.divide(TimeUnitEnum.metric.get(index), 2, RoundingMode.HALF_UP), TimeUnitEnum.orderList.get(index + 1)); + } + } + + /** + * 将时间从一种单位转换为传入的单位 + * + * @param data 输入的初始参数 + * @param fromUnit 输入的初始参数的单位 + * @param toUnit 要转换的目标单位 + */ + public static String format(Number data, TimeUnitEnum fromUnit, TimeUnitEnum toUnit) { + BigDecimal data1 = new BigDecimal(data.toString()); + // 获取初始参数的索引值 + int index = getIndex(TimeUnitEnum.orderList, fromUnit); + // 判别初始参数索引是否高于目标参数索引 + if (TimeUnitEnum.orderList.indexOf(fromUnit) > TimeUnitEnum.orderList.indexOf(toUnit)) { + // 递归调用方法,对参数放大相应进制倍数,将单位下调一位 + return format(data1.multiply(TimeUnitEnum.metric.get(index - 1)), TimeUnitEnum.orderList.get(index - 1), toUnit); + // 判别初始参数索引是否低于目标参数索引 + } else if (TimeUnitEnum.orderList.indexOf(fromUnit) < TimeUnitEnum.orderList.indexOf(toUnit)) { + // 递归调用方法,对参数缩小相应进制倍数,将单位上调一位 + return format(data1.divide(TimeUnitEnum.metric.get(index), 2, RoundingMode.HALF_UP), TimeUnitEnum.orderList.get(index + 1), toUnit); + // 取得fromUnit与toUnit的索引值相同的情况 + } else { + // 进行数据处理,返回相应结果 + return data1.divide(new BigDecimal(1), 2, RoundingMode.HALF_UP) + fromUnit.toString(); + } + } + +} diff --git a/fire-common/src/main/java/com/zto/fire/common/util/YarnUtils.java b/fire-common/src/main/java/com/zto/fire/common/util/YarnUtils.java new file mode 100644 index 0000000..92d54b9 --- /dev/null +++ b/fire-common/src/main/java/com/zto/fire/common/util/YarnUtils.java @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util; + +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * Yarn相关工具类 + * @author ChengLong 2018年8月10日 16:03:29 + */ +public class YarnUtils { + + private YarnUtils() {} + /** + * 使用正则提取日志中的applicationId + * @param log + * @return + */ + public static String getAppId(String log) { + // 正则表达式规则 + String regEx = "application_[0-9]+_[0-9]+"; + // 编译正则表达式 + Pattern pattern = Pattern.compile(regEx); + // 忽略大小写的写法 + Matcher matcher = pattern.matcher(log); + // 查找字符串中是否有匹配正则表达式的字符/字符串 + if(matcher.find()) { + return matcher.group(); + } else { + return ""; + } + } +} diff --git a/fire-common/src/main/resources/log4j.properties b/fire-common/src/main/resources/log4j.properties new file mode 100644 index 0000000..e7faf53 --- /dev/null +++ b/fire-common/src/main/resources/log4j.properties @@ -0,0 +1,32 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +log4j.rootLogger = INFO, stdout, D + +### 输出到控制台 ### +log4j.appender.stdout = org.apache.log4j.ConsoleAppender +log4j.appender.stdout.Target = System.out +log4j.appender.stdout.layout = org.apache.log4j.PatternLayout +log4j.appender.stdout.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss.SSS} [%thread]-[%p]-[%c] %m%n + +### 输出到日志文件 ### +log4j.appender.D = org.apache.log4j.DailyRollingFileAppender +log4j.appender.D.File = ./fire.log +log4j.appender.D.Append = true +log4j.appender.D.Threshold = INFO +log4j.appender.D.layout = org.apache.log4j.PatternLayout +log4j.appender.D.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss.SSS} [%thread]-[%p]-[%c]-[%l] %m%n \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/conf/FireConf.scala b/fire-common/src/main/scala/com/zto/fire/common/conf/FireConf.scala new file mode 100644 index 0000000..c827298 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/conf/FireConf.scala @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.conf + +import com.zto.fire.common.util.PropUtils + +/** + * 常量配置类 + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 15:00 + */ +private[fire] class FireConf { + // 用于区分不同的流计算引擎类型 + private[fire] lazy val engine = PropUtils.engine + + // Fire框架相关配置 + val frameworkConf = FireFrameworkConf + // kafka相关配置 + val kafkaConf = FireKafkaConf + // rocketMQ相关配置 + val rocketMQConf = FireRocketMQConf + // impala相关配置 + val kuduConf = FireKuduConf + // 颜色预定义 + val ps1Conf = FirePS1Conf + // hive相关配置 + val hiveConf = FireHiveConf +} + +object FireConf extends FireConf \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/conf/FireFrameworkConf.scala b/fire-common/src/main/scala/com/zto/fire/common/conf/FireFrameworkConf.scala new file mode 100644 index 0000000..71be803 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/conf/FireFrameworkConf.scala @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.conf + +import com.zto.fire.common.util.PropUtils +import org.apache.commons.lang3.StringUtils + +/** + * Fire框架相关配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 14:54 + */ +private[fire] object FireFrameworkConf { + // fire版本号 + lazy val FIRE_VERSION = "fire.version" + lazy val DRIVER_CLASS_NAME = "driver.class.name" + // fire内置线程池大小 + lazy val FIRE_THREAD_POOL_SIZE = "fire.thread.pool.size" + // fire内置定时任务线程池大小 + lazy val FIRE_THREAD_POOL_SCHEDULE_SIZE = "fire.thread.pool.schedule.size" + // 是否启用fire框架restful服务 + lazy val FIRE_REST_ENABLE = "fire.rest.enable" + lazy val FIRE_REST_URL_HOSTNAME = "fire.rest.url.hostname" + lazy val FIRE_CONF_DEPLOY_ENGINE = "fire.conf.deploy.engine" + lazy val FIRE_ENGINE_CONF_HELPER = "com.zto.fire.core.conf.EngineConfHelper" + // rest接口权限认证 + lazy val FIRE_REST_FILTER_ENABLE = "fire.rest.filter.enable" + // 用于配置是否关闭fire内置的所有累加器 + lazy val FIRE_ACC_ENABLE = "fire.acc.enable" + // 日志累加器开关 + lazy val FIRE_ACC_LOG_ENABLE = "fire.acc.log.enable" + // 多值累加器开关 + lazy val FIRE_ACC_MULTI_COUNTER_ENABLE = "fire.acc.multi.counter.enable" + // 多时间维度累加器开关 + lazy val FIRE_ACC_MULTI_TIMER_ENABLE = "fire.acc.multi.timer.enable" + // env累加器开关 + lazy val FIRE_ACC_ENV_ENABLE = "fire.acc.env.enable" + // fire框架埋点日志开关,当关闭后,埋点的日志将不再被记录到日志累加器中,并且也不再打印 + lazy val FIRE_LOG_ENABLE = "fire.log.enable" + // 用于限定fire框架中sql日志的字符串长度 + lazy val FIRE_LOG_SQL_LENGTH = "fire.log.sql.length" + // fire框架rest接口服务最大线程数 + lazy val FIRE_RESTFUL_MAX_THREAD = "fire.restful.max.thread" + lazy val FIRE_CONNECTOR_SHUTDOWN_HOOK_ENABLE = "fire.connector.shutdown_hook.enable" + // 用于配置是否抛弃配置中心独立运行 + lazy val FIRE_CONFIG_CENTER_ENABLE = "fire.config_center.enable" + // 本地运行环境下(Windows、Mac)是否调用配置中心接口获取配置信息 + lazy val FIRE_CONFIG_CENTER_LOCAL_ENABLE = "fire.config_center.local.enable" + // 配置中心接口调用秘钥 + lazy val FIRE_CONFIG_CENTER_SECRET = "fire.config_center.register.conf.secret" + // fire框架restful端口冲突重试次数 + lazy val FIRE_RESTFUL_PORT_RETRY_NUM = "fire.restful.port.retry_num" + // fire框架restful端口冲突重试时间(ms) + lazy val FIRE_RESTFUL_PORT_RETRY_DURATION = "fire.restful.port.retry_duration" + lazy val FIRE_REST_SERVER_SECRET = "fire.rest.server.secret" + lazy val FIRE_LOG_LEVEL_CONF_PREFIX = "fire.log.level.conf." + lazy val FIRE_USER_COMMON_CONF = "fire.user.common.conf" + // 日志记录器保留最少的记录数 + lazy val FIRE_ACC_LOG_MIN_SIZE = "fire.acc.log.min.size" + // 日志记录器保留最多的记录数 + lazy val FIRE_ACC_LOG_MAX_SIZE = "fire.acc.log.max.size" + // env累加器保留最多的记录数 + lazy val FIRE_ACC_ENV_MAX_SIZE = "fire.acc.env.max.size" + // env累加器保留最少的记录数 + lazy val FIRE_ACC_ENV_MIN_SIZE = "fire.acc.env.min.size" + // timer累加器保留最大的记录数 + lazy val FIRE_ACC_TIMER_MAX_SIZE = "fire.acc.timer.max.size" + // timer累加器清理几小时之前的记录 + lazy val FIRE_ACC_TIMER_MAX_HOUR = "fire.acc.timer.max.hour" + // 定时调度任务黑名单(定时任务方法名),以逗号分隔 + lazy val FIRE_SCHEDULER_BLACKLIST = "fire.scheduler.blacklist" + // 用于配置是否启用任务定时调度 + lazy val FIRE_TASK_SCHEDULE_ENABLE = "fire.task.schedule.enable" + // quartz最大线程池大小 + lazy val FIRE_QUARTZ_MAX_THREAD = "fire.quartz.max.thread" + // fire框架restful地址 + lazy val FIRE_REST_URL = "fire.rest.url" + lazy val FIRE_SHUTDOWN_EXIT = "fire.shutdown.auto.exit" + // 配置中心生产环境注册地址 + lazy val FIRE_CONFIG_CENTER_REGISTER_CONF_PROD_ADDRESS = "fire.config_center.register.conf.prod.address" + // 配置中心测试环境注册地址 + lazy val FIRE_CONFIG_CENTER_REGISTER_CONF_TEST_ADDRESS = "fire.config_center.register.conf.test.address" + // 配置打印黑名单,配置项以逗号分隔 + lazy val FIRE_CONF_PRINT_BLACKLIST = "fire.conf.print.blacklist" + // 是否启用动态配置功能 + lazy val FIRE_DYNAMIC_CONF_ENABLE = "fire.dynamic.conf.enable" + // 是否打印配置信息 + lazy val FIRE_CONF_SHOW_ENABLE = "fire.conf.show.enable" + // 是否将fire restful地址以日志形式打印 + lazy val FIRE_REST_URL_SHOW_ENABLE = "fire.rest.url.show.enable" + lazy val SPARK_STREAMING_CONF_FILE = "spark-streaming" + lazy val SPARK_STRUCTURED_STREAMING_CONF_FILE = "structured-streaming" + lazy val SPARK_CORE_CONF_FILE = "spark-core" + lazy val FLINK_CONF_FILE = "flink" + lazy val FLINK_STREAMING_CONF_FILE = "flink-streaming" + lazy val FLINK_BATCH_CONF_FILE = "flink-batch" + lazy val FIRE_DEPLOY_CONF_ENABLE = "fire.deploy_conf.enable" + lazy val FIRE_EXCEPTION_BUS_SIZE = "fire.exception_bus.size" + lazy val FIRE_BURIED_POINT_DATASOURCE_ENABLE = "fire.buried_point.datasource.enable" + lazy val FIRE_BURIED_POINT_DATASOURCE_MAX_SIZE = "fire.buried_point.datasource.max.size" + lazy val FIRE_BURIED_POINT_DATASOURCE_INITIAL_DELAY = "fire.buried_point.datasource.initialDelay" + lazy val FIRE_BURIED_POINT_DATASOURCE_PERIOD = "fire.buried_point.datasource.period" + lazy val FIRE_BURIED_POINT_DATASOURCE_MAP = "fire.buried_point.datasource.map." + lazy val FIRE_CONF_ADAPTIVE_PREFIX = "fire.conf.adaptive.prefix" + + /** + * 用于jdbc url的识别,当无法通过driver class识别数据源时,将从url中的端口号进行区分 + * 不同数据配置使用统一的前缀:fire.buried_point.datasource.map. + */ + def buriedPointDatasourceMap: Map[String, String] = PropUtils.sliceKeys(this.FIRE_BURIED_POINT_DATASOURCE_MAP) + // 获取当前任务的rest server访问地址 + lazy val fireRestUrl = PropUtils.getString(this.FIRE_REST_URL, "") + // 是否启用hostname作为fire rest url + lazy val restUrlHostname = PropUtils.getBoolean(this.FIRE_REST_URL_HOSTNAME, false) + // 不同引擎配置获取具体的实现 + lazy val confDeployEngine = PropUtils.getString(this.FIRE_CONF_DEPLOY_ENGINE, "") + // 定时解析埋点SQL的执行频率(s) + lazy val buriedPointDatasourcePeriod = PropUtils.getInt(this.FIRE_BURIED_POINT_DATASOURCE_PERIOD, 60) + // 定时解析埋点SQL的初始延迟(s) + lazy val buriedPointDatasourceInitialDelay = PropUtils.getInt(this.FIRE_BURIED_POINT_DATASOURCE_INITIAL_DELAY, 30) + // 用于存放埋点的队列最大大小,超过该大小将会被丢弃 + lazy val buriedPointDatasourceMaxSize = PropUtils.getInt(this.FIRE_BURIED_POINT_DATASOURCE_MAX_SIZE, 300) + // 是否开启数据源埋点 + lazy val buriedPointDatasourceEnable = PropUtils.getBoolean(this.FIRE_BURIED_POINT_DATASOURCE_ENABLE, true) + // 每个jvm实例内部queue用于存放异常对象数最大大小,避免队列过大造成内存溢出 + lazy val exceptionBusSize = PropUtils.getInt(this.FIRE_EXCEPTION_BUS_SIZE, 1000) + // 是否将配置同步到executor、taskmanager端 + lazy val deployConf = PropUtils.getBoolean(this.FIRE_DEPLOY_CONF_ENABLE, true) + // fire内置线程池大小 + lazy val threadPoolSize = PropUtils.getInt(this.FIRE_THREAD_POOL_SIZE, 5) + // fire内置定时任务线程池大小 + lazy val threadPoolSchedulerSize = PropUtils.getInt(this.FIRE_THREAD_POOL_SCHEDULE_SIZE, 5) + // 自适应前缀,调用getOriginalProperty避免栈溢出 + lazy val adaptivePrefix = PropUtils.getOriginalProperty(this.FIRE_CONF_ADAPTIVE_PREFIX).toBoolean + // 用户公共配置文件列表 + lazy val userCommonConf = PropUtils.getString(this.FIRE_USER_COMMON_CONF, "").split(",").map(conf => conf.trim).toList + // fire接口认证秘钥 + lazy val restServerSecret = PropUtils.getString(this.FIRE_REST_SERVER_SECRET) + // 用于配置是否在调用shutdown后主动退出jvm进程 + lazy val shutdownExit = PropUtils.getBoolean(this.FIRE_SHUTDOWN_EXIT, false) + // 是否启用为connector注册shutdown hook,当jvm退出前close + lazy val connectorShutdownHookEnable = PropUtils.getBoolean(this.FIRE_CONNECTOR_SHUTDOWN_HOOK_ENABLE, false) + + // fire日志打印黑名单 + lazy val fireConfBlackList: Set[String] = { + val blacklist = PropUtils.getString(this.FIRE_CONF_PRINT_BLACKLIST, "") + if (StringUtils.isNotBlank(blacklist)) blacklist.split(",").toSet else Set.empty + } + + // 获取driver的class name + lazy val driverClassName = PropUtils.getString(this.DRIVER_CLASS_NAME) + // 是否打印配置信息 + lazy val fireConfShow: Boolean = PropUtils.getBoolean(this.FIRE_CONF_SHOW_ENABLE, true) + // 是否将restful地址以日志方式打印 + lazy val fireRestUrlShow: Boolean = PropUtils.getBoolean(this.FIRE_REST_URL_SHOW_ENABLE, false) + // 获取动态配置参数 + lazy val dynamicConf: Boolean = PropUtils.getBoolean(this.FIRE_DYNAMIC_CONF_ENABLE, true) + + // 用于获取fire版本号 + lazy val fireVersion = PropUtils.getString(this.FIRE_VERSION, "1.0.0") + // quartz最大线程池大小 + lazy val quartzMaxThread = PropUtils.getString(this.FIRE_QUARTZ_MAX_THREAD, "8") + // 用于设置是否启用任务定时调度 + lazy val scheduleEnable = PropUtils.getBoolean(this.FIRE_TASK_SCHEDULE_ENABLE, true) + // 定时任务黑名单,配置的value为方法名,多个以逗号分隔 + def schedulerBlackList: String = PropUtils.getString(this.FIRE_SCHEDULER_BLACKLIST, "") + // env累加器开关 + lazy val accEnvEnable = PropUtils.getBoolean(this.FIRE_ACC_ENV_ENABLE, true) + // 是否启用Fire内置的restful服务 + lazy val restEnable = PropUtils.getBoolean(this.FIRE_REST_ENABLE, true) + // rest接口权限认证 + lazy val restFilter = PropUtils.getBoolean(this.FIRE_REST_FILTER_ENABLE, true) + // 是否关闭fire内置的所有累加器 + lazy val accEnable = PropUtils.getBoolean(this.FIRE_ACC_ENABLE, true) + // 日志累加器开关 + lazy val accLogEnable = PropUtils.getBoolean(this.FIRE_ACC_LOG_ENABLE, true) + // 多值累加器开关 + lazy val accMultiCounterEnable = PropUtils.getBoolean(this.FIRE_ACC_MULTI_COUNTER_ENABLE, true) + // 多时间维度累加器开关 + lazy val accMultiTimerEnable = PropUtils.getBoolean(this.FIRE_ACC_MULTI_TIMER_ENABLE, true) + // fire框架埋点日志开关 + lazy val logEnable = PropUtils.getBoolean(this.FIRE_LOG_ENABLE, true) + // 用于限定fire框架中sql日志的字符串长度 + lazy val logSqlLength = PropUtils.getInt(this.FIRE_LOG_SQL_LENGTH, 50) + // 配置中心生产环境注册地址 + lazy val configCenterProdAddress = PropUtils.getString(this.FIRE_CONFIG_CENTER_REGISTER_CONF_PROD_ADDRESS, "") + // 配置中心测试环境注册地址 + lazy val configCenterTestAddress = PropUtils.getString(this.FIRE_CONFIG_CENTER_REGISTER_CONF_TEST_ADDRESS) + + + // fire框架rest接口服务最大线程数 + lazy val restfulMaxThread = PropUtils.getInt(this.FIRE_RESTFUL_MAX_THREAD, 8) + // 用于配置是否抛弃配置中心独立运行 + lazy val configCenterEnable = PropUtils.getBoolean(this.FIRE_CONFIG_CENTER_ENABLE, true) + // 本地运行环境下(Windows、Mac)是否调用配置中心接口获取配置信息 + lazy val configCenterLocalEnable = PropUtils.getBoolean(this.FIRE_CONFIG_CENTER_LOCAL_ENABLE, false) + // 配置中心接口调用秘钥 + lazy val configCenterSecret = PropUtils.getString(this.FIRE_CONFIG_CENTER_SECRET, "") + // fire框架restful端口冲突重试次数 + lazy val restfulPortRetryNum = PropUtils.getInt(this.FIRE_RESTFUL_PORT_RETRY_NUM, 3) + // fire框架restful端口冲突重试时间(ms) + lazy val restfulPortRetryDuration = PropUtils.getLong(this.FIRE_RESTFUL_PORT_RETRY_DURATION, 1000L) + // 用于限定日志最少保存量,防止当日志量达到maxLogSize时频繁的进行clear操作 + lazy val minLogSize = PropUtils.getInt(this.FIRE_ACC_LOG_MIN_SIZE, 500).abs + // 用于限定日志最大保存量,防止日志量过大,撑爆driver + lazy val maxLogSize = PropUtils.getInt(this.FIRE_ACC_LOG_MAX_SIZE, 1000).abs + // 用于限定运行时信息最少保存量,防止当运行时信息量达到maxEnvSize时频繁的进行clear操作 + lazy val minEnvSize = PropUtils.getInt(this.FIRE_ACC_ENV_MIN_SIZE, 100).abs + // 用于限定运行时信息最大保存量,防止过大撑爆driver + lazy val maxEnvSize = PropUtils.getInt(this.FIRE_ACC_ENV_MAX_SIZE, 500).abs + // 用于限定最大保存量,防止数据量过大,撑爆driver + lazy val maxTimerSize = PropUtils.getInt(this.FIRE_ACC_TIMER_MAX_SIZE, 1000).abs + // 用于指定清理指定小时数之前的记录 + lazy val maxTimerHour = PropUtils.getInt(this.FIRE_ACC_TIMER_MAX_HOUR, 12).abs +} \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/conf/FireHDFSConf.scala b/fire-common/src/main/scala/com/zto/fire/common/conf/FireHDFSConf.scala new file mode 100644 index 0000000..8fb16b6 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/conf/FireHDFSConf.scala @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.conf + +import com.zto.fire.common.util.PropUtils + +/** + * HDFS配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 15:07 + */ +private[fire] object FireHDFSConf { + // 是否启用高可用 + lazy val HDFS_HA = "hdfs.ha.enable" + lazy val HDFS_HA_PREFIX = "hdfs.ha.conf." + + + // 配置是否启用hdfs HA + lazy val hdfsHAEnable = PropUtils.getBoolean(this.HDFS_HA, true) + + /** + * 读取HDFS高可用相关配置信息 + */ + def hdfsHAConf: Map[String, String] = { + if (FireHDFSConf.hdfsHAEnable) { + PropUtils.sliceKeys(s"${this.HDFS_HA_PREFIX}${FireHiveConf.hiveCluster}.") + } else Map.empty + } +} \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/conf/FireHiveConf.scala b/fire-common/src/main/scala/com/zto/fire/common/conf/FireHiveConf.scala new file mode 100644 index 0000000..37f1ba7 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/conf/FireHiveConf.scala @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.conf + +import com.zto.fire.common.util.PropUtils + +/** + * hive相关配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 15:02 + */ +private[fire] object FireHiveConf { + lazy val HIVE_CLUSTER = "hive.cluster" + // hive版本号 + lazy val HIVE_VERSION = "hive.version" + // hive的catalog名称 + lazy val HIVE_CATALOG_NAME = "hive.catalog.name" + lazy val HIVE_CLUSTER_MAP_PREFIX = "fire.hive.cluster.map." + lazy val HIVE_SITE_PATH_MAP_PREFIX = "fire.hive.site.path.map." + lazy val HIVE_CONF_PREFIX = "hive.conf." + // 默认的库名 + lazy val DEFAULT_DATABASE_NAME = "fire.hive.default.database.name" + // 默认的数据库名称 + lazy val dbName = "tmp" + // 默认的分区名称 + lazy val DEFAULT_TABLE_PARTITION_NAME = "fire.hive.table.default.partition.name" + // 默认的partition名称 + lazy val defaultPartitionName = "ds" + + // hive集群标识(batch/streaming/test) + lazy val hiveCluster = PropUtils.getString(this.HIVE_CLUSTER, "") + // 初始化hive集群名称与metastore映射 + private lazy val hiveMetastoreMap = PropUtils.sliceKeys(this.HIVE_CLUSTER_MAP_PREFIX) + // hive-site.xml存放路径映射 + private lazy val hiveSiteMap = PropUtils.sliceKeys(this.HIVE_SITE_PATH_MAP_PREFIX) + // hive版本号 + lazy val hiveVersion = PropUtils.getString(this.HIVE_VERSION, "1.1.0") + // hive catalog名称 + lazy val hiveCatalogName = PropUtils.getString(this.HIVE_CATALOG_NAME, "hive") + // hive的set配置,如:this.spark.sql("set hive.exec.dynamic.partition=true") + lazy val hiveConfMap = PropUtils.sliceKeys(this.HIVE_CONF_PREFIX) + lazy val defaultDB = PropUtils.getString(this.DEFAULT_DATABASE_NAME, this.dbName) + lazy val partitionName = PropUtils.getString(this.DEFAULT_TABLE_PARTITION_NAME, this.defaultPartitionName) + + /** + * 根据hive集群名称获取metastore地址 + */ + def getMetastoreUrl: String = { + this.hiveMetastoreMap.getOrElse(hiveCluster, hiveCluster) + } + + /** + * 获取hive-site.xml的存放路径 + * + * @return + * /path/to/hive-site.xml + */ + def getHiveConfDir: String = { + this.hiveSiteMap.getOrElse(hiveCluster, hiveCluster) + } +} \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/conf/FireKafkaConf.scala b/fire-common/src/main/scala/com/zto/fire/common/conf/FireKafkaConf.scala new file mode 100644 index 0000000..9413f8b --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/conf/FireKafkaConf.scala @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.conf + +import com.zto.fire.common.util.{PropUtils, StringsUtils} +import org.apache.commons.lang3.StringUtils + +/** + * kafka相关配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 14:58 + */ +private[fire] object FireKafkaConf { + lazy val offsetLargest = "latest" + lazy val offsetSmallest = "earliest" + lazy val offsetNone = "none" + lazy val clusterMapConfStart = "fire.kafka.cluster.map." + lazy val kafkaConfStart = "kafka.conf." + lazy val KAFKA_BROKERS_NAME = "kafka.brokers.name" + // kafka的topic列表,以逗号分隔 + lazy val KAFKA_TOPICS = "kafka.topics" + // group.id + lazy val KAFKA_GROUP_ID = "kafka.group.id" + // kafka起始消费位点 + lazy val KAFKA_STARTING_OFFSET = "kafka.starting.offsets" + // kafka结束消费位点 + lazy val KAFKA_ENDING_OFFSET = "kafka.ending.offsets" + // 是否自动维护offset + lazy val KAFKA_ENABLE_AUTO_COMMIT = "kafka.enable.auto.commit" + // 丢失数据是否失败 + lazy val KAFKA_FAIL_ON_DATA_LOSS = "kafka.failOnDataLoss" + // kafka session超时时间 + lazy val KAFKA_SESSION_TIMEOUT_MS = "kafka.session.timeout.ms" + // kafka request超时时间 + lazy val KAFKA_REQUEST_TIMEOUT_MS = "kafka.request.timeout.ms" + lazy val KAFKA_MAX_POLL_INTERVAL_MS = "kafka.max.poll.interval.ms" + lazy val KAFKA_COMMIT_OFFSETS_ON_CHECKPOINTS = "kafka.CommitOffsetsOnCheckpoints" + lazy val KAFKA_START_FROM_TIMESTAMP = "kafka.StartFromTimestamp" + lazy val KAFKA_START_FROM_GROUP_OFFSETS = "kafka.StartFromGroupOffsets" + + // 初始化kafka集群名称与地址映射 + private[fire] lazy val kafkaMap = PropUtils.sliceKeys(clusterMapConfStart) + + // kafka消费起始位点 + def kafkaStartingOffset(keyNum: Int = 1): String = PropUtils.getString(this.KAFKA_STARTING_OFFSET, "", keyNum) + + // kafka消费结束位点 + def kafkaEndingOffsets(keyNum: Int = 1): String = PropUtils.getString(this.KAFKA_ENDING_OFFSET, "", keyNum) + + // 丢失数据时是否失败 + def kafkaFailOnDataLoss(keyNum: Int = 1): Boolean = PropUtils.getBoolean(this.KAFKA_FAIL_ON_DATA_LOSS, true, keyNum) + + // enable.auto.commit + def kafkaEnableAutoCommit(keyNum: Int = 1): Boolean = PropUtils.getBoolean(this.KAFKA_ENABLE_AUTO_COMMIT, false, keyNum) + + // 获取topic列表 + def kafkaTopics(keyNum: Int = 1): String = PropUtils.getString(this.KAFKA_TOPICS, "", keyNum) + + // kafka session超时时间,默认5分钟 + def kafkaSessionTimeOut(keyNum: Int = 1): java.lang.Integer = PropUtils.getInt(this.KAFKA_SESSION_TIMEOUT_MS, 300000, keyNum) + + // kafka request超时时间,默认10分钟 + def kafkaPollInterval(keyNum: Int = 1): java.lang.Integer = PropUtils.getInt(this.KAFKA_MAX_POLL_INTERVAL_MS, 600000, keyNum) + + // kafka request超时时间 + def kafkaRequestTimeOut(keyNum: Int = 1): java.lang.Integer = PropUtils.getInt(this.KAFKA_REQUEST_TIMEOUT_MS, 400000, keyNum) + + // 配置文件中的groupId + def kafkaGroupId(keyNum: Int = 1): String = PropUtils.getString(this.KAFKA_GROUP_ID, "", keyNum) + + // 是否在checkpoint时记录offset值 + def kafkaCommitOnCheckpoint(keyNum: Int = 1): Boolean = PropUtils.getBoolean(this.KAFKA_COMMIT_OFFSETS_ON_CHECKPOINTS, true, keyNum) + + // 设置从指定时间戳位置开始消费kafka + def kafkaStartFromTimeStamp(keyNum: Int = 1): java.lang.Long = PropUtils.getLong(this.KAFKA_START_FROM_TIMESTAMP, 0L, keyNum) + + // 从topic中指定的group上次消费的位置开始消费,必须配置group.id参数 + def kafkaStartFromGroupOffsets(keyNum: Int = 1): Boolean = PropUtils.getBoolean(this.KAFKA_START_FROM_GROUP_OFFSETS, false, keyNum) + + // kafka-client配置信息 + def kafkaConfMap(keyNum: Int = 1): collection.immutable.Map[String, String] = PropUtils.sliceKeysByNum(kafkaConfStart, keyNum) + + def kafkaConfMapWithType(keyNum: Int = 1): collection.immutable.Map[String, Object] = { + val map = new collection.mutable.HashMap[String, Object]() + this.kafkaConfMap(keyNum).foreach(kv => { + map.put(kv._1, StringsUtils.parseString(kv._2)) + }) + map.toMap + } + + /** + * 根据名称获取kafka broker地址 + */ + def kafkaBrokers(keyNum: Int = 1): String = { + val brokerName = PropUtils.getString(this.KAFKA_BROKERS_NAME, "", keyNum) + this.kafkaMap.getOrElse(brokerName, brokerName) + } +} \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/conf/FireKuduConf.scala b/fire-common/src/main/scala/com/zto/fire/common/conf/FireKuduConf.scala new file mode 100644 index 0000000..7402eed --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/conf/FireKuduConf.scala @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.conf + +import com.zto.fire.common.util.PropUtils + +/** + * kudu & impala相关配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 14:59 + */ +private[fire] object FireKuduConf { + lazy val KUDU_MASTER_URL = "kudu.master" + lazy val IMPALA_CONNECTION_URL_KEY = "impala.connection.url" + lazy val IMPALA_JDBC_DRIVER_NAME_KEY = "impala.jdbc.driver.class.name" + lazy val IMPALA_DAEMONS_URL = "impala.daemons.url" + + lazy val kuduMaster = PropUtils.getString(this.KUDU_MASTER_URL) + lazy val impalaConnectionUrl: String = PropUtils.getString(this.IMPALA_CONNECTION_URL_KEY) + lazy val impalaJdbcDriverName: String = PropUtils.getString(this.IMPALA_JDBC_DRIVER_NAME_KEY) + lazy val impalaDaemons: String = PropUtils.getString(this.IMPALA_DAEMONS_URL, "") +} \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/conf/FirePS1Conf.scala b/fire-common/src/main/scala/com/zto/fire/common/conf/FirePS1Conf.scala new file mode 100644 index 0000000..e90a00b --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/conf/FirePS1Conf.scala @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.conf + +/** + * 颜色预定义 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 15:01 + */ +private[fire] object FirePS1Conf { + // 颜色相关 + lazy val GREEN = "\u001B[32m" + lazy val DEFAULT = "\u001B[0m" + lazy val RED = "\u001B[31m" + lazy val YELLOW = "\u001B[33m" + lazy val BLUE = "\u001B[34m" + lazy val PURPLE = "\u001B[35m" + lazy val PINK = "\u001B[35m" + // 字体相关 + lazy val HIGH_LIGHT = "\u001B[1m" + lazy val ITALIC = "\u001B[3m" + lazy val UNDER_LINE = "\u001B[4m" + lazy val FLICKER = "\u001B[5m" + + /** + * 包裹处理 + * + * @param str + * 原字符串 + * @param ps1 + * ps1 + * @return + * wrap后的字符串 + */ + def wrap(str: String, ps1: String*): String = { + val printStr = new StringBuilder() + ps1.foreach(ps => { + printStr.append(ps) + }) + printStr.append(str + DEFAULT).toString() + } +} \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/conf/FireRocketMQConf.scala b/fire-common/src/main/scala/com/zto/fire/common/conf/FireRocketMQConf.scala new file mode 100644 index 0000000..083291b --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/conf/FireRocketMQConf.scala @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.conf + +import com.zto.fire.common.util.PropUtils +import org.apache.commons.lang3.StringUtils + +/** + * RocketMQ相关配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 14:58 + */ +private[fire] object FireRocketMQConf { + lazy val rocketOffsetLargest = "latest" + lazy val rocketOffsetSmallest = "earliest" + lazy val rocketConsumerTag = "*" + lazy val rocketClusterMapConfStart = "rocket.cluster.map." + // 初始化kafka集群名称与地址映射 + private[fire] lazy val rocketClusterMap = PropUtils.sliceKeys(rocketClusterMapConfStart) + lazy val rocketConfStart = "rocket.conf." + // rocketMQ name server + lazy val ROCKET_BROKERS_NAME = "rocket.brokers.name" + // rocketMQ topic信息,多个以逗号分隔 + lazy val ROCKET_TOPICS = "rocket.topics" + // rocketMQ groupId + val ROCKET_GROUP_ID = "rocket.group.id" + // 丢失数据是否失败 + lazy val ROCKET_FAIL_ON_DATA_LOSS = "rocket.failOnDataLoss" + lazy val ROCKET_FORCE_SPECIAL = "rocket.forceSpecial" + // 是否自动维护offset + lazy val ROCKET_ENABLE_AUTO_COMMIT = "rocket.enable.auto.commit" + // RocketMQ起始消费位点 + lazy val ROCKET_STARTING_OFFSET = "rocket.starting.offsets" + // rocketMq订阅的tag + lazy val ROCKET_CONSUMER_TAG = "rocket.consumer.tag" + // 每次拉取每个partition的消息数 + lazy val ROCKET_PULL_MAX_SPEED_PER_PARTITION = "rocket.pull.max.speed.per.partition" + lazy val ROCKET_INSTANCE_ID = "rocket.consumer.instance" + + // 用于标识消费者的名称 + def rocketInstanceId(keyNum: Int = 1): String = PropUtils.getString(this.ROCKET_INSTANCE_ID, "", keyNum) + // rocket-client配置信息 + def rocketConfMap(keyNum: Int = 1): collection.immutable.Map[String, String] = PropUtils.sliceKeysByNum(rocketConfStart, keyNum) + // 获取消费位点 + def rocketStartingOffset(keyNum: Int = 1): String = PropUtils.getString(this.ROCKET_STARTING_OFFSET, "", keyNum) + // 丢失数据时是否失败 + def rocketFailOnDataLoss(keyNum: Int = 1): Boolean = PropUtils.getBoolean(this.ROCKET_FAIL_ON_DATA_LOSS, true, keyNum) + // spark.rocket.forceSpecial + def rocketForceSpecial(keyNum: Int = 1): Boolean = PropUtils.getBoolean(this.ROCKET_FORCE_SPECIAL, false, keyNum) + // enable.auto.commit + def rocketEnableAutoCommit(keyNum: Int = 1): Boolean = PropUtils.getBoolean(this.ROCKET_ENABLE_AUTO_COMMIT, false, keyNum) + // 获取rocketMQ 订阅的tag + def rocketConsumerTag(keyNum: Int = 1): String = PropUtils.getString(this.ROCKET_CONSUMER_TAG, "", keyNum) + // 获取groupId + def rocketGroupId(keyNum: Int = 1): String = PropUtils.getString(this.ROCKET_GROUP_ID, "", keyNum) + // 获取rocket topic列表 + def rocketTopics(keyNum: Int = 1): String = PropUtils.getString(this.ROCKET_TOPICS, null, keyNum) + // 每次拉取每个partition的消息数 + def rocketPullMaxSpeedPerPartition(keyNum: Int = 1): String = PropUtils.getString(this.ROCKET_PULL_MAX_SPEED_PER_PARTITION, "", keyNum) + + // 获取rocketMQ name server 地址 + def rocketNameServer(keyNum: Int = 1): String = { + val brokerName = PropUtils.getString(this.ROCKET_BROKERS_NAME, "", keyNum) + this.rocketClusterMap.getOrElse(brokerName, brokerName) + } +} \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/ext/JavaExt.scala b/fire-common/src/main/scala/com/zto/fire/common/ext/JavaExt.scala new file mode 100644 index 0000000..e36082e --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/ext/JavaExt.scala @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.ext + +import com.zto.fire.predef._ + +/** + * Java语法扩展 + * + * @author ChengLong + * @since 2.0.0 + * @create 2021-01-04 13:50 + */ +trait JavaExt { + + + /** + * Java map API扩展 + */ + implicit class MapExt[K, V](map: JMap[K, V]) { + + /** + * map的get操作,如果map中存在则直接返回,否则会根据fun定义的逻辑进行value的初始化 + * 注:fun中定义的逻辑仅会在key对应的value不存在时被调用一次 + * + * @param key map的key + * @param fun 用于定义key对应value的初始化逻辑 + * @return map中key对应的value + */ + def mergeGet(key: K)(fun: => V): V = { + requireNonEmpty(key) + if (!map.containsKey(key)) map.put(key, fun) + map.get(key) + } + } + +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/ext/ScalaExt.scala b/fire-common/src/main/scala/com/zto/fire/common/ext/ScalaExt.scala new file mode 100644 index 0000000..93fde7f --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/ext/ScalaExt.scala @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.ext + +import java.util.regex.Pattern + +/** + * scala相关扩展 + * + * @author ChengLong + * @since 2.0.0 + * @create 2021-01-04 10:32 + */ +trait ScalaExt { + // 用于缓存转为驼峰标识的字符串与转换前的字符串的映射关系 + private[this] lazy val humpMap = collection.mutable.Map[String, String]() + + /** + * String API扩展 + */ + implicit class StringExt[K, V](str: String) { + // 用于匹配带有下划线字符串的正则 + private[this] lazy val humpPattern = Pattern.compile("(.*)_(\\w)(.*)") + private[this] lazy val maxHumpMapSize = 10000 + + /** + * 数据表字段名转换为驼峰式名字的实体类属性名 + * + * @return 转换后的驼峰式命名 + */ + def toHump: String = { + val matcher = humpPattern.matcher(str) + val humpStr = if (matcher.find) { + (matcher.group(1) + matcher.group(2).toUpperCase + matcher.group(3)).toHump + } else str + if (humpMap.size <= this.maxHumpMapSize) humpMap += (humpStr -> str) + humpStr + } + + /** + * 驼峰式的实体类属性名转换为数据表字段名 + * + * @return 转换后的以"_"分隔的数据表字段名 + */ + def unHump: String = humpMap.getOrElse(str, str.replaceAll("[A-Z]", "_$0").toLowerCase) + } + +} + diff --git a/fire-common/src/main/scala/com/zto/fire/common/package.scala b/fire-common/src/main/scala/com/zto/fire/common/package.scala new file mode 100644 index 0000000..0a5064c --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/package.scala @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire + +import com.zto.fire.common.util.Tools + +/** + * 预定义通用常用的api + * + * @author ChengLong 2020-12-8 15:15:00 + */ +package object predef extends Tools diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/ConfigurationCenterManager.scala b/fire-common/src/main/scala/com/zto/fire/common/util/ConfigurationCenterManager.scala new file mode 100644 index 0000000..1a42490 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/ConfigurationCenterManager.scala @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.predef._ +import org.apache.commons.lang3.StringUtils +import org.slf4j.LoggerFactory + +/** + * 配置中心管理器,用于读取配置中心中的配置信息 + * + * @author ChengLong + * @since 2.0.0 + * @create 2021-03-12 13:35 + */ +private[fire] object ConfigurationCenterManager extends Serializable { + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 构建配置中心请求参数 + * + * @param className + * 当前任务主类名 + */ + private[this] def buildRequestParam(className: String): String = { + val rest = FireFrameworkConf.fireRestUrl + if (StringUtils.isBlank(rest)) this.logger.warn("Fire Rest Server 地址为空,将无法完成注册") + s""" + |{"className": "$className", "url": "$rest", "fireVersion": "${FireFrameworkConf.fireVersion}", "zrcKey": "${FireFrameworkConf.configCenterSecret}"} + """.stripMargin + } + + /** + * 调用外部配置中心接口获取配合信息 + */ + def invokeConfigCenter(className: String): Unit = { + if (!FireFrameworkConf.configCenterEnable || (OSUtils.isLocal && !FireFrameworkConf.configCenterLocalEnable)) return + + val param = buildRequestParam(className) + var conf = "" + try { + conf = HttpClientUtils.doPost(FireFrameworkConf.configCenterProdAddress, param) + } catch { + case e: Exception => { + this.logger.error("调用配置中心接口失败,开始尝试调用测试环境配置中心接口。", e) + try { + conf = HttpClientUtils.doPost(FireFrameworkConf.configCenterTestAddress, param) + } catch { + case e: Exception => { + this.logger.error("无法从配置中心获取到该任务的配置信息,如遇配置中心注册接口不可用,仍需紧急发布,请将配置中心中的配置复制到当前任务的配置文件中,并通过以下配置关闭获取配置中心配置的接口,并重启任务:spark.fire.config_center.enable=false", e) + throw e + } + } + } + } finally { + if (StringUtils.isNotBlank(conf)) { + this.logger.info(s"成功获取配置中心配置信息:${conf}") + val map = JSONUtils.parseObject(conf, classOf[JMap[String, Object]]) + if (map.containsKey("code") && map.get("code").asInstanceOf[Int] == 200) { + if (map.containsKey("content")) { + val contentMap = map.get("content").asInstanceOf[JMap[String, String]] + if (contentMap != null && contentMap.nonEmpty) { + PropUtils.setProperties(contentMap) + } + } + } + } + } + } +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/DatasourceManager.scala b/fire-common/src/main/scala/com/zto/fire/common/util/DatasourceManager.scala new file mode 100644 index 0000000..38fd7e1 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/DatasourceManager.scala @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import java.util +import java.util.concurrent.{ConcurrentHashMap, ScheduledExecutorService, TimeUnit} + +import com.google.common.collect.EvictingQueue +import com.zto.fire.common.conf.FireFrameworkConf._ +import com.zto.fire.common.enu.{Datasource, ThreadPoolType} +import com.zto.fire.predef._ +import org.apache.commons.lang3.StringUtils +import org.slf4j.LoggerFactory + +/** + * 用于统计当前任务使用到的数据源信息,包括MQ、DB等连接信息等 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-11-26 15:30 + */ +private[fire] class DatasourceManager { + private[this] lazy val logger = LoggerFactory.getLogger(this.getClass) + // 用于存放当前任务用到的数据源信息 + private[this] lazy val datasourceMap = new ConcurrentHashMap[Datasource, util.HashSet[DatasourceDesc]]() + // 用于收集来自不同数据源的sql语句,后续会异步进行SQL解析,考虑到分布式场景下会有很多重复的SQL执行,因此使用了线程不安全的队列即可满足需求 + private lazy val sqlQueue = EvictingQueue.create[DBSqlSource](buriedPointDatasourceMaxSize) + private[this] lazy val threadPool = ThreadUtils.createThreadPool("DatasourceManager", ThreadPoolType.SCHEDULED) + this.sqlParse() + + /** + * 用于异步解析sql中使用到的表,并放到datasourceMap中 + */ + private[this] def sqlParse(): Unit = { + if (buriedPointDatasourceEnable && threadPool != null) { + threadPool.asInstanceOf[ScheduledExecutorService].scheduleWithFixedDelay(new Runnable { + override def run(): Unit = { + val start = currentTime + if (sqlQueue != null) { + for (i <- 1 until sqlQueue.size()) { + val sqlSource = sqlQueue.poll() + if (sqlSource != null) { + val tableNames = SQLUtils.tableParse(sqlSource.sql) + if (tableNames != null && tableNames.nonEmpty) { + tableNames.filter(StringUtils.isNotBlank).foreach(tableName => { + add(Datasource.parse(sqlSource.datasource), DBDatasource(sqlSource.datasource, sqlSource.cluster, tableName, sqlSource.username, sqlSource.sink)) + }) + } + } + } + logger.debug(s"异步解析SQL埋点中的表信息,耗时:${timecost(start)}") + } + } + }, buriedPointDatasourceInitialDelay, buriedPointDatasourcePeriod, TimeUnit.SECONDS) + } + } + + /** + * 添加一个数据源描述信息 + */ + private[fire] def add(sourceType: Datasource, datasourceDesc: DatasourceDesc): Unit = { + var set = this.datasourceMap.get(sourceType) + if (set == null) { + set = new util.HashSet[DatasourceDesc]() + } + set.add(datasourceDesc) + this.datasourceMap.put(sourceType, set) + } + + /** + * 向队列中添加一条sql类型的数据源,用于后续异步解析 + */ + private[fire] def addSql(source: DBSqlSource): Unit = if (buriedPointDatasourceEnable) this.sqlQueue.offer(source) + + /** + * 获取所有使用到的数据源 + */ + private[fire] def get: util.Map[Datasource, util.HashSet[DatasourceDesc]] = this.datasourceMap +} + +/** + * 对外暴露API,用于收集并处理各种埋点信息 + */ +private[fire] object DatasourceManager { + private lazy val manager = new DatasourceManager + + /** + * 添加一条sql记录到队列中 + * + * @param datasource + * 数据源类型 + * @param cluster + * 集群信息 + * @param sink source or sink + * @param username + * 用户名 + * @param sql + * 待解析的sql语句 + */ + private[fire] def addSql(datasource: String, cluster: String, username: String, sql: String, sink: Boolean = true): Unit = { + this.manager.addSql(DBSqlSource(datasource, cluster, username, sql, sink)) + } + + /** + * 添加一条DB的埋点信息 + * + * @param datasource + * 数据源类型 + * @param cluster + * 集群信息 + * @param sink + * source or sink + * @param tableName + * 表名 + * @param username + * 连接用户名 + */ + private[fire] def addDBDatasource(datasource: String, cluster: String, tableName: String, username: String = "", sink: Boolean = true): Unit = { + this.manager.add(Datasource.parse(datasource), DBDatasource(datasource, cluster, tableName, username, sink)) + } + + /** + * 添加一条MQ的埋点信息 + * + * @param datasource + * 数据源类型 + * @param cluster + * 集群标识 + * @param sink + * product or consumer + * @param topics + * 主题列表 + * @param groupId + * 消费组标识 + */ + private[fire] def addMQDatasource(datasource: String, cluster: String, topics: String, groupId: String, sink: Boolean = false): Unit = { + this.manager.add(Datasource.parse(datasource), MQDatasource(datasource, cluster, topics, groupId, sink)) + } + + /** + * 获取所有使用到的数据源 + */ + private[fire] def get: util.Map[Datasource, util.HashSet[DatasourceDesc]] = this.manager.get +} + +/** + * 数据源描述 + */ +trait DatasourceDesc + +/** + * 面向数据库类型的数据源,带有tableName + * + * @param datasource + * 数据源类型,参考DataSource枚举 + * @param cluster + * 数据源的集群标识 + * @param sink + * true: sink false: source + * @param tableName + * 表名 + * @param username + * 使用关系型数据库时作为jdbc的用户名,HBase留空 + */ +case class DBDatasource(datasource: String, cluster: String, tableName: String, username: String = "", sink: Boolean = true) extends DatasourceDesc + +/** + * 面向数据库类型的数据源,需将SQL中的tableName主动解析 + * + * @param datasource + * 数据源类型,参考DataSource枚举 + * @param cluster + * 数据源的集群标识 + * @param sink + * true: sink false: source + * @param username + * 使用关系型数据库时作为jdbc的用户名,HBase留空 + * @param sql 执行的SQL语句 + */ +case class DBSqlSource(datasource: String, cluster: String, username: String, sql: String, sink: Boolean = true) extends DatasourceDesc + +/** + * MQ类型数据源,如:kafka、RocketMQ等 + * + * @param datasource + * 数据源类型,参考DataSource枚举 + * @param cluster + * 数据源的集群标识 + * @param sink + * true: sink false: source + * @param topics + * 使用到的topic列表 + * @param groupId + * 任务的groupId + */ +case class MQDatasource(datasource: String, cluster: String, topics: String, groupId: String, sink: Boolean = false) extends DatasourceDesc \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/DateFormatUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/DateFormatUtils.scala new file mode 100644 index 0000000..1d8a5c6 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/DateFormatUtils.scala @@ -0,0 +1,862 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import org.apache.commons.lang3.StringUtils +import org.apache.commons.lang3.time.DateUtils +import org.slf4j.{Logger, LoggerFactory} + +import java.text.SimpleDateFormat +import java.util.{Calendar, Date, TimeZone} +import scala.collection.mutable.ArrayBuffer + +/** + * 日期格式化工具类 + * Created by ChengLong on 2016-11-24. + */ +object DateFormatUtils { + lazy val yyyyMMdd = "yyyyMMdd" + lazy val yyyy_MM_dd = "yyyy-MM-dd" + lazy val yyyyMMddHH = "yyyyMMddHH" + lazy val yyyy_MM_ddHHmmss = "yyyy-MM-dd HH:mm:ss" + lazy val TRUNCATE_MIN = "yyyy-MM-dd HH:mm:00" + private val timeZoneShangHai = "Asia/Shanghai" + private lazy val logger: Logger = LoggerFactory.getLogger(this.getClass) + lazy val HOUR = "hour" + lazy val DAY = "day" + lazy val WEEK = "week" + lazy val MONTH = "month" + lazy val YEAR = "year" + lazy val MINUTE = "minute" + lazy val SECOND = "second" + lazy val enumSet = Set(HOUR, DAY, WEEK, MONTH, YEAR, MINUTE, SECOND) + + /** + * 将日期格式化为 yyyy-MM-dd HH:mm:ss + */ + def getTimeFormat(): SimpleDateFormat = { + val timeFormat: SimpleDateFormat = new SimpleDateFormat(DateFormatUtils.yyyy_MM_ddHHmmss) + timeFormat.setTimeZone(TimeZone.getTimeZone(timeZoneShangHai)) + timeFormat + } + + /** + * 给定yyyy-MM-dd HH:mm:ss 格式数据,返回yyyy-MM-dd + */ + def getDateFromDateTimeStr(dateTime: String) = { + if (StringUtils.isNotBlank(dateTime) && dateTime.length() > 10) { + dateTime.substring(0, 10) + } else { + dateTime + } + } + + /** + * 给定yyyy-MM-dd HH:mm:ss 格式数据,返回yyyyMMdd格式的时间分区 + */ + def getPartitionDate(dateTime: String): String = { + this.getDateFromDateTimeStr(dateTime).replace("-", "") + } + + /** + * 将日期格式化为 yyyy-MM-dd + */ + def getDateFormat(): SimpleDateFormat = { + this.getSchemaFormat() + } + + /** + * 将日期格式化为 yyyy-MM-dd + */ + def getSchemaFormat(schema: String = DateFormatUtils.yyyy_MM_dd): SimpleDateFormat = { + val dateFormat: SimpleDateFormat = new SimpleDateFormat(schema) + dateFormat.setTimeZone(TimeZone.getTimeZone(timeZoneShangHai)) + dateFormat + } + + /** + * 格式化Date为yyyy-MM-dd格式的字符串 + */ + def formatDate(date: Date): String = { + this.getDateFormat().format(date) + } + + /** + * 将日期格式化为 yyyy-MM-dd hh:mm:ss 格式的字符串 + */ + def formatDateTime(dateTime: Date): String = { + if (dateTime != null) this.getTimeFormat().format(dateTime) else "" + } + + /** + * 将指定时间转为指定schema的格式 + * + * @param dateTime + * 指定时间 + * @return + */ + def formatBySchema(dateTime: Date, schema: String): String = { + if (dateTime != null) this.getSchemaFormat(schema).format(dateTime) else "" + } + + + /** + * 将字符串格式化为yyyy-MM-dd的日期 + */ + def formatDate(date: String): Date = { + this.getDateFormat().parse(date) + } + + /** + * 将字符串格式化为yyyy-MM-dd hh:mm:ss的日期 + */ + def formatDateTime(dateTime: String): Date = { + this.getTimeFormat().parse(dateTime) + } + + /** + * 将当期系统时间格式化为yyyy-MM-dd 并返回字符串 + */ + def formatCurrentDate(): String = { + this.formatDate(new Date) + } + + /** + * 将当期系统时间格式化为yyyy-MM-dd hh:mm:ss并返回字符串 + */ + def formatCurrentDateTime(): String = { + this.formatDateTime(new Date) + } + + /** + * 转换当前时间为指定的时间格式 + * + * @param schema + * 指定的schema + */ + def formatCurrentBySchema(schema: String): String = { + this.formatBySchema(new Date, schema) + } + + /** + * 将指定的unix元年时间转为yyyy-MM-dd 的字符串 + */ + def formatUnixDate(date: Long): String = { + this.formatDate(new Date(date)) + } + + /** + * 将指定的unix元年时间转为yyyy-MM-dd hh:mm:ss 的字符串 + */ + def formatUnixDateTime(dateTime: Long): String = { + this.formatDateTime(new Date(dateTime)) + } + + /** + * 对日期进行格式转换 + */ + def dateSchemaFormat(dateTimeStr: String, srcSchema: String, destSchema: String): String = { + if (StringUtils.isBlank(dateTimeStr)) { + return dateTimeStr + } + val timeFormat: SimpleDateFormat = new SimpleDateFormat(srcSchema) + timeFormat.setTimeZone(TimeZone.getTimeZone(timeZoneShangHai)) + val datetime = timeFormat.parse(dateTimeStr) + timeFormat.applyPattern(destSchema) + timeFormat.format(datetime) + } + + /** + * 对日期进行格式转换 + */ + def dateSchemaFormat(dateTime: Date, srcSchema: String, destSchema: String): Date = { + val timeFormat: SimpleDateFormat = new SimpleDateFormat(srcSchema) + timeFormat.setTimeZone(TimeZone.getTimeZone(timeZoneShangHai)) + val dateTimeStr = timeFormat.format(dateTime) + timeFormat.applyPattern(destSchema) + timeFormat.parse(dateTimeStr) + } + + /** + * 判断两个日期是否为同一天 + */ + def isSameDay(day1: String, day2: String): Boolean = { + if (StringUtils.isNotBlank(day1) && StringUtils.isNotBlank(day2)) { + val format = this.getTimeFormat() + DateUtils.isSameDay(format.parse(day1), format.parse(day2)) + } else { + false + } + } + + /** + * 判断两个日期是否为同一天 + */ + def isSameDay(day1: Date, day2: Date): Boolean = { + DateUtils.isSameDay(day1, day2) + } + + /** + * 用于判断给定的时间是否和系统时间处于同一天 + */ + def isSameDay(date: String): Boolean = { + try { + DateUtils.isSameDay(new Date(), this.getTimeFormat().parse(date)) + } catch { + case e: Exception => { + logger.error("isSameDay判断失败", e) + false + } + } + } + + /** + * 判断两个日期是否为同一小时(前提是同一天) + */ + def isSameHour(day1: String, day2: String): Boolean = { + if (StringUtils.isNotBlank(day1) && StringUtils.isNotBlank(day2)) { + val format = this.getTimeFormat() + val d1 = format.parse(day1) + val d2 = format.parse(day2) + if (this.isSameDay(d1, d2)) { + d1.getHours == d2.getHours + } else { + false + } + } else { + false + } + } + + /** + * 判断两个日期是否为同一小时(前提是同一天) + */ + def isSameHour(day1: Date, day2: Date): Boolean = { + if (this.isSameDay(day1, day2)) { + day1.getHours == day2.getHours + } else { + false + } + } + + /** + * 判断两个日期是否为同一星期(必须是同年同月) + */ + def isSameWeek(day1: Date, day2: Date): Boolean = { + if (this.isSameYear(day1, day2) && this.isSameMonth(day1, day2)) { + val cal = Calendar.getInstance() + cal.setTimeZone(TimeZone.getTimeZone(timeZoneShangHai)) + cal.setTime(day1) + val week1 = cal.get(Calendar.DAY_OF_WEEK_IN_MONTH) + cal.setTime(day2) + week1 == cal.get(Calendar.DAY_OF_WEEK_IN_MONTH) + } else { + false + } + } + + /** + * 判断两个日期是否为同一星期(必须是同年同月) + */ + def isSameWeek(day1: String, day2: String): Boolean = { + if (StringUtils.isNotBlank(day1) && StringUtils.isNotBlank(day2)) { + val format = this.getTimeFormat() + val d1 = format.parse(day1) + val d2 = format.parse(day2) + this.isSameWeek(d1, d2) + } else { + false + } + } + + /** + * 判断两个日期是否为同一月份 + */ + def isSameMonth(day1: Date, day2: Date): Boolean = { + day1.getMonth == day2.getMonth + } + + /** + * 判断两个日期是否为同一月份 + */ + def isSameMonth(day1: String, day2: String): Boolean = { + val format = this.getTimeFormat() + val d1 = format.parse(day1) + val d2 = format.parse(day2) + this.isSameMonth(d1, d2) + } + + /** + * 判断两个日期是否为同一年 + */ + def isSameYear(day1: Date, day2: Date): Boolean = { + day1.getYear == day2.getYear + } + + /** + * 判断两个日期是否为同一年 + */ + def isSameYear(day1: String, day2: String): Boolean = { + val format = this.getTimeFormat() + val d1 = format.parse(day1) + val d2 = format.parse(day2) + this.isSameYear(d1, d2) + } + + /** + * day1是否大于day2 + */ + def isBig(day1: String, day2: String): Boolean = { + if (StringUtils.isNotBlank(day1) && StringUtils.isNotBlank(day2)) { + DateFormatUtils.formatDateTime(day1).after(DateFormatUtils.formatDateTime(day2)) + } else if (StringUtils.isNotBlank(day1) && StringUtils.isBlank(day2)) { + true + } else if (StringUtils.isBlank(day1) && StringUtils.isNotBlank(day2)) { + false + } else { + true + } + } + + /** + * day1是否小于day2 + */ + def isSmall(day1: String, day2: String): Boolean = { + !this.isBig(day1, day2) + } + + /** + * day 是否介于day1与day2之间 + */ + def isBetween(day: String, day1: String, day2: String) = { + this.isSmall(day, day2) && this.isBig(day, day1) + } + + /** + * 指定时间字段,对日期进行加减 + * + * @param field + * 'year'、'month'、'day'、'hour'、'minute'、'second' + * @param dateTimeStr + * 格式:yyyy-MM-dd hh:mm:ss + * @param count + * 正负数 + * @return + * 计算后的日期 + */ + def addTimer(field: String, dateTimeStr: String, count: Int): String = { + if (this.YEAR.equalsIgnoreCase(field)) { + this.addYears(dateTimeStr, count) + } else if (this.MONTH.equalsIgnoreCase(field)) { + this.addMons(dateTimeStr, count) + } else if (this.DAY.equalsIgnoreCase(field)) { + this.addDays(dateTimeStr, count) + } else if (this.HOUR.equalsIgnoreCase(field)) { + this.addHours(dateTimeStr, count) + } else if (this.MINUTE.equalsIgnoreCase(field)) { + this.addMins(dateTimeStr, count) + } else if (this.SECOND.equalsIgnoreCase(field)) { + this.addSecs(dateTimeStr, count) + } else { + "" + } + } + + /** + * 对指定的时间字段进行年度加减 + */ + def addYears(dateTimeStr: String, years: Int): String = { + if (StringUtils.isNotBlank(dateTimeStr) && !"null".equals(dateTimeStr) && !"NULL".equals(dateTimeStr)) { + val datetime = DateFormatUtils.formatDateTime(dateTimeStr) + DateFormatUtils.formatDateTime(DateUtils.addYears(datetime, years)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行年度加减 + */ + def addYears(dateTime: Date, years: Int): String = { + if (dateTime != null) { + DateFormatUtils.formatDateTime(DateUtils.addYears(dateTime, years)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行月份加减 + */ + def addMons(dateTimeStr: String, mons: Int): String = { + if (StringUtils.isNotBlank(dateTimeStr) && !"null".equals(dateTimeStr) && !"NULL".equals(dateTimeStr)) { + val datetime = DateFormatUtils.formatDateTime(dateTimeStr) + DateFormatUtils.formatDateTime(DateUtils.addMonths(datetime, mons)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行月份加减 + */ + def addMons(dateTime: Date, mons: Int): String = { + if (dateTime != null) { + DateFormatUtils.formatDateTime(DateUtils.addMonths(dateTime, mons)) + } else { + "" + } + } + + /** + * 对指定日期增加天 + */ + def addDays(dateTimeStr: String, days: Int): String = { + if (StringUtils.isNotBlank(dateTimeStr) && !"null".equals(dateTimeStr) && !"NULL".equals(dateTimeStr)) { + val datetime = DateFormatUtils.formatDateTime(dateTimeStr) + DateFormatUtils.formatDateTime(DateUtils.addDays(datetime, days)) + } else { + "" + } + } + + /** + * 对指定日期增加天 + */ + def addDays(dateTime: Date, days: Int): String = { + if (dateTime != null) { + DateFormatUtils.formatDateTime(DateUtils.addDays(dateTime, days)) + } else { + "" + } + } + + /** + * 对指定日期增加天,并以指定的格式返回 + */ + def addPartitionDays(dateTime: Date, days: Int, schema: String = "yyyyMMdd"): String = { + if (dateTime != null) { + DateFormatUtils.formatBySchema(DateUtils.addDays(dateTime, days), schema) + } else { + "" + } + } + + /** + * 对指定的时间字段进行天加减 + */ + def addWeeks(dateTimeStr: String, weeks: Int): String = { + if (StringUtils.isNotBlank(dateTimeStr) && !"null".equals(dateTimeStr) && !"NULL".equals(dateTimeStr)) { + val datetime = DateFormatUtils.formatDateTime(dateTimeStr) + DateFormatUtils.formatDateTime(DateUtils.addWeeks(datetime, weeks)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行天加减 + */ + def addWeeks(dateTime: Date, weeks: Int): String = { + if (dateTime != null) { + DateFormatUtils.formatDateTime(DateUtils.addWeeks(dateTime, weeks)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行小时加减 + */ + def addHours(dateTimeStr: String, hours: Int): String = { + if (StringUtils.isNotBlank(dateTimeStr) && !"null".equals(dateTimeStr) && !"NULL".equals(dateTimeStr)) { + val datetime = DateFormatUtils.formatDateTime(dateTimeStr) + DateFormatUtils.formatDateTime(DateUtils.addHours(datetime, hours)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行小时加减 + */ + def addHours(dateTime: Date, hours: Int): String = { + if (dateTime != null) { + DateFormatUtils.formatDateTime(DateUtils.addHours(dateTime, hours)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行分钟加减 + */ + def addMins(dateTimeStr: String, minutes: Int): String = { + if (StringUtils.isNotBlank(dateTimeStr) && !"null".equals(dateTimeStr) && !"NULL".equals(dateTimeStr)) { + val datetime = DateFormatUtils.formatDateTime(dateTimeStr) + DateFormatUtils.formatDateTime(DateUtils.addMinutes(datetime, minutes)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行分钟加减 + */ + def addMins(dateTime: Date, minutes: Int): String = { + if (dateTime != null) { + DateFormatUtils.formatDateTime(DateUtils.addMinutes(dateTime, minutes)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行秒钟加减 + */ + def addSecs(dateTimeStr: String, seconds: Int): String = { + if (StringUtils.isNotBlank(dateTimeStr) && !"null".equals(dateTimeStr) && !"NULL".equals(dateTimeStr)) { + val datetime = DateFormatUtils.formatDateTime(dateTimeStr) + DateFormatUtils.formatDateTime(DateUtils.addSeconds(datetime, seconds)) + } else { + "" + } + } + + /** + * 对指定的时间字段进行秒钟加减 + */ + def addSecs(dateTime: Date, seconds: Int): String = { + if (dateTime != null) { + DateFormatUtils.formatDateTime(DateUtils.addSeconds(dateTime, seconds)) + } else { + "" + } + } + + /** + * 获取day1到day2之间的所有日期 + * + * @param prefix + * 指定拼接前缀 + */ + def getBetweenDate(prefix: String, day1: String, day2: String): Array[String] = { + val dates = ArrayBuffer[String]() + var nextDay = this.addDays(day1, 1) + if (this.isBetween(nextDay, day1, day2)) { + dates += s"$prefix >= to_date('$day1','yyyy-mm-dd hh24:mi:ss') and $prefix < to_date('$nextDay','yyyy-mm-dd hh24:mi:ss')" + } + while (this.isBetween(nextDay, day1, day2)) { + var tmpDay = "" + tmpDay = this.addDays(nextDay, 1) + dates += s"$prefix >= to_date('$nextDay','yyyy-mm-dd hh24:mi:ss') and $prefix < to_date('$tmpDay','yyyy-mm-dd hh24:mi:ss')" + nextDay = tmpDay + } + dates.toArray + } + + /** + * 计算date1与date2之间相差的小时数 + * @return + * 相差的小时数 + */ + def betweenHours(date1: Date, date2: Date): Double = { + (date1.getTime - date2.getTime) / 3600000.0 + } + + /** + * 将yyyy-MM-dd hh:mm:ss类型日期truncate为月初零点 + */ + def truncateMonth(dateTime: Date): String = { + val cal = Calendar.getInstance() + if (dateTime != null) cal.setTime(dateTime) + val year = cal.get(Calendar.YEAR) + val month = cal.get(Calendar.MONTH) + 1 + if (month < 10) + year + "-0" + month + "-01 00:00:00" + else + year + "-" + month + "-01 00:00:00" + } + + /** + * 取年月日 + */ + def getyyyyMMdd(dataTime: String): String = { + if (StringUtils.isNotBlank(dataTime) && dataTime.length >= 10) { + dataTime.substring(0, 10) + } else { + dataTime + } + } + + /** + * 取年月日 + */ + def getyyyyMM(dataTime: String): String = { + if (StringUtils.isNotBlank(dataTime) && dataTime.length >= 7) { + dataTime.substring(0, 7) + } else { + dataTime + } + } + + /** + * 取年月日 + */ + def getyyyy(dataTime: String): String = { + if (StringUtils.isNotBlank(dataTime) && dataTime.length >= 4) { + dataTime.substring(0, 4) + } else { + dataTime + } + } + + /** + * 获取指定日期的月初时间,如为空则返回系统当前时间对应的月初 + */ + def truncateMonthStr(dateTime: String): String = { + var dateTimeStr = dateTime + if (StringUtils.isBlank(dateTimeStr)) { + dateTimeStr = this.getTimeFormat().format(new Date) + } + this.truncateMonth(this.formatDate(dateTimeStr)) + } + + /** + * 根据指定的时间和格式,将时间格式化为hive分区格式 + */ + def getPartitionTime(dateTime: String = this.formatCurrentDateTime(), schema: String = DateFormatUtils.yyyyMMdd): String = { + this.dateSchemaFormat(dateTime, DateFormatUtils.yyyy_MM_ddHHmmss, schema) + } + + /** + * 将当前系统时间格式化为指定的格式作为分区 + */ + def getCurrentPartitionTime(schema: String = DateFormatUtils.yyyyMMdd): String = { + getPartitionTime(this.formatCurrentDateTime(), schema) + } + + /** + * 获取两个时间间隔的毫秒数 + */ + def interval(before: Date, after: Date): Long = { + after.getTime - before.getTime + } + + /** + * 获取两个时间间隔的毫秒数 + */ + def interval(before: String, after: String): Long = { + this.formatDateTime(after).getTime - this.formatDateTime(before).getTime + } + + /** + * 将yyyy-MM-dd hh:mm:ss类型日期truncate为整点分钟 + */ + def truncateMinute(dateTime: String): String = { + val date = this.formatDateTime(dateTime) + val prefix = this.dateSchemaFormat(dateTime, DateFormatUtils.yyyy_MM_ddHHmmss, "yyyy-MM-dd HH") + val minute = date.getMinutes + if (minute >= 0 && minute < 10) { + s"$prefix:00" + } else if (minute >= 10 && minute < 20) { + s"$prefix:10" + } else if (minute >= 20 && minute < 30) { + s"$prefix:20" + } else if (minute >= 30 && minute < 40) { + s"$prefix:30" + } else if (minute >= 40 && minute < 50) { + s"$prefix:40" + } else { + s"$prefix:50" + } + } + + /** + * 将yyyy-MM-dd hh:mm:ss类型日期truncate为整点分钟 + */ + def truncateMinute(dateTime: Date): String = { + this.truncateMinute(this.formatDateTime(dateTime)) + } + + /** + * 获取整点小时 + */ + def truncateHour(dateStr: String): String = { + this.dateSchemaFormat(dateStr, DateFormatUtils.yyyy_MM_ddHHmmss, DateFormatUtils.yyyyMMddHH) + } + + /** + * 截取指定时间指定的位数 + * + * @param date + * 日期 + * @param cron + * 切分的范围 + * @param replace + * 是否替换掉日期字符串中的特殊字符 + * @return + */ + def truncate(date: String, cron: String = this.DAY, replace: Boolean = true): String = { + if (StringUtils.isBlank(date) || StringUtils.isBlank(cron) || date.length != 19) { + throw new IllegalArgumentException("日期不能为空,格式为yyyy-MM-dd HH:mm:ss") + } + if (!this.enumSet.contains(cron)) { + throw new IllegalArgumentException("where参数必须是hour/day/week/month/year中的一个") + } + val index: Int = if (this.HOUR.equals(cron)) { + 13 + } else if (this.DAY.equals(cron)) { + 10 + } else if (this.MONTH.equals(cron)) { + 7 + } else if (this.MINUTE.equals(cron)) { + 15 + } else { + 4 + } + if (replace) date.substring(0, index).replace("-", "").replace(":", "").replace(" ", "") else date.substring(0, index) + } + + /** + * 截取指定时间指定的位数 + * + * @param date + * 日期 + * @param cron + * 切分的范围 + * @param replace + * 是否替换掉日期字符串中的特殊字符 + * @return + */ + def truncate(date: Date, cron: String, replace: Boolean): String = { + this.truncate(this.formatDateTime(date), cron, replace) + } + + /** + * 截取系统时间指定的位数 + * + * @param cron + * 切分的范围 + * @param replace + * 是否替换掉日期字符串中的特殊字符 + * @return + */ + def truncate(cron: String, replace: Boolean): String = { + this.truncate(this.formatCurrentDateTime(), cron, replace) + } + + /** + * 判断给定的时间的秒位的个位是否为0秒,如00/10/20/30/40/60/60 + */ + def isSecondDivisibleZero(date: Date = new Date): Boolean = { + val cal = Calendar.getInstance() + cal.setTimeZone(TimeZone.getTimeZone(timeZoneShangHai)) + cal.setTime(date) + cal.get(Calendar.SECOND) % 10 == 0 + } + + /** + * 判断给定的时间的秒位的个位是否为0秒,如00/10/20/30/40/60/60 + */ + def isSecondDivisibleZero(dateTime: String): Boolean = { + this.isSecondDivisibleZero(this.formatDateTime(dateTime)) + } + + /** + * 判断给定的时间的秒位是否为00秒 + */ + def isZeroSecond(date: Date = new Date): Boolean = { + val cal = Calendar.getInstance() + cal.setTimeZone(TimeZone.getTimeZone(timeZoneShangHai)) + cal.setTime(date) + cal.get(Calendar.SECOND) == 0 + } + + /** + * 判断给定的时间的秒位是否为00秒 + */ + def isZeroSecond(dateTime: String): Boolean = { + this.isZeroSecond(this.formatDateTime(dateTime)) + } + + /** + * 判断给定的时间的分钟位是否为00分 + */ + def isZeroMinute(date: Date = new Date): Boolean = { + if (this.isZeroSecond(date)) { + val cal = Calendar.getInstance() + cal.setTimeZone(TimeZone.getTimeZone(timeZoneShangHai)) + cal.setTime(date) + cal.get(Calendar.MINUTE) == 0 + } else { + false + } + } + + /** + * 判断给定的时间的分钟位是否为00分 + */ + def isZeroMinute(dateTime: String): Boolean = { + this.isZeroMinute(this.formatDateTime(dateTime)) + } + + /** + * 判断给定的时间的小时位是否为00时 + */ + def isZeroHour(date: Date = new Date): Boolean = { + if (this.isZeroMinute(date)) { + val cal = Calendar.getInstance() + cal.setTimeZone(TimeZone.getTimeZone(timeZoneShangHai)) + cal.setTime(date) + cal.get(Calendar.HOUR_OF_DAY) == 0 + } else { + false + } + } + + /** + * 判断给定的时间的小时位是否为00时 + */ + def isZeroHour(dateTime: String): Boolean = { + this.isZeroHour(this.formatDateTime(dateTime)) + } + + /** + * 获取系统当前时间,精确到秒 + */ + def currentTime: Long = { + System.currentTimeMillis() / 1000 + } + + /** + * 计算运行时长 + */ + def runTime(startTime: Long): String = { + val currentTime = this.currentTime + val apartTime = currentTime - startTime + val hours = apartTime / 3600 + val hoursStr = if (hours < 10) s"0${hours}" else s"${hours}" + val minutes = apartTime / 60 - hours * 60 + val minutesStr = if (minutes < 10) s"0${minutes}" else s"${minutes}" + val seconds = apartTime - minutes * 60 - hours * 60 * 60 + val secondsStr = if (seconds < 10) s"0${seconds}" else s"${seconds}" + + s"${hoursStr}时 ${minutesStr}分 ${secondsStr}秒" + } +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/ExceptionBus.scala b/fire-common/src/main/scala/com/zto/fire/common/util/ExceptionBus.scala new file mode 100644 index 0000000..35a441b --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/ExceptionBus.scala @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong} + +import com.google.common.collect.EvictingQueue +import com.zto.fire.common.anno.Internal +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.predef._ +import org.slf4j.{Logger, LoggerFactory} + + +/** + * Fire框架异常总线,用于收集各引擎执行task过程中发生的异常信息 + * + * @author ChengLong + * @since 1.1.2 + * @create 2020-11-16 09:33 + */ +object ExceptionBus { + private[this] lazy val logger = LoggerFactory.getLogger(this.getClass) + // 用于保存收集而来的异常对象 + @transient + private[this] lazy val queue = EvictingQueue.create[(String, Throwable)](FireFrameworkConf.exceptionBusSize) + // 队列大小,对比queue.size有性能优势 + private[fire] lazy val queueSize = new AtomicInteger(0) + // 异常总数计数器 + private[fire] lazy val exceptionCount = new AtomicLong(0) + + /** + * 向异常总线中投递异常对象 + */ + def post(t: Throwable): Boolean = this.synchronized { + exceptionCount.incrementAndGet() + this.queue.offer((DateFormatUtils.formatCurrentDateTime(), t)) + } + + /** + * 获取并清空queue + * + * @return 异常集合 + */ + @Internal + private[fire] def getAndClear: (List[(String, Throwable)], Long) = this.synchronized { + val list = this.queue.toList + this.queue.clear() + queueSize.set(0) + this.logger.warn(s"成功收集异常总线中的异常对象共计:${list.size}条,异常总线将会被清空.") + (list, this.exceptionCount.get()) + } + + /** + * 工具方法,用于打印异常信息 + */ + @Internal + private[fire] def offAndLogError(logger: Logger, msg: String, t: Throwable): Unit = { + this.post(t) + if (noEmpty(msg)) { + if (logger != null) logger.error(msg, t) else t.printStackTrace() + } + } + + /** + * 获取Throwable的堆栈信息 + */ + def stackTrace(t: Throwable): String = { + if (t == null) return "" + val stackTraceInfo = new StringBuilder() + stackTraceInfo.append(t.toString + "\n") + t.getStackTrace.foreach(trace => stackTraceInfo.append("\tat " + trace + "\n")) + stackTraceInfo.toString + } + +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/FireFunctions.scala b/fire-common/src/main/scala/com/zto/fire/common/util/FireFunctions.scala new file mode 100644 index 0000000..254270e --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/FireFunctions.scala @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import com.zto.fire.common.conf.FirePS1Conf +import com.zto.fire.common.util.UnitFormatUtils.{TimeUnitEnum, readable} +import org.apache.commons.lang3.StringUtils +import org.slf4j.{Logger, LoggerFactory} + +import scala.util.Try + +/** + * 常用的函数库 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-12-16 15:45 + */ +trait FireFunctions extends Serializable { + private lazy val logger = LoggerFactory.getLogger(this.getClass) + private[this] lazy val tryLog = "" + private[this] lazy val catchLog = "执行try的过程中发生异常" + private[this] lazy val finallyCatchLog = "执行finally过程中发生异常" + + /** + * 重试指定的函数fn retryNum次 + * 当fn执行失败时,会根据设置的重试次数自动重试retryNum次 + * 每次重试间隔等待duration(毫秒) + * + * @param retryNum + * 指定重试的次数 + * @param duration + * 重试的间隔时间(ms) + * @param fun + * 重试的函数或方法 + * @tparam T + * fn执行后返回的数据类型 + * @return + * 返回fn执行结果 + */ + def retry[T](retryNum: Long = 3, duration: Long = 3000)(fun: => T): T = { + var count = 1L + + def redo[T](retryNum: Long, duration: Long)(fun: => T): T = { + Try { + fun + } match { + case util.Success(x) => x + case _ if retryNum > 1 => { + Thread.sleep(duration) + count += 1 + logger.info(s"${FirePS1Conf.RED}第${count}次执行. 时间:${DateFormatUtils.formatCurrentDateTime()}. 间隔:${duration}.${FirePS1Conf.DEFAULT}") + redo(retryNum - 1, duration)(fun) + } + case util.Failure(e) => throw e + } + } + + redo(retryNum, duration)(fun) + } + + /** + * 尝试执行block中的逻辑,如果出现异常,则记录日志 + * + * @param block + * try的具体逻辑 + * @param logger + * 日志记录器 + * @param catchLog + * 日志内容 + */ + def tryWithLog(block: => Unit)(logger: Logger = this.logger, tryLog: String = tryLog, catchLog: String = catchLog, isThrow: Boolean = true): Unit = { + try { + timecost(tryLog, logger)(block) + } catch { + case t: Throwable => { + ExceptionBus.offAndLogError(logger, catchLog, t) + if (isThrow) throw t + } + } + } + + /** + * 尝试执行block中的逻辑,如果出现异常,则记录日志,并将执行结果返回 + * + * @param block + * try的具体逻辑 + * @param logger + * 日志记录器 + * @param catchLog + * 日志内容 + */ + def tryWithReturn[T](block: => T)(logger: Logger = this.logger, tryLog: String = tryLog, catchLog: String = catchLog): T = { + try { + timecost[T](tryLog, logger)(block) + } catch { + case t: Throwable => { + ExceptionBus.offAndLogError(logger, catchLog, t) + throw t + } + } + } + + /** + * 执行用户指定的try/catch/finally逻辑 + * + * @param block + * try 代码块 + * @param finallyBlock + * finally 代码块 + * @param logger + * 日志记录器 + * @param catchLog + * 当执行try过程中发生异常时打印的日志内容 + * @param finallyCatchLog + * 当执行finally代码块过程中发生异常时打印的日志内容 + */ + def tryWithFinally[T](block: => T)(finallyBlock: => Unit)(logger: Logger = this.logger, tryLog: String = tryLog, catchLog: String = catchLog, finallyCatchLog: String = finallyCatchLog): T = { + try { + timecost[T](tryLog, logger)(block) + } catch { + case t: Throwable => + ExceptionBus.offAndLogError(logger, catchLog, t) + throw t + } finally { + try { + finallyBlock + } catch { + case t: Throwable => { + ExceptionBus.offAndLogError(logger, catchLog, t) + throw t + } + } + } + } + + /** + * 获取当前系统时间(ms) + */ + def currentTime: Long = System.currentTimeMillis + + /** + * 以人类可读的方式计算耗时(ms) + * + * @param beginTime + * 开始时间 + * @return + * 耗时 + */ + def timecost(beginTime: Long): String = readable(currentTime - beginTime, TimeUnitEnum.MS) + + /** + * 用于统计指定代码块执行的耗时时间 + * + * @param msg + * 用于描述当前代码块的用户 + * @param logger + * 日志记录器 + * @param block + * try的具体逻辑 + */ + def timecost[T](msg: String, logger: Logger = this.logger)(block: => T): T = { + val startTime = this.currentTime + val retVal = block + if (StringUtils.isNotBlank(msg)) logger.info(s"${msg}, 耗时:${timecost(startTime)}") + retVal + } +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/FireUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/FireUtils.scala new file mode 100644 index 0000000..a40add1 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/FireUtils.scala @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import com.zto.fire.common.conf.{FireFrameworkConf, FirePS1Conf} +import org.slf4j.LoggerFactory + +/** + * fire框架通用的工具方法 + * 注:该工具类中不可包含Spark或Flink的依赖 + * + * @author ChengLong + * @since 1.0.0 + * @create: 2020-05-17 10:17 + */ +private[fire] object FireUtils extends Serializable { + private var isSplash = false + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 判断是否为spark引擎 + */ + def isSparkEngine: Boolean = "spark".equals(PropUtils.engine) + + /** + * 判断是否为flink引擎 + */ + def isFlinkEngine: Boolean = "flink".equals(PropUtils.engine) + + /** + * 获取fire版本号 + */ + def fireVersion: String = FireFrameworkConf.fireVersion + + /** + * 用于在fire框架启动时展示信息 + */ + private[fire] def splash: Unit = { + if (!isSplash) { + val info = + """ + | ___ ___ ___ + | /\ \ ___ /\ \ /\ \ + | /::\ \ /\ \ /::\ \ /::\ \ + | /:/\:\ \ \:\ \ /:/\:\ \ /:/\:\ \ + | /::\~\:\ \ /::\__\ /::\~\:\ \ /::\~\:\ \ + | /:/\:\ \:\__\ __/:/\/__/ /:/\:\ \:\__\ /:/\:\ \:\__\ + | \/__\:\ \/__/ /\/:/ / \/_|::\/:/ / \:\~\:\ \/__/ + | \:\__\ \::/__/ |:|::/ / \:\ \:\__\ + | \/__/ \:\__\ |:|\/__/ \:\ \/__/ + | \/__/ |:| | \:\__\ + | \|__| \/__/ version + | + |""".stripMargin.replace("version", s"version ${FirePS1Conf.PINK + this.fireVersion}") + + this.logger.warn(FirePS1Conf.GREEN + info + FirePS1Conf.DEFAULT) + this.isSplash = true + } + } +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/JSONUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/JSONUtils.scala new file mode 100644 index 0000000..7fffcdb --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/JSONUtils.scala @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import com.zto.fire.predef._ +import org.apache.commons.lang3.StringUtils +import org.apache.htrace.fasterxml.jackson.annotation.JsonAutoDetect.Visibility +import org.apache.htrace.fasterxml.jackson.annotation.JsonInclude.Include +import org.apache.htrace.fasterxml.jackson.annotation.PropertyAccessor +import org.apache.htrace.fasterxml.jackson.core.JsonParser +import org.apache.htrace.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper, SerializationFeature} + +import scala.reflect.ClassTag +import scala.util.Try + +/** + * json处理工具类,基于jackson封装 + * + * @author ChengLong 2021年4月14日09:27:37 + * @since fire 2.0.0 + */ +object JSONUtils { + + private[this] lazy val objectMapperLocal = new ThreadLocal[ObjectMapper]() { + override def initialValue(): ObjectMapper = newObjectMapperWithDefaultConf + } + + /** + * 创建一个新的ObjectMapper实例 + */ + def newObjectMapper: ObjectMapper = new ObjectMapper + + /** + * 创建一个新的ObjectMapper实例,并设置一系列默认的属性 + */ + def newObjectMapperWithDefaultConf: ObjectMapper = { + this.newObjectMapper + .configure(DeserializationFeature.FAIL_ON_IGNORED_PROPERTIES, false) + .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) + .configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true) + .configure(JsonParser.Feature.ALLOW_SINGLE_QUOTES, true) + .configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, true) + .configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true) + .configure(SerializationFeature.FAIL_ON_EMPTY_BEANS, false) + .setSerializationInclusion(Include.ALWAYS) + .setVisibility(PropertyAccessor.ALL, Visibility.ANY) + } + + /** + * 从线程局部变量中获取对应的ObjectMapper对象实例 + */ + def getObjectMapper: ObjectMapper = this.objectMapperLocal.get() + + /** + * 将给定的对象解析成json字符串 + * + * @param obj 任意的对象实例 + * @return json字符串 + */ + def toJSONString(obj: Object): String = this.getObjectMapper.writeValueAsString(obj) + + /** + * 将给定的json字符串转为T类型对象实例 + * + * @param json + * json字符串 + * @tparam T + * 目标泛型类型 + * @return + * 目标对象实例 + */ + def parseObject[T: ClassTag](json: String): T = this.getObjectMapper.readValue[T](json, getParamType[T]) + + /** + * 将给定的json字符串转为T类型对象实例 + * + * @param json + * json字符串 + * @param valueType + * 目标类型 + * @tparam T + * 目标泛型类型 + * @return + * 目标对象实例 + */ + def parseObject[T](json: String, valueType: Class[T]): T = this.getObjectMapper.readValue[T](json, valueType) + + + /** + * 用于判断给定的字符串是否为合法的json + * + * @param json + * 待校验的字符串 + * @param strictMode + * 检查模式,如果是true则会进行严格的检查,会牺牲部分性能,如果为false,则只进行简单的检查,性能较好 + * @return + * true: 合法的字符串 false:非法的json字符串 + */ + def isJson(json: String, strictMode: Boolean = true): Boolean = { + if (strictMode) { + Try { + try parseObject[JMap[Object, Object]](json) + }.isSuccess + } else { + val jsonStr = StringUtils.trim(json) + if (StringUtils.isBlank(jsonStr)) return false + jsonStr.startsWith("{") && jsonStr.endsWith("}") + } + } + + /** + * 用于判断给定的字符串是否为合法的jsonarray + * + * @param jsonArray + * 待校验的字符串 + * @param strictMode + * 检查模式,如果是true则会进行严格的检查,会牺牲部分性能,如果为false,则只进行简单的检查,性能较好 + * @return + * true: 合法的字符串 false:非法的json字符串 + */ + def isJsonArray(jsonArray: String, strictMode: Boolean = true): Boolean = { + if (strictMode) { + Try { + try parseObject[JList[Object]](jsonArray) + }.isSuccess + } else { + val jsonArrayStr = StringUtils.trim(jsonArray) + if (StringUtils.isBlank(jsonArrayStr)) return false + jsonArrayStr.startsWith("[") && jsonArrayStr.endsWith("]") + } + } + + /** + * 用于快速判断给定的字符串是否为合法的JsonArray或json + * 注:不会验证每个field的合法性,仅做简单校验 + * + * @param json + * 待校验的字符串 + * @return + * true: 合法的字符串 false:非法的json字符串 + */ + def isLegal(json: String, strictMode: Boolean = true): Boolean = this.isJson(json, strictMode) || this.isJsonArray(json, strictMode) + + /** + * 用于快速判断给定的字符串是否为合法的JsonArray或json + * 注:不会验证每个field的合法性,仅做简单校验 + * + * @param json + * 待校验的字符串 + * @return + * true: 合法的字符串 false:非法的json字符串 + */ + def checkJson(json: String, strictMode: Boolean = true): Boolean = this.isLegal(json, strictMode) + + /** + * 解析JSON,并获取指定key对应的值 + * + * @param json json字符串 + * @param key json的key + * @return value + */ + def getValue[T: ClassTag](json: String, key: String, defaultValue: T): T = { + if (!this.isLegal(json)) return defaultValue + val map = this.parseObject[JHashMap[String, Object]](json) + map.getOrElse(key, defaultValue).asInstanceOf[T] + } + +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/JavaTypeMap.scala b/fire-common/src/main/scala/com/zto/fire/common/util/JavaTypeMap.scala new file mode 100644 index 0000000..4f887dd --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/JavaTypeMap.scala @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +/** + * Java类型映射 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-16 15:40 + */ +trait JavaTypeMap { + // Java API库映射 + type JInt = java.lang.Integer + type JLong = java.lang.Long + type JBoolean = java.lang.Boolean + type JChar = java.lang.Character + type JFloat = java.lang.Float + type JShort = java.lang.Short + type JDouble = java.lang.Double + type JBigDecimal = java.math.BigDecimal + type JString = java.lang.String + type JStringBuilder = java.lang.StringBuilder + type JStringBuffer = java.lang.StringBuffer + type JMap[K, V] = java.util.Map[K, V] + type JHashMap[K, V] = java.util.HashMap[K, V] + type JLinkedHashMap[K, V] = java.util.LinkedHashMap[K, V] + type JConcurrentHashMap[K, V] = java.util.concurrent.ConcurrentHashMap[K, V] + type JSet[E] = java.util.Set[E] + type JHashSet[E] = java.util.HashSet[E] + type JLinkedHashSet[E] = java.util.LinkedHashSet[E] + type JList[E] = java.util.List[E] + type JArrayList[E] = java.util.ArrayList[E] + type JLinkedList[E] = java.util.LinkedList[E] + type JQueue[E] = java.util.Queue[E] + type JPriorityQueue[E] = java.util.PriorityQueue[E] + type JCollections = java.util.Collections +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/KafkaUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/KafkaUtils.scala new file mode 100644 index 0000000..a43aea7 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/KafkaUtils.scala @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import java.util +import java.util.Properties + +import com.zto.fire.common.conf.FireKafkaConf +import com.zto.fire.predef._ +import org.apache.commons.lang3.StringUtils +import org.apache.kafka.clients.consumer.{ConsumerConfig, KafkaConsumer, OffsetAndTimestamp} +import org.apache.kafka.common.TopicPartition +import org.apache.kafka.common.serialization.StringDeserializer +import org.slf4j.LoggerFactory + +/** + * Kafka工具类 + * + * @author ChengLong 2020-4-17 09:50:50 + */ +object KafkaUtils { + private lazy val kafkaMonitor = "fire_kafka_consumer" + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 根据kafka集群名称获取broker地址 + * + * @param clusterName 集群名称 + * @return broker地址 + */ + def getBorkers(clusterName: String): String = FireKafkaConf.kafkaMap.getOrElse(clusterName, "") + + /** + * 创建新的kafka consumer + * + * @param host kafka broker地址 + * @param groupId 对应的groupId + * @return KafkaConsumer + */ + def createNewConsumer(host: String, groupId: String): KafkaConsumer[String, String] = { + val properties = new Properties + properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, host) + properties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId) + properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false") + properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, classOf[StringDeserializer].getName) + properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, classOf[StringDeserializer].getName) + properties.put("auto.offset.reset", "earliest") + new KafkaConsumer[String, String](properties) + } + + /** + * 获取大于指定时间戳的一条消息 + * + * @param host broker地址 + * @param topic topic信息 + * @param timestamp 消息时间戳 + * @return 一条消息记录 + */ + def getMsg(host: String, topic: String, timestamp: java.lang.Long): String = { + var kafkaConsumer: KafkaConsumer[String, String] = null + var msg = "" + try { + kafkaConsumer = createNewConsumer(host, kafkaMonitor) + // 如果指定了时间戳,则取大于该时间戳的消息 + if (timestamp != null) { // 获取topic的partition信息 + val partitionInfos = kafkaConsumer.partitionsFor(topic) + val topicPartitions = new util.ArrayList[TopicPartition] + val timestampsToSearch = new util.HashMap[TopicPartition, java.lang.Long] + for (partitionInfo <- partitionInfos) { + topicPartitions.add(new TopicPartition(partitionInfo.topic, partitionInfo.partition)) + timestampsToSearch.put(new TopicPartition(partitionInfo.topic, partitionInfo.partition), timestamp) + } + // 手动指定各分区offset + kafkaConsumer.assign(topicPartitions) + // 获取每个partition指定时间戳的偏移量 + val map = kafkaConsumer.offsetsForTimes(timestampsToSearch) + this.logger.info("根据时间戳获取偏移量:map.size={}", map.size()) + var offsetTimestamp: OffsetAndTimestamp = null + this.logger.info("开始设置各分区初始偏移量...") + for (entry <- map.entrySet) { // 如果设置的查询偏移量的时间点大于最大的索引记录时间,那么value就为空 + offsetTimestamp = entry.getValue + if (offsetTimestamp != null) { // 设置读取消息的偏移量 + val offset: java.lang.Long = offsetTimestamp.offset + kafkaConsumer.seek(entry.getKey, offset) + this.logger.info("seek: id=" + entry.getKey.partition + " offset=" + offset) + } + } + } else { // 如果未指定时间戳,则直接获取消息 + kafkaConsumer.subscribe(util.Arrays.asList(topic)) + } + // 消费消息 + val records = kafkaConsumer.poll(10000) + for (record <- records if StringUtils.isBlank(msg)) { + if (timestamp == null) { + msg = record.value + } + else { // 如果指定时间戳,则取大于指定时间戳的消息 + if (record.timestamp >= timestamp) { + msg = record.value + } + } + } + } catch { + case e: Exception => logger.error("获取消息失败", e) + } finally { + if (kafkaConsumer != null) kafkaConsumer.close() + } + msg + } + + /** + * kafka配置信息 + * + * @param kafkaParams + * 代码中指定的kafka配置信息,如果配置文件中也有配置,则配置文件中的优先级高 + * @param groupId + * 消费组 + * @param offset + * smallest、largest + * @return + * kafka相关配置 + */ + def kafkaParams(kafkaParams: Map[String, Object] = null, + groupId: String = null, + kafkaBrokers: String = null, + offset: String = FireKafkaConf.offsetLargest, + autoCommit: Boolean = false, + keyNum: Int = 1): Map[String, Object] = { + + val consumerMap = collection.mutable.Map[String, Object]() + // 代码中指定的kafka配置优先级最低 + if (kafkaParams != null && kafkaParams.nonEmpty) consumerMap ++= kafkaParams + + // 如果没有在配置文件中指定brokers,则认为从代码中获取,此处返回空的map,用于上层判断 + val confBrokers = FireKafkaConf.kafkaBrokers(keyNum) + val finalKafkaBrokers = if (StringUtils.isNotBlank(confBrokers)) confBrokers else kafkaBrokers + if (StringUtils.isNotBlank(finalKafkaBrokers)) consumerMap += ("bootstrap.servers" -> finalKafkaBrokers) + + // 如果配置文件中没有指定spark.kafka.group.id,则默认获取用户指定的groupId + val confGroupId = FireKafkaConf.kafkaGroupId(keyNum) + val finalKafkaGroupId = if (StringUtils.isNotBlank(confGroupId)) confGroupId else groupId + if (StringUtils.isNotBlank(finalKafkaGroupId)) consumerMap += ("group.id" -> finalKafkaGroupId) + + val confOffset = FireKafkaConf.kafkaStartingOffset(keyNum) + val finalOffset = if (StringUtils.isNotBlank(confOffset)) confOffset else offset + if (StringUtils.isNotBlank(finalOffset)) consumerMap += ("auto.offset.reset" -> finalOffset) + + val confAutoCommit = FireKafkaConf.kafkaEnableAutoCommit(keyNum) + val finalAutoCommit = if (confAutoCommit != null) confAutoCommit else autoCommit + if (finalAutoCommit != null) consumerMap += ("enable.auto.commit" -> (finalAutoCommit: java.lang.Boolean)) + + // 最基本的配置项 + consumerMap ++= collection.mutable.Map[String, Object]( + "key.deserializer" -> classOf[StringDeserializer], + "value.deserializer" -> classOf[StringDeserializer], + "session.timeout.ms" -> FireKafkaConf.kafkaSessionTimeOut(keyNum), + "request.timeout.ms" -> FireKafkaConf.kafkaRequestTimeOut(keyNum), + "max.poll.interval.ms" -> FireKafkaConf.kafkaPollInterval(keyNum) + ) + + // 以spark.kafka.conf.开头的配置优先级最高 + val configMap = FireKafkaConf.kafkaConfMapWithType(keyNum) + if (configMap.nonEmpty) consumerMap ++= configMap + // 日志记录最终生效的kafka配置 + LogUtils.logMap(this.logger, consumerMap.toMap, s"Kafka client configuration. keyNum=$keyNum.") + + consumerMap.toMap + } + +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/LogUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/LogUtils.scala new file mode 100644 index 0000000..6560982 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/LogUtils.scala @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import com.zto.fire.common.conf.FirePS1Conf +import org.apache.commons.lang3.StringUtils +import org.slf4j.event.Level +import org.slf4j.{Logger, LoggerFactory} + +/** + * 日志工具类 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-07-01 10:23 + */ +object LogUtils { + + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 以固定的开始与结束风格打日志 + * + * @param logger + * 日志记录器 + * @param title + * 日志开始标题 + * @param style + * 日志开始标题类型 + * @param level + * 日志的级别 + * @param fun + * 用户自定义的操作 + */ + def logStyle(logger: Logger, title: String = "", style: String = "-", level: Level = Level.INFO)(fun: Logger => Unit): Unit = { + if (logger != null) { + val styleRepeat = StringUtils.repeat(style, 19) + val titleStart = styleRepeat + s"${FirePS1Conf.GREEN}> start: " + title + s" <${FirePS1Conf.DEFAULT}" + styleRepeat + this.logLevel(logger, titleStart, level) + fun(logger) + val titleEnd = styleRepeat + s"${FirePS1Conf.GREEN}> end: " + title + s" <${FirePS1Conf.DEFAULT}" + styleRepeat + this.logLevel(logger, titleEnd, level) + } + } + + /** + * 以固定的风格打印map中的内容 + */ + def logMap(logger: Logger = this.logger, map: Map[_, _], title: String): Unit = { + if (logger != null && map != null && map.nonEmpty) { + LogUtils.logStyle(logger, title)(logger => { + map.foreach(kv => logger.info(s"---> ${kv._1} = ${kv._2}")) + }) + } + } + + /** + * 根据指定的基本进行日志记录 + * + * @param logger + * 日志记录器 + * @param log + * 日志内容 + * @param level + * 日志的级别 + */ + def logLevel(logger: Logger, log: String, level: Level = Level.INFO, ps: String = null): Unit = { + val logMsg = if (StringUtils.isNotBlank(ps)) s"$ps $log ${FirePS1Conf.DEFAULT}" else log + level match { + case Level.DEBUG => logger.debug(logMsg) + case Level.INFO => logger.info(logMsg) + case Level.WARN => logger.warn(logMsg) + case Level.ERROR => logger.error(logMsg) + case Level.TRACE => logger.trace(logMsg) + } + } +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/NumberFormatUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/NumberFormatUtils.scala new file mode 100644 index 0000000..437b375 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/NumberFormatUtils.scala @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import java.math.BigDecimal + +/** + * 数值类型常用操作工具类 + * Created by ChengLong on 2018-06-01. + */ +object NumberFormatUtils { + + /** + * floor操作 + * + * @param field + * @return + */ + def floor(field: Double): Int = { + if (field == null) 0 else Math.floor(field).toInt + } + + /** + * 将Long转为Integer + * + * @param field + * @return + */ + def long2Int(field: java.lang.Long): java.lang.Integer = { + if (field != null) { + field.toInt + } else { + 0 + } + } + + /** + * 将BigDecimal转为Long类型 + * + * @param field + * @return + */ + def bigDecimal2Long(field: java.math.BigDecimal): java.lang.Long = { + if (field != null) { + field.longValue() + } else { + 0L + } + } + + /** + * 判断是否为空 + * + * @param decimal + * @return + */ + def ifnull(decimal: java.math.BigDecimal, defaultVal: java.math.BigDecimal): java.math.BigDecimal = { + if (decimal == null) defaultVal else decimal + } + + /** + * 类似于round,但不会四舍五入 + * + * @param value + * 目标值 + * @param scale + * 精度 + * @return + */ + def truncate(value: Double, scale: Int): Double = { + if (value == null) { + 0.0 + } else { + new BigDecimal(value).setScale(Math.abs(scale), BigDecimal.ROUND_HALF_UP).doubleValue() + } + } + + def truncate2(value: Double, scale: Int): Double = { + if (value == null) { + 0.0 + } else if (scale == 0) { + value.toLong + } else { + val tmp = Math.pow(10, Math.abs(scale)) + (value * tmp).asInstanceOf[Int] / tmp + } + } + + /** + * 截取精度 + * + * @param bigDecimal + * @param scale + * 精度 + * @return + */ + def truncateDecimal(bigDecimal: java.math.BigDecimal, scale: Int): java.math.BigDecimal = { + if (bigDecimal == null) { + new java.math.BigDecimal("0").setScale(scale, BigDecimal.ROUND_HALF_UP) + } else { + bigDecimal.setScale(scale, BigDecimal.ROUND_HALF_UP) + } + } + +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/PropUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/PropUtils.scala new file mode 100644 index 0000000..0da762f --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/PropUtils.scala @@ -0,0 +1,447 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import com.zto.fire.common.conf._ +import com.zto.fire.predef._ +import org.apache.commons.lang3.StringUtils +import org.slf4j.LoggerFactory + +import java.io.{FileInputStream, InputStream} +import java.util.Properties +import java.util.concurrent.atomic.AtomicBoolean +import scala.collection.mutable.Map +import scala.collection.{immutable, mutable} +import scala.reflect.ClassTag + +/** + * 读取配置文件工具类 + * Created by ChengLong on 2016-11-22. + */ +object PropUtils { + private val props = new Properties() + private val configurationFiles = Array[String]("fire", "cluster", "spark", "flink") + // 用于判断是否merge过 + private[fire] val isMerge = new AtomicBoolean(false) + // 引擎类型判断,当前阶段仅支持spark与flink,未来若支持新的引擎,则需在此处做支持 + private[fire] val engine = if (this.isExists("spark")) "spark" else "flink" + // 加载默认配置文件 + this.load(this.configurationFiles: _*) + // 避免已被加载的配置文件被重复加载 + private[this] lazy val alreadyLoadMap = new mutable.HashMap[String, String]() + // 用于存放所有的配置信息 + private[fire] lazy val settingsMap = new mutable.HashMap[String, String]() + // 用于存放固定前缀,而后缀不同的配置信息 + private[this] lazy val cachedConfMap = new mutable.HashMap[String, collection.immutable.Map[String, String]]() + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 判断指定的配置文件是否存在 + * + * @param fileName + * 配置文件名称 + */ + def isExists(fileName: String): Boolean = { + var resource: InputStream = null + try { + resource = this.getInputStream(fileName) + if (resource == null) false else true + } finally { + if (resource != null) { + IOUtils.close(resource) + } + } + } + + /** + * 获取完整的配置文件名称 + */ + private[this] def getFullName(fileName: String): String = if (fileName.endsWith(".properties")) fileName else s"$fileName.properties" + + /** + * 获取指定配置文件的输入流 + * 注:此api调用者需主动关闭输入流 + * + * @param fileName + * 配置文件名称 + */ + private[this] def getInputStream(fileName: String): InputStream = { + val fullName = this.getFullName(fileName) + var resource: InputStream = null + try { + resource = FileUtils.resourceFileExists(fullName) + if (resource == null) { + val findFileName = FindClassUtils.findFileInJar(fullName) + if (StringUtils.isNotBlank(findFileName)) { + if (FindClassUtils.isJar) { + resource = FileUtils.resourceFileExists(findFileName) + } else { + resource = new FileInputStream(findFileName) + } + } + } + resource + } + } + + /** + * 加载指定配置文件,resources根目录下优先级最高,其次是按字典顺序的目录 + * + * @param fileName + * 配置文件名称 + */ + def loadFile(fileName: String): this.type = this.synchronized { + val fullName = this.getFullName(fileName) + if (StringUtils.isNotBlank(fullName) && !this.alreadyLoadMap.contains(fullName)) { + var resource: InputStream = null + try { + resource = this.getInputStream(fullName) + if (resource == null && !this.configurationFiles.contains(fileName)) this.logger.warn(s"未找到配置文件[ $fullName ],请核实!") + if (resource != null) { + this.logger.warn(s"${FirePS1Conf.YELLOW} -------------> loaded ${fullName} <------------- ${FirePS1Conf.DEFAULT}") + props.load(resource) + // 将所有的配置信息存放到settings中,并统一添加key的引擎前缀,如: + // 如果是spark引擎,则key前缀统一添加spark. 如果是flink引擎,则统一添加flink. + props.foreach(prop => this.settingsMap.put(this.adaptiveKey(prop._1), prop._2)) + props.clear() + this.alreadyLoadMap.put(fullName, fullName) + } + } finally { + if (resource != null) { + IOUtils.close(resource) + } + } + } + this + } + + /** + * 加载多个指定配置文件,resources根目录下优先级最高,其次是按字典顺序的目录 + * + * @param fileNames + * 配置文件名称 + */ + def load(fileNames: String*): this.type = { + if (noEmpty(fileNames)) fileNames.foreach(this.loadFile) + this + } + + /** + * 自适应key的前缀 + */ + private[this] def adaptiveKey(key: String): String = { + if (!key.startsWith(s"${this.engine}.")) s"${this.engine}.$key" else key + } + + /** + * 根据key获取配置信息 + * 注:其他均需要通过该API进行配置的获取,禁止直接调用:props.getProperty + * + * @param key + * 配置的key + * @return + * 配置的value + */ + def getProperty(key: String): String = { + if (this.isMerge.compareAndSet(false, true)) this.mergeEngineConf + this.getOriginalProperty(this.adaptiveKey(key)) + } + + /** + * 获取原生的配置信息 + */ + private[fire] def getOriginalProperty(key: String): String = this.settingsMap.getOrElse(key, "") + + /** + * 将给定的配置中的值与计量单位拆分开 + * + * @param value + * 配置的值,形如:10.3min + * @return + * 拆分单位后的tuple,形如:(10.3, min) + */ + def splitUnit(value: String): (String, String) = { + val numericPrefix = RegularUtils.numericPrefix.findFirstIn(value) + val unitSuffix = RegularUtils.unitSuffix.findFirstIn(value) + if (numericPrefix.isEmpty || unitSuffix.isEmpty) throw new IllegalArgumentException("配置中不包含数值或计量单位,请检查配置") + + (numericPrefix.get.trim, unitSuffix.get.trim) + } + + /** + * 获取字符串 + */ + def getString(key: String): String = this.getProperty(key) + + /** + * 获取字符串,为空则取默认值 + */ + def getString(key: String, default: String): String = { + val value = this.getProperty(key) + if (StringUtils.isNotBlank(value)) value else default + } + + /** + * 获取拼接后数值的配置字符串 + * + * @param key 配置的前缀 + * @param keyNum 拼接到key后的数值后缀 + * @return + * 对应的配置信息 + */ + def getString(key: String, default: String, keyNum: Int = 1): String = { + if (keyNum <= 1) { + var value = this.getProperty(key) + if (StringUtils.isBlank(value)) { + value = this.getString(key + "1", default) + } + value + } else { + this.getString(key + keyNum, default) + } + } + + /** + * 获取拼接后数值的配置整数 + * + * @param key 配置的前缀 + * @param keyNum 拼接到key后的数值后缀 + * @return + * 对应的配置信息 + */ + def getInt(key: String, default: Int, keyNum: Int = 1): Int = { + val value = this.getString(key, default + "", keyNum) + if (StringUtils.isNotBlank(value)) value.toInt else default + } + + + /** + * 获取拼接后数值的配置长整数 + * + * @param key 配置的前缀 + * @param keyNum 拼接到key后的数值后缀 + * @return + * 对应的配置信息 + */ + def getLong(key: String, default: Long, keyNum: Int = 1): Long = { + this.get[Long](key, Some(default), keyNum) + } + + /** + * 获取float型数据 + */ + def getFloat(key: String, default: Float, keyNum: Int = 1): Float = { + this.get[Float](key, Some(default), keyNum) + } + + /** + * 获取Double型数据 + */ + def getDouble(key: String, default: Double, keyNum: Int = 1): Double = { + this.get[Double](key, Some(default), keyNum) + } + + + /** + * 获取拼接后数值的配置布尔值 + * + * @param key 配置的前缀 + * @param keyNum 拼接到key后的数值后缀 + * @return + * 对应的配置信息 + */ + def getBoolean(key: String, default: Boolean, keyNum: Int = 1): Boolean = { + this.get[Boolean](key, Some(default), keyNum) + } + + /** + * 根据指定的key与key的num,获取对应的配置信息 + * 1. 如果配置存在,则进行类型转换,返回T类型数据 + * 2. 如果配置不存在,则取default参数作为默认值返回 + * + * @param key + * 配置的key + * @param default + * 如果配置不存在,则取default只 + * @param keyNum + * 配置key的后缀编号 + * @tparam T + * 返回配置的类型 + * @return + */ + def get[T: ClassTag](key: String, default: Option[T] = Option.empty, keyNum: Int = 1): T = { + val value = this.getString(key, if (default.isDefined) default.get.toString else "", keyNum = keyNum) + val paramType = getParamType[T] + val property = tryWithReturn { + paramType match { + case _ if paramType eq classOf[Int] => value.toInt + case _ if paramType eq classOf[Long] => value.toLong + case _ if paramType eq classOf[Float] => value.toFloat + case _ if paramType eq classOf[Double] => value.toDouble + case _ if paramType eq classOf[Boolean] => value.toBoolean + case _ => value + } + } (this.logger, catchLog = s"为找到配置信息:${key},请检查!") + property.asInstanceOf[T] + } + + /** + * 使用map设置多个值 + * + * @param map + * java map,存放多个配置信息 + */ + def setProperties(map: mutable.Map[String, String]): Unit = this.synchronized { + if (map != null) map.foreach(kv => this.setProperty(kv._1, kv._2)) + } + + /** + * 使用map设置多个值 + * + * @param map + * java map,存放多个配置信息 + */ + def setProperties(map: JMap[String, Object]): Unit = this.synchronized { + if (map != null) { + map.foreach(kv => { + if (StringUtils.isNotBlank(kv._1) && kv._2 != null) { + this.setProperty(kv._1, kv._2.toString) + } + }) + } + } + + /** + * 设置指定的配置 + * 注:其他均需要通过该API进行配置的设定,禁止直接调用:props.setProperty + * + * @param key + * 配置的key + * @param value + * 配置的value + */ + def setProperty(key: String, value: String): Unit = this.synchronized { + if (StringUtils.isNotBlank(key) && StringUtils.isNotBlank(value)) { + this.setOriginalProperty(this.adaptiveKey(key), value) + } + } + + /** + * 添加原生的配置信息 + */ + private[fire] def setOriginalProperty(key: String, value: String): Unit = this.synchronized(this.settingsMap.put(key, value)) + + /** + * 隐蔽密码信息后返回 + */ + def cover: Map[String, String] = this.settingsMap.filter(t => !t._1.contains("pass")) + + /** + * 打印配置文件中的kv + */ + def show(): Unit = { + if (!FireFrameworkConf.fireConfShow) return + LogUtils.logStyle(this.logger, "Fire configuration.")(logger => { + this.settingsMap.foreach(key => { + // 如果包含配置黑名单,则不打印 + if (key != null && !FireFrameworkConf.fireConfBlackList.exists(conf => key.toString.contains(conf))) { + logger.info(s">>${FirePS1Conf.PINK} ${key._1} --> ${key._2} ${FirePS1Conf.DEFAULT}") + } + }) + }) + } + + /** + * 将配置信息转为Map,并设置到SparkConf中 + * + * @return + * confMap + */ + def settings: Map[String, String] = { + val map = Map[String, String]() + map.putAll(this.settingsMap) + map + } + + /** + * 指定key的前缀获取所有该前缀的key与value + */ + def sliceKeys(keyStart: String): immutable.Map[String, String] = { + if (!this.cachedConfMap.contains(keyStart)) { + val confMap = new mutable.HashMap[String, String]() + this.settingsMap.foreach(key => { + val adaptiveKeyStar = this.adaptiveKey(keyStart) + if (key._1.contains(adaptiveKeyStar)) { + val keySuffix = key._1.substring(adaptiveKeyStar.length) + confMap.put(keySuffix, key._2) + } + }) + this.cachedConfMap.put(keyStart, confMap.toMap) + } + this.cachedConfMap(keyStart) + } + + /** + * 根据keyNum选择对应的kafka配置 + */ + def sliceKeysByNum(keyStart: String, keyNum: Int = 1): collection.immutable.Map[String, String] = { + // 用于匹配以指定keyNum结尾的key + val reg = "\\D" + keyNum + "$" + val map = new mutable.HashMap[String, String]() + this.sliceKeys(keyStart).foreach(kv => { + val keyLength = kv._1.length + val keyNumStr = keyNum.toString + // 末尾匹配keyNum并且keyNum的前一位非整数 + val isMatch = reg.r.findFirstMatchIn(kv._1).isDefined + // 提前key,如key=session.timeout.ms33,则提前后的key=session.timeout.ms + val trimKey = if (isMatch) kv._1.substring(0, keyLength - keyNumStr.length) else kv._1 + + // 配置的key的末尾与keyNum匹配 + if (isMatch) { + map += (trimKey -> kv._2) + } else if (keyNum <= 1) { + // 匹配没有数字后缀的key,session.timeout.ms与session.timeout.ms1认为是同一个配置 + val lastChar = kv._1.substring(keyLength - 1, keyLength) + // 如果配置的结尾是字母 + if (!StringsUtils.isInt(lastChar)) { + map += (kv._1 -> kv._2) + } + } + }) + map.toMap + } + + /** + * 合并Conf中的配置信息 + */ + private[this] def mergeEngineConf: Unit = { + val clazz = Class.forName(FireFrameworkConf.FIRE_ENGINE_CONF_HELPER) + val method = clazz.getDeclaredMethod("getEngineConf") + val map = method.invoke(null).asInstanceOf[immutable.Map[String, String]] + if (map.nonEmpty) { + this.setProperties(map) + logger.info(s"完成计算引擎配置信息的同步,总计:${map.size}条") + map.foreach(k => logger.debug("合并:k=" + k._1 + " v=" + k._2)) + } + } + + /** + * 调用外部配置中心接口获取配合信息 + */ + def invokeConfigCenter(className: String): Unit = ConfigurationCenterManager.invokeConfigCenter(className) +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/RegularUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/RegularUtils.scala new file mode 100644 index 0000000..0923c5b --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/RegularUtils.scala @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +/** + * 常用的正则表达式 + * + * @author ChengLong 2021-5-28 11:14:19 + * @since fire 2.0.0 + */ +object RegularUtils { + // 用于匹配纯数值的表达式 + lazy val numeric = "(^[1-9]\\d*\\.?\\d*$)|(^0\\.\\d*[1-9]$)".r + // 用于匹配字符串中以数值开头的数值 + lazy val numericPrefix = "(^[1-9]\\d*\\.?\\d*)|(^0\\.\\d*[1-9])".r + // 用于匹配字符串中以固定的字母+空白符结尾 + lazy val unitSuffix = "[a-zA-Z]+\\s*$".r +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/SQLUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/SQLUtils.scala new file mode 100644 index 0000000..49cca4c --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/SQLUtils.scala @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import org.apache.commons.lang3.StringUtils + +import scala.collection.mutable.ListBuffer + +/** + * SQL相关工具类 + * + * @author ChengLong + * @since 1.1.2 + * @create 2020-11-26 15:09 + */ +object SQLUtils { + private[this] val beforeWorld = "(?i)(from|join|update|into table|table|into|exists|desc|like|if)" + private[this] val reg = s"${beforeWorld}\\s+(\\w+\\.\\w+|\\w+)".r + + /** + * 利用正则表达式解析SQL中用到的表名 + */ + def tableParse(sql: String): ListBuffer[String] = { + require(StringUtils.isNotBlank(sql), "sql语句不能为空") + + val tables = ListBuffer[String]() + // 找出所有beforeWorld中定义的关键字匹配到的后面的表名 + reg.findAllMatchIn(sql.replace("""`""", "")).foreach(tableName => { + // 将匹配到的数据剔除掉beforeWorld中定义的关键字 + val name = tableName.toString().replaceAll(s"${beforeWorld}\\s+", "").trim + if (StringUtils.isNotBlank(name)) tables += name + }) + + tables + } + +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/ScalaUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/ScalaUtils.scala new file mode 100644 index 0000000..9aab61b --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/ScalaUtils.scala @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import scala.reflect.{ClassTag, classTag} +import scala.runtime.Nothing$ + +/** + * scala工具类 + * + * @author ChengLong + * @since 2.0.0 + * @create 2021-01-04 14:06 + */ +trait ScalaUtils { + + /** + * 获取泛型具体的类型 + * + * @tparam T + * 泛型类型 + * @return + * Class[T] + */ + def getParamType[T: ClassTag]: Class[T] = { + val paramType = classTag[T].runtimeClass.asInstanceOf[Class[T]] + if (paramType == classOf[Nothing$]) throw new IllegalArgumentException("不合法的方法调用,请在方法调用时指定泛型!") + paramType + } +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/ShutdownHookManager.scala b/fire-common/src/main/scala/com/zto/fire/common/util/ShutdownHookManager.scala new file mode 100644 index 0000000..e1f1044 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/ShutdownHookManager.scala @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import java.util.PriorityQueue +import java.util.concurrent.atomic.AtomicBoolean + +import com.zto.fire.predef._ +import org.slf4j.LoggerFactory + +/** + * Fire框架统一的shutdown hook管理器,所有注册了的hook将会在jvm退出前根据优先级依次调用 + * + * @author ChengLong + * @create 2020-11-20 14:06 + * @since 1.1.2 + */ +private[fire] class ShutdownHookManager { + private lazy val logger = LoggerFactory.getLogger(this.getClass) + // 具有优先级的队列,存放各处注册的hook信息,在jvm退出前根据优先级依次调用 + private[this] val hooks = new PriorityQueue[HookEntry]() + private[this] val shuttingDown = new AtomicBoolean(false) + + /** + * 执行所有的hook + */ + def runAll: Unit = { + if (this.shuttingDown.compareAndSet(false, true)) { + var nextHook: HookEntry = null + while ( { + nextHook = hooks.synchronized { + hooks.poll() + }; + nextHook != null + }) { + // 调用每一个hook的run方法 + tryWithLog(nextHook.run())(this.logger, catchLog = "执行hook过程中发生例外.") + } + } + } + + /** + * install所有的hook + */ + def install: Unit = { + Runtime.getRuntime.addShutdownHook(new Thread() { + // 调用hooks中的所有hook的run方法,每个run都会被try/cache包围 + override def run(): Unit = runAll + }) + } + + /** + * 添加指定优先级的hook + */ + def add(priority: Int, hook: () => Unit): Unit = { + this.hooks.synchronized { + if (this.shuttingDown.get()) throw new IllegalStateException("Shutdown hooks 在关闭过程中无法注册新的hook") + this.hooks.add(new HookEntry(priority, hook)) + } + } + + /** + * 移除指定的hook + */ + def remove(ref: AnyRef): Unit = { + this.hooks.synchronized { + this.hooks.remove(ref) + } + } +} + +/** + * hook项,包含优先级与具体的hook逻辑 + * + * @param priority + * hook优先级,优先级高的会先被调用 + * @param hook + * hook具体的执行逻辑,比如用于关闭数据库连接等 + */ +private[fire] class HookEntry(private val priority: Int, hook: () => Unit) extends Comparable[HookEntry] { + + /** + * hook执行顺序的优先级比较 + */ + override def compareTo(o: HookEntry): Int = o.priority - this.priority + + /** + * run方法中调用hook函数 + */ + def run(): Unit = hook() +} + +/** + * Fire框架统一的shutdown hook管理器 + * 调用者可以基于提供的api进行hook的注册 + */ +object ShutdownHookManager { + // 优先级定义 + private val DEFAULT_PRIORITY = 10 + private val HIGHT_PRIORITY = 100 + private val LOW_PRIORITY = 5 + private[this] lazy val hookManager = new ShutdownHookManager() + + this.hookManager.install + + def addShutdownHook(priority: Int = DEFAULT_PRIORITY)(hook: () => Unit): Unit = { + hookManager.add(priority, hook) + } + + def removeShutdownHook(ref: AnyRef): Unit = this.hookManager.remove(ref) +} \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/ThreadUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/ThreadUtils.scala new file mode 100644 index 0000000..ebb41f8 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/ThreadUtils.scala @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import java.util.concurrent._ + +import com.zto.fire.predef._ +import com.zto.fire.common.conf.FirePS1Conf +import com.zto.fire.common.enu.ThreadPoolType +import org.apache.commons.lang3.StringUtils +import org.slf4j.LoggerFactory + + +/** + * 线程相关工具类 + * + * @author ChengLong 2019-4-25 15:17:55 + */ +object ThreadUtils { + // 用于维护使用ThreadUtils创建的线程池对象,并进行统一的关闭 + private val threadPoolMap = new ConcurrentHashMap[String, ExecutorService]() + private val logger = LoggerFactory.getLogger(this.getClass) + private[this] lazy val paramErrorMsg = "线程池不能为空" + + /** + * 以子线程方式执行函数调用 + * + * @param threadPool + * 线程池 + * @param fun + * 用于指定以多线程方式执行的函数 + * @param threadCount + * 表示开启多少个线程执行该fun任务 + */ + def runAsThread(threadPool: ExecutorService, fun: => Unit, threadCount: Int = 1): Unit = { + require(threadPool != null, paramErrorMsg) + + (1 to threadCount).foreach(_ => { + threadPool.execute(new Runnable { + override def run(): Unit = { + fun + logger.debug(s"Invoke runAsThread as ${Thread.currentThread().getName}.") + } + }) + }) + } + + /** + * 以子线程while方式循环执行函数调用 + * + * @param fun + * 用于指定以多线程方式执行的函数 + * @param delay + * 循环调用间隔时间(单位s) + */ + def runAsThreadLoop(threadPool: ExecutorService, fun: => Unit, delay: Long = 10, threadCount: Int = 1): Unit = { + require(threadPool != null, paramErrorMsg) + + (1 to threadCount).foreach(_ => { + threadPool.execute(new Runnable { + override def run(): Unit = { + while (true) { + fun + logger.debug(s"Loop invoke runAsThreadLoop as ${Thread.currentThread().getName}. Delay is ${delay}s.") + Thread.sleep(delay * 1000) + } + } + }) + }) + } + + /** + * 定时调度给定的函数 + * + * @param threadPoolSchedule + * 定时调度线程池 + * @param fun + * 定时执行的任务函数引用 + * @param initialDelay + * 第一次延迟执行的时长 + * @param period + * 每隔指定的时长执行一次 + * @param rate + * true:表示周期性的执行,不受上一个定时任务的约束 + * false:表示当上一次周期性任务执行成功后,period后开始执行 + * @param timeUnit + * 时间单位,默认分钟 + * @param threadCount + * 表示开启多少个线程执行该fun任务 + */ + def runAsSchedule(threadPoolSchedule: ScheduledExecutorService, fun: => Unit, initialDelay: Long, period: Long, rate: Boolean = true, timeUnit: TimeUnit = TimeUnit.MINUTES, threadCount: Int = 1): Unit = { + require(threadPoolSchedule != null, paramErrorMsg) + + (1 to threadCount).foreach(_ => { + if (rate) { + // 表示周期性的执行,不受上一个定时任务的约束 + threadPoolSchedule.scheduleAtFixedRate(new Runnable { + override def run(): Unit = { + wrapFun() + } + }, initialDelay, period, timeUnit) + } else { + // 表示当上一次周期性任务执行成功后,period后开始执行 + threadPoolSchedule.scheduleWithFixedDelay(new Runnable { + override def run(): Unit = { + wrapFun() + } + }, initialDelay, period, timeUnit) + } + + // 处理传入的函数 + def wrapFun(): Unit = { + fun + logger.debug(s"Loop invoke runAsSchedule as ${Thread.currentThread().getName}. Delay is ${period}${timeUnit.name()}.") + } + }) + } + + /** + * 表示当上一次周期性任务执行成功后 + * period后开始执行给定的函数fun + * + * @param threadPoolSchedule + * 定时调度线程池 + * @param fun + * 定时执行的任务函数引用 + * @param initialDelay + * 第一次延迟执行的时长 + * @param period + * 每隔指定的时长执行一次 + * @param timeUnit + * 时间单位,默认分钟 + * @param threadCount + * 表示开启多少个线程执行该fun任务 + */ + def runAsScheduleAtFixedRate(threadPoolSchedule: ScheduledExecutorService, fun: => Unit, initialDelay: Long, period: Long, rate: Boolean = true, timeUnit: TimeUnit = TimeUnit.MINUTES, threadCount: Int = 1): Unit = { + this.runAsSchedule(threadPoolSchedule, fun, initialDelay, period, true, timeUnit, threadCount) + } + + /** + * 表示当上一次周期性任务执行成功后,period后开始执行fun函数 + * 注:受上一个定时任务的影响 + * + * @param threadPoolSchedule + * 定时调度线程池 + * @param fun + * 定时执行的任务函数引用 + * @param initialDelay + * 第一次延迟执行的时长 + * @param period + * 每隔指定的时长执行一次 + * @param timeUnit + * 时间单位,默认分钟 + * @param threadCount + * 表示开启多少个线程执行该fun任务 + */ + def runAsScheduleWithFixedDelay(threadPoolSchedule: ScheduledExecutorService, fun: => Unit, initialDelay: Long, period: Long, rate: Boolean = true, timeUnit: TimeUnit = TimeUnit.MINUTES, threadCount: Int = 1): Unit = { + this.runAsSchedule(threadPoolSchedule, fun, initialDelay, period, false, timeUnit, threadCount) + } + + /** + * 创建一个新的指定大小的调度线程池 + * 如果名称已存在,则直接返回对应的线程池 + * + * @param poolName + * 线程池标识 + * @param poolType + * 线程池类型 + * @param poolSize + * 线程池大小 + */ + def createThreadPool(poolName: String, poolType: ThreadPoolType = ThreadPoolType.FIXED, poolSize: Int = 1): ExecutorService = { + require(StringUtils.isNotBlank(poolName), "线程池名称不能为空") + if (this.threadPoolMap.containsKey(poolName)) { + this.threadPoolMap.get(poolName) + } else { + val threadPool = poolType match { + case ThreadPoolType.FIXED => Executors.newFixedThreadPool(poolSize) + case ThreadPoolType.SCHEDULED => Executors.newScheduledThreadPool(poolSize) + case ThreadPoolType.SINGLE => Executors.newSingleThreadExecutor() + case ThreadPoolType.CACHED => Executors.newCachedThreadPool() + case ThreadPoolType.WORK_STEALING => Executors.newWorkStealingPool() + case _ => Executors.newFixedThreadPool(poolSize) + } + this.threadPoolMap.put(poolName, threadPool) + threadPool + } + } + + /** + * 用于释放指定的线程池 + * + * @param poolName + * 线程池标识 + */ + def shutdown(poolName: String): Unit = { + if (StringUtils.isNotBlank(poolName) && this.threadPoolMap.containsKey(poolName)) { + val threadPool = this.threadPoolMap.get(poolName) + if (threadPool != null && !threadPool.isShutdown) { + threadPool.shutdownNow() + this.logger.debug(s"关闭线程池:${poolName}") + } + } + } + + /** + * 用于释放指定的线程池 + */ + def shutdown(pool: ExecutorService): Unit = { + if (pool != null && !pool.isShutdown) { + pool.shutdown() + this.logger.debug(s"关闭线程池:${pool}") + } + } + + /** + * 用于释放所有线程池 + */ + private[fire] def shutdown: Unit = { + val poolNum = this.threadPoolMap.size() + if (this.threadPoolMap.size() > 0) { + this.threadPoolMap.foreach(pool => { + if (pool != null && pool._2 != null && !pool._2.isShutdown) { + pool._2.shutdownNow() + logger.info(s"${FirePS1Conf.GREEN}---> 完成线程池[ ${pool._1} ]的资源回收. <---${FirePS1Conf.DEFAULT}") + } + }) + } + logger.info(s"${FirePS1Conf.PINK}---> 完成所有线程池回收,总计:${poolNum}个. <---${FirePS1Conf.DEFAULT}") + } +} diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/Tools.scala b/fire-common/src/main/scala/com/zto/fire/common/util/Tools.scala new file mode 100644 index 0000000..06c4ab8 --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/Tools.scala @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import com.zto.fire.common.ext.{JavaExt, ScalaExt} + +import scala.collection.convert.{WrapAsJava, WrapAsScala} +import scala.util.control.Breaks + +/** + * 各种工具API的集合类 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-12-16 16:23 + */ +trait Tools extends Breaks with JavaTypeMap with ValueCheck with FireFunctions with JavaExt with ScalaExt with ScalaUtils with WrapAsScala with WrapAsJava { + +} \ No newline at end of file diff --git a/fire-common/src/main/scala/com/zto/fire/common/util/ValueUtils.scala b/fire-common/src/main/scala/com/zto/fire/common/util/ValueUtils.scala new file mode 100644 index 0000000..b9a7f2f --- /dev/null +++ b/fire-common/src/main/scala/com/zto/fire/common/util/ValueUtils.scala @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import java.util + +import com.zto.fire.predef._ +import org.apache.commons.lang3.StringUtils + +/** + * 值校验工具,支持任意对象、字符串、集合、map、rdd、dataset是否为空的校验 + * + * @since 0.4.1 + * @author ChengLong 2019-9-4 13:39:16 + */ +private[fire] trait ValueCheck { + + /** + * 值为空判断,支持任意类型 + * + * @param params + * 参数值 + * @return + * true:empty false:not empty + */ + def isEmpty(params: Any *): Boolean = { + if (params == null || params.isEmpty) return true + params.map { + case null => true + case str: String => StringUtils.isBlank(str) + case array: Array[_] => array.isEmpty + case collection: util.Collection[_] => collection.isEmpty + case it: Iterable[_] => it.isEmpty + case map: JMap[_, _] => map.isEmpty + case _ => false + }.count(_ == true) > 0 + } + + /** + * 值为非空判断,支持任意类型 + * + * @param param + * 参数值 + * @return + * true:not empty false:empty + */ + def noEmpty(param: Any *): Boolean = !this.isEmpty(param: _*) + + /** + * 参数非空约束(严格模式,进一步验证集合是否有元素) + * + * @param params 参数列表信息 + * @param message 异常信息 + */ + def requireNonEmpty(params: Any*)(implicit message: String = "参数不能为空,请检查."): Unit = { + require(params != null && params.nonEmpty, message) + + var index = 0 + params.foreach(param => { + index += 1 + param match { + case null => require(param != null, msg(index, message)) + case str: String => require(StringUtils.isNotBlank(str), msg(index, message)) + case array: Array[_] => require(array.nonEmpty, msg(index, message)) + case collection: util.Collection[_] => require(!collection.isEmpty, msg(index, message)) + case it: Iterable[_] => require(it.nonEmpty, msg(index, message)) + case map: JMap[_, _] => require(map.nonEmpty, msg(index, message)) + case _ => + } + }) + + /** + * 构建异常信息 + */ + def msg(index: Int, msg: String): String = s"第[ ${index} ]参数为空,异常信息:$message" + } +} + +/** + * 用于单独调用的值校验工具类 + */ +object ValueUtils extends ValueCheck diff --git a/fire-common/src/test/scala/com/zto/fire/common/util/DatasourceManagerTest.scala b/fire-common/src/test/scala/com/zto/fire/common/util/DatasourceManagerTest.scala new file mode 100644 index 0000000..6468448 --- /dev/null +++ b/fire-common/src/test/scala/com/zto/fire/common/util/DatasourceManagerTest.scala @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import org.junit.Test + +/** + * DatasourceManager单元测试 + * + * @author ChengLong + * @since 1.1.2 + * @create 2020-11-26 16:32 + */ +class DatasourceManagerTest { + + @Test + def testAddGet: Unit = { + /*val t1 = DataSourceDesc(DataSource.TIDB, "test", "sjznb", "rtdb.base") + val t2 = DataSourceDesc(DataSource.TIDB, "test", "sjznb", "rtdb.base") + val t3 = DataSourceDesc(DataSource.ORACLE, "test", "sjznb", "rtdb.oracle") + val t4 = DataSourceDesc(DataSource.KAFKA, "test", "sjznb", "rtdb.kafka") + val t5 = DataSourceDesc(DataSource.KAFKA, "test", "sjznb", "rtdb.kafka") + DataSourceManager.add(t1) + DataSourceManager.add(t2) + DataSourceManager.add(t3) + DataSourceManager.add(t4) + DataSourceManager.add(t5) + DataSourceManager.get.foreach(t => { + t._2.foreach(t => println(t)) + }) + assertEquals(DataSourceManager.get.size(), 3)*/ + } +} diff --git a/fire-common/src/test/scala/com/zto/fire/common/util/ExceptionBusTest.scala b/fire-common/src/test/scala/com/zto/fire/common/util/ExceptionBusTest.scala new file mode 100644 index 0000000..3c40004 --- /dev/null +++ b/fire-common/src/test/scala/com/zto/fire/common/util/ExceptionBusTest.scala @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import com.zto.fire.common.util.ExceptionBus.stackTrace +import com.zto.fire.predef._ +import org.junit.Assert._ +import org.junit.Test + +/** + * 用于ExceptionBus的单元测试 + * + * @author ChengLong + * @since 1.1.2 + * @create 2020-11-16 14:42 + */ +class ExceptionBusTest { + + /** + * 用于测试queue大小限制与exception的存入和获取 + */ + @Test + def testTry: Unit = { + (1 to 10020).foreach(i => { + tryWithLog { + val a = 1 / 0 + } (isThrow = false) + }) + + val t = ExceptionBus.getAndClear + assertEquals(t._1.size, 1000) + t._1.foreach(t => stackTrace(t._2)) + + // 上一次获取后queue中的记录数为0 + assertEquals(ExceptionBus.queueSize.get(), 0) + assertEquals(ExceptionBus.exceptionCount.get(), 10020) + } + +} diff --git a/fire-common/src/test/scala/com/zto/fire/common/util/SQLUtilsTest.scala b/fire-common/src/test/scala/com/zto/fire/common/util/SQLUtilsTest.scala new file mode 100644 index 0000000..dacf664 --- /dev/null +++ b/fire-common/src/test/scala/com/zto/fire/common/util/SQLUtilsTest.scala @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import org.junit.Test +import com.zto.fire.common.util.SQLUtils._ + +/** + * SQLUtils单元测试 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-11-26 15:11 + */ +class SQLUtilsTest { + + @Test + def testParse: Unit = { + val selectSql = + """ + | select * FROM + | student1 s join dev.teacher2 b + |""".stripMargin + tableParse(selectSql).foreach(tableName => println("匹配:" + tableName)) + + val insertSQL = + """ + |insert into dev.student3(id,name) values(1, 'root'); + |insert into teacher4(id,name) values(1, 'root'); + |""".stripMargin + tableParse(insertSQL).foreach(tableName => println("匹配:" + tableName)) + + val deleteSQL = + """ + |delete from teacher5 where id=10; + |delete from dev.teacher6 where id=10; + |""".stripMargin + tableParse(deleteSQL).foreach(tableName => println("匹配:" + tableName)) + + val createSQL = + """ + |create table hello7(idxxx); + |create table if not EXISTS hello8; + |CREATE TABLE student9 LIKE tmp.student10 + |""".stripMargin + tableParse(createSQL).foreach(tableName => println("匹配:" + tableName)) + + val alterSQL = + """ + |LOAD DATA LOCAL INPATH '/home/hadoop/data/student1.txt' INTO TABLE student11 + |""".stripMargin + tableParse(alterSQL).foreach(tableName => println("匹配:" + tableName)) + + val testSQL = + """ + |create table table_student12 + |insert into dev.student13_from + |delete from `from_student14_from` + |select * from (select * from student15) + |select * from (select * from + |student16) + |""".stripMargin + tableParse(testSQL).foreach(tableName => println("匹配:" + tableName)) + + val start = System.currentTimeMillis() + (1 to 1000).foreach(i => tableParse(selectSql)) + println("耗时:" + (System.currentTimeMillis() - start)) + } +} diff --git a/fire-common/src/test/scala/com/zto/fire/common/util/ShutdownHookManagerTest.scala b/fire-common/src/test/scala/com/zto/fire/common/util/ShutdownHookManagerTest.scala new file mode 100644 index 0000000..0fbe8c8 --- /dev/null +++ b/fire-common/src/test/scala/com/zto/fire/common/util/ShutdownHookManagerTest.scala @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import org.apache.log4j.{Level, Logger} +import org.junit.Test + +/** + * shutdown hook管理器单元测试 + * + * @author ChengLong + * @since 1.1.2 + * @create 2020-11-20 14:45 + */ +class ShutdownHookManagerTest { + Logger.getLogger(classOf[ShutdownHookManagerTest]).setLevel(Level.toLevel("INFO")) + + @Test + def testRegister: Unit = { + ShutdownHookManager.addShutdownHook(1) { + () => println("1. 执行逻辑") + } + ShutdownHookManager.addShutdownHook(3) { + () => println("3. 执行逻辑") + } + ShutdownHookManager.addShutdownHook(2) { + () => println("2. 执行逻辑") + } + ShutdownHookManager.addShutdownHook(5) { + () => println("5. 执行逻辑") + } + println("=========main method==========") + } +} diff --git a/fire-common/src/test/scala/com/zto/fire/common/util/ValueUtilsTest.scala b/fire-common/src/test/scala/com/zto/fire/common/util/ValueUtilsTest.scala new file mode 100644 index 0000000..2ca6667 --- /dev/null +++ b/fire-common/src/test/scala/com/zto/fire/common/util/ValueUtilsTest.scala @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.util + +import com.zto.fire.predef._ +import org.junit.Test + +/** + * ValueUtils工具类单元测试 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-12-16 13:21 + */ +class ValueUtilsTest { + + /** + * 测试isEmpty、isNotEmpty等API + */ + @Test + def testIsEmpty(): Unit = { + val str = "" + assert(isEmpty(str), "字符串不能为空") + val map = new JHashMap[String, Integer]() + assert(isEmpty(str, map), "存在为空的值") + map.put("1", 1) + assert(noEmpty("123", map), "都不为空") + assert(!noEmpty("123", map, ""), "存在为空的") + } + + /** + * 测试参数检测API + */ + @Test + def testRequireNonEmpty(): Unit = { + val arr = new Array[Int](1) + val map = Map("str" -> 1) + val mutableMap = scala.collection.mutable.Map("str" -> 1) + val jmap = new JHashMap[String, Integer]() + jmap.put("str", 1) + val jset = new JHashSet[Int]() + jset.add(1) + requireNonEmpty(arr, map, mutableMap, jmap, jset)("参数不合法") + } +} diff --git a/fire-connectors/fire-hbase/pom.xml b/fire-connectors/fire-hbase/pom.xml new file mode 100644 index 0000000..f12f205 --- /dev/null +++ b/fire-connectors/fire-hbase/pom.xml @@ -0,0 +1,104 @@ + + + + + 4.0.0 + fire-hbase_${scala.binary.version} + jar + fire-hbase + + + com.zto.fire + fire-connectors_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + + + + org.apache.hadoop + hadoop-common + ${hadoop.version} + ${maven.scope} + + + org.apache.hadoop + hadoop-hdfs + ${hadoop.version} + ${maven.scope} + + + org.apache.hadoop + hadoop-client + ${hadoop.version} + ${maven.scope} + + + + + org.apache.hbase + hbase-common + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-server + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-client_${scala.binary.version} + ${hbase.version} + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-connectors/fire-hbase/src/main/java/com/zto/fire/hbase/anno/HConfig.java b/fire-connectors/fire-hbase/src/main/java/com/zto/fire/hbase/anno/HConfig.java new file mode 100644 index 0000000..37d5923 --- /dev/null +++ b/fire-connectors/fire-hbase/src/main/java/com/zto/fire/hbase/anno/HConfig.java @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.hbase.anno; + +import java.lang.annotation.ElementType; +import java.lang.annotation.Retention; +import java.lang.annotation.RetentionPolicy; +import java.lang.annotation.Target; + +/** + * HBase相关的配置 + * @author ChengLong 2020-11-16 16:03:08 + */ +@Target(ElementType.TYPE) +@Retention(RetentionPolicy.RUNTIME) +public @interface HConfig { + + /** + * 是否允许空字段插入HBase + */ + boolean nullable() default true; + + /** + * 是否以多版本方式插入 + * 注:fire中将数据转为json后以多版本方式插入,因此多列数据最终存放到HBase中只是一列json数据 + */ + boolean multiVersion() default false; +} diff --git a/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/HBaseConnector.scala b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/HBaseConnector.scala new file mode 100644 index 0000000..7370b21 --- /dev/null +++ b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/HBaseConnector.scala @@ -0,0 +1,1044 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.hbase + +import com.google.common.collect.Maps +import com.zto.fire.common.anno.{FieldName, Internal} +import com.zto.fire.common.enu.ThreadPoolType +import com.zto.fire.common.util.{DatasourceManager, _} +import com.zto.fire.core.connector.{ConnectorFactory, FireConnector} +import com.zto.fire.hbase.anno.HConfig +import com.zto.fire.hbase.bean.{HBaseBaseBean, MultiVersionsBean} +import com.zto.fire.hbase.conf.FireHBaseConf +import com.zto.fire.hbase.conf.FireHBaseConf.{familyName, _} +import com.zto.fire.predef._ +import org.apache.commons.lang3.StringUtils +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.hbase._ +import org.apache.hadoop.hbase.client.{Durability, _} +import org.apache.hadoop.hbase.io.ImmutableBytesWritable +import org.apache.hadoop.hbase.io.compress.Compression +import org.apache.hadoop.hbase.util.Bytes + +import java.lang.reflect.Field +import java.lang.{Boolean => JBoolean, Double => JDouble, Float => JFloat, Integer => JInt, Long => JLong, Short => JShort, String => JString} +import java.math.{BigDecimal => JBigDecimal} +import java.nio.charset.StandardCharsets +import java.util.concurrent.{ScheduledExecutorService, TimeUnit, ConcurrentHashMap => JConcurrentHashMap} +import java.util.{Map => JMap} +import scala.collection.Iterator +import scala.collection.mutable.ListBuffer +import scala.reflect.{ClassTag, classTag} + +/** + * HBase操作工具类,除了涵盖CRUD等常用操作外,还提供以下功能: + * 1. static void insert(String tableName, String family, List list) + * 将自定义的javabean集合批量插入到表中 + * 2. scan[T <: HBaseBaseBean[T]](tableName: String, scan: Scan, clazz: Class[T], keyNum: Int = 1): ListBuffer[T] + * 指定查询条件,将查询结果以List[T]形式返回 + * 注:自定义bean中的field需与hbase中的qualifier对应 + *

+ * + * @param conf + * 代码级别的配置信息,允许为空,配置文件会覆盖相同配置项,也就是说配置文件拥有着跟高的优先级 + * @param keyNum + * 用于区分连接不同的数据源,不同配置源对应不同的Connector实例 + * @since 2.0.0 + * @author ChengLong 2020-11-11 + */ +private[fire] class HBaseConnector(val conf: Configuration = null, val keyNum: Int = 1) extends FireConnector(keyNum = keyNum) { + // --------------------------------------- 反射缓存 --------------------------------------- // + private[this] var configuration: Configuration = _ + private[this] lazy val cacheFieldMap = new JConcurrentHashMap[Class[_], JMap[String, Field]]() + private[this] lazy val cacheHConfigMap = new JConcurrentHashMap[Class[_], HConfig]() + private[this] lazy val cacheTableExistsMap = new JConcurrentHashMap[String, Boolean]() + private[this] lazy val connection: Connection = this.initConnection + private[this] lazy val durability = this.initDurability + private[this] lazy val threadPool = ThreadUtils.createThreadPool("HBaseConnectorPool", ThreadPoolType.SCHEDULED) + // ------------------------------------ 表存在判断缓存 ------------------------------------ // + private[this] lazy val tableExistsCacheEnable = tableExistsCache(this.keyNum) + private[this] lazy val closeAdminError = "close admin执行失败" + this.registerReload + + /** + * 批量插入多行多列,自动将HBaseBaseBean子类转为Put集合 + * + * @param tableName 表名 + * @param beans HBaseBaseBean子类集合 + */ + def insert[T <: HBaseBaseBean[T] : ClassTag](tableName: String, beans: T*): Unit = { + requireNonEmpty(tableName, beans)("参数不合法,批量HBase insert失败") + var table: Table = null + tryWithFinally { + table = this.getTable(tableName) + val beanList = if (this.getMultiVersion[T]) beans.filter(_ != null).map((bean: T) => new MultiVersionsBean(bean)) else beans + val putList = beanList.map(bean => convert2Put(bean.asInstanceOf[T], this.getNullable[T])) + this.insert(tableName, putList: _*) + } { + this.closeTable(table) + }(this.logger, catchLog = s"HBase insert ${hbaseCluster(keyNum)}.${tableName}执行失败, 总计${beans.size}条", finallyCatchLog = "close HBase table失败") + } + + /** + * 批量插入多行多列 + * + * @param tableName 表名 + * @param puts Put集合 + */ + def insert(tableName: String, puts: Put*): Unit = { + requireNonEmpty(tableName, puts)("参数不合法,批量HBase insert失败") + + var table: Table = null + tryWithFinally { + table = this.getTable(tableName) + table.put(puts) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName) + this.logger.info(s"HBase insert ${hbaseCluster(keyNum)}.${tableName}执行成功, 总计${puts.size}条") + } { + this.closeTable(table) + }(this.logger, "HBase insert", + s"HBase insert ${hbaseCluster(keyNum)}.${tableName}执行失败, 总计${puts.size}条", + "close HBase table失败") + } + + /** + * 从HBase批量Get数据,并将结果封装到JavaBean中 + * + * @param tableName 表名 + * @param rowKeys 指定的多个rowKey + * @param clazz 目标类类型,必须是HBaseBaseBean的子类 + * @return 目标对象实例 + */ + def get[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], rowKeys: String*): ListBuffer[T] = { + val getList = for (rowKey <- rowKeys) yield HBaseConnector.buildGet(rowKey) + this.get[T](tableName, clazz, getList: _*) + } + + /** + * 从HBase批量Get数据,并将结果封装到JavaBean中 + * + * @param tableName 表名 + * @param clazz 目标类类型,必须是HBaseBaseBean的子类 + * @param gets 指定的多个get对象 + * @return 目标对象实例 + */ + def get[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], gets: Get*)(implicit canOverload: Boolean = true): ListBuffer[T] = { + requireNonEmpty(tableName, clazz, gets)("参数不合法,无法进行HBase Get操作") + tryWithReturn { + val resultList = this.getResult(tableName, gets: _*) + if (this.getMultiVersion[T]) this.hbaseMultiRow2Bean[T](resultList, clazz) else this.hbaseRow2Bean(resultList, clazz) + }(this.logger, catchLog = s"批量 get ${hbaseCluster(keyNum)}.${tableName}执行失败") + } + + /** + * 通过HBase Seq[Get]获取多条数据 + * + * @param tableName 表名 + * @param getList HBase的get对象实例 + * @return + * HBase Result + */ + def getResult(tableName: String, getList: Get*): ListBuffer[Result] = { + requireNonEmpty(tableName, getList)("参数不合法,执行HBase 批量get失败") + + var table: Table = null + val list = ListBuffer[Result]() + tryWithFinally { + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName, sink = false) + table = this.getTable(tableName) + list ++= table.get(getList) + this.logger.info(s"HBase 批量get ${hbaseCluster(keyNum)}.${tableName}执行成功, 总计${list.size}条") + list + } { + this.closeTable(table) + }(this.logger, "HBase get", + s"get ${hbaseCluster(keyNum)}.${tableName}执行失败", "close HBase table对象失败.") + } + + /** + * 通过HBase Get对象获取一条数据 + * + * @param tableName 表名 + * @return + * HBase Result + */ + def getResult[T: ClassTag](tableName: String, rowKeyList: String*): ListBuffer[Result] = { + requireNonEmpty(tableName, rowKeyList)("参数不合法,rowKey集合不能为空.") + val getList = for (rowKey <- rowKeyList) yield HBaseConnector.buildGet(rowKey) + val starTime = currentTime + val resultList = this.getResult(tableName, getList: _*) + logger.info(s"HBase 批量get ${hbaseCluster(keyNum)}.${tableName}执行成功, 总计${resultList.size}条, 耗时:${timecost(starTime)}") + resultList + } + + /** + * 表扫描,将scan后得到的ResultScanner对象直接返回 + * 注:调用者需手动关闭ResultScanner对象实例 + * + * @param tableName 表名 + * @param scan HBase scan对象 + * @return 指定类型的List + */ + def scanResultScanner(tableName: String, scan: Scan): ResultScanner = { + requireNonEmpty(tableName, scan)(s"参数不合法,scan ${hbaseCluster(keyNum)}.${tableName}失败.") + + var table: Table = null + var rsScanner: ResultScanner = null + try { + table = this.getTable(tableName) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName, sink = false) + rsScanner = table.getScanner(scan) + } catch { + case e: Exception => { + // 当执行scan失败时,向上抛异常之前,避免ResultScanner对象因异常无法得到有效的关闭 + // 因此在发生异常时会尝试关闭ResultScanner对象 + logger.error(s"执行scan ${hbaseCluster(keyNum)}.${tableName}失败", e) + try { + this.closeResultScanner(rsScanner) + } finally { + throw e + } + } + } finally { + this.closeTable(table) + } + + rsScanner + } + + /** + * 表扫描,将scan后得到的ResultScanner对象直接返回 + * 注:调用者需手动关闭ResultScanner对象实例 + * + * @param tableName 表名 + * @param startRow 开始行 + * @param endRow 结束行 + * @return 指定类型的List + */ + def scanResultScanner(tableName: String, startRow: String, endRow: String): ResultScanner = { + requireNonEmpty(tableName, startRow, endRow) + val scan = HBaseConnector.buildScan(startRow, endRow) + this.scanResultScanner(tableName, scan) + } + + /** + * 表扫描,将查询后的数据转为JavaBean并放到List中 + * + * @param tableName 表名 + * @param startRow 开始行 + * @param endRow 结束行 + * @param clazz 类型 + * @return 指定类型的List + */ + def scan[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, endRow: String): ListBuffer[T] = { + requireNonEmpty(tableName, clazz, startRow, endRow) + val scan = HBaseConnector.buildScan(startRow, endRow) + this.scan[T](tableName, clazz, scan) + } + + /** + * 表扫描,将查询后的数据转为JavaBean并放到List中 + * + * @param tableName 表名 + * @param scan HBase scan对象 + * @param clazz 类型 + * @return 指定类型的List + */ + def scan[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan): ListBuffer[T] = { + requireNonEmpty(tableName, clazz, scan)(s"参数不合法,scan ${hbaseCluster(keyNum)}.${tableName}失败.") + + val list = ListBuffer[T]() + var rsScanner: ResultScanner = null + tryWithFinally { + rsScanner = this.scanResultScanner(tableName, scan) + if (rsScanner != null) { + rsScanner.foreach(rs => { + if (this.getMultiVersion[T]) { + val objList = this.hbaseMultiRow2Bean[T](rs, clazz) + if (objList != null && objList.nonEmpty) list ++= objList + } else { + val obj = hbaseRow2Bean(rs, clazz) + if (obj != null) list += obj + } + }) + } + this.logger.info(s"HBase scan ${hbaseCluster(keyNum)}.${tableName}执行成功, 总计${list.size}条") + list + } { + this.closeResultScanner(rsScanner) + }(this.logger, "HBase scan", + s"scan ${hbaseCluster(keyNum)}.${tableName}执行失败", + "关闭HBase table对象或ResultScanner失败") + } + + /** + * 获取Configuration实例 + * + * @return HBase Configuration对象 + */ + def getConfiguration: Configuration = this.configuration + + /** + * 用于初始化全局唯一的HBase connection + */ + @Internal + def initConnection: Connection = { + tryWithReturn { + ConnectionFactory.createConnection(this.getConfiguration) + }(logger, s"成功创建HBase ${hbaseCluster(keyNum)}集群connection.", s"获取HBase ${hbaseCluster(keyNum)}集群connection失败.") + } + + /** + * 根据keyNum获取指定HBase集群的connection + */ + def getConnection: Connection = this.connection + + /** + * 将class中的field转为map映射 + * + * @param clazz Class类型 + * @return 名称与字段的映射map + */ + @Internal + private[this] def getFieldNameMap[T <: HBaseBaseBean[T]](clazz: Class[T]): JMap[String, Field] = { + if (!this.cacheFieldMap.containsKey(clazz)) { + val allFields = ReflectionUtils.getAllFields(clazz) + if (allFields != null) { + val fieldMap = Maps.newHashMapWithExpectedSize[String, Field](allFields.size()) + + if (allFields != null) { + allFields.values.filter(_ != null).foreach(field => { + val fieldName = field.getAnnotation(classOf[FieldName]) + var family = "" + var qualifier = "" + if (fieldName != null) { + family = fieldName.family + qualifier = fieldName.value + } + + if (StringUtils.isBlank(family)) family = familyName(keyNum) + if (StringUtils.isBlank(qualifier)) qualifier = field.getName + fieldMap.put(family + ":" + qualifier, field) + }) + } + cacheFieldMap.put(clazz, fieldMap) + } + } + + this.cacheFieldMap.get(clazz) + } + + /** + * 为指定对象的field赋值 + * + * @param obj 目标对象 + * @param field 指定filed + * @param value byte类型的数据 + */ + @Internal + private def setFieldBytesValue[T <: HBaseBaseBean[T]](obj: T, field: Field, value: Array[Byte]): Unit = { + tryWithLog { + if (field != null && value != null && value.nonEmpty) { + ReflectionUtils.setAccessible(field) + val toValue = field.getType match { + case fieldType if fieldType eq classOf[JString] => Bytes.toString(value) + case fieldType if fieldType eq classOf[JInt] => Bytes.toInt(value) + case fieldType if fieldType eq classOf[JDouble] => Bytes.toDouble(value) + case fieldType if fieldType eq classOf[JLong] => Bytes.toLong(value) + case fieldType if fieldType eq classOf[JBigDecimal] => Bytes.toBigDecimal(value) + case fieldType if fieldType eq classOf[JFloat] => Bytes.toFloat(value) + case fieldType if fieldType eq classOf[JBoolean] => Bytes.toBoolean(value) + case fieldType if fieldType eq classOf[JShort] => Bytes.toShort(value) + } + field.set(obj, toValue) + } else if (field != null) field.set(obj, null) + }(this.logger, catchLog = s"为filed ${field}设置赋值过程中出现异常") + } + + /** + * 将含有多版本的cell映射为field + * + * @param rs hbase 结果集 + * @param clazz 目标类型 + * @param fieldMap 字段映射信息 + */ + @Internal + private[this] def multiCell2Field[T <: HBaseBaseBean[T] : ClassTag](rs: Result, clazz: Class[T], fieldMap: JMap[String, Field]): ListBuffer[T] = { + val objList = ListBuffer[T]() + tryWithLog { + if (rs != null) { + rs.rawCells.filter(_ != null).foreach(cell => { + val obj = new MultiVersionsBean + val rowKey = new String(CellUtil.cloneRow(cell), StandardCharsets.UTF_8) + val family = new String(CellUtil.cloneFamily(cell), StandardCharsets.UTF_8) + val qualifier = new String(CellUtil.cloneQualifier(cell), StandardCharsets.UTF_8) + val value = CellUtil.cloneValue(cell) + val field = fieldMap.get(family + ":" + qualifier) + this.setFieldBytesValue(obj, field, value) + val idField = ReflectionUtils.getFieldByName(clazz, "rowKey") + requireNonEmpty(idField)(s"${clazz}中必须有名为rowKey的成员变量") + idField.set(obj, rowKey) + if (StringUtils.isNotBlank(obj.getMultiFields)) objList.add(JSONUtils.parseObject[T](obj.getMultiFields)) + }) + } + }(this.logger, catchLog = s"将多版本json数据转为类型${clazz}过程中发生失败.") + objList + } + + /** + * 将cell中的值转为File的值 + * + * @param clazz 类类型 + * @param fieldMap 成员变量信息 + * @param rs hbase查询结果集 + * @return clazz对应的结果实例 + */ + @Internal + private[this] def cell2Field[T <: HBaseBaseBean[T]](clazz: Class[T], fieldMap: JMap[String, Field], rs: Result): T = { + val obj = clazz.newInstance + + tryWithLog { + val cells = rs.rawCells + val rowKey = convertCells2Fields(fieldMap, obj, cells) + val idField = ReflectionUtils.getFieldByName(clazz, "rowKey") + requireNonEmpty(idField)(s"${clazz}中必须有名为rowKey的成员变量") + ReflectionUtils.setAccessible(idField) + idField.set(obj, rowKey) + }(this.logger, catchLog = "将HBase cell中的值转换并赋值给field过程中报错.") + + obj + } + + /** + * 一次循环取出cell中的值赋值给各个field + * + * @param obj 对象实例 + * @param cells hbase结果集中的cells集合 + * @return rowkey + */ + @Internal + private[this] def convertCells2Fields[T <: HBaseBaseBean[T]](fieldMap: JMap[String, Field], obj: T, cells: Array[Cell]): String = { + requireNonEmpty(fieldMap, obj, cells) + + var rowKey = "" + if (cells != null) { + cells.filter(_ != null).foreach(cell => { + rowKey = new String(CellUtil.cloneRow(cell), StandardCharsets.UTF_8) + val family = new String(CellUtil.cloneFamily(cell), StandardCharsets.UTF_8) + val qualifier = new String(CellUtil.cloneQualifier(cell), StandardCharsets.UTF_8) + val value = CellUtil.cloneValue(cell) + val field = fieldMap.get(family + ":" + qualifier) + this.setFieldBytesValue(obj, field, value) + }) + } + rowKey + } + + /** + * 将结果映射到自定义bean中 + * + * @param rs HBase查询结果集 + * @param clazz 映射的目标Class类型 + * @return 目标类型实例 + */ + @Internal + private[fire] def hbaseRow2Bean[T <: HBaseBaseBean[T]](rs: Result, clazz: Class[T]): T = { + requireNonEmpty(rs, clazz)("参数不合法,HBase Row转为JavaBean失败.") + val fieldMap = this.getFieldNameMap(clazz) + requireNonEmpty(fieldMap)(s"${clazz}中未声明任何成员变量或成员变量未声明注解@FieldName") + this.cell2Field(clazz, fieldMap, rs) + } + + /** + * 将结果映射到自定义bean中 + * + * @param rsArr HBase查询结果集 + * @param clazz 映射的目标Class类型 + * @return 目标类型实例 + */ + @Internal + private[fire] def hbaseRow2Bean[T <: HBaseBaseBean[T]](rsArr: ListBuffer[Result], clazz: Class[T]): ListBuffer[T] = { + requireNonEmpty(rsArr, clazz)("参数不合法,HBase Row转为JavaBean失败.") + val fieldMap = this.getFieldNameMap(clazz) + requireNonEmpty(fieldMap)(s"${clazz}中未声明任何成员变量或成员变量未声明注解@FieldName") + val objList = ListBuffer[T]() + rsArr.filter(rs => rs != null && !rs.isEmpty).foreach(rs => objList += this.cell2Field(clazz, fieldMap, rs)) + objList + } + + /** + * 将结果映射到自定义bean中 + * + * @param rs HBase查询结果集 + * @param clazz 映射的目标Class类型 + * @return 目标类型实例 + */ + @Internal + private[fire] def hbaseMultiRow2Bean[T <: HBaseBaseBean[T] : ClassTag](rs: Result, clazz: Class[T]): ListBuffer[T] = { + requireNonEmpty(rs, clazz)("参数不合法,HBase MultiRow转为JavaBean失败.") + val fieldMap = this.getFieldNameMap(classOf[MultiVersionsBean]) + requireNonEmpty(fieldMap)(s"${clazz}中未声明任何成员变量或成员变量未声明注解@FieldName") + this.multiCell2Field[T](rs, clazz, fieldMap) + } + + /** + * 将结果映射到自定义bean中 + * + * @param rsArr HBase查询结果集 + * @param clazz 映射的目标Class类型 + * @return 目标类型实例 + */ + @Internal + private[fire] def hbaseMultiRow2Bean[T <: HBaseBaseBean[T] : ClassTag](rsArr: ListBuffer[Result], clazz: Class[T]): ListBuffer[T] = { + requireNonEmpty(rsArr, clazz)("参数不合法,HBase Row转为JavaBean失败.") + val fieldMap = getFieldNameMap(classOf[MultiVersionsBean]) + requireNonEmpty(fieldMap)(s"${clazz}中未声明任何成员变量或成员变量未声明注解@FieldName") + val objList = ListBuffer[T]() + rsArr.filter(rs => rs != null && !rs.isEmpty).foreach(rs => objList ++= this.multiCell2Field[T](rs, clazz, fieldMap)) + objList + } + + /** + * 将结果映射到自定义bean中 + * + * @param it HBase查询结果集 + * @param clazz 映射的目标Class类型 + * @return 目标类型实例 + */ + @Internal + private[fire] def hbaseRow2BeanList[T <: HBaseBaseBean[T]](it: Iterator[(ImmutableBytesWritable, Result)], clazz: Class[T]): Iterator[T] = { + requireNonEmpty(it, clazz) + val fieldMap = this.getFieldNameMap(clazz) + requireNonEmpty(fieldMap)(s"${clazz}中未声明任何成员变量或成员变量未声明注解@FieldName") + val beanList = ListBuffer[T]() + tryWithLog { + it.foreach(t => { + val obj = clazz.newInstance() + val cells = t._2.rawCells() + val rowKey = this.convertCells2Fields(fieldMap, obj, cells) + val idField = ReflectionUtils.getFieldByName(clazz, "rowKey") + requireNonEmpty(idField)(s"${clazz}中必须有名为rowKey的成员变量") + idField.set(obj, rowKey) + beanList += obj + }) + }(this.logger, catchLog = "执行hbaseRow2BeanList过程中出现异常") + beanList.iterator + } + + /** + * 将多版本结果映射到自定义bean中 + * + * @param it HBase查询结果集 + * @param clazz 映射的目标Class类型 + * @return 目标类型实例 + */ + @Internal + private[fire] def hbaseMultiVersionRow2BeanList[T <: HBaseBaseBean[T] : ClassTag](it: Iterator[(ImmutableBytesWritable, Result)], clazz: Class[T]): Iterator[T] = { + requireNonEmpty(it, clazz) + val beanList = ListBuffer[T]() + tryWithLog { + it.foreach(t => { + beanList ++= this.hbaseMultiRow2Bean[T](t._2, clazz) + }) + }(this.logger, catchLog = "将HBase多版本Row转为JavaBean过程中出现异常.") + + beanList.iterator + } + + /** + * 将Javabean转为put对象 + * + * @param obj 对象 + * @param insertEmpty true:插入null字段,false:不插入空字段 + * @return put对象实例 + */ + @Internal + private[fire] def convert2Put[T <: HBaseBaseBean[T]](obj: T, insertEmpty: Boolean): Put = { + requireNonEmpty(obj, insertEmpty)("参数不能为空,无法将对象转为HBase Put对象") + tryWithReturn { + var tmpObj = obj + val clazz = tmpObj.getClass + val rowKeyField = ReflectionUtils.getFieldByName(clazz, "rowKey") + var rowKeyObj = rowKeyField.get(tmpObj) + if (rowKeyObj == null) { + val method = ReflectionUtils.getMethodByName(clazz, "buildRowKey") + tmpObj = method.invoke(tmpObj).asInstanceOf[T] + rowKeyObj = rowKeyField.get(tmpObj) + requireNonEmpty(rowKeyObj)(s"rowKey不能为空,请检查${clazz}中是否实现buildRowKey()方法!") + } + + val allFields = ReflectionUtils.getAllFields(clazz) + requireNonEmpty(allFields)(s"在${clazz}中未找到任何成员变量,请检查!") + val rowKey = rowKeyObj.toString.getBytes(StandardCharsets.UTF_8) + val put = new Put(rowKey) + put.setDurability(this.durability) + allFields.values().foreach(field => { + val objValue = field.get(obj) + // 将objValue插入的两种情况:1. 允许插入为空的值;2. 不允许插入为空的值,并且objValue不为空 + if (insertEmpty || (!insertEmpty && objValue != null)) { + val fieldName = field.getAnnotation(classOf[FieldName]) + var name = "" + var familyName = "" + if (fieldName != null && !fieldName.disuse) { + familyName = fieldName.family + name = fieldName.value + } + + // 如果未声明@FieldName注解或者声明了@FieldName注解但同时在注解中的disuse指定为false,则进行字段的转换 + // 如果不满足以上两个条件,则任务当前字段不需要转为Put对象中的qualifier + if (fieldName == null || (fieldName != null && !fieldName.disuse())) { + if (StringUtils.isBlank(familyName)) familyName = FireHBaseConf.familyName(keyNum) + if (StringUtils.isBlank(name)) name = field.getName + val famliyByte = familyName.getBytes(StandardCharsets.UTF_8) + val qualifierByte = name.getBytes(StandardCharsets.UTF_8) + if (objValue != null) { + val objValueStr = objValue.toString + val toBytes = field.getType match { + case fieldType if fieldType eq classOf[JString] => Bytes.toBytes(objValueStr) + case fieldType if fieldType eq classOf[JInt] => Bytes.toBytes(JInt.parseInt(objValueStr)) + case fieldType if fieldType eq classOf[JDouble] => Bytes.toBytes(JDouble.parseDouble(objValueStr)) + case fieldType if fieldType eq classOf[JLong] => Bytes.toBytes(JLong.parseLong(objValueStr)) + case fieldType if fieldType eq classOf[JBigDecimal] => Bytes.toBytes(new JBigDecimal(objValueStr)) + case fieldType if fieldType eq classOf[JFloat] => Bytes.toBytes(JFloat.parseFloat(objValueStr)) + case fieldType if fieldType eq classOf[JBoolean] => Bytes.toBytes(JBoolean.parseBoolean(objValueStr)) + case fieldType if fieldType eq classOf[JShort] => Bytes.toBytes(JShort.parseShort(objValueStr)) + } + put.addColumn(famliyByte, qualifierByte, toBytes) + } else { + put.addColumn(famliyByte, qualifierByte, null) + } + } + } + }) + put + }(this.logger, catchLog = "将JavaBean转为HBase Put对象过程中出现异常.") + } + + /** + * 提供给fire-spark引擎的工具方法 + * + * @param obj 继承自HBaseBaseBean的子类实例 + * @return HBaseBaseBean的子类实例 + */ + @Internal + private[fire] def convert2PutTuple[T <: HBaseBaseBean[T]](obj: T, insertEmpty: Boolean = true): (ImmutableBytesWritable, Put) = { + (new ImmutableBytesWritable(), convert2Put(obj, insertEmpty)) + } + + /** + * 获取类注解HConfig中的nullable + */ + @Internal + private[fire] def getNullable[T <: HBaseBaseBean[T] : ClassTag]: Boolean = { + val hConfig = this.getHConfig[T] + if (hConfig == null) return true + hConfig.nullable() + } + + /** + * 获取类注解HConfig中的multiVersion + */ + @Internal + private[fire] def getMultiVersion[T <: HBaseBaseBean[T] : ClassTag]: Boolean = { + val hConfig = this.getHConfig[T] + if (hConfig == null) return false + hConfig.multiVersion() + } + + /** + * 获取类上声明的HConfig注解 + */ + @Internal + private[fire] def getHConfig[T <: HBaseBaseBean[T] : ClassTag]: HConfig = { + val clazz = classTag[T].runtimeClass + if (!this.cacheHConfigMap.containsKey(clazz)) { + val hConfig = clazz.getAnnotation(classOf[HConfig]) + if (hConfig != null) { + this.cacheHConfigMap.put(clazz, hConfig) + } + } + this.cacheHConfigMap.get(clazz) + } + + /** + * 根据keyNum获取对应配置的durability + */ + @Internal + private[this] def initDurability: Durability = { + val hbaseDurability = FireHBaseConf.hbaseDurability(keyNum) + + // 将匹配到的配置转为Durability对象 + hbaseDurability.toUpperCase match { + case "ASYNC_WAL" => Durability.ASYNC_WAL + case "FSYNC_WAL" => Durability.FSYNC_WAL + case "SKIP_WAL" => Durability.SKIP_WAL + case "SYNC_WAL" => Durability.SYNC_WAL + case _ => Durability.USE_DEFAULT + } + } + + /** + * 创建HBase表 + * + * @param tableName + * 表名 + * @param families + * 列族 + */ + private[fire] def createTable(tableName: String, families: String*): Unit = { + requireNonEmpty(tableName, families)("执行createTable失败") + var admin: Admin = null + tryWithFinally { + admin = this.getConnection.getAdmin + val tbName = TableName.valueOf(tableName) + if (!admin.tableExists(tbName)) { + val tableDesc = new HTableDescriptor(tbName) + // 在描述里添加列族 + for (columnFamily <- families) { + val desc = new HColumnDescriptor(columnFamily) + // 启用压缩 + desc.setCompressionType(Compression.Algorithm.SNAPPY) + tableDesc.addFamily(desc) + } + admin.createTable(tableDesc) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName) + // 如果开启表缓存,则更新缓存信息 + if (this.tableExistsCacheEnable && this.tableExists(tableName)) this.cacheTableExistsMap.update(tableName, true) + } + } { + this.closeAdmin(admin) + }(logger, s"HBase createTable ${hbaseCluster(keyNum)}.${tableName}执行成功", + s"创建HBase表${hbaseCluster(keyNum)}.${tableName}失败.", closeAdminError) + } + + /** + * 删除指定的HBase表 + * + * @param tableName 表名 + */ + private[fire] def dropTable(tableName: String): Unit = { + requireNonEmpty(tableName)("执行dropTable失败") + var admin: Admin = null + tryWithFinally { + admin = this.getConnection.getAdmin + val tbName = TableName.valueOf(tableName) + if (admin.tableExists(tbName)) { + admin.disableTable(tbName) + admin.deleteTable(tbName) + // 如果开启表缓存,则更新缓存信息 + if (this.tableExistsCacheEnable && !this.tableExists(tableName)) this.cacheTableExistsMap.update(tableName, false) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName) + } + } { + this.closeAdmin(admin) + }(this.logger, s"HBase createTable ${hbaseCluster(keyNum)}.${tableName}执行成功", + s"drop ${hbaseCluster(keyNum)}.${tableName}表操作失败", closeAdminError) + } + + /** + * 启用指定的HBase表 + * + * @param tableName 表名 + */ + private[fire] def enableTable(tableName: String): Unit = { + requireNonEmpty(tableName)("执行enableTable失败") + var admin: Admin = null + tryWithFinally { + admin = this.getConnection.getAdmin + val tbName = TableName.valueOf(tableName) + if (admin.tableExists(tbName) && !admin.isTableEnabled(tbName)) { + admin.enableTable(tbName) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName) + } + } { + this.closeAdmin(admin) + }(this.logger, s"HBase enableTable ${hbaseCluster(keyNum)}.${tableName}执行成功", + s"enable ${hbaseCluster(keyNum)}.${tableName}表失败", closeAdminError) + } + + /** + * disable指定的HBase表 + * + * @param tableName 表名 + */ + private[fire] def disableTable(tableName: String): Unit = { + requireNonEmpty(tableName)("执行disableTable失败") + var admin: Admin = null + tryWithFinally { + admin = this.getConnection.getAdmin + val tbName = TableName.valueOf(tableName) + if (admin.tableExists(tbName) && admin.isTableEnabled(tbName)) { + admin.disableTable(tbName) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName) + } + } { + this.closeAdmin(admin) + }(this.logger, s"HBase disableTable ${hbaseCluster(keyNum)}.${tableName}执行成功", + s"disable ${hbaseCluster(keyNum)}.${tableName}表失败", closeAdminError) + } + + /** + * 清空指定的HBase表 + * + * @param tableName HBase表名 + * @param preserveSplits 是否保留所有的split信息 + */ + private[fire] def truncateTable(tableName: String, preserveSplits: Boolean = true): Unit = { + requireNonEmpty(tableName, preserveSplits)("执行truncateTable失败") + var admin: Admin = null + tryWithFinally { + admin = this.getConnection.getAdmin + val tbName = TableName.valueOf(tableName) + if (admin.tableExists(tbName)) { + this.disableTable(tableName) + admin.truncateTable(tbName, preserveSplits) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName) + } + } { + this.closeAdmin(admin) + }(this.logger, s"HBase truncateTable ${hbaseCluster(keyNum)}.${tableName}执行成功", + s"truncate ${hbaseCluster(keyNum)}.${tableName}表失败", closeAdminError) + } + + /** + * 释放对象 + * + * @param admin admin对象实例 + */ + @Internal + private[this] def closeAdmin(admin: Admin): Unit = { + tryWithLog { + if (admin != null) admin.close() + }(logger, catchLog = "关闭HBase admin对象失败") + } + + /** + * 关闭ResultScanner对象 + */ + @Internal + private[this] def closeResultScanner(rs: ResultScanner): Unit = { + tryWithLog { + if (rs != null) rs.close() + }(this.logger, catchLog = "关闭ResultScanner对象失败", isThrow = false) + } + + /** + * 关闭table对象 + */ + def closeTable(table: Table): Unit = { + tryWithLog { + if (table != null) table.close() + }(logger, catchLog = "关闭HBase table对象失败", isThrow = true) + } + + /** + * 根据表名获取Table实例 + * + * @param tableName 表名 + */ + def getTable(tableName: String): Table = { + tryWithReturn { + require(this.isExists(tableName), s"表${tableName}不存在,请检查") + this.getConnection.getTable(TableName.valueOf(tableName)) + }(logger, catchLog = s"HBase getTable操作失败. ${hbaseCluster(keyNum)}.${tableName}") + } + + /** + * 判断给定的表名是否存在 + * + * @param tableName + * HBase表名 + */ + def isExists(tableName: String): Boolean = { + if (StringUtils.isBlank(tableName)) return false + if (this.tableExistsCacheEnable) { + // 如果走缓存 + if (!this.cacheTableExistsMap.containsKey(tableName)) { + this.logger.debug(s"已缓存${tableName}是否存在信息,后续将走缓存.") + this.cacheTableExistsMap.put(tableName, this.tableExists(tableName)) + } + this.cacheTableExistsMap.get(tableName) + } else { + // 不走缓存则每次连接HBase获取表是否存在的信息 + this.tableExists(tableName) + } + } + + /** + * 用于判断HBase表是否存在 + * 注:内部api,每次需连接HBase获取表信息 + */ + @Internal + private[fire] def tableExists(tableName: String): Boolean = { + if (StringUtils.isBlank(tableName)) return false + var admin: Admin = null + tryWithFinally { + admin = this.getConnection.getAdmin + val isExists = admin.tableExists(TableName.valueOf(tableName)) + this.logger.debug(s"HBase tableExists ${hbaseCluster(keyNum)}.${tableName}获取成功") + isExists + } { + closeAdmin(admin) + }(logger, catchLog = s"判断HBase表${hbaseCluster(keyNum)}.${tableName}是否存在失败") + } + + /** + * 根据多个rowKey删除对应的整行记录 + * + * @param tableName 表名 + * @param rowKeys 待删除的rowKey集合 + */ + def deleteRows(tableName: String, rowKeys: String*): Unit = { + if (noEmpty(tableName, rowKeys)) { + var table: Table = null + tryWithFinally { + table = this.getTable(tableName) + + val deletes = ListBuffer[Delete]() + rowKeys.filter(StringUtils.isNotBlank).foreach(rowKey => { + deletes += new Delete(rowKey.getBytes(StandardCharsets.UTF_8)) + }) + + table.delete(deletes) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName) + } { + this.closeTable(table) + }(this.logger, s"HBase deleteRows ${hbaseCluster(keyNum)}.${tableName}执行成功", + s"执行${tableName}表rowKey删除失败", "close HBase table对象失败") + } + } + + /** + * 批量删除指定RowKey的多个列族 + * + * @param tableName 表名 + * @param rowKey rowKey + * @param families 多个列族 + */ + @Internal + private[fire] def deleteFamilies(tableName: String, rowKey: String, families: String*): Unit = { + if (noEmpty(tableName, rowKey, families)) { + val delete = new Delete(rowKey.getBytes(StandardCharsets.UTF_8)) + families.filter(StringUtils.isNotBlank).foreach(family => delete.addFamily(family.getBytes(StandardCharsets.UTF_8))) + + var table: Table = null + tryWithFinally { + table = this.getTable(tableName) + table.delete(delete) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName) + } { + this.closeTable(table) + }(this.logger, s"HBase deleteFamilies ${hbaseCluster(keyNum)}.${tableName}执行成功", + s"delete ${hbaseCluster(keyNum)}.${tableName} families failed. RowKey is ${rowKey}, families is ${families}", + "close HBase table对象出现异常.") + } + } + + /** + * 批量删除指定列族下的多个字段 + * + * @param tableName 表名 + * @param rowKey rowKey字段 + * @param family 列族 + * @param qualifiers 列名 + */ + @Internal + private[fire] def deleteQualifiers(tableName: String, rowKey: String, family: String, qualifiers: String*): Unit = { + if (noEmpty(tableName, rowKey, family, qualifiers)) { + val delete = new Delete(rowKey.getBytes(StandardCharsets.UTF_8)) + qualifiers.foreach(qualifier => delete.addColumns(family.getBytes(StandardCharsets.UTF_8), qualifier.getBytes(StandardCharsets.UTF_8))) + var table: Table = null + + tryWithFinally { + table = this.getTable(tableName) + table.delete(delete) + DatasourceManager.addDBDatasource("HBase", hbaseCluster(keyNum), tableName) + } { + this.closeTable(table) + }(this.logger, s"HBase deleteQualifiers ${hbaseCluster(keyNum)}.${tableName}执行成功", + s"delete ${hbaseCluster(keyNum)}.${tableName} qualifiers failed. RowKey is ${rowKey}, qualifiers is ${qualifiers}", "close HBase table对象出现异常.") + } + } + + /** + * 用于定时reload表是否存在的数据 + */ + @Internal + private[this] def registerReload(): Unit = { + if (tableExistsCacheReload(this.keyNum)) { + threadPool.asInstanceOf[ScheduledExecutorService].scheduleWithFixedDelay(new Runnable { + override def run(): Unit = { + val start = currentTime + cacheTableExistsMap.foreach(kv => { + cacheTableExistsMap.update(kv._1, tableExists(kv._1)) + // 将用到的表信息加入到数据源管理器中 + logger.debug(s"定时reload HBase表:${kv._1} 信息成功.") + }) + logger.debug(s"定时reload HBase耗时:${timecost(start)}") + } + }, tableExistCacheInitialDelay(this.keyNum), tableExistCachePeriod(this.keyNum), TimeUnit.SECONDS) + } + } + + /** + * 用于初始化单例的configuration + */ + @Internal + override protected[fire] def open(): Unit = { + val finalConf = if (this.conf != null) this.conf else HBaseConfiguration.create() + + val url = hbaseClusterUrl(keyNum) + if (StringUtils.isNotBlank(url)) finalConf.set("hbase.zookeeper.quorum", url) + + // 以spark.fire.hbase.conf.xxx[keyNum]开头的配置信息 + PropUtils.sliceKeysByNum(hbaseConfPrefix, keyNum).foreach(kv => { + logger.info(s"hbase configuration: key=${kv._1} value=${kv._2}") + finalConf.set(kv._1, kv._2) + }) + + requireNonEmpty(finalConf.get("hbase.zookeeper.quorum"))(s"未配置HBase集群信息,请通过以下参数指定:spark.hbase.cluster[$keyNum]=xxx") + this.configuration = finalConf + } + + /** + * connector关闭 + */ + override protected def close(): Unit = { + if (this.connection != null && !this.connection.isClosed) { + this.connection.close() + logger.debug(s"释放HBase connection成功. keyNum=$keyNum") + } + } +} + +/** + * 用于单例构建伴生类HBaseConnector的实例对象 + * 每个HBaseConnector实例使用keyNum作为标识,并且与每个HBase集群一一对应 + */ +object HBaseConnector extends ConnectorFactory[HBaseConnector] with HBaseFunctions { + + /** + * 创建HBaseConnector + */ + override protected def create(conf: Any = null, keyNum: Int = 1): HBaseConnector = { + requireNonEmpty(keyNum) + val connector = new HBaseConnector(conf.asInstanceOf[Configuration], keyNum) + logger.debug(s"创建HBaseConnector实例成功. keyNum=$keyNum") + connector + } +} diff --git a/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/HBaseFunctions.scala b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/HBaseFunctions.scala new file mode 100644 index 0000000..4fbfd15 --- /dev/null +++ b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/HBaseFunctions.scala @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.hbase + +import java.nio.charset.StandardCharsets + +import com.zto.fire.predef._ +import com.zto.fire.common.anno.Internal +import com.zto.fire.hbase.bean.HBaseBaseBean +import org.apache.commons.lang3.StringUtils +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.hbase.client.{Connection, Get, Put, Result, ResultScanner, Scan} +import org.apache.hadoop.hbase.filter.{Filter, FilterList} + +import scala.collection.mutable.ListBuffer +import scala.reflect.ClassTag + +/** + * HBase API库 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 15:44 + */ +private[hbase] trait HBaseFunctions { + + /** + * 构建Get对象 + * + * @param rowKey rowKey + * @param family 列族名称 + * @param qualifier 表的qualifier名称 + */ + def buildGet(rowKey: String, + family: String = null, + qualifier: String = "", + maxVersions: Int = 1, + filter: Filter = null): Get = { + require(StringUtils.isNotBlank(rowKey), "buildGet执行失败,rowKey不能为空!") + val get = new Get(rowKey.getBytes(StandardCharsets.UTF_8)) + if (StringUtils.isNotBlank(family) && StringUtils.isNotBlank(qualifier)) { + get.addColumn(family.getBytes(StandardCharsets.UTF_8), qualifier.getBytes(StandardCharsets.UTF_8)) + } else if (StringUtils.isNotBlank(family)) { + get.addFamily(family.getBytes(StandardCharsets.UTF_8)) + } + if (filter != null) get.setFilter(filter) + if (maxVersions > 0) get.setMaxVersions(maxVersions) + get + } + + /** + * 构建Scan对象 + * + * @param startRow 指定起始rowkey + * @param endRow 指定结束rowkey + * @param filterList 过滤器 + * @return scan实例 + */ + def buildScan(startRow: String, endRow: String, + family: String = null, + qualifier: String = "", + maxVersions: Int = 1, + filterList: FilterList = null, + batch: Int = -1): Scan = { + val scan = new Scan + if (StringUtils.isNotBlank(startRow)) scan.setStartRow(startRow.getBytes(StandardCharsets.UTF_8)) + if (StringUtils.isNotBlank(endRow)) scan.setStopRow(endRow.getBytes(StandardCharsets.UTF_8)) + if (StringUtils.isNotBlank(family) && StringUtils.isNotBlank(qualifier)) { + scan.addColumn(family.getBytes(StandardCharsets.UTF_8), qualifier.getBytes(StandardCharsets.UTF_8)) + } else if (StringUtils.isNotBlank(family)) { + scan.addFamily(family.getBytes(StandardCharsets.UTF_8)) + } + if (filterList != null) scan.setFilter(filterList) + if (maxVersions > 0) scan.setMaxVersions(maxVersions) + if (batch > 0) scan.setBatch(batch) + scan + } + + /** + * 批量插入多行多列,自动将HBaseBaseBean子类转为Put集合 + * + * @param tableName 表名 + * @param beans HBaseBaseBean子类集合 + */ + def insert[T <: HBaseBaseBean[T] : ClassTag](tableName: String, beans: Seq[T], keyNum: Int = 1): Unit = { + HBaseConnector(keyNum = keyNum).insert[T](tableName, beans: _*) + } + + /** + * 批量插入多行多列 + * + * @param tableName 表名 + * @param puts Put集合 + */ + def insert(tableName: String, puts: Seq[Put], keyNum: Int): Unit = { + HBaseConnector(keyNum = keyNum).insert(tableName, puts: _*) + } + + /** + * 从HBase批量Get数据,并将结果封装到JavaBean中 + * + * @param tableName 表名 + * @param rowKeys 指定的多个rowKey + * @param clazz 目标类类型,必须是HBaseBaseBean的子类 + * @return 目标对象实例 + */ + def get[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], rowKeys: Seq[String], keyNum: Int = 1): ListBuffer[T] = { + HBaseConnector(keyNum = keyNum).get[T](tableName, clazz, rowKeys: _*) + } + + /** + * 从HBase批量Get数据,并将结果封装到JavaBean中 + * + * @param tableName 表名 + * @param clazz 目标类类型,必须是HBaseBaseBean的子类 + * @param gets 指定的多个get对象 + * @return 目标对象实例 + */ + def get[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], gets: ListBuffer[Get], keyNum: Int): ListBuffer[T] = { + HBaseConnector(keyNum = keyNum).get[T](tableName, clazz, gets: _*) + } + + /** + * 通过HBase Seq[Get]获取多条数据 + * + * @param tableName 表名 + * @param getList HBase的get对象实例 + * @return + * HBase Result + */ + def getResult(tableName: String, getList: Seq[Get], keyNum: Int): ListBuffer[Result] = { + HBaseConnector(keyNum = keyNum).getResult(tableName, getList: _*) + } + + /** + * 通过HBase Get对象获取一条数据 + * + * @param tableName 表名 + * @return + * HBase Result + */ + def getResult[T: ClassTag](tableName: String, rowKeyList: Seq[String], keyNum: Int = 1): ListBuffer[Result] = { + HBaseConnector(keyNum = keyNum).getResult[T](tableName, rowKeyList: _*) + } + + /** + * 表扫描,将scan后得到的ResultScanner对象直接返回 + * 注:调用者需手动关闭ResultScanner对象实例 + * + * @param tableName 表名 + * @param scan HBase scan对象 + * @return 指定类型的List + */ + def scanResultScanner(tableName: String, scan: Scan, keyNum: Int): ResultScanner = { + HBaseConnector(keyNum = keyNum).scanResultScanner(tableName, scan) + } + + /** + * 表扫描,将scan后得到的ResultScanner对象直接返回 + * 注:调用者需手动关闭ResultScanner对象实例 + * + * @param tableName 表名 + * @param startRow 开始行 + * @param endRow 结束行 + * @return 指定类型的List + */ + def scanResultScanner(tableName: String, startRow: String, endRow: String, keyNum: Int = 1): ResultScanner = { + HBaseConnector(keyNum = keyNum).scanResultScanner(tableName, startRow, endRow) + } + + /** + * 表扫描,将查询后的数据转为JavaBean并放到List中 + * + * @param tableName 表名 + * @param startRow 开始行 + * @param endRow 结束行 + * @param clazz 类型 + * @return 指定类型的List + */ + def scan[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, endRow: String, keyNum: Int = 1): ListBuffer[T] = { + HBaseConnector(keyNum = keyNum).scan[T](tableName, clazz, startRow, endRow) + } + + /** + * 表扫描,将查询后的数据转为JavaBean并放到List中 + * + * @param tableName 表名 + * @param scan HBase scan对象 + * @param clazz 类型 + * @return 指定类型的List + */ + def scan[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int): ListBuffer[T] = { + HBaseConnector(keyNum = keyNum).scan[T](tableName, clazz, scan) + } + + /** + * 根据keyNum获取指定HBase集群的connection + */ + def getConnection(keyNum: Int = 1): Connection = HBaseConnector(keyNum = keyNum).getConnection + + /** + * 创建HBase表 + * + * @param tableName + * 表名 + * @param families + * 列族 + */ + private[fire] def createTable(tableName: String, families: Seq[String], keyNum: Int = 1): Unit = { + HBaseConnector(keyNum = keyNum).createTable(tableName, families: _*) + } + + /** + * 删除指定的HBase表 + * + * @param tableName 表名 + */ + private[fire] def dropTable(tableName: String, keyNum: Int = 1): Unit = { + HBaseConnector(keyNum = keyNum).dropTable(tableName) + } + + /** + * 启用指定的HBase表 + * + * @param tableName 表名 + */ + private[fire] def enableTable(tableName: String, keyNum: Int = 1): Unit = { + HBaseConnector(keyNum = keyNum).enableTable(tableName) + } + + /** + * disable指定的HBase表 + * + * @param tableName 表名 + */ + private[fire] def disableTable(tableName: String, keyNum: Int = 1): Unit = { + HBaseConnector(keyNum = keyNum).disableTable(tableName) + } + + /** + * 清空指定的HBase表 + * + * @param tableName + * 表名 + * @param preserveSplits 是否保留所有的split信息 + */ + private[fire] def truncateTable(tableName: String, preserveSplits: Boolean = true, keyNum: Int = 1): Unit = { + HBaseConnector(keyNum = keyNum).truncateTable(tableName, preserveSplits) + } + + /** + * 用于判断HBase表是否存在 + */ + def tableExists(tableName: String, keyNum: Int = 1): Boolean = { + HBaseConnector(keyNum = keyNum).tableExists(tableName) + } + + /** + * 根据多个rowKey删除对应的整行记录 + * + * @param tableName 表名 + * @param rowKeys 待删除的rowKey集合 + */ + def deleteRows(tableName: String, rowKeys: Seq[String], keyNum: Int = 1): Unit = { + HBaseConnector(keyNum = keyNum).deleteRows(tableName, rowKeys: _*) + } + + /** + * 批量删除指定RowKey的多个列族 + * + * @param tableName 表名 + * @param rowKey rowKey + * @param families 多个列族 + */ + @Internal + private[fire] def deleteFamilies(tableName: String, rowKey: String, families: Seq[String], keyNum: Int = 1): Unit = { + HBaseConnector(keyNum = keyNum).deleteFamilies(tableName, rowKey, families: _*) + } + + /** + * 批量删除指定列族下的多个字段 + * + * @param tableName 表名 + * @param rowKey rowKey字段 + * @param family 列族 + * @param qualifiers 列名 + */ + @Internal + private[fire] def deleteQualifiers(tableName: String, rowKey: String, family: String, qualifiers: Seq[String], keyNum: Int = 1): Unit = { + HBaseConnector(keyNum = keyNum).deleteQualifiers(tableName, rowKey, family, qualifiers: _*) + } + + /** + * 获取Configuration实例 + * + * @return HBase Configuration对象 + */ + def getConfiguration(keyNum: Int = 1): Configuration = HBaseConnector(keyNum = keyNum).getConfiguration + + /** + * 校验类型合法性,class必须是HBaseBaseBean的子类 + */ + def checkClass[T: ClassTag](clazz: Class[_] = null): Unit = { + val finalClazz = if (clazz != null) clazz else getParamType[T] + if (finalClazz == null || finalClazz.getSuperclass != classOf[HBaseBaseBean[_]]) throw new IllegalArgumentException("请指定泛型类型,该泛型必须是HBaseBaseBean的子类,如:this.fire.hbasePutTable[JavaBean]") + } +} diff --git a/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/bean/HBaseBaseBean.java b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/bean/HBaseBaseBean.java new file mode 100644 index 0000000..d59a533 --- /dev/null +++ b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/bean/HBaseBaseBean.java @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.hbase.bean; + +import com.zto.fire.common.anno.FieldName; + +import java.io.Serializable; + +/** + * HBase封装bean需实现该接口 + * Created by ChengLong on 2017-03-27. + */ +public abstract class HBaseBaseBean implements Serializable { + /** + * rowKey字段 + */ + @FieldName(value = "rowKey", disuse = true) + public String rowKey; + + /** + * 子类包名+类名 + */ + @FieldName(value = "className", disuse = true) + public final String className = this.getClass().getSimpleName(); + + /** + * 根据业务需要,构建rowkey + */ + public abstract T buildRowKey(); + + public String getRowKey() { + return rowKey; + } + + public void setRowKey(String rowKey) { + this.rowKey = rowKey; + } +} diff --git a/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/bean/MultiVersionsBean.java b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/bean/MultiVersionsBean.java new file mode 100644 index 0000000..cfd74f6 --- /dev/null +++ b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/bean/MultiVersionsBean.java @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.hbase.bean; + +import com.zto.fire.common.anno.FieldName; +import com.zto.fire.common.util.JSONUtils; +import org.apache.commons.beanutils.BeanUtils; +import org.apache.commons.beanutils.ConvertUtils; +import org.apache.commons.beanutils.converters.BigDecimalConverter; +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.math.BigDecimal; +import java.util.Map; + +/** + * 多版本HBase实体Bean + * Created by ChengLong on 2017-08-17. + */ +public class MultiVersionsBean extends HBaseBaseBean { + @FieldName(value = "logger", disuse = true) + private static final transient Logger logger = LoggerFactory.getLogger(MultiVersionsBean.class); + @FieldName("multiFields") + private String multiFields; + + @FieldName(value = "HBaseBaseBean", disuse = true) + private HBaseBaseBean target; + + @FieldName(value = "BIGDECIMAL_ZERO", disuse = true) + private static final BigDecimal BIGDECIMAL_ZERO = new BigDecimal("0"); + + static { + // 这里一定要注册默认值,使用null也可以 + BigDecimalConverter bd = new BigDecimalConverter(BIGDECIMAL_ZERO); + ConvertUtils.register(bd, java.math.BigDecimal.class); + } + + public String getMultiFields() { + return multiFields; + } + + public void setMultiFields(String multiFields) { + this.multiFields = multiFields; + } + + public HBaseBaseBean getTarget() { + return target; + } + + public void setTarget(HBaseBaseBean target) { + this.target = target; + } + + public MultiVersionsBean(HBaseBaseBean target) { + this.target = (HBaseBaseBean) target.buildRowKey(); + this.multiFields = JSONUtils.toJSONString(this.target); + } + + public MultiVersionsBean() { + + } + + @Override + public MultiVersionsBean buildRowKey() { + try { + if (this.target == null && StringUtils.isNotBlank(this.multiFields)) { + Map map = JSONUtils.parseObject(this.multiFields, Map.class); + Class clazz = Class.forName(map.get("className").toString()); + HBaseBaseBean bean = (HBaseBaseBean) clazz.newInstance(); + BeanUtils.populate(bean, map); + this.target = (HBaseBaseBean) bean.buildRowKey(); + } + + if (this.target != null) { + this.target = (HBaseBaseBean) this.target.buildRowKey(); + this.rowKey = this.target.rowKey; + } + } catch (Exception e) { + logger.error("执行buildRowKey()方法失败", e); + } + + return this; + } +} diff --git a/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/conf/FireHBaseConf.scala b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/conf/FireHBaseConf.scala new file mode 100644 index 0000000..7f1889e --- /dev/null +++ b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/conf/FireHBaseConf.scala @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.hbase.conf + +import java.util + +import com.zto.fire.common.util.PropUtils +import com.zto.fire.predef._ + + +/** + * hbase相关配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 15:08 + */ +private[fire] object FireHBaseConf { + lazy val HBASE_BATCH = "fire.hbase.batch.size" + lazy val HBBASE_COLUMN_FAMILY_KEY = "hbase.column.family" + lazy val HBASE_MAX_RETRY = "hbase.max.retry" + lazy val HBASE_CLUSTER_URL = "hbase.cluster" + lazy val HBASE_DURABILITY = "hbase.durability" + // fire框架针对hbase操作后数据集的缓存策略,配置列表详见:StorageLevel.scala(配置不区分大小写) + lazy val FIRE_HBASE_STORAGE_LEVEL = "fire.hbase.storage.level" + // 通过HBase scan后repartition的分区数 + @deprecated("use fire.hbase.scan.partitions", "v1.0.0") + lazy val FIRE_HBASE_SCAN_REPARTITIONS = "fire.hbase.scan.repartitions" + lazy val FIRE_HBASE_SCAN_PARTITIONS = "fire.hbase.scan.partitions" + // hbase集群映射配置前缀 + lazy val hbaseClusterMapPrefix = "fire.hbase.cluster.map." + // 是否开启HBase表存在判断的缓存 + lazy val TABLE_EXISTS_CACHE_ENABLE = "fire.hbase.table.exists.cache.enable" + // 是否开启HBase表存在列表缓存的定时更新任务 + lazy val TABLE_EXISTS_CACHE_RELOAD_ENABLE = "fire.hbase.table.exists.cache.reload.enable" + // 定时刷新缓存HBase表任务的初始延迟 + lazy val TABLE_EXISTS_CACHE_INITIAL_DELAY = "fire.hbase.table.exists.cache.initialDelay" + // 定时刷新缓存HBase表任务的执行频率 + lazy val TABLE_EXISTS_CACHE_PERIOD = "fire.hbase.table.exists.cache.period" + + // hbase集群映射地址 + lazy val hbaseClusterMap: util.Map[String, String] = PropUtils.sliceKeys(this.hbaseClusterMapPrefix) + // hbase java api 配置前缀 + lazy val hbaseConfPrefix = "fire.hbase.conf." + + // 是否开启HBase表存在判断的缓存 + def tableExistsCache(keyNum: Int = 1): Boolean = PropUtils.getBoolean(this.TABLE_EXISTS_CACHE_ENABLE, true, keyNum) + // 是否开启HBase表存在列表缓存的定时更新任务 + def tableExistsCacheReload(keyNum: Int = 1): Boolean = PropUtils.getBoolean(this.TABLE_EXISTS_CACHE_RELOAD_ENABLE, true, keyNum) + // 定时刷新缓存HBase表任务的初始延迟 + def tableExistCacheInitialDelay(keyNum: Int = 1): Long = PropUtils.getLong(this.TABLE_EXISTS_CACHE_INITIAL_DELAY, 60, keyNum) + // 定时刷新缓存HBase表任务的执行频率 + def tableExistCachePeriod(keyNum: Int = 1): Long = PropUtils.getLong(this.TABLE_EXISTS_CACHE_PERIOD, 600, keyNum) + // HBase操作默认的批次大小 + def hbaseBatchSize(keyNum: Int = 1): Int = PropUtils.getInt(this.HBASE_BATCH, 10000, keyNum) + // hbase默认的列族名称,如果使用FieldName指定,则会被覆盖 + def familyName(keyNum: Int = 1): String = PropUtils.getString(this.HBBASE_COLUMN_FAMILY_KEY, "info", keyNum) + // hbase操作失败最大重试次数 + def hbaseMaxRetry(keyNum: Int = 1): Long = PropUtils.getLong(this.HBASE_MAX_RETRY, 3, keyNum) + // hbase集群名称 + def hbaseCluster(keyNum: Int = 1): String = PropUtils.getString(this.HBASE_CLUSTER_URL, "", keyNum) + + /** + * 根据给定的HBase集群别名获取对应的hbase.zookeeper.quorum地址 + */ + def hbaseClusterUrl(keyNum: Int = 1): String = { + val clusterName = this.hbaseCluster(keyNum) + this.hbaseClusterMap.getOrElse(clusterName, clusterName) + } + + def hbaseDurability(keyNum: Int = 1): String = PropUtils.getString(this.HBASE_DURABILITY, "", keyNum) + + // HBase结果集的缓存策略配置 + def hbaseStorageLevel: String = PropUtils.getString(this.FIRE_HBASE_STORAGE_LEVEL, "memory_and_disk_ser").toUpperCase + + // 通过HBase scan后repartition的分区数,默认1200 + def hbaseHadoopScanPartitions: Int = { + val partitions = PropUtils.getInt(this.FIRE_HBASE_SCAN_PARTITIONS, -1) + if (partitions != -1) partitions else PropUtils.getInt(this.FIRE_HBASE_SCAN_REPARTITIONS, 1200) + } +} \ No newline at end of file diff --git a/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/utils/HBaseUtils.scala b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/utils/HBaseUtils.scala new file mode 100644 index 0000000..f5b6220 --- /dev/null +++ b/fire-connectors/fire-hbase/src/main/scala/com/zto/fire/hbase/utils/HBaseUtils.scala @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.hbase.utils + +import org.apache.commons.lang3.StringUtils +import org.apache.hadoop.hbase.client.Scan +import org.apache.hadoop.hbase.protobuf.ProtobufUtil +import org.apache.hadoop.hbase.util.Base64 + +/** + * HBase 操作工具类 + * + * @author ChengLong 2019-6-23 13:36:16 + */ +private[fire] object HBaseUtils { + + /** + * 将scan对象转为String + * + * @param scan + * @return + */ + def convertScanToString(scan: Scan): String = { + val proto = ProtobufUtil.toScan(scan) + Base64.encodeBytes(proto.toByteArray) + } + + /** + * 将给定的字符串补齐指定的位数 + * + * @param str + * @param length + * @return + */ + def appendString(str: String, char: String, length: Int): String = { + if (StringUtils.isNotBlank(str) && StringUtils.isNotBlank(char) && length > str.length) { + val sb: StringBuilder = new StringBuilder(str) + var i: Int = 0 + while (i < length - str.length) { + sb.append(char) + i += 1 + } + sb.toString + } else if (length == str.length) { + str + } else if (length < str.length && length > 0) { + str.substring(0, length) + } else { + "" + } + } +} diff --git a/fire-connectors/fire-hbase/src/test/java/com/zto/fire/common/db/bean/Student.java b/fire-connectors/fire-hbase/src/test/java/com/zto/fire/common/db/bean/Student.java new file mode 100644 index 0000000..638ad35 --- /dev/null +++ b/fire-connectors/fire-hbase/src/test/java/com/zto/fire/common/db/bean/Student.java @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.db.bean; + +import com.zto.fire.common.anno.FieldName; +import com.zto.fire.common.util.JSONUtils; +import com.zto.fire.hbase.anno.HConfig; +import com.zto.fire.hbase.bean.HBaseBaseBean; +import com.zto.fire.common.util.DateFormatUtils; + +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.List; + +/** + * @author ChengLong + * @create 2020-11-13 17:46 + * @since 1.0.0 + */ +@HConfig(nullable = true, multiVersion = true) +public class Student extends HBaseBaseBean { + private Long id; + private String name; + private Integer age; + private BigDecimal height; + @FieldName(family = "data", value = "timestamp") + private String createTime; + private String nullField; + + public Student() { + } + + public Student(Long id, String name, Integer age, BigDecimal height) { + this.id = id; + this.name = name; + this.age = age; + this.height = height; + this.createTime = DateFormatUtils.formatCurrentDateTime(); + } + + public static List build(int count) { + List list = new ArrayList<>(count); + try { + for (int i = 1; i <= count; i++) { + list.add(new Student(Long.parseLong(i + ""), "root_" + i, i, new BigDecimal(i + "" + i + "." + i))); + Thread.sleep(500); + } + } catch (Exception e) { + e.printStackTrace(); + } + return list; + } + + @Override + public Student buildRowKey() { + this.rowKey = this.id.toString(); + return this; + } + + @Override + public boolean equals(Object o) { + if (this == o) return true; + if (!(o instanceof Student)) return false; + Student student = (Student) o; + return id.equals(student.id); + } + + public Long getId() { + return id; + } + + public void setId(Long id) { + this.id = id; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public Integer getAge() { + return age; + } + + public void setAge(Integer age) { + this.age = age; + } + + public BigDecimal getHeight() { + return height; + } + + public void setHeight(BigDecimal height) { + this.height = height; + } + + public String getCreateTime() { + return createTime; + } + + public void setCreateTime(String createTime) { + this.createTime = createTime; + } + + public String getNullField() { + return nullField; + } + + public void setNullField(String nullField) { + this.nullField = nullField; + } + + @Override + public String toString() { + return JSONUtils.toJSONString(this); + } + +} diff --git a/fire-connectors/fire-hbase/src/test/resources/HBaseConnectorTest.properties b/fire-connectors/fire-hbase/src/test/resources/HBaseConnectorTest.properties new file mode 100644 index 0000000..96c2f1a --- /dev/null +++ b/fire-connectors/fire-hbase/src/test/resources/HBaseConnectorTest.properties @@ -0,0 +1,19 @@ +spark.hbase.cluster=test +spark.fire.hbase.conf.hbase.zookeeper.property.clientPort = 2181 +spark.log.level = info + +spark.hbase.cluster2=test +spark.fire.hbase.conf.hbase.zookeeper.property.clientPort2 = 2181 +spark.fire.hbase.conf.zookeeper.znode.parent2 = /hbase +spark.fire.hbase.conf.hbase.rpc.timeout2 = 600000 +spark.fire.hbase.conf.hbase.snapshot.master.timeoutMillis2 = 400000 +spark.fire.hbase.conf.hbase.snapshot.region.timeout2 = 300000 + +# 是否开启HBase表存在判断的缓存,开启后表存在判断将避免大量的connection消耗 +spark.fire.hbase.table.exists.cache.enable = true +# 是否开启HBase表存在列表缓存的定时更新任务 +spark.fire.hbase.table.exists.cache.reload.enable = false +# 定时刷新缓存HBase表任务的初始延迟(s) +spark.fire.hbase.table.exists.cache.initialDelay = 6 +# 定时刷新缓存HBase表任务的执行频率(s) +spark.fire.hbase.table.exists.cache.period = 10 \ No newline at end of file diff --git a/fire-connectors/fire-hbase/src/test/scala/com/zto/fire/hbase/HBaseConnectorTest.scala b/fire-connectors/fire-hbase/src/test/scala/com/zto/fire/hbase/HBaseConnectorTest.scala new file mode 100644 index 0000000..46d4b46 --- /dev/null +++ b/fire-connectors/fire-hbase/src/test/scala/com/zto/fire/hbase/HBaseConnectorTest.scala @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.hbase + +import com.zto.fire.common.anno.{Internal, TestStep} +import com.zto.fire.common.db.bean.Student +import com.zto.fire.common.util.{DatasourceManager, PropUtils} +import com.zto.fire.predef._ +import org.junit.Assert._ +import org.junit.{Before, Test} + +/** + * 用于单元测试HBaseConnector中的API + * + * @author ChengLong + * @since 1.1.2 + * @create 2020-11-13 15:06 + */ +class HBaseConnectorTest { + val tableName = "fire_test_1" + val tableName2 = "fire_test_2" + var hbase: HBaseConnector = null + var hbase2: HBaseConnector = null + + @Before + def init: Unit = { + PropUtils.load("HBaseConnectorTest") + this.hbase = HBaseConnector() + this.hbase2 = HBaseConnector() + this.hbase2 = HBaseConnector(keyNum = 2) + } + + /** + * 用于测试以下api: + * 1. 判断表是否存在 + * 2. disable 表 + * 3. create 表 + */ + @Test + @TestStep(step = 1, desc = "创建表API测试") + def testDDL: Unit = this.createTestTable + + /** + * 测试表是否存在的缓存功能 + */ + @Test + @TestStep(step = 2, desc = "增删改查API测试") + def testTableExists: Unit = { + val starTime = currentTime + (1 to 10).foreach(i => { + this.hbase.tableExists(this.tableName) + }) + println("未开启缓存总耗时:" + (timecost(starTime))) + + val starTime2 = currentTime + (1 to 10).foreach(i => { + this.hbase.isExists(this.tableName) + }) + println("开启缓存总耗时:" + (timecost(starTime2))) + } + + /** + * 测试插入多条记录 + */ + @Test + @TestStep(step = 3, desc = "增删改查API测试") + def testInsert: Unit = { + this.hbase.truncateTable(this.tableName) + // 批量插入 + val studentList = Student.build(5) + this.hbase.insert(this.tableName, studentList: _*) + + // get操作 + println("===========get=============") + val rowKeyList = (1 to 5).map(i => i.toString) + val getStudentList = this.hbase.get(this.tableName, classOf[Student], rowKeyList: _*) + assertEquals(getStudentList.size, 5) + getStudentList.foreach(println) + val getOne = this.hbase.get(this.tableName, classOf[Student], HBaseConnector.buildGet("1")) + assertEquals(getOne.size, 1) + + println("===========scan=============") + val scanList = this.hbase.scan(this.tableName, classOf[Student], "1", "3") + assertEquals(scanList.size, 2) + scanList.foreach(println) + + for (i <- 1 to 5) { + DatasourceManager.get.foreach(t => { + t._2.foreach(source => { + println("数据源:" + t._1.toString + " " + source) + }) + }) + println("=====================================") + Thread.sleep(10000) + } + } + + /** + * 测试跨集群支持 + */ + @Test + @TestStep(step = 4, desc = "多集群测试") + def testMultiCluster: Unit = { + this.hbase.truncateTable(this.tableName) + this.hbase2.truncateTable(this.tableName2) + val studentList1 = Student.build(5) + this.hbase.insert(this.tableName, studentList1: _*) + val scanStudentList1 = this.hbase.scan(this.tableName, classOf[Student], "1", "6") + assertEquals(scanStudentList1.size, 5) + val studentList2 = Student.build(3) + this.hbase2.insert(this.tableName2, studentList2: _*) + val scanStudentList2 = this.hbase2.scan(this.tableName2, classOf[Student], "1", "6") + assertEquals(scanStudentList2.size, 3) + + assertEquals(DatasourceManager.get.size(), 1) + DatasourceManager.get.foreach(t => { + t._2.foreach(println) + }) + } + + /** + * 测试多版本插入 + * 注:多版本需要在Student类上声明@HConfig注解:@HConfig(nullable = true, multiVersion = true) + */ + @Test + @TestStep(step = 5, desc = "多版本测试") + def testMultiInsert: Unit = { + this.hbase2.truncateTable(this.tableName2) + val studentList = Student.build(5) + this.hbase2.insert(this.tableName2, studentList: _*) + val students = this.hbase2.get(this.tableName2, classOf[Student], "1", "2") + students.foreach(println) + } + + /** + * 测试老的api使用方式 + */ + @Test + @TestStep(step = 6, desc = "静态类型API测试") + def testOldStyle: Unit = { + val hbaseConn1 = HBaseConnector(keyNum = 2) + val hbaseConn2 = HBaseConnector(keyNum = 2) + assertEquals(hbaseConn1 == hbaseConn2, true) + println(HBaseConnector.tableExists("fire_test_1")) + println(HBaseConnector.tableExists("fire_test_1")) + } + + /** + * 创建必要的表信息 + */ + @Internal + private def createTestTable: Unit = { + if (this.hbase.isExists(this.tableName)) this.hbase.dropTable(this.tableName) + assertEquals(this.hbase.isExists(this.tableName), false) + this.hbase.createTable(this.tableName, "info", "data") + assertEquals(this.hbase.isExists(this.tableName), true) + + if (this.hbase2.isExists(this.tableName2)) this.hbase2.dropTable(this.tableName2) + assertEquals(this.hbase2.isExists(this.tableName2), false) + this.hbase2.createTable(this.tableName2, "info", "data") + assertEquals(this.hbase2.isExists(this.tableName2), true) + } + +} diff --git a/fire-connectors/fire-jdbc/pom.xml b/fire-connectors/fire-jdbc/pom.xml new file mode 100644 index 0000000..7217cae --- /dev/null +++ b/fire-connectors/fire-jdbc/pom.xml @@ -0,0 +1,69 @@ + + + + + 4.0.0 + fire-jdbc_${scala.binary.version} + jar + fire-jdbc + + + com.zto.fire + fire-connectors_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + + mysql + mysql-connector-java + ${mysql.version} + + + c3p0 + c3p0 + 0.9.1.2 + + + org.apache.derby + derby + 10.13.1.1 + test + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/JdbcConnector.scala b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/JdbcConnector.scala new file mode 100644 index 0000000..2809913 --- /dev/null +++ b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/JdbcConnector.scala @@ -0,0 +1,353 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.jdbc + +import java.sql.{Connection, PreparedStatement, ResultSet, SQLException, Statement} + +import com.mchange.v2.c3p0.ComboPooledDataSource +import com.zto.fire.common.anno.Internal +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.common.util.{DatasourceManager, StringsUtils} +import com.zto.fire.core.connector.{ConnectorFactory, FireConnector} +import com.zto.fire.jdbc.conf.FireJdbcConf +import com.zto.fire.jdbc.util.DBUtils +import com.zto.fire.predef._ +import org.apache.commons.lang3.StringUtils + +import scala.collection.mutable.ListBuffer +import scala.reflect.ClassTag + +/** + * 数据库连接池(c3p0)工具类 + * 封装了数据库常用的操作方法 + * + * @param conf + * 代码级别的配置信息,允许为空,配置文件会覆盖相同配置项,也就是说配置文件拥有着跟高的优先级 + * @param keyNum + * 用于区分连接不同的数据源,不同配置源对应不同的Connector实例 + * @author ChengLong 2020-11-27 10:31:03 + */ +private[fire] class JdbcConnector(conf: JdbcConf = null, keyNum: Int = 1) extends FireConnector(keyNum = keyNum) { + private[this] var connPool: ComboPooledDataSource = _ + // 日志中sql截取的长度 + private lazy val logSqlLength = FireFrameworkConf.logSqlLength + private[this] var username: String = _ + private[this] var url: String = _ + private[this] var dbType: String = "unknown" + private[this] lazy val finallyCatchLog = "释放jdbc资源失败" + + /** + * c3p0线程池初始化 + */ + override protected[fire] def open(): Unit = { + tryWithLog { + // 从配置文件中读取配置信息,并设置到ComboPooledDataSource对象中 + this.logger.info(s"准备初始化数据库连接池[ ${FireJdbcConf.jdbcUrl(keyNum)} ]") + // 支持url和别名两种配置方式 + this.url = if (StringUtils.isBlank(FireJdbcConf.jdbcUrl(keyNum)) && this.conf != null && StringUtils.isNotBlank(this.conf.url)) this.conf.url else FireJdbcConf.jdbcUrl(keyNum) + require(StringUtils.isNotBlank(this.url), s"数据库url不能为空,keyNum=${this.keyNum}") + val driverClass = if (StringUtils.isBlank(FireJdbcConf.driverClass(keyNum)) && this.conf != null && StringUtils.isNotBlank(this.conf.driverClass)) this.conf.driverClass else FireJdbcConf.driverClass(keyNum) + require(StringUtils.isNotBlank(driverClass), s"数据库driverClass不能为空,keyNum=${this.keyNum}") + this.username = if (StringUtils.isBlank(FireJdbcConf.user(keyNum)) && this.conf != null && StringUtils.isNotBlank(this.conf.username)) this.conf.username else FireJdbcConf.user(keyNum) + val password = if (StringUtils.isBlank(FireJdbcConf.password(keyNum)) && this.conf != null && StringUtils.isNotBlank(this.conf.password)) this.conf.password else FireJdbcConf.password(keyNum) + // 识别数据源类型是oracle、mysql等 + this.dbType = DBUtils.dbTypeParser(driverClass, this.url) + logger.info(s"Fire框架识别到当前jdbc数据源标识为:${this.dbType},keyNum=${this.keyNum}") + + // 创建c3p0数据库连接池实例 + val pool = new ComboPooledDataSource(true) + pool.setJdbcUrl(this.url) + pool.setDriverClass(driverClass) + if (StringUtils.isNotBlank(this.username)) pool.setUser(this.username) + if (StringUtils.isNotBlank(password)) pool.setPassword(password) + pool.setMaxPoolSize(FireJdbcConf.maxPoolSize(keyNum)) + pool.setMinPoolSize(FireJdbcConf.minPoolSize(keyNum)) + pool.setAcquireIncrement(FireJdbcConf.acquireIncrement(keyNum)) + pool.setInitialPoolSize(FireJdbcConf.initialPoolSize(keyNum)) + pool.setMaxStatements(0) + pool.setMaxStatementsPerConnection(0) + pool.setMaxIdleTime(FireJdbcConf.maxIdleTime(keyNum)) + this.connPool = pool + this.logger.info(s"创建数据库连接池[ $keyNum ] driver: ${this.dbType}") + }(this.logger, s"数据库连接池创建成功", s"初始化数据库连接池[ $keyNum ]失败") + } + + /** + * 关闭c3p0数据库连接池 + */ + override protected def close(): Unit = { + if (this.connPool != null) { + this.connPool.close() + logger.debug(s"释放jdbc 连接池成功. keyNum=$keyNum") + } + } + + + /** + * 从指定的连接池中获取一个连接 + * + * @return + * 对应配置项的数据库连接 + */ + def getConnection: Connection = { + tryWithReturn { + val connection = this.connPool.getConnection + this.logger.debug(s"获取数据库连接[ ${keyNum} ]成功") + connection + }(this.logger, catchLog = s"获取数据库连接[ ${FireJdbcConf.jdbcUrl(keyNum)} ]发生异常,请检查配置文件") + } + + /** + * 更新操作 + * + * @param sql + * 待执行的sql语句 + * @param params + * sql中的参数 + * @param connection + * 传递已有的数据库连接,可满足跨api的同一事务提交的需求 + * @param commit + * 是否自动提交事务,默认为自动提交 + * @param closeConnection + * 是否关闭connection,默认关闭 + * @return + * 影响的记录数 + */ + def executeUpdate(sql: String, params: Seq[Any] = null, connection: Connection = null, commit: Boolean = true, closeConnection: Boolean = true): Long = { + val conn = if (connection == null) this.getConnection else connection + var retVal: Long = 0L + var stat: PreparedStatement = null + tryWithFinally { + conn.setAutoCommit(false) + stat = conn.prepareStatement(sql) + + // 设置值参数 + if (params != null && params.nonEmpty) { + var i: Int = 1 + params.foreach(param => { + stat.setObject(i, param) + i += 1 + }) + } + retVal = stat.executeUpdate + if (commit) conn.commit() + this.logger.info(s"executeUpdate success. keyNum: ${keyNum} count: $retVal") + retVal + } { + this.release(sql, conn, stat, null, closeConnection) + }(this.logger, s"${this.sqlBuriedPoint(sql)}", + s"executeUpdate failed. keyNum:${keyNum}\n${this.sqlBuriedPoint(sql)}", finallyCatchLog) + } + + /** + * 执行批量更新操作 + * + * @param sql + * 待执行的sql语句 + * @param paramsList + * sql的参数列表 + * @param connection + * 传递已有的数据库连接,可满足跨api的同一事务提交的需求 + * @param commit + * 是否自动提交事务,默认为自动提交 + * @param closeConnection + * 是否关闭connection,默认关闭 + * @return + * 影响的记录数 + */ + def executeBatch(sql: String, paramsList: Seq[Seq[Any]] = null, connection: Connection = null, commit: Boolean = true, closeConnection: Boolean = true): Array[Int] = { + val conn = if (connection == null) this.getConnection else connection + var stat: PreparedStatement = null + + var batch = 0 + var count = 0 + tryWithFinally { + conn.setAutoCommit(false) + stat = conn.prepareStatement(sql) + if (paramsList != null && paramsList.nonEmpty) { + paramsList.foreach(params => { + var i = 1 + params.foreach(param => { + stat.setObject(i, param) + i += 1 + }) + batch += 1 + stat.addBatch() + if (batch % FireJdbcConf.batchSize(keyNum) == 0) { + stat.executeBatch() + stat.clearBatch() + } + }) + } + // 执行批量更新 + val retVal = stat.executeBatch + if (commit) conn.commit() + count = retVal.sum + this.logger.info(s"executeBatch success. keyNum: ${keyNum} count: $count") + retVal + } { + this.release(sql, conn, stat, null, closeConnection) + }(this.logger, s"${this.sqlBuriedPoint(sql)}", + s"executeBatch failed. keyNum:${keyNum}\n${this.sqlBuriedPoint(sql)}", finallyCatchLog) + } + + /** + * 执行查询操作,以JavaBean方式返回结果集 + * + * @param sql + * 查询语句 + * @param params + * sql执行参数 + * @param clazz + * JavaBean类型 + * @param connection + * 传递已有的数据库连接,可满足跨api的同一事务提交的需求 + */ + def executeQuery[T <: Object : ClassTag](sql: String, params: Seq[Any] = null, clazz: Class[T], connection: Connection = null): List[T] = { + val listBuffer = ListBuffer[T]() + + this.executeQueryCall(sql, params, rs => { + listBuffer ++= DBUtils.dbResultSet2Bean(rs, clazz) + listBuffer.size + }, connection) + + listBuffer.toList + } + + /** + * 执行查询操作 + * + * @param sql + * 查询语句 + * @param params + * sql执行参数 + * @param callback + * 查询回调 + * @param connection + * 传递已有的数据库连接,可满足跨api的同一事务提交的需求 + */ + def executeQueryCall(sql: String, params: Seq[Any] = null, callback: ResultSet => Int = null, connection: Connection = null): Unit = { + val conn = if (connection == null) this.getConnection else connection + var stat: PreparedStatement = null + var rs: ResultSet = null + var count: Long = 0 + + tryWithFinally { + stat = conn.prepareStatement(sql) + if (params != null && params.nonEmpty) { + var i = 1 + params.foreach(param => { + stat.setObject(i, param) + i += 1 + }) + } + rs = stat.executeQuery + + if (rs != null && callback != null) { + count = callback(rs) + } + this.logger.info(s"executeQueryCall success. keyNum: ${keyNum} count: $count") + } { + this.release(sql, conn, stat, rs) + }(this.logger, s"${this.sqlBuriedPoint(sql, false)}", + s"executeQueryCall failed. keyNum:${keyNum}\n${this.sqlBuriedPoint(sql, false)}", finallyCatchLog) + } + + /** + * 释放jdbc资源的工具类 + * + * @param sql + * 对应的sql语句 + * @param conn + * 数据库连接 + * @param rs + * 查询结果集 + * @param stat + * jdbc statement + */ + def release(sql: String, conn: Connection, stat: Statement, rs: ResultSet, closeConnection: Boolean = true): Unit = { + try { + if (rs != null) rs.close() + } catch { + case e: SQLException => { + this.logger.error(s"close jdbc ResultSet failed. keyNum: ${keyNum}", e) + throw e + } + } finally { + try { + if (stat != null) stat.close() + } catch { + case e: SQLException => { + this.logger.error(s"close jdbc statement failed. keyNum: ${keyNum}", e) + throw e + } + } finally { + try { + if (conn != null && closeConnection) conn.close() + } catch { + case e: SQLException => { + this.logger.error(s"close jdbc connection failed. keyNum: ${keyNum}", e) + throw e + } + } + } + } + } + + /** + * 工具方法,截取给定的SQL语句 + */ + @Internal + private[this] def sqlBuriedPoint(sql: String, sink: Boolean = true): String = { + DatasourceManager.addSql(this.dbType, this.url, this.username, sql, sink) + StringsUtils.substring(sql, 0, this.logSqlLength) + } + +} + + +/** + * jdbc最基本的配置信息,如果配置文件中有,则会覆盖代码中的配置 + * + * @param url + * 数据库的url + * @param driverClass + * jdbc驱动名称 + * @param username + * 数据库用户名 + * @param password + * 数据库密码 + */ +case class JdbcConf(url: String, driverClass: String, username: String, password: String) + +/** + * 用于单例构建伴生类JdbcConnector的实例对象 + * 每个JdbcConnector实例使用keyNum作为标识,并且与每个关系型数据库一一对应 + */ +object JdbcConnector extends ConnectorFactory[JdbcConnector] with JdbcFunctions { + + /** + * 约定创建connector子类实例的方法 + */ + override protected def create(conf: Any = null, keyNum: Int = 1): JdbcConnector = { + requireNonEmpty(keyNum) + val connector = new JdbcConnector(conf.asInstanceOf[JdbcConf], keyNum) + logger.debug(s"创建JdbcConnector实例成功. keyNum=$keyNum") + connector + } +} \ No newline at end of file diff --git a/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/JdbcConnectorBridge.scala b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/JdbcConnectorBridge.scala new file mode 100644 index 0000000..c927200 --- /dev/null +++ b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/JdbcConnectorBridge.scala @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.jdbc + +import java.sql.{Connection, ResultSet} + +import scala.reflect.ClassTag + +/** + * jdbc操作简单封装 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-05-22 15:55 + */ +private[fire] trait JdbcConnectorBridge { + + /** + * 关系型数据库插入、删除、更新操作 + * + * @param sql + * 待执行的sql语句 + * @param params + * sql中的参数 + * @param connection + * 传递已有的数据库连接 + * @param commit + * 是否自动提交事务,默认为自动提交 + * @param closeConnection + * 是否关闭connection,默认关闭 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * 影响的记录数 + */ + def jdbcUpdate(sql: String, params: Seq[Any] = null, connection: Connection = null, commit: Boolean = true, closeConnection: Boolean = true, keyNum: Int = 1): Long = { + JdbcConnector.executeUpdate(sql, params, connection, commit, closeConnection, keyNum) + } + + /** + * 关系型数据库批量插入、删除、更新操作 + * + * @param sql + * 待执行的sql语句 + * @param paramsList + * sql的参数列表 + * @param connection + * 传递已有的数据库连接 + * @param commit + * 是否自动提交事务,默认为自动提交 + * @param closeConnection + * 是否关闭connection,默认关闭 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * 影响的记录数 + */ + def jdbcBatchUpdate(sql: String, paramsList: Seq[Seq[Any]] = null, connection: Connection = null, commit: Boolean = true, closeConnection: Boolean = true, keyNum: Int = 1): Array[Int] = { + JdbcConnector.executeBatch(sql, paramsList, connection, commit, closeConnection, keyNum) + } + + /** + * 执行查询操作,以JavaBean方式返回结果集 + * + * @param sql + * 查询语句 + * @param params + * sql执行参数 + * @param clazz + * JavaBean类型 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * 查询结果集 + */ + def jdbcQuery[T <: Object : ClassTag](sql: String, params: Seq[Any] = null, clazz: Class[T], connection: Connection = null, keyNum: Int = 1): List[T] = { + JdbcConnector.executeQuery[T](sql, params, clazz, connection, keyNum) + } + + /** + * 执行查询操作,并在QueryCallback对结果集进行处理 + * + * @param sql + * 查询语句 + * @param params + * sql执行参数 + * @param callback + * 查询回调 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + */ + def jdbcQueryCall(sql: String, params: Seq[Any] = null, callback: ResultSet => Int = null, connection: Connection = null, keyNum: Int = 1): Unit = { + JdbcConnector.executeQueryCall(sql, params, callback, connection, keyNum) + } +} diff --git a/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/JdbcFunctions.scala b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/JdbcFunctions.scala new file mode 100644 index 0000000..7041619 --- /dev/null +++ b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/JdbcFunctions.scala @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.jdbc + +import java.sql.{Connection, ResultSet} + +import scala.reflect.ClassTag + +/** + * Jdbc api集合 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 15:49 + */ +private[fire] trait JdbcFunctions { + + /** + * 根据指定的keyNum获取对应的数据库连接 + */ + def getConnection(keyNum: Int = 1): Connection = JdbcConnector(keyNum = keyNum).getConnection + + /** + * 更新操作 + * + * @param sql + * 待执行的sql语句 + * @param params + * sql中的参数 + * @param connection + * 传递已有的数据库连接,可满足跨api的同一事务提交的需求 + * @param commit + * 是否自动提交事务,默认为自动提交 + * @param closeConnection + * 是否关闭connection,默认关闭 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * 影响的记录数 + */ + def executeUpdate(sql: String, params: Seq[Any] = null, connection: Connection = null, commit: Boolean = true, closeConnection: Boolean = true, keyNum: Int = 1): Long = { + JdbcConnector(keyNum = keyNum).executeUpdate(sql, params, connection, commit, closeConnection) + } + + /** + * 执行批量更新操作 + * + * @param sql + * 待执行的sql语句 + * @param paramsList + * sql的参数列表 + * @param connection + * 传递已有的数据库连接,可满足跨api的同一事务提交的需求 + * @param commit + * 是否自动提交事务,默认为自动提交 + * @param closeConnection + * 是否关闭connection,默认关闭 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * 影响的记录数 + */ + def executeBatch(sql: String, paramsList: Seq[Seq[Any]] = null, connection: Connection = null, commit: Boolean = true, closeConnection: Boolean = true, keyNum: Int = 1): Array[Int] = { + JdbcConnector(keyNum = keyNum).executeBatch(sql, paramsList, connection, commit, closeConnection) + } + + /** + * 执行查询操作,以JavaBean方式返回结果集 + * + * @param sql + * 查询语句 + * @param params + * sql执行参数 + * @param clazz + * JavaBean类型 + * @param connection + * 传递已有的数据库连接,可满足跨api的同一事务提交的需求 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + */ + def executeQuery[T <: Object : ClassTag](sql: String, params: Seq[Any] = null, clazz: Class[T], connection: Connection = null, keyNum: Int = 1): List[T] = { + JdbcConnector(keyNum = keyNum).executeQuery(sql, params, clazz, connection) + } + + /** + * 执行查询操作 + * + * @param sql + * 查询语句 + * @param params + * sql执行参数 + * @param callback + * 查询回调 + * @param connection + * 传递已有的数据库连接,可满足跨api的同一事务提交的需求 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + */ + def executeQueryCall(sql: String, params: Seq[Any] = null, callback: ResultSet => Int = null, connection: Connection = null, keyNum: Int = 1): Unit = { + JdbcConnector(keyNum = keyNum).executeQueryCall(sql, params, callback, connection) + } +} diff --git a/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/conf/FireJdbcConf.scala b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/conf/FireJdbcConf.scala new file mode 100644 index 0000000..59d4969 --- /dev/null +++ b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/conf/FireJdbcConf.scala @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.jdbc.conf + +import com.zto.fire.common.util.PropUtils + +/** + * 关系型数据库连接池相关配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 14:56 + */ +private[fire] object FireJdbcConf { + // c3p0连接池相关配置 + lazy val JDBC_URL = "db.jdbc.url" + lazy val JDBC_URL_PREFIX = "db.jdbc.url.map." + lazy val JDBC_DRIVER = "db.jdbc.driver" + lazy val JDBC_USER = "db.jdbc.user" + lazy val JDBC_PASSWORD = "db.jdbc.password" + lazy val JDBC_ISOLATION_LEVEL = "db.jdbc.isolation.level" + lazy val JDBC_MAX_POOL_SIZE = "db.jdbc.maxPoolSize" + lazy val JDBC_MIN_POOL_SIZE = "db.jdbc.minPoolSize" + lazy val JDBC_ACQUIRE_INCREMENT = "db.jdbc.acquireIncrement" + lazy val JDBC_INITIAL_POOL_SIZE = "db.jdbc.initialPoolSize" + lazy val JDBC_MAX_IDLE_TIME = "db.jdbc.maxIdleTime" + lazy val JDBC_BATCH_SIZE = "db.jdbc.batch.size" + lazy val JDBC_FLUSH_INTERVAL = "db.jdbc.flushInterval" + lazy val JDBC_MAX_RETRY = "db.jdbc.max.retry" + // fire框架针对jdbc操作后数据集的缓存策略 + lazy val FIRE_JDBC_STORAGE_LEVEL = "fire.jdbc.storage.level" + // 通过JdbcConnector查询后将数据集放到多少个分区中,需根据实际的结果集做配置 + lazy val FIRE_JDBC_QUERY_REPARTITION = "fire.jdbc.query.partitions" + + // 默认的事务隔离级别 + lazy val jdbcIsolationLevel = "READ_UNCOMMITTED" + // 数据库批量操作的记录数 + lazy val jdbcBatchSize = 1000 + // fire框架针对jdbc操作后数据集的缓存策略 + lazy val jdbcStorageLevel = PropUtils.getString(this.FIRE_JDBC_STORAGE_LEVEL, "memory_and_disk_ser").toUpperCase + // 通过JdbcConnector查询后将数据集放到多少个分区中,需根据实际的结果集做配置 + lazy val jdbcQueryPartition = PropUtils.getInt(this.FIRE_JDBC_QUERY_REPARTITION, 10) + + // db.jdbc.url + def url(keyNum: Int = 1): String = PropUtils.getString(this.JDBC_URL, "", keyNum) + // jdbc url与别名映射 + lazy val jdbcUrlMap = PropUtils.sliceKeys(this.JDBC_URL_PREFIX) + // db.jdbc.driver + def driverClass(keyNum: Int = 1): String = PropUtils.getString(this.JDBC_DRIVER,"", keyNum) + // db.jdbc.user + def user(keyNum: Int = 1): String = PropUtils.getString(this.JDBC_USER, "", keyNum = keyNum) + // db.jdbc.password + def password(keyNum: Int = 1): String = PropUtils.getString(this.JDBC_PASSWORD, "", keyNum = keyNum) + // 事务的隔离级别:NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, SERIALIZABLE,默认为READ_UNCOMMITTED + def isolationLevel(keyNum: Int = 1): String = PropUtils.getString(this.JDBC_ISOLATION_LEVEL, this.jdbcIsolationLevel, keyNum) + // 批量操作的记录数 + def batchSize(keyNum: Int = 1): Int = PropUtils.getInt(this.JDBC_BATCH_SIZE, this.jdbcBatchSize, keyNum) + // 默认多少毫秒flush一次 + def jdbcFlushInterval(keyNum: Int = 1): Long = PropUtils.getLong(this.JDBC_FLUSH_INTERVAL, 1000, keyNum) + // jdbc失败最大重试次数 + def maxRetry(keyNum: Int = 1): Long = PropUtils.getLong(this.JDBC_MAX_RETRY, 3, keyNum) + // 连接池最小连接数 + def minPoolSize(keyNum: Int = 1): Int = PropUtils.getInt(this.JDBC_MIN_POOL_SIZE, 1, keyNum) + // 连接池初始化连接数 + def initialPoolSize(keyNum: Int = 1): Int = PropUtils.getInt(this.JDBC_INITIAL_POOL_SIZE, 1, keyNum) + // 连接池最大连接数 + def maxPoolSize(keyNum: Int = 1): Int = PropUtils.getInt(this.JDBC_MAX_POOL_SIZE, 5, keyNum) + // 连接池每次自增连接数 + def acquireIncrement(keyNum: Int = 1): Int = PropUtils.getInt(this.JDBC_ACQUIRE_INCREMENT, 1, keyNum) + // 多久释放没有用到的连接 + def maxIdleTime(keyNum: Int = 1): Int = PropUtils.getInt(this.JDBC_MAX_IDLE_TIME, 30, keyNum) + + /** + * 根据给定的jdbc url别名获取对应的jdbc地址 + */ + def jdbcUrl(keyNum: Int = 1): String = { + val url = this.url(keyNum) + this.jdbcUrlMap.getOrElse(url, url) + } +} diff --git a/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/util/DBUtils.scala b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/util/DBUtils.scala new file mode 100644 index 0000000..993d167 --- /dev/null +++ b/fire-connectors/fire-jdbc/src/main/scala/com/zto/fire/jdbc/util/DBUtils.scala @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.jdbc.util + +import java.sql.ResultSet +import java.util.{Date, Properties} + +import com.zto.fire.common.anno.FieldName +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.common.enu.Datasource +import com.zto.fire.common.util.ReflectionUtils +import com.zto.fire.jdbc.conf.FireJdbcConf +import org.apache.commons.lang3.StringUtils + +import scala.collection.mutable.ListBuffer +import scala.util.Try + +/** + * 关系型数据库操作工具类 + * + * @author ChengLong 2019-6-23 11:16:18 + */ +object DBUtils { + + /** + * 将row结果转为javabean + * + * @param row 数据库中的一条记录 + * @param clazz + * @tparam T + * @return + */ + def dbRow2Bean[T](row: ResultSet, clazz: Class[T]): T = { + val obj = clazz.newInstance() + clazz.getDeclaredFields.foreach(field => { + ReflectionUtils.setAccessible(field) + val fieldType = field.getType + val anno = field.getAnnotation(classOf[FieldName]) + val fieldName = if (anno != null && StringUtils.isNotBlank(anno.value())) anno.value() else field.getName + if (this.containsColumn(row, fieldName)) { + if (fieldType eq classOf[String]) field.set(obj, row.getString(fieldName)) + else if (fieldType eq classOf[java.lang.Integer]) field.set(obj, row.getInt(fieldName)) + else if (fieldType eq classOf[java.lang.Double]) field.set(obj, row.getDouble(fieldName)) + else if (fieldType eq classOf[java.lang.Long]) field.set(obj, row.getLong(fieldName)) + else if (fieldType eq classOf[java.math.BigDecimal]) field.set(obj, row.getBigDecimal(fieldName)) + else if (fieldType eq classOf[java.lang.Float]) field.set(obj, row.getFloat(fieldName)) + else if (fieldType eq classOf[java.lang.Boolean]) field.set(obj, row.getBoolean(fieldName)) + else if (fieldType eq classOf[java.lang.Short]) field.set(obj, row.getShort(fieldName)) + else if (fieldType eq classOf[java.util.Date]) field.set(obj, row.getDate(fieldName)) + } + }) + obj + } + + /** + * 将ResultSet结果转为javabean + * + * @param rs 数据库中的查询结果集 + * @param clazz + * @tparam T + * @return + */ + def dbResultSet2Bean[T](rs: ResultSet, clazz: Class[T]): ListBuffer[T] = { + val list = ListBuffer[T]() + val fields = clazz.getDeclaredFields + try { + while (rs.next()) { + var obj = clazz.newInstance() + fields.foreach(field => { + ReflectionUtils.setAccessible(field) + val fieldType = field.getType + val anno = field.getAnnotation(classOf[FieldName]) + if (!(anno != null && anno.disuse())) { + val fieldName = if (anno != null && StringUtils.isNotBlank(anno.value())) anno.value() else field.getName + if (this.containsColumn(rs, fieldName)) { + if (fieldType eq classOf[String]) field.set(obj, rs.getString(fieldName)) + else if (fieldType eq classOf[java.lang.Integer]) field.set(obj, rs.getInt(fieldName)) + else if (fieldType eq classOf[java.lang.Double]) field.set(obj, rs.getDouble(fieldName)) + else if (fieldType eq classOf[java.lang.Long]) field.set(obj, rs.getLong(fieldName)) + else if (fieldType eq classOf[java.math.BigDecimal]) field.set(obj, rs.getBigDecimal(fieldName)) + else if (fieldType eq classOf[java.lang.Float]) field.set(obj, rs.getFloat(fieldName)) + else if (fieldType eq classOf[java.lang.Boolean]) field.set(obj, rs.getBoolean(fieldName)) + else if (fieldType eq classOf[java.lang.Short]) field.set(obj, rs.getShort(fieldName)) + else if (fieldType eq classOf[Date]) field.set(obj, rs.getDate(fieldName)) + } + } + }) + list += obj + } + } catch { + case e: Exception => e.printStackTrace() + } + list + } + + /** + * 判断指定的结果集中是否包含指定的列名 + * + * @param rs + * 关系型数据库查询结果集 + * @param columnName + * 列名 + * @return + * true: 存在 false:不存在 + */ + def containsColumn(rs: ResultSet, columnName: String): Boolean = { + Try { + try { + rs.findColumn(columnName) + } + }.isSuccess + } + + /** + * 获取jdbc连接信息,若调用者指定,以调用者为准,否则读取配置文件 + * + * @param jdbcProps + * 调用者传入的jdbc配置信息 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * jdbc配置信息 + */ + def getJdbcProps(jdbcProps: Properties = null, keyNum: Int = 1): Properties = { + if (jdbcProps == null || jdbcProps.size() == 0) { + val defaultProps = new Properties() + defaultProps.setProperty("user", FireJdbcConf.user(keyNum)) + defaultProps.setProperty("password", FireJdbcConf.password(keyNum)) + defaultProps.setProperty("driver", FireJdbcConf.driverClass(keyNum)) + defaultProps.setProperty("batchsize", FireJdbcConf.batchSize(keyNum).toString) + defaultProps.setProperty("isolationLevel", FireJdbcConf.isolationLevel(keyNum).toUpperCase) + defaultProps + } else { + jdbcProps + } + } + + /** + * 根据jdbc驱动包名或数据库url区分连接的不同的数据库厂商标识 + */ + def dbTypeParser(driverClass: String, url: String): String = { + var dbType = "unknown" + Datasource.values().map(_.toString).foreach(datasource => { + if (driverClass.toUpperCase.contains(datasource)) dbType = datasource + }) + + // 尝试从url中的端口号解析,对结果进行校正,因为有些数据库使用的是mysql驱动,可以通过url中的端口号区分 + if (StringUtils.isNotBlank(url)) { + FireFrameworkConf.buriedPointDatasourceMap.foreach(kv => { + if (url.contains(kv._2)) dbType = kv._1.toUpperCase + }) + } + dbType + } + +} diff --git a/fire-connectors/fire-jdbc/src/test/java/com/zto/fire/common/db/bean/Student.java b/fire-connectors/fire-jdbc/src/test/java/com/zto/fire/common/db/bean/Student.java new file mode 100644 index 0000000..5a3ea0b --- /dev/null +++ b/fire-connectors/fire-jdbc/src/test/java/com/zto/fire/common/db/bean/Student.java @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.common.db.bean; + +import com.zto.fire.common.anno.FieldName; +import com.zto.fire.common.util.DateFormatUtils; +import com.zto.fire.common.util.JSONUtils; + +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.List; + +/** + * For test only. + * @author ChengLong + * @create 2020-11-13 17:46 + * @since 1.0.0 + */ +public class Student { + private Long id; + private String name; + private Integer age; + private BigDecimal height; + @FieldName(family = "data", value = "timestamp") + private String createTime; + private String nullField; + + public Student() { + } + + public Student(Long id, String name, Integer age, BigDecimal height) { + this.id = id; + this.name = name; + this.age = age; + this.height = height; + this.createTime = DateFormatUtils.formatCurrentDateTime(); + } + + public static List build(int count) { + List list = new ArrayList<>(count); + try { + for (int i = 1; i <= count; i++) { + list.add(new Student(Long.parseLong(i + ""), "root_" + i, i, new BigDecimal(i + "" + i + "." + i))); + Thread.sleep(500); + } + } catch (Exception e) { + e.printStackTrace(); + } + return list; + } + + @Override + public boolean equals(Object o) { + if (this == o) return true; + if (!(o instanceof Student)) return false; + Student student = (Student) o; + return id.equals(student.id); + } + + public Long getId() { + return id; + } + + public void setId(Long id) { + this.id = id; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public Integer getAge() { + return age; + } + + public void setAge(Integer age) { + this.age = age; + } + + public BigDecimal getHeight() { + return height; + } + + public void setHeight(BigDecimal height) { + this.height = height; + } + + public String getCreateTime() { + return createTime; + } + + public void setCreateTime(String createTime) { + this.createTime = createTime; + } + + public String getNullField() { + return nullField; + } + + public void setNullField(String nullField) { + this.nullField = nullField; + } + + @Override + public String toString() { + return JSONUtils.toJSONString(this); + } + +} diff --git a/fire-connectors/fire-jdbc/src/test/resources/JdbcConnectorTest.properties b/fire-connectors/fire-jdbc/src/test/resources/JdbcConnectorTest.properties new file mode 100644 index 0000000..57b0f3f --- /dev/null +++ b/fire-connectors/fire-jdbc/src/test/resources/JdbcConnectorTest.properties @@ -0,0 +1,25 @@ +spark.log.level = INFO +spark.log.level.fire_conf.com.zto.fire= info +# fire框架埋点日志开关,关闭以后将不再打印埋点日志 +spark.fire.log.enable = true +# 用于限定fire框架中sql日志的字符串长度 +spark.fire.log.sql.length = 100 + +# 定时解析埋点SQL的初始延迟(s) +spark.fire.buried_point.datasource.initialDelay = 1 +# 定时解析埋点SQL的执行频率(s) +spark.fire.buried_point.datasource.period = 5 + +# 关系型数据库连接信息 +spark.db.jdbc.url = jdbc:derby:memory:fire;create=true +spark.db.jdbc.driver = org.apache.derby.jdbc.EmbeddedDriver +spark.db.jdbc.maxPoolSize = 1 +spark.db.jdbc.user = fire +spark.db.jdbc.password = fire + +# 配置另一个数据源,对应的操作需对应加数字后缀,如:this.spark.jdbcQueryDF2(sql, Seq(1, 2, 3), classOf[Student]) +spark.db.jdbc.url3 = jdbc:derby:memory:fire2;create=true +spark.db.jdbc.driver3 = org.apache.derby.jdbc.EmbeddedDriver +spark.db.jdbc.maxPoolSize3 = 1 +spark.db.jdbc.user3 = fire +spark.db.jdbc.password3 = fire \ No newline at end of file diff --git a/fire-connectors/fire-jdbc/src/test/scala/com/zto/fire/jdbc/JdbcConnectorTest.scala b/fire-connectors/fire-jdbc/src/test/scala/com/zto/fire/jdbc/JdbcConnectorTest.scala new file mode 100644 index 0000000..c6b8807 --- /dev/null +++ b/fire-connectors/fire-jdbc/src/test/scala/com/zto/fire/jdbc/JdbcConnectorTest.scala @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.jdbc + +import com.zto.fire.common.anno.TestStep +import com.zto.fire.common.db.bean.Student +import com.zto.fire.common.util.{DatasourceManager, PropUtils} +import com.zto.fire.predef._ +import org.junit.Assert._ +import org.junit.{After, Before, Test} + +/** + * 用于测试JdbcConnector相关API + * + * @author ChengLong + * @since 1.1.2 + * @create 2020-11-30 14:23 + */ +class JdbcConnectorTest { + private var jdbc: JdbcConnector = _ + private var jdbc3: JdbcConnector = _ + private val tableName = "t_student" + private val createTable = + s""" + |CREATE TABLE $tableName( + | id BIGINT, + | name VARCHAR(100), + | age INT, + | createTime VARCHAR(20), + | length double, + | sex CHAR, + | rowkey VARCHAR(100) + |) + |""".stripMargin + + + @Before + def init: Unit = { + PropUtils.load("JdbcConnectorTest") + this.jdbc = JdbcConnector() + this.jdbc.executeUpdate(this.createTable) + this.jdbc3 = JdbcConnector(keyNum = 3) + this.jdbc3.executeUpdate(this.createTable) + } + + + @Test + @TestStep(step = 1, desc = "jdbc CRUD测试") + def testCRUD: Unit = { + val studentName = "root" + + val deleteSql = s"delete from $tableName where name=?" + this.jdbc.executeUpdate(deleteSql, Seq(studentName)) + this.jdbc3.executeUpdate(deleteSql, Seq(studentName)) + + val selectSql = s"select * from $tableName where name=?" + val studentList1 = this.jdbc.executeQuery(selectSql, Seq(studentName), classOf[Student]) + val studentList3 = this.jdbc3.executeQuery(selectSql, Seq(studentName), classOf[Student]) + assertEquals(studentList1.size, 0) + studentList1.foreach(println) + assertEquals(studentList3.size, 0) + studentList3.foreach(println) + + val insertSql = s"insert into $tableName(name, age, length) values(?, ?, ?)" + this.jdbc.executeUpdate(insertSql, Seq(studentName, 10, 10.3)) + this.jdbc3.executeUpdate(insertSql, Seq(studentName, 10, 10.3)) + + val studentList11 = this.jdbc.executeQuery(selectSql, Seq(studentName), classOf[Student]) + val studentList33 = this.jdbc3.executeQuery(selectSql, Seq(studentName), classOf[Student]) + assertEquals(studentList11.size, 1) + studentList11.foreach(println) + assertEquals(studentList33.size, 1) + studentList33.foreach(println) + + for (i <- 1 to 5) { + DatasourceManager.get.foreach(t => { + t._2.foreach(source => { + println("数据源:" + t._1.toString + " " + source) + }) + }) + println("=====================================") + Thread.sleep(1000) + } + } + + @After + def close: Unit = { + this.jdbc.executeUpdate(s"drop table $tableName") + this.jdbc3.executeUpdate(s"drop table $tableName") + } +} diff --git a/fire-connectors/pom.xml b/fire-connectors/pom.xml new file mode 100644 index 0000000..b97e1eb --- /dev/null +++ b/fire-connectors/pom.xml @@ -0,0 +1,74 @@ + + + + + 4.0.0 + fire-connectors_2.12 + pom + fire-connectors + + + com.zto.fire + fire-parent_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + fire-hbase + fire-jdbc + + + + + + com.zto.fire + fire-common_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-core_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-metrics_${scala.binary.version} + ${project.version} + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-core/pom.xml b/fire-core/pom.xml new file mode 100644 index 0000000..3b7305d --- /dev/null +++ b/fire-core/pom.xml @@ -0,0 +1,60 @@ + + + + + 4.0.0 + fire-core_${scala.binary.version} + jar + fire-core + + + com.zto.fire + fire-parent_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + + com.zto.fire + fire-common_${scala.binary.version} + ${project.version} + + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + + diff --git a/fire-core/src/main/java/com/zto/fire/core/TimeCost.java b/fire-core/src/main/java/com/zto/fire/core/TimeCost.java new file mode 100644 index 0000000..a50c290 --- /dev/null +++ b/fire-core/src/main/java/com/zto/fire/core/TimeCost.java @@ -0,0 +1,273 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core; + +import com.zto.fire.common.util.DateFormatUtils; +import com.zto.fire.common.util.ExceptionBus; +import com.zto.fire.common.util.OSUtils; +import org.apache.htrace.fasterxml.jackson.annotation.JsonIgnore; + +import java.io.Serializable; +import java.util.UUID; + +/** + * 用于记录任务的执行时间 + * + * @author ChengLong 2019-6-10 16:16:16 + */ +public class TimeCost implements Serializable { + // 异常信息 + private String msg; + // 耗时 + private Long timeCost; + private String ip; + private String load; + // 多核cpu使用率 + private String cpuUsage; + // 用于区分埋点日志和用户日志 + private boolean isFire = false; + private String id = UUID.randomUUID().toString(); + // 任务的applicationId + private static String applicationId; + // 任务的main方法 + private static String mainClass; + // executorId + private static String executorId; + private Integer stageId; + private Long taskId; + private Integer partitionId; + @JsonIgnore + private Throwable exception; + private String stackTraceInfo; + private String level = "WARN"; + private String module; + private Integer io; + private Long start; + private String startTime; + private String endTime; + + public String getId() { + return id; + } + + public String getLoad() { + return load; + } + + public String getMsg() { + return msg; + } + + public Long getTimeCost() { + if (this.timeCost == null) { + return System.currentTimeMillis() - this.start; + } + return timeCost; + } + + public String getStartTime() { + return startTime; + } + + public void setStartTime(String startTime) { + this.startTime = startTime; + } + + public String getEndTime() { + return endTime; + } + + public void setEndTime(String endTime) { + this.endTime = endTime; + } + + public String getIp() { + return ip; + } + + public Integer getStageId() { + return stageId; + } + + public Long getTaskId() { + return taskId; + } + + public Integer getPartitionId() { + return partitionId; + } + + public Boolean getIsFire() { + return isFire; + } + + public static String getApplicationId() { + return applicationId; + } + + public static void setApplicationId(String applicationId) { + TimeCost.applicationId = applicationId; + } + + public static String getExecutorId() { + return executorId; + } + + public static String getMainClass() { + return mainClass; + } + + public static void setExecutorId(String executorId) { + TimeCost.executorId = executorId; + } + + public static void setMainClass(String mainClass) { + TimeCost.mainClass = mainClass; + } + + public void setMsg(String msg) { + this.msg = msg; + } + + public void setTimeCost(Long timeCost) { + this.timeCost = timeCost; + } + + public Boolean getFire() { + return isFire; + } + + public void setFire(Boolean fire) { + isFire = fire; + } + + public void setIp(String ip) { + this.ip = ip; + } + + public void setLoad(String load) { + this.load = load; + } + + public void setStageId(Integer stageId) { + this.stageId = stageId; + } + + public void setTaskId(Long taskId) { + this.taskId = taskId; + } + + public void setPartitionId(Integer partitionId) { + this.partitionId = partitionId; + } + + public Long getStart() { + return start; + } + + public void setStart(Long start) { + this.start = start; + } + + public String getStackTraceInfo() { + return stackTraceInfo; + } + + public void setStackTraceInfo(String stackTraceInfo) { + this.stackTraceInfo = stackTraceInfo; + } + + public String getModule() { + return module; + } + + public Integer getIo() { + return io; + } + + public String getLevel() { + return level; + } + + public void setLevel(String level) { + this.level = level; + } + + public String getCpuUsage() { + return cpuUsage; + } + + public void setCpuUsage(String cpuUsage) { + this.cpuUsage = cpuUsage; + } + + private String lable() { + if (this.isFire) { + return "fire"; + } else { + return "user"; + } + } + + @Override + public String toString() { + String baseInfo = "【" + this.lable() + "Log】 〖" + this.msg + "〗 start:" + this.startTime + " end:" + this.endTime + " cost:" + this.getTimeCost() + " ip:" + this.ip + " load:" + this.load + " cpuUsage:" + this.cpuUsage + " executor:" + this.executorId; + if (!"driver".equalsIgnoreCase(this.executorId)) { + baseInfo += " stage:" + this.stageId + " task:" + this.taskId; + } + if (this.isFire) { + baseInfo += " module:" + this.module + " io:" + this.io; + } + return baseInfo; + } + + private TimeCost() { + this.start = System.currentTimeMillis(); + this.startTime = DateFormatUtils.formatCurrentDateTime(); + this.ip = OSUtils.getIp(); + } + + /** + * 构建一个TimCost对象 + * + * @return 返回TimeCost对象实例 + */ + public static TimeCost build() { + return new TimeCost(); + } + + /** + * 设置必要的参数 + * + * @return 当前对象 + */ + public TimeCost info(String msg, String module, Integer io, Boolean isFire, Throwable exception) { + this.timeCost = System.currentTimeMillis() - this.start; + this.endTime = DateFormatUtils.formatCurrentDateTime(); + this.exception = exception; + this.msg = msg; + this.module = module; + this.io = io; + if (isFire != null) this.isFire = isFire; + if (exception != null) { + this.stackTraceInfo = ExceptionBus.stackTrace(exception); + this.level = "ERROR"; + } + return this; + } +} \ No newline at end of file diff --git a/fire-core/src/main/java/com/zto/fire/core/task/SchedulerManager.java b/fire-core/src/main/java/com/zto/fire/core/task/SchedulerManager.java new file mode 100644 index 0000000..1b6d085 --- /dev/null +++ b/fire-core/src/main/java/com/zto/fire/core/task/SchedulerManager.java @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.task; + +import com.google.common.collect.Maps; +import com.zto.fire.common.anno.Scheduled; +import com.zto.fire.common.conf.FireFrameworkConf; +import com.zto.fire.common.util.DateFormatUtils; +import com.zto.fire.common.util.ReflectionUtils; +import org.apache.commons.lang3.StringUtils; +import org.apache.commons.lang3.time.DateUtils; +import org.quartz.*; +import org.quartz.impl.StdSchedulerFactory; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.Serializable; +import java.lang.reflect.Method; +import java.util.Date; +import java.util.Map; +import java.util.Properties; +import java.util.concurrent.atomic.AtomicBoolean; + +/** + * 定时任务管理器,内部使用Quartz框架 + * 为了适用于Spark,没有采用按包扫描的方式去扫描标记有@Scheduled的方法 + * 而是要主动通过TaskManager.registerTasks注册,然后扫描该实例中所有标记 + * 有@Scheduled的方法,并根据cron表达式定时执行 + * + * @author ChengLong 2019年11月4日 18:06:21 + * @since 0.3.5 + */ +public abstract class SchedulerManager implements Serializable { + // 用于指定当前spark任务的main方法所在的对象实例 + private static Map taskMap; + // 已注册的task列表 + private static Map alreadyRegisteredTaskMap; + // 定时调度实例 + private static Scheduler scheduler; + // 初始化标识 + private static AtomicBoolean isInit = new AtomicBoolean(false); + // 定时任务黑名单,存放带有@Scheduler标识的方法名 + private static Map blacklistMap = Maps.newHashMap(); + protected static final String DRIVER = "driver"; + protected static final String EXECUTOR = "executor"; + private static final String DEFAULT_COLOR = "\u001B[0m ] "; + private static final Logger logger = LoggerFactory.getLogger(SchedulerManager.class); + + static { + String blacklistMethod = FireFrameworkConf.schedulerBlackList(); + if (StringUtils.isNotBlank(blacklistMethod)) { + String[] methods = blacklistMethod.split(","); + for (String method : methods) { + if (StringUtils.isNotBlank(method)) { + blacklistMap.put(method.trim(), method); + } + } + } + } + + protected SchedulerManager() {} + + /** + * 初始化quartz + */ + protected static void init() { + if (isInit.compareAndSet(false, true)) { + taskMap = Maps.newConcurrentMap(); + alreadyRegisteredTaskMap = Maps.newConcurrentMap(); + try { + StdSchedulerFactory factory = new StdSchedulerFactory(); + Properties quartzProp = new Properties(); + quartzProp.setProperty("org.quartz.threadPool.threadCount", FireFrameworkConf.quartzMaxThread()); + factory.initialize(quartzProp); + scheduler = factory.getScheduler(); + } catch (Exception e) { + logger.error("初始化quartz发生异常", e); + } + } + } + + /** + * 添加待执行的任务列表类实例 + * + * @param tasks 带有@Scheduled的类的实例 + */ + protected void addScanTask(Object... tasks) { + if (tasks != null && tasks.length > 0) { + for (Object task : tasks) { + if (task != null) { + taskMap.put(task.getClass().getName(), task); + } + } + } + } + + + /** + * 判断当前是否为driver + * @return + */ + protected abstract String label(); + + /** + * 将标记有@Scheduled的类实例注册给定时调度管理器 + * 注:参数是类的实例而不是Class类型,是由于像Spark所在的object类型传入后,会被反射调用构造器创建另一个实例 + * 为了保证当前Spark任务所在的Object实例只有一个,约定传入的参数必须是类的实例而不是Class类型 + * + * @param taskInstances 具有@Scheduled注解类的实例 + */ + public synchronized void registerTasks(Object... taskInstances) { + try { + if (!FireFrameworkConf.scheduleEnable()) return; + SchedulerManager.init(); + addScanTask(taskInstances); + if (!taskMap.isEmpty()) { + for (Map.Entry entry : taskMap.entrySet()) { + // 已经注册过的任务不再重复注册 + if (alreadyRegisteredTaskMap.containsKey(entry.getKey())) continue; + + Class clazz = entry.getValue().getClass(); + if (clazz != null) { + Method[] methods = clazz.getDeclaredMethods(); + for (Method method : methods) { + if (method != null) { + ReflectionUtils.setAccessible(method); + if (blacklistMap.containsKey(method.getName())) continue; + Scheduled anno = method.getAnnotation(Scheduled.class); + String label = label(); + if (anno != null && StringUtils.isNotBlank(anno.scope()) && ("all".equalsIgnoreCase(anno.scope()) || anno.scope().equalsIgnoreCase(label))) { + // 通过anno.concurrent判断是否使用并发任务实例 + JobDetail job = (anno.concurrent() ? JobBuilder.newJob(TaskRunner.class) : JobBuilder.newJob(TaskRunnerQueue.class)).usingJobData(clazz.getName() + "#" + method.getName(), anno.cron()).build(); + TriggerBuilder triggerBuilder = TriggerBuilder.newTrigger(); + + if (StringUtils.isNotBlank(anno.cron())) { + // 优先执行cron表达式 + triggerBuilder.withSchedule(CronScheduleBuilder.cronSchedule(anno.cron())); + } else if (anno.fixedInterval() != -1) { + // 固定频率的调度器 + SimpleScheduleBuilder simpleScheduleBuilder = SimpleScheduleBuilder + .simpleSchedule().withIntervalInMilliseconds(anno.fixedInterval()); + // 设定重复执行的次数 + long repeatCount = anno.repeatCount(); + if (repeatCount == -1) { + simpleScheduleBuilder.repeatForever(); + } else { + simpleScheduleBuilder.withRepeatCount((int) repeatCount - 1); + } + triggerBuilder.withSchedule(simpleScheduleBuilder); + } + // 用于指定任务首次执行的时间 + if (StringUtils.isNotBlank(anno.startAt())) { + // startAt优先级较高 + triggerBuilder.startAt(DateFormatUtils.formatDateTime(anno.startAt())); + } else { + // 首次延迟多久(毫秒)开始执行 + if (anno.initialDelay() == 0) triggerBuilder.startNow(); + if (anno.initialDelay() != 0 && anno.initialDelay() != -1) + triggerBuilder.startAt(DateUtils.addMilliseconds(new Date(), (int) anno.initialDelay())); + } + // 添加到调度任务中 + if (scheduler == null) scheduler = StdSchedulerFactory.getDefaultScheduler(); + scheduler.scheduleJob(job, triggerBuilder.build()); + // 将已注册的task放到已注册标记列表中,防止重复注册同一个类的同一个定时方法 + alreadyRegisteredTaskMap.put(entry.getKey(), entry.getValue()); + String schedulerInfo = buildSchedulerInfo(anno); + logger.info("\u001B[33m---> 已注册定时任务[ {}.{} ],{}. \u001B[33m<---\u001B[0m", entry.getKey(), method.getName(), schedulerInfo); + } + } + } + } + } + if (alreadyRegisteredTaskMap.size() > 0) + scheduler.start(); + } + } catch (Exception e) { + logger.error("定时任务注册失败:作为定时任务的类必须可序列化,并且标记有@Scheduled的方法必须是无参的!", e); + } + } + + /** + * 用于描述定时任务的详细信息 + * + * @param anno Scheduled注解 + * @return 描述信息 + */ + protected String buildSchedulerInfo(Scheduled anno) { + if (anno == null) return "Scheduled为空"; + StringBuilder schedulerInfo = new StringBuilder("\u001B[31m调度信息\u001B[0m"); + if (StringUtils.isNotBlank(anno.scope())) { + schedulerInfo.append("[ 范围=\u001B[32m").append(anno.scope()).append(DEFAULT_COLOR); + } + if (StringUtils.isNotBlank(anno.cron())) { + schedulerInfo.append("[ 频率=\u001B[33m").append(anno.cron()).append(DEFAULT_COLOR); + } else if (anno.fixedInterval() != -1) { + schedulerInfo.append("[ 频率=\u001B[34m").append(anno.fixedInterval()).append(DEFAULT_COLOR); + } + if (anno.initialDelay() != -1) { + schedulerInfo.append("[ 延迟=\u001B[35m").append(anno.initialDelay()).append(DEFAULT_COLOR); + } + if (StringUtils.isNotBlank(anno.startAt())) { + schedulerInfo.append("[ 启动时间=\u001B[36m").append(anno.startAt()).append(DEFAULT_COLOR); + } + if (anno.repeatCount() != -1) { + schedulerInfo.append("[ 重复=\u001B[32m").append(anno.repeatCount()).append("\u001B[0m次 ] "); + } + return schedulerInfo.toString(); + } + + /** + * 通过execute方法调用传入的指定类的指定方法 + */ + public static void execute(JobExecutionContext context) { + try { + JobDataMap dataMap = context.getJobDetail().getJobDataMap(); + for (Map.Entry entry : dataMap.entrySet()) { + String key = entry.getKey(); + // 定时调用指定类的指定方法 + if (StringUtils.isNotBlank(key) && key.contains("#")) { + String[] classMethod = key.split("#"); + Class clazz = Class.forName(classMethod[0]); + Method method = clazz.getMethod(classMethod[1]); + Object instance = taskMap.get(classMethod[0]); + if (instance != null) method.invoke(instance); + } + } + } catch (Exception e) { + logger.error("执行execute发生异常", e); + } + } + + /** + * 用于判断当前的定时调度器是否已启动 + */ + public synchronized boolean schedulerIsStarted() { + if (scheduler == null) { + return false; + } + try { + return scheduler.isStarted(); + } catch (Exception e) { + logger.error("获取调度器是否启用失败", e); + } + return false; + } + + /** + * 关闭定时调度 + * + * @param waitForJobsToComplete 是否等待所有job全部执行完成再关闭 + */ + public static synchronized void shutdown(boolean waitForJobsToComplete) { + try { + if (scheduler != null && !scheduler.isShutdown()) { + scheduler.shutdown(waitForJobsToComplete); + scheduler = null; + taskMap.clear(); + alreadyRegisteredTaskMap.clear(); + logger.info("\u001B[33m---> 完成定时任务的资源回收. <---\u001B[0m"); + } + } catch (Exception e) { + logger.error("定时任务注册失败:作为定时任务的类必须可序列化,并且标记有@Scheduled的方法必须是无参的!", e); + } + } +} + diff --git a/fire-core/src/main/java/com/zto/fire/core/task/TaskRunner.java b/fire-core/src/main/java/com/zto/fire/core/task/TaskRunner.java new file mode 100644 index 0000000..0106238 --- /dev/null +++ b/fire-core/src/main/java/com/zto/fire/core/task/TaskRunner.java @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.task; + +import org.quartz.Job; +import org.quartz.JobExecutionContext; +import org.quartz.JobExecutionException; + +import java.io.Serializable; + +/** + * Scheduler TaskRunner + * @author ChengLong 2019年11月5日 09:59:33 + * @since 0.3.5 + */ +public class TaskRunner implements Job, Serializable { + @Override + public void execute(JobExecutionContext context) throws JobExecutionException { + SchedulerManager.execute(context); + } +} diff --git a/fire-core/src/main/java/com/zto/fire/core/task/TaskRunnerQueue.java b/fire-core/src/main/java/com/zto/fire/core/task/TaskRunnerQueue.java new file mode 100644 index 0000000..c63c162 --- /dev/null +++ b/fire-core/src/main/java/com/zto/fire/core/task/TaskRunnerQueue.java @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.task; + +import org.quartz.DisallowConcurrentExecution; + +/** + * 线程安全的方式执行定时任务,同一实例同一时刻只能有一个任务 + * @author ChengLong 2019年11月5日 09:59:33 + * @since 0.3.5 + */ +@DisallowConcurrentExecution +public class TaskRunnerQueue extends TaskRunner { +} diff --git a/fire-core/src/main/resources/cluster.properties b/fire-core/src/main/resources/cluster.properties new file mode 100644 index 0000000..985a0f2 --- /dev/null +++ b/fire-core/src/main/resources/cluster.properties @@ -0,0 +1,53 @@ +# ----------------------------------------------- < 集群 配置 > ------------------------------------------------ # +# ----------------------------------------------- < kafka 配置 > ----------------------------------------------- # +# kafka集群名称与集群地址映射:kafka.brokers.name=bigdata | kafka.brokers.name=zms +fire.kafka.cluster.map.bigdata = 192.168.0.1:9092,192.168.0.2:9092 +fire.kafka.cluster.map.zms = 192.168.0.3:9092,192.168.0.4:9092 + +# --------------------------------------------- < RocketMQ 配置 > ---------------------------------------------- # +rocket.cluster.map.bigdata = 192.168.0.1:9876;192.168.0.2:9876 +rocket.cluster.map.zms = 192.168.0.3;192.168.0.4:9876 + +# -------------------------------------------- < spark-hive 配置 > --------------------------------------------- # +# 离线集群hive metastore地址(别名:batch) +fire.hive.cluster.map.batch = thrift://192.168.0.1:9083,thrift://192.168.0.2:9083 +# 测试集群hive metastore地址(别名:test) +fire.hive.cluster.map.test = thrift://192.168.0.3:9083,thrift://192.168.0.4:9083 + +# -------------------------------------------- < flink-hive 配置 > --------------------------------------------- # +# 离线集群hive-site.xml存放路径(别名:batch) +flink.fire.hive.site.path.map.batch = /opt/apache/flink/conf/hive/batch +# 测试集群hive-site.xml存放路径(别名:test) +flink.fire.hive.site.path.map.test = /opt/apache/flink/conf/hive/test + +# ----------------------------------------------- < HDFS 配置 > ------------------------------------------------ # +# 用于是否启用HDFS HA +hdfs.ha.enable = true +# 离线hive集群的HDFS HA配置项,规则为统一的ha前缀:spark.hdfs.ha.conf.+hive.cluster名称+hdfs专门的ha配置 +hdfs.ha.conf.batch.fs.defaultFS = hdfs://nameservice1 +hdfs.ha.conf.batch.dfs.nameservices = nameservice1 +hdfs.ha.conf.batch.dfs.ha.namenodes.nameservice1 = namenode5231,namenode5229 +hdfs.ha.conf.batch.dfs.namenode.rpc-address.nameservice1.namenode5231 = 192.168.0.1:8020 +hdfs.ha.conf.batch.dfs.namenode.rpc-address.nameservice1.namenode5229 = 192.168.0.2:8020 +hdfs.ha.conf.batch.dfs.client.failover.proxy.provider.nameservice1 = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider + +hdfs.ha.conf.batch_new.fs.defaultFS = hdfs://nameservice1 +hdfs.ha.conf.batch_new.dfs.nameservices = nameservice1 +hdfs.ha.conf.batch_new.dfs.ha.namenodes.nameservice1 = namenode5231,namenode5229 +hdfs.ha.conf.batch_new.dfs.namenode.rpc-address.nameservice1.namenode5231 = 192.168.0.3:8020 +hdfs.ha.conf.batch_new.dfs.namenode.rpc-address.nameservice1.namenode5229 = 192.168.0.4:8020 +hdfs.ha.conf.batch_new.dfs.client.failover.proxy.provider.nameservice1 = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider + +# ----------------------------------------------- < HBase 配置 > ----------------------------------------------- # +# 离线集群hbase的zk地址(别名:batch) +fire.hbase.cluster.map.batch = 192.168.0.1:2181,192.168.0.2:2181 +# 测试集群hbase的zk地址(别名:test) +fire.hbase.cluster.map.test = 192.168.0.3:2181,192.168.0.4:2181 + +# --------------------------------------------- < 配置中心配置 > --------------------------------------------- # +# 配置中心接口调用秘钥 +fire.config_center.register.conf.secret = fire +# 配置中心注册与配置接口生产地址 +fire.config_center.register.conf.prod.address = http://192.168.0.1:8080/restUrl/xxx +# 配置中心注册与配置接口测试地址 +fire.config_center.register.conf.test.address = http://192.168.0.2:8080/restUrl/xxx \ No newline at end of file diff --git a/fire-core/src/main/resources/fire.properties b/fire-core/src/main/resources/fire.properties new file mode 100644 index 0000000..288e2f1 --- /dev/null +++ b/fire-core/src/main/resources/fire.properties @@ -0,0 +1,157 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# ----------------------------------------------- < fire 配置 > ------------------------------------------------ # +# 当前fire框架的版本号 +fire.version = ${project.version} +# fire内置线程池大小 +fire.thread.pool.size = 5 +# fire内置定时任务线程池大小 +fire.thread.pool.schedule.size = 5 +# 是否启用restful服务 +fire.rest.enable = true +# 用于设置是否做接口校验 +fire.rest.filter.enable = true +# 是否打印配置信息 +fire.conf.show.enable = true +# fire.conf.deploy.engine=className,在不同引擎实现模块中,指定具体可获取配置信息的EngineConf子类实现,用于同步配置到各container节点 +# 是否打印restful地址 +fire.rest.url.show.enable = false +# 是否启用hostname作为rest服务的访问地址 +fire.rest.url.hostname = false +# 是否关闭fire内置的所有累加器 +fire.acc.enable = true +# 日志累加器开关 +fire.acc.log.enable = true +# 多值累加器开关 +fire.acc.multi.counter.enable = true +# 多时间维度累加器开关 +fire.acc.multi.timer.enable = true +# fire框架埋点日志开关,关闭以后将不再打印埋点日志 +fire.log.enable = true +# 用于限定fire框架中sql日志的字符串长度 +fire.log.sql.length = 100 +# 是否启用为connector注册shutdown hook,当jvm退出前close +fire.connector.shutdown_hook.enable = false +# fire框架针对jdbc操作后数据集的缓存策略 +fire.jdbc.storage.level = memory_and_disk_ser +# 通过JdbcConnector查询后将数据集放到多少个分区中,需根据实际的结果集做配置 +fire.jdbc.query.partitions = 10 +# 是否启用定时调度 +fire.task.schedule.enable = true +# 是否启用动态配置 +fire.dynamic.conf.enable = true +# fire框架rest接口服务最大线程数 +fire.restful.max.thread = 8 +# quartz最大线程池大小 +fire.quartz.max.thread = 8 +# fire收集日志保留的最少记录数 +fire.acc.log.min.size = 500 +# fire收集日志保留的最多记录数 +fire.acc.log.max.size = 1000 +# timer累加器保留最大的记录数 +fire.acc.timer.max.size = 1000 +# timer累加器清理几小时之前的记录 +fire.acc.timer.max.hour = 12 +# env累加器开关 +fire.acc.env.enable = true +# env累加器保留最多的记录数 +fire.acc.env.max.size = 500 +# env累加器保留最少的记录数 +fire.acc.env.min.size = 100 +# 定时调度任务黑名单,配置的value为定时任务方法名,多个以逗号分隔 +fire.scheduler.blacklist = +# 配置打印黑名单,包含该配置将不被打印 +fire.conf.print.blacklist = .map.,pass,secret,zrc,connection,hdfs.ha,print.blacklist,yarn,namenode,metastore,address,redaction +# fire框架restful端口冲突重试次数 +fire.restful.port.retry_num = 3 +# fire框架restful端口冲突重试时间(ms) +fire.restful.port.retry_duration = 1000 +# 日志的级别,统一前缀为:fire.log.level.conf. +fire.log.level.conf.org.apache.spark = WARN +fire.log.level.conf.org.spark_project = WARN +fire.log.level.conf.org.apache.kafka = WARN +fire.log.level.conf.org.apache.zookeeper = WARN +fire.log.level.conf.com.zto.fire = INFO +fire.log.level.conf.org.eclipse.jetty.server = ERROR +# 是否将配置同步到executor、taskmanager端 +fire.deploy_conf.enable = true +# 每个jvm实例内部queue用于存放异常对象数最大大小,避免队列过大造成内存溢出 +fire.exception_bus.size = 1000 +# 是否开启数据源埋点 +fire.buried_point.datasource.enable = true +# 用于存放埋点的队列最大大小,超过该大小将会被丢弃 +fire.buried_point.datasource.max.size = 200 +# 定时解析埋点SQL的初始延迟(s) +fire.buried_point.datasource.initialDelay = 30 +# 定时解析埋点SQL的执行频率(s) +fire.buried_point.datasource.period = 60 +# 用于jdbc url的识别,当无法通过driver class识别数据源时,将从url中的端口号进行区分,不同数据配置使用统一的前缀:fire.buried_point.datasource.map. +fire.buried_point.datasource.map.tidb = 4000 +# 是否开启配置自适应前缀,自动为配置加上引擎前缀(spark.|flink.) +fire.conf.adaptive.prefix = true +# 用户统一配置文件,允许用户在该配置文件中存放公共的配置信息,优先级低于任务配置文件(多个以逗号分隔) +fire.user.common.conf = common.properties +# fire接口认证秘钥 +fire.rest.server.secret = fire +# 是否在调用shutdown方法时主动退出jvm进程 +fire.shutdown.auto.exit = false + +# ----------------------------------------------- < kafka 配置 > ----------------------------------------------- # +# kafka集群名称与集群地址映射,任务中通过kafka.brokers.name=local即可连到以下配置的broker地址 +# fire.kafka.cluster.map.local = localhost:9092,localhost02:9092 + +# ----------------------------------------------- < hive 配置 > ------------------------------------------------ # +# 默认的hive数据库 +fire.hive.default.database.name = tmp +# 默认的hive分区字段名称 +fire.hive.table.default.partition.name = ds +# 离线集群hive metastore地址(别名:local),任务中通过fire.hive.cluster=local即可连到一下配置的thrift地址 +# fire.hive.cluster.map.local = thrift://localhost:9083,thrift://localhost02:9083 + +# ----------------------------------------------- < HBase 配置 > ----------------------------------------------- # +# 一次读写HBase的数据量 +fire.hbase.batch.size = 10000 +# fire框架针对hbase操作后数据集的缓存策略 +fire.hbase.storage.level = memory_and_disk_ser +# 通过HBase scan后repartition的分区数,需根据scan后的数据量做配置 +fire.hbase.scan.partitions = -1 +# 后续版本会废弃,废弃后fire.hbase.scan.partitions默认值改为1200 +fire.hbase.scan.repartitions = 1200 +# 是否开启HBase表存在判断的缓存,开启后表存在判断将避免大量的connection消耗 +fire.hbase.table.exists.cache.enable = true +# 是否开启HBase表存在列表缓存的定时更新任务 +fire.hbase.table.exists.cache.reload.enable = true +# 定时刷新缓存HBase表任务的初始延迟(s) +fire.hbase.table.exists.cache.initialDelay = 60 +# 定时刷新缓存HBase表任务的执行频率(s) +fire.hbase.table.exists.cache.period = 600 +# hbase集群的zk地址(别名:local),任务中通过hbase.cluster=local即可连到对应的hbase集群 +# fire.hbase.cluster.map.local = localhost:2181,localhost02:2181 + +# hbase connection 配置,约定以:fire.hbase.conf.开头,比如:fire.hbase.conf.hbase.rpc.timeout对应hbase中的配置为hbase.rpc.timeout +fire.hbase.conf.hbase.zookeeper.property.clientPort = 2181 +fire.hbase.conf.zookeeper.znode.parent = /hbase +fire.hbase.conf.hbase.rpc.timeout = 600000 +fire.hbase.conf.hbase.snapshot.master.timeoutMillis = 600000 +fire.hbase.conf.hbase.snapshot.region.timeout = 600000 + +# --------------------------------------------- < 配置中心配置 > --------------------------------------------- # +# 注:配置中心系统异常时可设置为false,不受配置中心影响,可正常发布和运行 +fire.config_center.enable = false +# 本地运行环境下(Windows、Mac)是否调用配置中心接口获取配置信息 +fire.config_center.local.enable = false \ No newline at end of file diff --git a/fire-core/src/main/scala/com/zto/fire/core/Api.scala b/fire-core/src/main/scala/com/zto/fire/core/Api.scala new file mode 100644 index 0000000..f3a330a --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/Api.scala @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core + +/** + * Fire变量API + * + * @author ChengLong + * @since 1.0.0 + * @create 2021-01-12 17:16 + */ +private[fire] trait Api { + + /** + * 流的启动 + */ + def start: Any +} diff --git a/fire-core/src/main/scala/com/zto/fire/core/BaseFire.scala b/fire-core/src/main/scala/com/zto/fire/core/BaseFire.scala new file mode 100644 index 0000000..ec3cd17 --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/BaseFire.scala @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core + +import java.util.concurrent.atomic.AtomicBoolean +import java.util.concurrent.{ExecutorService, ScheduledExecutorService, TimeUnit} + +import com.zto.fire.predef._ +import com.zto.fire.common.conf.{FireFrameworkConf, FirePS1Conf} +import com.zto.fire.common.enu.{JobType, ThreadPoolType} +import com.zto.fire.common.util.{FireUtils, _} +import com.zto.fire.core.rest.{RestServerManager, SystemRestful} +import com.zto.fire.core.task.SchedulerManager +import org.apache.log4j.{Level, Logger} +import org.slf4j +import org.slf4j.LoggerFactory +import spark.Spark + +/** + * 通用的父接口,提供通用的生命周期方法约束 + * + * @author ChengLong 2020年1月7日 09:20:02 + * @since 0.4.1 + */ +trait BaseFire { + // 任务启动时间戳 + protected[fire] val startTime: Long = currentTime + // web ui地址 + protected[fire] var webUI, applicationId: String = _ + // main方法参数 + protected[fire] var args: Array[String] = _ + // 当前任务的类型标识 + protected[fire] val jobType = JobType.UNDEFINED + // fire框架内置的restful接口 + private[fire] var systemRestful: SystemRestful = _ + // restful接口注册 + private[fire] var restfulRegister: RestServerManager = _ + // 用于子类的锁状态判断,默认关闭状态 + protected[fire] lazy val lock = new AtomicBoolean(false) + // 是否已停止 + protected[fire] lazy val isStoped = new AtomicBoolean(false) + // 当前任务的类名(包名+类名) + protected[fire] lazy val className: JString = this.getClass.getName.replace("$", "") + // 当前任务的类名 + protected[fire] lazy val driverClass: JString = this.getClass.getSimpleName.replace("$", "") + protected[fire] lazy val logger: slf4j.Logger = LoggerFactory.getLogger(this.getClass) + // 默认的任务名称为类名 + protected[fire] var appName: JString = this.driverClass + // 配置信息 + protected lazy val conf = PropUtils + // fire内置线程池 + protected[fire] lazy val threadPool: ExecutorService = ThreadUtils.createThreadPool("FireThreadPool", ThreadPoolType.FIXED, FireFrameworkConf.threadPoolSize) + protected[fire] lazy val threadPoolSchedule: ScheduledExecutorService = ThreadUtils.createThreadPool("FireThreadPoolSchedule", ThreadPoolType.SCHEDULED, FireFrameworkConf.threadPoolSchedulerSize).asInstanceOf[ScheduledExecutorService] + this.boot() + + /** + * 生命周期方法:初始化fire框架必要的信息 + * 注:该方法会同时在driver端与executor端执行 + */ + private[fire] def boot(): Unit = { + FireUtils.splash + PropUtils.sliceKeys(FireFrameworkConf.FIRE_LOG_LEVEL_CONF_PREFIX).foreach(kv => Logger.getLogger(kv._1).setLevel(Level.toLevel(kv._2))) + } + + /** + * 在加载任务配置文件前将被加载 + */ + private[fire] def loadConf(): Unit = { + // 加载配置文件 + } + + /** + * 用于将不同引擎的配置信息、累计器信息等传递到executor端或taskmanager端 + */ + protected def deployConf(): Unit = { + // 用于在分布式环境下分发配置信息 + } + + /** + * 生命周期方法:用于在SparkSession初始化之前完成用户需要的动作 + * 注:该方法会在进行init之前自动被系统调用 + * + * @param args + * main方法参数 + */ + def before(args: Array[String]): Unit = { + // 生命周期方法,在init之前被调用 + } + + /** + * 生命周期方法:初始化运行信息 + * + * @param conf 配置信息 + * @param args main方法参数 + */ + def init(conf: Any = null, args: Array[String] = null): Unit = { + this.before(args) + this.logger.info(s" ${FirePS1Conf.YELLOW}---> 完成用户资源初始化,任务类型:${this.jobType.getJobTypeDesc} <--- ${FirePS1Conf.DEFAULT}") + this.args = args + this.createContext(conf) + } + + /** + * 创建计算引擎运行时环境 + * + * @param conf + * 配置信息 + */ + private[fire] def createContext(conf: Any): Unit + + /** + * 生命周期方法:具体的用户开发的业务逻辑代码 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + def process(): Unit + + /** + * 生命周期方法:用于资源回收与清理,子类复写实现具体逻辑 + * 注:该方法会在进行destroy之前自动被系统调用 + */ + def after(args: Array[String] = null): Unit = { + // 子类复写该方法,在destroy之前被调用 + } + + /** + * 生命周期方法:用于回收资源 + */ + def stop(): Unit + + /** + * 生命周期方法:进行fire框架的资源回收 + */ + protected[fire] def shutdown(stopGracefully: Boolean = true): Unit = { + if (this.isStoped.compareAndSet(false, true)) { + ThreadUtils.shutdown + Spark.stop() + SchedulerManager.shutdown(stopGracefully) + this.logger.info(s" ${FirePS1Conf.YELLOW}---> 完成fire资源回收 <---${FirePS1Conf.DEFAULT}") + this.logger.info(s"总耗时:${FirePS1Conf.RED}${timecost(startTime)}${FirePS1Conf.DEFAULT} The end...${FirePS1Conf.DEFAULT}") + if (FireFrameworkConf.shutdownExit) System.exit(0) + } + } + + /** + * 以子线程方式执行函数调用 + * + * @param fun + * 用于指定以多线程方式执行的函数 + * @param threadCount + * 表示开启多少个线程执行该fun任务 + */ + @deprecated + def runAsThread(fun: => Unit, threadCount: Int = 1, threadPool: ExecutorService = this.threadPool): Unit = { + ThreadUtils.runAsThread(threadPool, fun, threadCount) + } + + /** + * 以子线程while循环方式循环执行函数调用 + * + * @param fun + * 用于指定以多线程方式执行的函数 + * @param delay + * 循环调用间隔时间(单位s) + */ + @deprecated + def runAsThreadLoop(fun: => Unit, delay: Long = 10, threadCount: Int = 1, threadPool: ExecutorService = this.threadPool): Unit = { + ThreadUtils.runAsThreadLoop(threadPool, fun, delay, threadCount) + } + + /** + * 定时调度给定的函数 + * + * @param fun + * 定时执行的任务函数引用 + * @param initialDelay + * 第一次延迟执行的时长 + * @param period + * 每隔指定的时长执行一次 + * @param rate + * true:表示周期性的执行,不受上一个定时任务的约束 + * false:表示当上一次周期性任务执行成功后,period后开始执行 + * @param timeUnit + * 时间单位,默认分钟 + * @param threadCount + * 表示开启多少个线程执行该fun任务 + */ + @deprecated + def runAsSchedule(fun: => Unit, initialDelay: Long, period: Long, rate: Boolean = true, timeUnit: TimeUnit = TimeUnit.MINUTES, threadCount: Int = 1, threadPoolSchedule: ScheduledExecutorService = this.threadPoolSchedule): Unit = { + ThreadUtils.runAsSchedule(threadPoolSchedule, fun, initialDelay, period, rate, timeUnit, threadCount) + } + +} diff --git a/fire-core/src/main/scala/com/zto/fire/core/conf/EngineConf.scala b/fire-core/src/main/scala/com/zto/fire/core/conf/EngineConf.scala new file mode 100644 index 0000000..53b0395 --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/conf/EngineConf.scala @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.conf + +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.common.util.ReflectionUtils +import org.slf4j.LoggerFactory + +import scala.collection.immutable + +/** + * 用于获取不同计算引擎的全局配置信息,同步到fire框架中,并传递到每一个分布式实例 + * + * @author ChengLong + * @since 2.0.0 + * @create 2021-03-02 10:48 + */ +private[fire] trait EngineConf { + protected lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 获取引擎的所有配置信息 + */ + def getEngineConf: Map[String, String] +} + +/** + * 用于获取不同引擎的配置信息 + */ +private[fire] object EngineConfHelper extends EngineConf { + + /** + * 通过反射获取不同引擎的配置信息 + */ + override def getEngineConf: Map[String, String] = { + var clazz: Class[_] = null + try { + clazz = Class.forName(FireFrameworkConf.confDeployEngine) + } catch { + case e: Exception => logger.error(s"未找到引擎配置获取实现类${FireFrameworkConf.confDeployEngine},无法进行配置同步", e) + } + + if (clazz != null) { + val method = clazz.getDeclaredMethod("getEngineConf") + ReflectionUtils.setAccessible(method) + method.invoke(clazz.newInstance()).asInstanceOf[immutable.Map[String, String]] + } else Map.empty + } + +} diff --git a/fire-core/src/main/scala/com/zto/fire/core/connector/Connector.scala b/fire-core/src/main/scala/com/zto/fire/core/connector/Connector.scala new file mode 100644 index 0000000..cfba277 --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/connector/Connector.scala @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.connector + +import com.zto.fire.common.conf.FireFrameworkConf + +import java.util.concurrent.ConcurrentHashMap +import com.zto.fire.predef._ +import com.zto.fire.common.util.ShutdownHookManager +import org.slf4j.{Logger, LoggerFactory} + +/** + * connector父接口,约定了open与close方法,子类需要根据具体 + * 情况覆盖这两个方法。这两个方法不需要子类主动调用,会被自动调用 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-11-27 10:32 + */ +private[fire] trait Connector extends Serializable { + protected lazy val logger: Logger = LoggerFactory.getLogger(this.getClass) + this.hook() + + /** + * 用于注册释放资源 + */ + private[this] def hook(): Unit = { + if (FireFrameworkConf.connectorShutdownHookEnable) { + ShutdownHookManager.addShutdownHook() { () => { + this.close() + logger.info("release connector successfully.") + } + } + } + } + + /** + * connector资源初始化 + */ + protected[fire] def open(): Unit = { + this.logger.debug("init connector.") + } + + /** + * connector资源释放 + */ + protected def close(): Unit = { + this.logger.debug("close connector.") + } +} + +/** + * 支持多集群的connector + * + * @param keyNum + * 对应的connector实例标识,不同的keyNum对应不同的集群连接实例 + */ +private[fire] abstract class FireConnector(keyNum: Int = 1) extends Connector + +/** + * 用于根据指定的keyNum创建不同的connector实例 + */ +private[fire] abstract class ConnectorFactory[T <: Connector] extends Serializable { + @transient + private[fire] lazy val instanceMap = new ConcurrentHashMap[Int, T]() + @transient + protected lazy val logger: Logger = LoggerFactory.getLogger(this.getClass) + + /** + * 约定创建connector子类实例的方法 + */ + protected def create(conf: Any = null, keyNum: Int = 1): T + + /** + * 根据指定的keyNum返回单例的HBaseConnector实例 + */ + def getInstance(keyNum: Int = 1): T = this.instanceMap.get(keyNum) + + /** + * 创建指定集群标识的connector对象实例 + */ + def apply(conf: Any = null, keyNum: Int = 1): T = { + this.instanceMap.mergeGet(keyNum) { + val instance: T = this.create(conf, keyNum) + instance.open() + instance + } + } +} \ No newline at end of file diff --git a/fire-core/src/main/scala/com/zto/fire/core/ext/BaseFireExt.scala b/fire-core/src/main/scala/com/zto/fire/core/ext/BaseFireExt.scala new file mode 100644 index 0000000..585895d --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/ext/BaseFireExt.scala @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.ext + +import com.zto.fire.common.util.Tools + +/** + * 隐式转换基类 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-16 15:55 + */ +trait BaseFireExt extends Tools diff --git a/fire-core/src/main/scala/com/zto/fire/core/ext/Provider.scala b/fire-core/src/main/scala/com/zto/fire/core/ext/Provider.scala new file mode 100644 index 0000000..b325ddd --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/ext/Provider.scala @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.ext + +import org.slf4j.LoggerFactory + +/** + * 为上层扩展层提供api集合 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 17:52 + */ +trait Provider { + protected lazy val logger = LoggerFactory.getLogger(this.getClass) +} diff --git a/fire-core/src/main/scala/com/zto/fire/core/rest/RestCase.scala b/fire-core/src/main/scala/com/zto/fire/core/rest/RestCase.scala new file mode 100644 index 0000000..98bf989 --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/rest/RestCase.scala @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.rest + +import spark.{Request, Response} + +/** + * 用于封装rest的相关信息 + * + * @param method + * rest的提交方式:GET/POST/PUT/DELETE等 + * @param path + * rest服务地址 + * @author ChengLong 2019-3-16 09:58:06 + */ +private[fire] case class RestCase(method: String, path: String, fun: (Request, Response) => AnyRef) diff --git a/fire-core/src/main/scala/com/zto/fire/core/rest/RestServerManager.scala b/fire-core/src/main/scala/com/zto/fire/core/rest/RestServerManager.scala new file mode 100644 index 0000000..2a40691 --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/rest/RestServerManager.scala @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.rest + +import java.net.ServerSocket + +import com.zto.fire.common.bean.rest.ResultMsg +import com.zto.fire.common.conf.{FireFrameworkConf, FirePS1Conf} +import com.zto.fire.common.enu.ErrorCode +import com.zto.fire.common.util.{EncryptUtils, OSUtils, PropUtils, ThreadUtils} +import com.zto.fire.predef._ +import org.slf4j.LoggerFactory +import spark._ + +import scala.collection.mutable._ + +/** + * Fire框架的rest服务管理器 + * + * @author ChengLong 2019-3-16 09:56:56 + */ +private[fire] class RestServerManager { + private[this] var port: JInt = null + private[this] var restPrefix: String = _ + private[this] var socket: ServerSocket = _ + private[this] lazy val restList = ListBuffer[RestCase]() + private[this] lazy val logger = LoggerFactory.getLogger(this.getClass) + private[this] lazy val mainClassName: String = FireFrameworkConf.driverClassName + private[this] lazy val threadPool = ThreadUtils.createThreadPool("FireRestServerPool") + + /** + * 注册新的rest接口 + * + * @param rest + * rest的封装信息 + * @return + */ + private[fire] def addRest(rest: RestCase): this.type = { + this.restList += rest + this + } + + /** + * 获取Fire RestServer占用的端口号 + */ + def restPort: Int = this.port + + /** + * 为rest服务指定监听端口 + */ + private[fire] def startRestPort(port: Int = 0): this.type = this.synchronized { + if (this.port == null && !RestServerManager.isStarted) { + Spark.threadPool(FireFrameworkConf.restfulMaxThread, 2, -1) + // 端口占用失败默认重试3次 + if (port == 0) { + retry(FireFrameworkConf.restfulPortRetryNum, FireFrameworkConf.restfulPortRetryDuration) { + val randomPort = OSUtils.getRundomPort + Spark.port(randomPort) + this.port = randomPort + } + } else { + Spark.port(port) + this.port = port + } + // 获取到未被占用的端口后,rest server不会立即绑定,为了避免被其他应用占用 + // 此处使用ServerSocket占用该端口,等真正启动rest server前再关闭该ServerSocket以便释放端口 + this.socket = new ServerSocket(this.port) + // 接口地址:hostname还是以ip地址 + val address = if (FireFrameworkConf.restUrlHostname) OSUtils.getHostName else OSUtils.getIp + this.restPrefix = s"http://$address:${this.port}" + PropUtils.setProperty(FireFrameworkConf.FIRE_REST_URL, s"$restPrefix") + } + this + } + + /** + * 注册并以子线程方式开启rest服务 + */ + private[fire] def startRestServer: Unit = this.synchronized { + if (!FireFrameworkConf.restEnable || RestServerManager.isStarted) return + RestServerManager.isStarted = true + if (this.port == null) this.startRestPort() + // 批量注册接口地址 + this.threadPool.execute(new Runnable { + override def run(): Unit = { + // 释放Socket占用的端口给RestServer使用,避免被其他服务所占用 + if (socket != null && !socket.isClosed) socket.close() + restList.filter(_ != null).foreach(rest => { + if (FireFrameworkConf.fireRestUrlShow) logger.info(s"---------> start rest: ${FirePS1Conf.wrap(restPrefix + rest.path, FirePS1Conf.BLUE, FirePS1Conf.UNDER_LINE)} successfully. <---------") + rest.method match { + case "get" | "GET" => Spark.get(rest.path, new Route { + override def handle(request: Request, response: Response): AnyRef = { + rest.fun(request, response) + } + }) + case "post" | "POST" => Spark.post(rest.path, new Route { + override def handle(request: Request, response: Response): AnyRef = { + rest.fun(request, response) + } + }) + case "put" | "PUT" => Spark.put(rest.path, new Route { + override def handle(request: Request, response: Response): AnyRef = { + rest.fun(request, response) + } + }) + case "delete" | "DELETE" => Spark.delete(rest.path, new Route { + override def handle(request: Request, response: Response): AnyRef = { + rest.fun(request, response) + } + }) + } + }) + + // 注册过滤器,用于进行权限校验 + Spark.before(new Filter { + override def handle(request: Request, response: Response): Unit = { + if (FireFrameworkConf.restFilter) { + val msg = checkAuth(request) + if (msg.getCode != null && ErrorCode.UNAUTHORIZED == msg.getCode) { + Spark.halt(401, msg.toString) + } + } + } + }) + } + }) + } + + /** + * 通过header进行用户权限校验 + */ + private[fire] def checkAuth(request: Request): ResultMsg = { + val msg = new ResultMsg + val auth = request.headers("Authorization") + try { + if (!EncryptUtils.checkAuth(auth, this.mainClassName)) { + this.logger.warn(s"非法请求:用户身份校验失败!ip=${request.ip()} auth=$auth") + msg.buildError(s"非法请求:用户身份校验失败!ip=${request.ip()}", ErrorCode.UNAUTHORIZED) + } + } catch { + case e: Exception => { + this.logger.error(s"非法请求:请检查请求参数!ip=${request.ip()} auth=$auth", e) + msg.buildError(s"非法请求:请检查请求参数!ip=${request.ip()}", ErrorCode.UNAUTHORIZED) + } + } + msg + } +} + +private[fire] object RestServerManager { + private[RestServerManager] var isStarted = false + + /** + * 用于判断fire rest是否启动 + */ + def serverStarted:Boolean = this.isStarted +} diff --git a/fire-core/src/main/scala/com/zto/fire/core/rest/SystemRestful.scala b/fire-core/src/main/scala/com/zto/fire/core/rest/SystemRestful.scala new file mode 100644 index 0000000..76a1cb9 --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/rest/SystemRestful.scala @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.rest + +import com.zto.fire.common.anno.Rest +import com.zto.fire.common.bean.rest.ResultMsg +import com.zto.fire.common.enu.{Datasource, ErrorCode} +import com.zto.fire.common.util.{DatasourceDesc, JSONUtils} +import com.zto.fire.core.BaseFire +import com.zto.fire.predef.{JHashMap, JHashSet, _} +import org.slf4j.{Logger, LoggerFactory} +import spark.{Request, Response} + +/** + * 系统预定义的restful服务抽象 + * + * @author ChengLong 2020年4月2日 13:58:08 + */ +protected[fire] abstract class SystemRestful(engine: BaseFire) { + protected lazy val logger: Logger = LoggerFactory.getLogger(this.getClass) + // 用于记录当前任务所访问的数据源 + private lazy val datasourceMap = new JConcurrentHashMap[Datasource, JHashSet[DatasourceDesc]]() + this.register + + /** + * 注册接口 + */ + protected def register: Unit + + /** + * 获取当前任务所使用到的数据源信息 + * + * @return + * 数据源列表 + */ + @Rest("/system/datasource") + protected def datasource(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + try { + val dataSource = JSONUtils.toJSONString(this.datasourceMap) + this.logger.info(s"[DataSource] 获取数据源列表成功:counter=$dataSource") + msg.buildSuccess(dataSource, "获取数据源列表成功") + } catch { + case e: Exception => { + this.logger.error(s"[log] 获取数据源列表失败", e) + msg.buildError("获取数据源列表失败", ErrorCode.ERROR) + } + } + } + + @Rest("/system/collectDatasource") + def collectDatasource(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + try { + val json = request.body() + val datasource = JSONUtils.parseObject[JHashMap[Datasource, JHashSet[DatasourceDesc]]](json) + if (datasource.nonEmpty) this.datasourceMap.putAll(datasource) + msg.buildSuccess(datasource, "添加数据源列表成功") + }catch { + case e: Exception => { + this.logger.error(s"[log] 添加数据源列表失败", e) + msg.buildError("添加数据源列表失败", ErrorCode.ERROR) + } + } + } +} diff --git a/fire-core/src/main/scala/com/zto/fire/core/task/FireInternalTask.scala b/fire-core/src/main/scala/com/zto/fire/core/task/FireInternalTask.scala new file mode 100644 index 0000000..0e95665 --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/task/FireInternalTask.scala @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.task + +import com.zto.fire.common.bean.runtime.RuntimeInfo +import com.zto.fire.common.conf.{FireFrameworkConf, FirePS1Conf} +import com.zto.fire.common.util.UnitFormatUtils.DateUnitEnum +import com.zto.fire.common.util._ +import com.zto.fire.core.BaseFire +import com.zto.fire.predef._ +import org.apache.commons.httpclient.Header +import org.slf4j.LoggerFactory + +/** + * Fire框架内部的定时任务 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-07-14 11:02 + */ +private[fire] class FireInternalTask(baseFire: BaseFire) extends Serializable { + protected lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * fire框架内部接口调用工具 + * + * @param urlSuffix + * 接口后缀 + * @param json + * 请求参数 + * @return + * 接口响应结果 + */ + protected def restInvoke(urlSuffix: String, json: String): String = { + var response: String = "" + if (FireFrameworkConf.restEnable && noEmpty(FireFrameworkConf.fireRestUrl, urlSuffix)) { + val restful = FireFrameworkConf.fireRestUrl + urlSuffix + try { + val secret = EncryptUtils.md5Encrypt(FireFrameworkConf.restServerSecret + this.baseFire.className + DateFormatUtils.formatCurrentDate) + response = if (noEmpty(json)) { + HttpClientUtils.doPost(restful, json, new Header("Content-Type", "application/json"), new Header("Authorization", secret)) + } else { + HttpClientUtils.doGet(restful, new Header("Content-Type", "application/json"), new Header("Authorization", secret)) + } + } catch { + case e: Exception => logger.warn(s"fire内部接口自调用失败,对用户任务无影响,可忽略。异常描述:${e.getMessage}") + } + } + response + } + + /** + * 定时采集运行时的jvm、gc、thread、cpu、memory、disk等信息 + * 并将采集到的数据存放到EnvironmentAccumulator中 + */ + def jvmMonitor: Unit = { + val runtimeInfo = RuntimeInfo.getRuntimeInfo + if (runtimeInfo != null && logger != null) { + LogUtils.logStyle(this.logger, s"Jvm信息:${runtimeInfo.getIp}")(logger => { + val jvmInfo = runtimeInfo.getJvmInfo + val cpuInfo = runtimeInfo.getCpuInfo + val threadInfo = runtimeInfo.getThreadInfo + logger.info( + s"""${FirePS1Conf.PINK} + |GC -> YGC: ${jvmInfo.getMinorGCCount} YGCT: ${UnitFormatUtils.readable(jvmInfo.getMinorGCTime, UnitFormatUtils.TimeUnitEnum.MS)} FGC: ${jvmInfo.getFullGCCount} FGCT: ${UnitFormatUtils.readable(jvmInfo.getFullGCTime, UnitFormatUtils.TimeUnitEnum.MS)} + |OnHeap -> Total: ${UnitFormatUtils.readable(jvmInfo.getMemoryTotal, DateUnitEnum.BYTE)} Used: ${UnitFormatUtils.readable(jvmInfo.getMemoryUsed, DateUnitEnum.BYTE)} Free: ${UnitFormatUtils.readable(jvmInfo.getMemoryFree, DateUnitEnum.BYTE)} HeapMax: ${UnitFormatUtils.readable(jvmInfo.getHeapMaxSize, DateUnitEnum.BYTE)} HeapUsed: ${UnitFormatUtils.readable(jvmInfo.getHeapUseSize, DateUnitEnum.BYTE)} Committed: ${UnitFormatUtils.readable(jvmInfo.getHeapCommitedSize, DateUnitEnum.BYTE)} + |OffHeap -> Total: ${UnitFormatUtils.readable(jvmInfo.getNonHeapMaxSize, DateUnitEnum.BYTE)} Used: ${UnitFormatUtils.readable(jvmInfo.getNonHeapUseSize, DateUnitEnum.BYTE)} Committed: ${UnitFormatUtils.readable(jvmInfo.getNonHeapCommittedSize, DateUnitEnum.BYTE)} + |CPUInfo -> Load: ${cpuInfo.getCpuLoad} LoadAverage: ${cpuInfo.getLoadAverage.mkString(",")} IoWait: ${cpuInfo.getIoWait} IrqTick: ${cpuInfo.getIrqTick} + |Thread -> Total: ${threadInfo.getTotalCount} TotalStarted: ${threadInfo.getTotalStartedCount} Peak: ${threadInfo.getPeakCount} Deamon: ${threadInfo.getDeamonCount} CpuTime: ${UnitFormatUtils.readable(threadInfo.getCpuTime, UnitFormatUtils.TimeUnitEnum.MS)} UserTime: ${UnitFormatUtils.readable(threadInfo.getUserTime, UnitFormatUtils.TimeUnitEnum.MS)} ${FirePS1Conf.DEFAULT} + |""".stripMargin) + }) + } + } +} diff --git a/fire-core/src/main/scala/com/zto/fire/core/util/SingletonFactory.scala b/fire-core/src/main/scala/com/zto/fire/core/util/SingletonFactory.scala new file mode 100644 index 0000000..7bf20dc --- /dev/null +++ b/fire-core/src/main/scala/com/zto/fire/core/util/SingletonFactory.scala @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.core.util + +import com.zto.fire.common.util.ValueUtils + +/** + * 单例工厂 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-18 14:02 + */ +private[fire] trait SingletonFactory { + @transient protected[this] var appName: String = _ + + /** + * 设置TableEnv实例 + */ + protected[fire] def setAppName(appName: String): this.type = { + if (ValueUtils.noEmpty(appName) && ValueUtils.isEmpty(this.appName)) this.appName = appName + this + } +} diff --git a/fire-engines/fire-flink/pom.xml b/fire-engines/fire-flink/pom.xml new file mode 100644 index 0000000..1ca24c4 --- /dev/null +++ b/fire-engines/fire-flink/pom.xml @@ -0,0 +1,250 @@ + + + + + 4.0.0 + fire-flink_${flink.reference} + jar + fire-flink + + + com.zto.fire + fire-engines_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + + com.sparkjava + spark-core + ${sparkjava.version} + + + javax.servlet + javax.servlet-api + 3.1.0 + + + + org.apache.flink + flink-java + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-scala_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-streaming-scala_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-clients_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-runtime-web_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-queryable-state-runtime_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-queryable-state-client-java + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-statebackend-rocksdb_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-kafka_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.kafka + kafka_${scala.binary.version} + ${kafka.version} + ${maven.scope} + + + org.apache.flink + flink-table-api-java + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-api-java-bridge_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-api-scala-bridge_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-planner_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-planner-blink_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-common + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-hive_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-jdbc_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-elasticsearch-base_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-hadoop-compatibility_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-shaded-hadoop-2-uber + 2.6.5-8.0 + ${maven.scope} + + + + + org.apache.hive + hive-exec + ${hive.apache.version} + ${maven.scope} + + + + + org.apache.hbase + hbase-common + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-server + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-client_${scala.binary.version} + ${hbase.version} + ${maven.scope} + + + + + org.apache.rocketmq + rocketmq-flink_${flink.major.version}_${scala.binary.version} + ${rocketmq.external.version} + ${maven.scope} + + + org.apache.rocketmq + rocketmq-client + ${rocketmq.version} + ${maven.scope} + + + org.apache.rocketmq + rocketmq-acl + ${rocketmq.version} + ${maven.scope} + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/bean/FlinkTableSchema.java b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/bean/FlinkTableSchema.java new file mode 100644 index 0000000..6d20557 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/bean/FlinkTableSchema.java @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.bean; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.api.TableException; +import org.apache.flink.table.api.TableSchema; +import org.apache.flink.table.types.DataType; +import org.apache.flink.table.types.utils.TypeConversions; +import org.apache.flink.util.Preconditions; + +import java.io.Serializable; +import java.util.*; + +import static org.apache.flink.table.types.utils.TypeConversions.fromDataTypeToLegacyInfo; + +/** + * flink表模式,支持序列化 + * @author ChengLong 2020年1月16日 16:56:23 + */ +public class FlinkTableSchema implements Serializable { + private final String[] fieldNames; + private final DataType[] fieldDataTypes; + private final Map fieldNameToIndex; + + public FlinkTableSchema(TableSchema schema) { + this(schema.getFieldNames(), schema.getFieldDataTypes()); + } + + private FlinkTableSchema(String[] fieldNames, DataType[] fieldDataTypes) { + this.fieldNames = Preconditions.checkNotNull(fieldNames); + this.fieldDataTypes = Preconditions.checkNotNull(fieldDataTypes); + + fieldNameToIndex = new HashMap<>(); + final Set duplicateNames = new HashSet<>(); + final Set uniqueNames = new HashSet<>(); + for (int i = 0; i < fieldNames.length; i++) { + // check for null + Preconditions.checkNotNull(fieldDataTypes[i]); + final String fieldName = Preconditions.checkNotNull(fieldNames[i]); + + // collect indices + fieldNameToIndex.put(fieldName, i); + + // check uniqueness of field names + if (uniqueNames.contains(fieldName)) { + duplicateNames.add(fieldName); + } else { + uniqueNames.add(fieldName); + } + } + if (!duplicateNames.isEmpty()) { + throw new TableException( + "Field names must be unique.\n" + + "List of duplicate fields: " + duplicateNames.toString() + "\n" + + "List of all fields: " + Arrays.toString(fieldNames)); + } + } + + + /** + * Returns all field data types as an array. + */ + public DataType[] getFieldDataTypes() { + return fieldDataTypes; + } + + /** + * This method will be removed in future versions as it uses the old type system. It + * is recommended to use {@link #getFieldDataTypes()} instead which uses the new type + * system based on {@link DataTypes}. Please make sure to use either the old or the new + * type system consistently to avoid unintended behavior. See the website documentation + * for more information. + */ + public TypeInformation[] getFieldTypes() { + return fromDataTypeToLegacyInfo(fieldDataTypes); + } + + /** + * Returns the specified data type for the given field index. + * + * @param fieldIndex the index of the field + */ + public Optional getFieldDataType(int fieldIndex) { + if (fieldIndex < 0 || fieldIndex >= fieldDataTypes.length) { + return Optional.empty(); + } + return Optional.of(fieldDataTypes[fieldIndex]); + } + + /** + * This method will be removed in future versions as it uses the old type system. It + * is recommended to use {@link #getFieldDataType(int)} instead which uses the new type + * system based on {@link DataTypes}. Please make sure to use either the old or the new + * type system consistently to avoid unintended behavior. See the website documentation + * for more information. + */ + public Optional> getFieldType(int fieldIndex) { + return getFieldDataType(fieldIndex) + .map(TypeConversions::fromDataTypeToLegacyInfo); + } + + /** + * Returns the specified data type for the given field name. + * + * @param fieldName the name of the field + */ + public Optional getFieldDataType(String fieldName) { + if (fieldNameToIndex.containsKey(fieldName)) { + return Optional.of(fieldDataTypes[fieldNameToIndex.get(fieldName)]); + } + return Optional.empty(); + } + + /** + * This method will be removed in future versions as it uses the old type system. It + * is recommended to use {@link #getFieldDataType(String)} instead which uses the new type + * system based on {@link DataTypes}. Please make sure to use either the old or the new + * type system consistently to avoid unintended behavior. See the website documentation + * for more information. + */ + public Optional> getFieldType(String fieldName) { + return getFieldDataType(fieldName) + .map(TypeConversions::fromDataTypeToLegacyInfo); + } + + /** + * Returns the number of fields. + */ + public int getFieldCount() { + return fieldNames.length; + } + + /** + * Returns all field names as an array. + */ + public String[] getFieldNames() { + return fieldNames; + } + + /** + * Returns the specified name for the given field index. + * + * @param fieldIndex the index of the field + */ + public Optional getFieldName(int fieldIndex) { + if (fieldIndex < 0 || fieldIndex >= fieldNames.length) { + return Optional.empty(); + } + return Optional.of(fieldNames[fieldIndex]); + } + + @Override + public String toString() { + final StringBuilder sb = new StringBuilder(); + sb.append("root\n"); + for (int i = 0; i < fieldNames.length; i++) { + sb.append(" |-- ").append(fieldNames[i]).append(": ").append(fieldDataTypes[i]).append('\n'); + } + return sb.toString(); + } + + @Override + public boolean equals(Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + FlinkTableSchema schema = (FlinkTableSchema) o; + return Arrays.equals(fieldNames, schema.fieldNames) && + Arrays.equals(fieldDataTypes, schema.fieldDataTypes); + } + + @Override + public int hashCode() { + int result = Arrays.hashCode(fieldNames); + result = 31 * result + Arrays.hashCode(fieldDataTypes); + return result; + } +} diff --git a/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/ext/watermark/FirePeriodicWatermarks.java b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/ext/watermark/FirePeriodicWatermarks.java new file mode 100644 index 0000000..3eecb21 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/ext/watermark/FirePeriodicWatermarks.java @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.watermark; + +import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks; +import org.apache.flink.streaming.api.watermark.Watermark; + +/** + * 基于AssignerWithPeriodicWatermarks的封装 + * 有参的构造方法允许定义允许的最大乱序时间(单位ms) + * maxTimestamp用于自定义水位线的时间,若不指定,则以系统当前时间为水位线值 + * + * @author ChengLong 2020-4-17 17:18:33 + */ +public abstract class FirePeriodicWatermarks implements AssignerWithPeriodicWatermarks { + // 用于计算水位线值,若为0则取当前系统时间 + protected long maxTimestamp = 0; + // 允许最大的乱序时间,默认10s + protected long maxOutOfOrder = 10 * 1000L; + // 当前水位线的引用 + protected transient Watermark watermark = new Watermark(System.currentTimeMillis()); + + protected FirePeriodicWatermarks() { + } + + /** + * 用于自定义允许最大的乱序时间 + * + * @param maxOutOfOrder 用户定义的最大乱序时间 + */ + protected FirePeriodicWatermarks(long maxOutOfOrder) { + if (maxOutOfOrder > 0) { + this.maxOutOfOrder = maxOutOfOrder; + } + } + + /** + * 计算并返回当前的水位线 + * 如果未指定水位线的时间戳,则默认获取当前系统时间 + */ + @Override + public Watermark getCurrentWatermark() { + if (this.maxTimestamp == 0) { + this.watermark = new Watermark(System.currentTimeMillis() - this.maxOutOfOrder); + } else { + this.watermark = new Watermark(this.maxTimestamp - this.maxOutOfOrder); + } + + return this.watermark; + } +} diff --git a/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/sink/BaseSink.scala b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/sink/BaseSink.scala new file mode 100644 index 0000000..49c4cb8 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/sink/BaseSink.scala @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.sink + +import org.apache.flink.configuration.Configuration +import org.apache.flink.runtime.state.{FunctionInitializationContext, FunctionSnapshotContext} +import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction +import org.apache.flink.streaming.api.functions.sink.{RichSinkFunction, SinkFunction} +import org.slf4j.LoggerFactory + +import java.util.concurrent.atomic.AtomicBoolean +import java.util.concurrent._ +import scala.util.control._ + +/** + * fire框架基础的flink sink类 + * 提供按批次、固定频率定时flush、checkpoint等功能 + * JDBC sink、HBase sink可继承自此类,并实现自己的flush方法,完成数据的sink + * + * @param batch 每批大小,达到该阈值将批量sink到目标组件 + * @param flushInterval 每隔多久刷新一次到目标组件(ms) + * @author ChengLong + * @since 1.1.0 + * @create 2020-05-21 15:27 + */ +abstract class BaseSink[IN, OUT](batch: Int, flushInterval: Long) extends RichSinkFunction[IN] with CheckpointedFunction { + protected var maxRetry: Long = 3 + private var flushException: Exception = _ + @transient protected var scheduler: ScheduledExecutorService = _ + @transient protected var scheduledFuture: ScheduledFuture[_] = _ + protected lazy val closed = new AtomicBoolean(false) + protected lazy val logger = LoggerFactory.getLogger(this.getClass) + @transient protected lazy val buffer = new CopyOnWriteArrayList[OUT]() + + /** + * 初始化定时调度器,用于定时flush数据到目标组件 + */ + override def open(parameters: Configuration): Unit = { + if (this.flushInterval > 0 && batch > 0) { + this.scheduler = Executors.newScheduledThreadPool(1) + if (this.scheduler != null) { + this.scheduledFuture = this.scheduler.scheduleWithFixedDelay(new Runnable { + override def run(): Unit = this.synchronized { + if (closed.get()) return + flush + } + }, this.flushInterval, this.flushInterval, TimeUnit.MILLISECONDS) + } + } + } + + /** + * 将数据sink到目标组件 + * 不同的组件需定义该flush逻辑实现不同组件的flush操作 + */ + def sink: Unit = { + // sink逻辑 + } + + /** + * 将数据构建成sink的格式 + */ + def map(value: IN): OUT + + /** + * 关闭资源 + * 1. 关闭定时flush线程池 + * 2. 将缓冲区中的数据flush到目标组件 + */ + override def close(): Unit = { + if (closed.get()) return + closed.compareAndSet(false, true) + + this.checkFlushException + + if (this.scheduledFuture != null) { + scheduledFuture.cancel(false) + this.scheduler.shutdown() + } + + if (this.buffer.size > 0) { + this.flush + } + } + + /** + * 将数据sink到缓冲区中 + */ + override def invoke(value: IN, context: SinkFunction.Context): Unit = { + this.checkFlushException + + val out = this.map(value) + if (out != null) this.buffer.add(out) + if (this.buffer.size >= this.batch) { + this.flush + } + } + + /** + * 内部的flush,调用用户定义的flush方法 + * 并清空缓冲区,将缓冲区大小归零 + */ + def flush: Unit = this.synchronized { + this.checkFlushException + + if (this.buffer != null && this.buffer.size > 0) { + this.logger.info(s"执行flushInternal操作 sink.size=${this.buffer.size()} batch=${this.batch} flushInterval=${this.flushInterval}") + val loop = new Breaks + loop.breakable { + if (this.maxRetry < 1) this.maxRetry = 1 + for (i <- 1L to this.maxRetry) { + try { + this.sink + this.buffer.clear() + loop.break + } catch { + case e: Exception => { + this.logger.error(s"执行flushInternal操作失败,正在进行第${i}次重试。", e) + if (i >= this.maxRetry) { + this.flushException = e + } + Thread.sleep(1000 * i) + } + } + } + } + } + } + + /** + * checkpoint时将数据全部flush + */ + override def snapshotState(context: FunctionSnapshotContext): Unit = { + this.flush + } + + override def initializeState(context: FunctionInitializationContext): Unit = { + // initializeState + } + + /** + * 用于检测在flush过程中是否有异常,如果存在异常,则不再flush + */ + private def checkFlushException: Unit = { + if (flushException != null) throw new RuntimeException(s"${this.getClass.getSimpleName} writing records failed.", flushException) + } +} diff --git a/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/sink/HBaseSink.scala b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/sink/HBaseSink.scala new file mode 100644 index 0000000..aeb407e --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/sink/HBaseSink.scala @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.sink + +import com.zto.fire._ +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.hbase.conf.FireHBaseConf + +import scala.reflect.ClassTag + + +/** + * flink HBase sink组件,底层基于HBaseConnector + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-5-25 16:06:15 + */ +abstract class HBaseSink[IN, T <: HBaseBaseBean[T] : ClassTag](tableName: String, + batch: Int = 100, + flushInterval: Long = 10000, + keyNum: Int = 1) extends BaseSink[IN, T](batch, flushInterval) { + + // hbase操作失败时允许最大重试次数 + this.maxRetry = FireHBaseConf.hbaseMaxRetry() + + /** + * 将数据sink到hbase + * 该方法会被flush方法自动调用 + */ + override def sink: Unit = { + HBaseConnector.insert(this.tableName, this.buffer, this.keyNum) + } +} diff --git a/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/sink/JdbcSink.scala b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/sink/JdbcSink.scala new file mode 100644 index 0000000..3531ab3 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/sink/JdbcSink.scala @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.sink + +import com.zto.fire.predef._ +import com.zto.fire.jdbc.JdbcConnector +import com.zto.fire.jdbc.conf.FireJdbcConf + +/** + * flink jdbc sink组件,底层基于JdbcConnector + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-05-22 10:37 + */ +abstract class JdbcSink[IN](sql: String, + batch: Int = 10, + flushInterval: Long = 1000, + keyNum: Int = 1) extends BaseSink[IN, Seq[Any]](batch, flushInterval) { + + // jdbc操作失败时允许最大重试次数 + this.maxRetry = FireJdbcConf.maxRetry(keyNum) + + /** + * 将数据sink到jdbc + * 该方法会被flush方法自动调用 + */ + override def sink: Unit = { + JdbcConnector.executeBatch(sql, this.buffer, keyNum = keyNum) + } +} diff --git a/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/task/FlinkSchedulerManager.java b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/task/FlinkSchedulerManager.java new file mode 100644 index 0000000..5f151b2 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/com/zto/fire/flink/task/FlinkSchedulerManager.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.task; + +import com.zto.fire.core.task.SchedulerManager; + +/** + * Flink 定时调度任务管理器 + * + * @author ChengLong + * @create 2020-12-18 17:20 + * @since 1.0.0 + */ +public class FlinkSchedulerManager extends SchedulerManager { + // 单例对象 + private static SchedulerManager instance = null; + + static { + instance = new FlinkSchedulerManager(); + } + + private FlinkSchedulerManager() { + } + + /** + * 获取单例实例 + */ + public static SchedulerManager getInstance() { + return instance; + } + + @Override + protected String label() { + return DRIVER; + } +} diff --git a/fire-engines/fire-flink/src/main/java/org/apache/flink/configuration/GlobalConfiguration.java b/fire-engines/fire-flink/src/main/java/org/apache/flink/configuration/GlobalConfiguration.java new file mode 100644 index 0000000..b54b306 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/org/apache/flink/configuration/GlobalConfiguration.java @@ -0,0 +1,331 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.configuration; + +import com.zto.fire.common.conf.FireFrameworkConf; +import com.zto.fire.common.util.OSUtils; +import com.zto.fire.common.util.PropUtils; +import org.apache.flink.annotation.Internal; +import org.apache.flink.util.Preconditions; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import scala.collection.JavaConversions; + +import javax.annotation.Nullable; +import java.io.*; +import java.lang.reflect.Method; +import java.net.ServerSocket; +import java.util.HashMap; +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; + +/** + * Global configuration object for Flink. Similar to Java properties configuration + * objects it includes key-value pairs which represent the framework's configuration. + */ +@Internal +public final class GlobalConfiguration { + + private static final Logger LOG = LoggerFactory.getLogger(GlobalConfiguration.class); + private static AtomicBoolean isStart = new AtomicBoolean(false); + + public static final String FLINK_CONF_FILENAME = "flink-conf.yaml"; + + // the hidden content to be displayed + public static final String HIDDEN_CONTENT = "******"; + // 用于判断是JobManager还是TaskManager + private static boolean isJobManager = false; + // fire rest服务占用端口 + private static ServerSocket restServerSocket; + // 任务的运行模式 + private static String runMode; + private static final Map settings = new HashMap<>(); + + static { + try { + restServerSocket = new ServerSocket(0); + } catch (Exception e) { + LOG.error("创建Socket失败", e); + } + } + + /** + * 获取配置信息 + */ + public static Map getSettings() { + return settings; + } + + /** + * 获取随机分配的Rest端口号 + */ + public static int getRestPort() { + return restServerSocket.getLocalPort(); + } + + /** + * 获取rest服务端口号,并关闭Socket + */ + public static int getRestPortAndClose() { + int port = restServerSocket.getLocalPort(); + if (restServerSocket != null && !restServerSocket.isClosed()) { + try { + restServerSocket.close(); + } catch (Exception e) { + LOG.error("关闭Rest Socket失败", e); + } + } + return port; + } + + // -------------------------------------------------------------------------------------------- + + private GlobalConfiguration() { + } + + // -------------------------------------------------------------------------------------------- + + /** + * Loads the global configuration from the environment. Fails if an error occurs during loading. Returns an + * empty configuration object if the environment variable is not set. In production this variable is set but + * tests and local execution/debugging don't have this environment variable set. That's why we should fail + * if it is not set. + * + * @return Returns the Configuration + */ + public static Configuration loadConfiguration() { + return loadConfiguration(new Configuration()); + } + + /** + * Loads the global configuration and adds the given dynamic properties + * configuration. + * + * @param dynamicProperties The given dynamic properties + * @return Returns the loaded global configuration with dynamic properties + */ + public static Configuration loadConfiguration(Configuration dynamicProperties) { + final String configDir = System.getenv(ConfigConstants.ENV_FLINK_CONF_DIR); + if (configDir == null) { + return new Configuration(dynamicProperties); + } + + return loadConfiguration(configDir, dynamicProperties); + } + + /** + * Loads the configuration files from the specified directory. + * + *

YAML files are supported as configuration files. + * + * @param configDir the directory which contains the configuration files + */ + public static Configuration loadConfiguration(final String configDir) { + isJobManager = true; + return loadConfiguration(configDir, null); + } + + /** + * Loads the configuration files from the specified directory. If the dynamic properties + * configuration is not null, then it is added to the loaded configuration. + * + * @param configDir directory to load the configuration from + * @param dynamicProperties configuration file containing the dynamic properties. Null if none. + * @return The configuration loaded from the given configuration directory + */ + public static Configuration loadConfiguration(final String configDir, @Nullable final Configuration dynamicProperties) { + + if (configDir == null) { + throw new IllegalArgumentException("Given configuration directory is null, cannot load configuration"); + } + + final File confDirFile = new File(configDir); + if (!(confDirFile.exists())) { + throw new IllegalConfigurationException( + "The given configuration directory name '" + configDir + + "' (" + confDirFile.getAbsolutePath() + ") does not describe an existing directory."); + } + + // get Flink yaml configuration file + final File yamlConfigFile = new File(confDirFile, FLINK_CONF_FILENAME); + + if (!yamlConfigFile.exists()) { + throw new IllegalConfigurationException( + "The Flink config file '" + yamlConfigFile + + "' (" + confDirFile.getAbsolutePath() + ") does not exist."); + } + + Configuration configuration = loadYAMLResource(yamlConfigFile); + + if (dynamicProperties != null) { + configuration.addAll(dynamicProperties); + } + + return configuration; + } + + /** + * Loads a YAML-file of key-value pairs. + * + *

Colon and whitespace ": " separate key and value (one per line). The hash tag "#" starts a single-line comment. + * + *

Example: + * + *

+     * jobmanager.rpc.address: localhost # network address for communication with the job manager
+     * jobmanager.rpc.port   : 6123      # network port to connect to for communication with the job manager
+     * taskmanager.rpc.port  : 6122      # network port the task manager expects incoming IPC connections
+     * 
+ * + *

This does not span the whole YAML specification, but only the *syntax* of simple YAML key-value pairs (see issue + * #113 on GitHub). If at any point in time, there is a need to go beyond simple key-value pairs syntax + * compatibility will allow to introduce a YAML parser library. + * + * @param file the YAML file to read from + * @see YAML 1.2 specification + */ + private static Configuration loadYAMLResource(File file) { + final Configuration config = new Configuration(); + + try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)))) { + + String line; + int lineNo = 0; + while ((line = reader.readLine()) != null) { + lineNo++; + // 1. check for comments + String[] comments = line.split("#", 2); + String conf = comments[0].trim(); + + // 2. get key and value + if (conf.length() > 0) { + String[] kv = conf.split(": ", 2); + + // skip line with no valid key-value pair + if (kv.length == 1) { + LOG.warn("Error while trying to split key and value in configuration file {}:{}: {}", file, lineNo, line); + continue; + } + + String key = kv[0].trim(); + String value = kv[1].trim(); + + // sanity check + if (key.length() == 0 || value.length() == 0) { + LOG.warn("Error after splitting key and value in configuration file {}:{}:{}", file, lineNo, line); + continue; + } + + LOG.info("Loading configuration property: {}, {}", key, isSensitive(key) ? HIDDEN_CONTENT : value); + config.setString(key, value); + } + } + } catch (IOException e) { + throw new RuntimeException("Error parsing YAML configuration.", e); + } + + fireBootstrap(config); + + return config; + } + + /** + * fire框架相关初始化动作 + */ + private static void fireBootstrap(Configuration config) { + if (isStart.compareAndSet(false, true)) { + // 加载必要的配置文件 + loadTaskConfiguration(config); + } + } + + /** + * 获取当前任务运行模式 + */ + public static String getRunMode() { + return runMode; + } + + /** + * 加载必要的配置文件 + */ + private static void loadTaskConfiguration(Configuration config) { + // 二次开发代码,用于加载任务同名配置文件中的flink参数 + // 获取当前任务的类名称 + String className = config.getString("$internal.application.main", config.getString("flink.fire.className", "")); + // 获取当前任务的运行模式:yarn-application或yarn-per-job + runMode = config.getString("flink.execution.target", config.getString("execution.target", "")); + + try { + Class env = Class.forName("org.apache.flink.runtime.util.EnvironmentInformation"); + Method method = env.getMethod("isJobManager"); + isJobManager = Boolean.valueOf(method.invoke(null) + ""); + } catch (Exception e) { + LOG.error("调用EnvironmentInformation.isJobManager()失败", e); + } + + // 配置信息仅在JobManager端进行加载,TaskManager端会被主动的merge + if (isJobManager && className != null && className.contains(".")) { + String simpleClassName = className.substring(className.lastIndexOf('.') + 1); + if (simpleClassName.length() > 0) { + // TODO: 判断批处理模式,并加载对应配置文件 + // PropUtils.load(FireFrameworkConf.FLINK_BATCH_CONF_FILE) + PropUtils.loadFile(FireFrameworkConf.FLINK_STREAMING_CONF_FILE()); + // 将所有configuration信息同步到PropUtils中 + PropUtils.setProperties(config.confData); + // 加载用户公共配置文件 + PropUtils.load(FireFrameworkConf.userCommonConf()); + // 加载任务同名的配置文件 + PropUtils.loadFile(simpleClassName); + // 构建fire rest接口地址 + PropUtils.setProperty(FireFrameworkConf.FIRE_REST_URL(), "http://" + OSUtils.getIp() + ":" + getRestPort()); + // 加载外部系统配置信息,覆盖同名配置文件中的配置,实现动态替换 + PropUtils.invokeConfigCenter(className); + PropUtils.setProperty("flink.run.mode", runMode); + + JavaConversions.mapAsJavaMap(PropUtils.settings()).forEach((k, v) -> { + config.setString(k, v); + settings.put(k, v); + }); + } + } + } + + /** + * Check whether the key is a hidden key. + * + * @param key the config key + */ + public static boolean isSensitive(String key) { + Preconditions.checkNotNull(key, "key is null"); + final String keyInLower = key.toLowerCase(); + // 二次开发代码,用于隐藏webui中敏感信息 + String hideKeys = JavaConversions.mapAsJavaMap(PropUtils.settings()).getOrDefault("spark.fire.conf.print.blacklist", "password,secret,fs.azure.account.key"); + if (hideKeys != null && hideKeys.length() > 0) { + String[] hideKeyArr = hideKeys.split(","); + for (String hideKey : hideKeyArr) { + if (keyInLower.length() >= hideKey.length() + && keyInLower.contains(hideKey)) { + return true; + } + } + } + return false; + } +} diff --git a/fire-engines/fire-flink/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java b/fire-engines/fire-flink/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java new file mode 100644 index 0000000..1583e40 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java @@ -0,0 +1,556 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.util; + +import org.apache.flink.configuration.GlobalConfiguration; +import org.apache.flink.util.OperatingSystem; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.InputStream; +import java.lang.management.ManagementFactory; +import java.lang.management.RuntimeMXBean; +import java.lang.reflect.Method; +import java.time.Instant; +import java.time.ZoneId; +import java.time.format.DateTimeFormatter; +import java.time.format.DateTimeParseException; +import java.util.*; +import java.util.concurrent.ConcurrentHashMap; + +/** + * Utility class that gives access to the execution environment of the JVM, like the executing user, + * startup options, or the JVM version. + */ +public class EnvironmentInformation { + + private static final Logger LOG = LoggerFactory.getLogger(EnvironmentInformation.class); + + public static final String UNKNOWN = ""; + + // 用于判断是否为JobManager + private static Boolean IS_JOBMANAGER = true; + private static final Map settings = new ConcurrentHashMap<>(); + + /** + * 用不判断当前组件是否为JobManager + */ + public static boolean isJobManager() { + return IS_JOBMANAGER; + } + + /** + * 获取配置信息 + */ + public static Map getSettings() { + return settings; + } + + /** + * Returns the version of the code as String. + * + * @return The project version string. + */ + public static String getVersion() { + return getVersionsInstance().projectVersion; + } + + /** + * Returns the version of the used Scala compiler as String. + * + * @return The scala version string. + */ + public static String getScalaVersion() { + return getVersionsInstance().scalaVersion; + } + + /** @return The Instant this version of the software was built. */ + public static Instant getBuildTime() { + return getVersionsInstance().gitBuildTime; + } + + /** + * @return The Instant this version of the software was built as a String using the + * Europe/Berlin timezone. + */ + public static String getBuildTimeString() { + return getVersionsInstance().gitBuildTimeStr; + } + + /** @return The last known commit id of this version of the software. */ + public static String getGitCommitId() { + return getVersionsInstance().gitCommitId; + } + + /** @return The last known abbreviated commit id of this version of the software. */ + public static String getGitCommitIdAbbrev() { + return getVersionsInstance().gitCommitIdAbbrev; + } + + /** @return The Instant of the last commit of this code. */ + public static Instant getGitCommitTime() { + return getVersionsInstance().gitCommitTime; + } + + /** + * @return The Instant of the last commit of this code as a String using the Europe/Berlin + * timezone. + */ + public static String getGitCommitTimeString() { + return getVersionsInstance().gitCommitTimeStr; + } + + /** + * Returns the code revision (commit and commit date) of Flink, as generated by the Maven + * builds. + * + * @return The code revision. + */ + public static RevisionInformation getRevisionInformation() { + return new RevisionInformation(getGitCommitIdAbbrev(), getGitCommitTimeString()); + } + + private static final class Versions { + private static final Instant DEFAULT_TIME_INSTANT = Instant.EPOCH; + private static final String DEFAULT_TIME_STRING = "1970-01-01T00:00:00+0000"; + private static final String UNKNOWN_COMMIT_ID = "DecafC0ffeeD0d0F00d"; + private static final String UNKNOWN_COMMIT_ID_ABBREV = "DeadD0d0"; + private String projectVersion = UNKNOWN; + private String scalaVersion = UNKNOWN; + private Instant gitBuildTime = DEFAULT_TIME_INSTANT; + private String gitBuildTimeStr = DEFAULT_TIME_STRING; + private String gitCommitId = UNKNOWN_COMMIT_ID; + private String gitCommitIdAbbrev = UNKNOWN_COMMIT_ID_ABBREV; + private Instant gitCommitTime = DEFAULT_TIME_INSTANT; + private String gitCommitTimeStr = DEFAULT_TIME_STRING; + + private static final String PROP_FILE = ".flink-runtime.version.properties"; + + private static final String FAIL_MESSAGE = + "The file " + + PROP_FILE + + " has not been generated correctly. You MUST run 'mvn generate-sources' in the flink-runtime module."; + + private String getProperty(Properties properties, String key, String defaultValue) { + String value = properties.getProperty(key); + if (value == null || value.charAt(0) == '$') { + return defaultValue; + } + return value; + } + + public Versions() { + ClassLoader classLoader = EnvironmentInformation.class.getClassLoader(); + try (InputStream propFile = classLoader.getResourceAsStream(PROP_FILE)) { + if (propFile != null) { + Properties properties = new Properties(); + properties.load(propFile); + + projectVersion = getProperty(properties, "project.version", UNKNOWN); + scalaVersion = getProperty(properties, "scala.binary.version", UNKNOWN); + + gitCommitId = getProperty(properties, "git.commit.id", UNKNOWN_COMMIT_ID); + gitCommitIdAbbrev = + getProperty( + properties, "git.commit.id.abbrev", UNKNOWN_COMMIT_ID_ABBREV); + + // This is to reliably parse the datetime format configured in the + // git-commit-id-plugin + DateTimeFormatter gitDateTimeFormatter = + DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ssZ"); + + // Default format is in Berlin timezone because that is where Flink originated. + DateTimeFormatter berlinDateTime = + DateTimeFormatter.ISO_OFFSET_DATE_TIME.withZone( + ZoneId.of("Europe/Berlin")); + + try { + String propGitCommitTime = + getProperty(properties, "git.commit.time", DEFAULT_TIME_STRING); + gitCommitTime = + gitDateTimeFormatter.parse(propGitCommitTime, Instant::from); + gitCommitTimeStr = berlinDateTime.format(gitCommitTime); + + String propGitBuildTime = + getProperty(properties, "git.build.time", DEFAULT_TIME_STRING); + gitBuildTime = gitDateTimeFormatter.parse(propGitBuildTime, Instant::from); + gitBuildTimeStr = berlinDateTime.format(gitBuildTime); + } catch (DateTimeParseException dtpe) { + LOG.error("{} : {}", FAIL_MESSAGE, dtpe); + throw new IllegalStateException(FAIL_MESSAGE); + } + } + } catch (IOException ioe) { + LOG.info( + "Cannot determine code revision: Unable to read version property file.: {}", + ioe.getMessage()); + } + } + } + + private static final class VersionsHolder { + static final Versions INSTANCE = new Versions(); + } + + private static Versions getVersionsInstance() { + return VersionsHolder.INSTANCE; + } + + /** + * Gets the name of the user that is running the JVM. + * + * @return The name of the user that is running the JVM. + */ + public static String getHadoopUser() { + try { + Class ugiClass = + Class.forName( + "org.apache.hadoop.security.UserGroupInformation", + false, + EnvironmentInformation.class.getClassLoader()); + + Method currentUserMethod = ugiClass.getMethod("getCurrentUser"); + Method shortUserNameMethod = ugiClass.getMethod("getShortUserName"); + Object ugi = currentUserMethod.invoke(null); + return (String) shortUserNameMethod.invoke(ugi); + } catch (ClassNotFoundException e) { + return ""; + } catch (LinkageError e) { + // hadoop classes are not in the classpath + LOG.debug( + "Cannot determine user/group information using Hadoop utils. " + + "Hadoop classes not loaded or compatible", + e); + } catch (Throwable t) { + // some other error occurred that we should log and make known + LOG.warn("Error while accessing user/group information via Hadoop utils.", t); + } + + return UNKNOWN; + } + + /** + * The maximum JVM heap size, in bytes. + * + *

This method uses the -Xmx value of the JVM, if set. If not set, it returns (as a + * heuristic) 1/4th of the physical memory size. + * + * @return The maximum JVM heap size, in bytes. + */ + public static long getMaxJvmHeapMemory() { + final long maxMemory = Runtime.getRuntime().maxMemory(); + if (maxMemory != Long.MAX_VALUE) { + // we have the proper max memory + return maxMemory; + } else { + // max JVM heap size is not set - use the heuristic to use 1/4th of the physical memory + final long physicalMemory = Hardware.getSizeOfPhysicalMemory(); + if (physicalMemory != -1) { + // got proper value for physical memory + return physicalMemory / 4; + } else { + throw new RuntimeException( + "Could not determine the amount of free memory.\n" + + "Please set the maximum memory for the JVM, e.g. -Xmx512M for 512 megabytes."); + } + } + } + + /** + * Gets an estimate of the size of the free heap memory. + * + *

NOTE: This method is heavy-weight. It triggers a garbage collection to reduce + * fragmentation and get a better estimate at the size of free memory. It is typically more + * accurate than the plain version {@link #getSizeOfFreeHeapMemory()}. + * + * @return An estimate of the size of the free heap memory, in bytes. + */ + public static long getSizeOfFreeHeapMemoryWithDefrag() { + // trigger a garbage collection, to reduce fragmentation + System.gc(); + + return getSizeOfFreeHeapMemory(); + } + + /** + * Gets an estimate of the size of the free heap memory. The estimate may vary, depending on the + * current level of memory fragmentation and the number of dead objects. For a better (but more + * heavy-weight) estimate, use {@link #getSizeOfFreeHeapMemoryWithDefrag()}. + * + * @return An estimate of the size of the free heap memory, in bytes. + */ + public static long getSizeOfFreeHeapMemory() { + Runtime r = Runtime.getRuntime(); + return getMaxJvmHeapMemory() - r.totalMemory() + r.freeMemory(); + } + + /** + * Gets the version of the JVM in the form "VM_Name - Vendor - Spec/Version". + * + * @return The JVM version. + */ + public static String getJvmVersion() { + try { + final RuntimeMXBean bean = ManagementFactory.getRuntimeMXBean(); + return bean.getVmName() + + " - " + + bean.getVmVendor() + + " - " + + bean.getSpecVersion() + + '/' + + bean.getVmVersion(); + } catch (Throwable t) { + return UNKNOWN; + } + } + + /** + * Gets the system parameters and environment parameters that were passed to the JVM on startup. + * + * @return The options passed to the JVM on startup. + */ + public static String getJvmStartupOptions() { + try { + final RuntimeMXBean bean = ManagementFactory.getRuntimeMXBean(); + final StringBuilder bld = new StringBuilder(); + + for (String s : bean.getInputArguments()) { + bld.append(s).append(' '); + } + + return bld.toString(); + } catch (Throwable t) { + return UNKNOWN; + } + } + + /** + * Gets the system parameters and environment parameters that were passed to the JVM on startup. + * + * @return The options passed to the JVM on startup. + */ + public static String[] getJvmStartupOptionsArray() { + try { + RuntimeMXBean bean = ManagementFactory.getRuntimeMXBean(); + List options = bean.getInputArguments(); + return options.toArray(new String[options.size()]); + } catch (Throwable t) { + return new String[0]; + } + } + + /** + * Gets the directory for temporary files, as returned by the JVM system property + * "java.io.tmpdir". + * + * @return The directory for temporary files. + */ + public static String getTemporaryFileDirectory() { + return System.getProperty("java.io.tmpdir"); + } + + /** + * Tries to retrieve the maximum number of open file handles. This method will only work on + * UNIX-based operating systems with Sun/Oracle Java versions. + * + *

If the number of max open file handles cannot be determined, this method returns {@code + * -1}. + * + * @return The limit of open file handles, or {@code -1}, if the limit could not be determined. + */ + public static long getOpenFileHandlesLimit() { + if (OperatingSystem + .isWindows()) { // getMaxFileDescriptorCount method is not available on Windows + return -1L; + } + Class sunBeanClass; + try { + sunBeanClass = Class.forName("com.sun.management.UnixOperatingSystemMXBean"); + } catch (ClassNotFoundException e) { + return -1L; + } + + try { + Method fhLimitMethod = sunBeanClass.getMethod("getMaxFileDescriptorCount"); + Object result = fhLimitMethod.invoke(ManagementFactory.getOperatingSystemMXBean()); + return (Long) result; + } catch (Throwable t) { + LOG.warn("Unexpected error when accessing file handle limit", t); + return -1L; + } + } + + /** + * 解析命令并判断是否为JobManager + */ + private static void parseCommand(String[] commandLineArgs) { + if (commandLineArgs != null) { + for (String command : commandLineArgs) { + if (command != null && command.length() > 0) { + if (command.contains("resource-id")) { + IS_JOBMANAGER = false; + } + if (!"-D".equals(command)) { + String[] properties = command.replace("-D", "").split("=", 2); + if (properties != null && properties.length == 2 && properties[0] != null && properties[1] != null) { + settings.put(properties[0], properties[1]); + } + } + } + } + } + } + + /** + * Logs information about the environment, like code revision, current user, Java version, and + * JVM parameters. + * + * @param log The logger to log the information to. + * @param componentName The component name to mention in the log. + * @param commandLineArgs The arguments accompanying the starting the component. + */ + public static void logEnvironmentInfo( + Logger log, String componentName, String[] commandLineArgs) { + parseCommand(commandLineArgs); + if (log.isInfoEnabled()) { + RevisionInformation rev = getRevisionInformation(); + String version = getVersion(); + String scalaVersion = getScalaVersion(); + + String jvmVersion = getJvmVersion(); + String[] options = getJvmStartupOptionsArray(); + + String javaHome = System.getenv("JAVA_HOME"); + + String inheritedLogs = System.getenv("FLINK_INHERITED_LOGS"); + + long maxHeapMegabytes = getMaxJvmHeapMemory() >>> 20; + + if (inheritedLogs != null) { + log.info( + "--------------------------------------------------------------------------------"); + log.info(" Preconfiguration: "); + log.info(inheritedLogs); + } + + log.info( + "--------------------------------------------------------------------------------"); + log.info( + " Starting " + + componentName + + " (Version: " + + version + + ", Scala: " + + scalaVersion + + ", " + + "Rev:" + + rev.commitId + + ", " + + "Date:" + + rev.commitDate + + ")"); + log.info(" OS current user: " + System.getProperty("user.name")); + log.info(" Current Hadoop/Kerberos user: " + getHadoopUser()); + log.info(" JVM: " + jvmVersion); + log.info(" Maximum heap size: " + maxHeapMegabytes + " MiBytes"); + log.info(" JAVA_HOME: " + (javaHome == null ? "(not set)" : javaHome)); + + String hadoopVersionString = getHadoopVersionString(); + if (hadoopVersionString != null) { + log.info(" Hadoop version: " + hadoopVersionString); + } else { + log.info(" No Hadoop Dependency available"); + } + + if (options.length == 0) { + log.info(" JVM Options: (none)"); + } else { + log.info(" JVM Options:"); + for (String s : options) { + log.info(" " + s); + } + } + + if (commandLineArgs == null || commandLineArgs.length == 0) { + log.info(" Program Arguments: (none)"); + } else { + log.info(" Program Arguments:"); + for (String s : commandLineArgs) { + if (GlobalConfiguration.isSensitive(s)) { + log.info( + " " + + GlobalConfiguration.HIDDEN_CONTENT + + " (sensitive information)"); + } else { + log.info(" " + s); + } + } + } + + log.info(" Classpath: " + System.getProperty("java.class.path")); + + log.info( + "--------------------------------------------------------------------------------"); + } + } + + public static String getHadoopVersionString() { + try { + Class versionInfoClass = + Class.forName( + "org.apache.hadoop.util.VersionInfo", + false, + EnvironmentInformation.class.getClassLoader()); + Method method = versionInfoClass.getMethod("getVersion"); + return (String) method.invoke(null); + } catch (ClassNotFoundException | NoSuchMethodException e) { + return null; + } catch (Throwable e) { + LOG.error("Cannot invoke VersionInfo.getVersion reflectively.", e); + return null; + } + } + + // -------------------------------------------------------------------------------------------- + + /** Don't instantiate this class */ + private EnvironmentInformation() {} + + // -------------------------------------------------------------------------------------------- + + /** + * Revision information encapsulates information about the source code revision of the Flink + * code. + */ + public static class RevisionInformation { + + /** The git commit id (hash) */ + public final String commitId; + + /** The git commit date */ + public final String commitDate; + + public RevisionInformation(String commitId, String commitDate) { + this.commitId = commitId; + this.commitDate = commitDate; + } + } +} diff --git a/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/RocketMQSource.java b/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/RocketMQSource.java new file mode 100644 index 0000000..51db161 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/RocketMQSource.java @@ -0,0 +1,434 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE + * file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the + * License. You may obtain a copy of the License at + *

+ * http://www.apache.org/licenses/LICENSE-2.0 + *

+ * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on + * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the + * specific language governing permissions and limitations under the License. + */ + +package org.apache.rocketmq.flink; + +import org.apache.commons.collections.map.LinkedMap; +import org.apache.commons.lang.Validate; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.api.common.state.ListState; +import org.apache.flink.api.common.state.ListStateDescriptor; +import org.apache.flink.api.common.typeinfo.TypeHint; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.metrics.Counter; +import org.apache.flink.metrics.Meter; +import org.apache.flink.metrics.MeterView; +import org.apache.flink.metrics.SimpleCounter; +import org.apache.flink.runtime.state.CheckpointListener; +import org.apache.flink.runtime.state.FunctionInitializationContext; +import org.apache.flink.runtime.state.FunctionSnapshotContext; +import org.apache.flink.shaded.curator4.org.apache.curator.shaded.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction; +import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction; +import org.apache.flink.streaming.api.operators.StreamingRuntimeContext; +import org.apache.flink.util.Preconditions; +import org.apache.rocketmq.client.consumer.DefaultMQPullConsumer; +import org.apache.rocketmq.client.consumer.PullResult; +import org.apache.rocketmq.client.exception.MQClientException; +import org.apache.rocketmq.common.message.MessageExt; +import org.apache.rocketmq.common.message.MessageQueue; +import org.apache.rocketmq.flink.common.util.MetricUtils; +import org.apache.rocketmq.flink.common.util.RetryUtil; +import org.apache.rocketmq.flink.common.util.RocketMQUtils; +import org.apache.rocketmq.flink.common.watermark.WaterMarkForAll; +import org.apache.rocketmq.flink.common.watermark.WaterMarkPerQueue; +import org.apache.rocketmq.flink.serialization.TagKeyValueDeserializationSchema; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.lang.management.ManagementFactory; +import java.nio.charset.StandardCharsets; +import java.util.*; +import java.util.concurrent.*; +import java.util.concurrent.locks.ReentrantLock; + +import static org.apache.rocketmq.flink.RocketMQConfig.*; +import static org.apache.rocketmq.flink.common.util.RocketMQUtils.getInteger; +import static org.apache.rocketmq.flink.common.util.RocketMQUtils.getLong; + +/** + * The RocketMQSource is based on RocketMQ pull consumer mode, and provides exactly once reliability guarantees when + * checkpoints are enabled. Otherwise, the source doesn't provide any reliability guarantees. + */ +public class RocketMQSource extends RichParallelSourceFunction + implements CheckpointedFunction, CheckpointListener, ResultTypeQueryable { + + private static final long serialVersionUID = 1L; + + private static final Logger log = LoggerFactory.getLogger(RocketMQSource.class); + private static final String OFFSETS_STATE_NAME = "topic-partition-offset-states"; + private RunningChecker runningChecker; + private transient DefaultMQPullConsumer consumer; + private TagKeyValueDeserializationSchema schema; + private transient ListState> unionOffsetStates; + private Map offsetTable; + private Map restoredOffsets; + private List messageQueues; + private ExecutorService executor; + + // watermark in source + private WaterMarkPerQueue waterMarkPerQueue; + private WaterMarkForAll waterMarkForAll; + + private ScheduledExecutorService timer; + /** + * Data for pending but uncommitted offsets. + */ + private LinkedMap pendingOffsetsToCommit; + private Properties props; + private String topic; + private String group; + private transient volatile boolean restored; + private transient boolean enableCheckpoint; + private volatile Object checkPointLock; + + private Meter tpsMetric; + + public RocketMQSource(TagKeyValueDeserializationSchema schema, Properties props) { + this.schema = schema; + this.props = props; + } + + @Override + public void open(Configuration parameters) throws Exception { + log.debug("source open...."); + Validate.notEmpty(props, "Consumer properties can not be empty"); + + this.topic = props.getProperty(RocketMQConfig.CONSUMER_TOPIC); + this.group = props.getProperty(RocketMQConfig.CONSUMER_GROUP); + + Validate.notEmpty(topic, "Consumer topic can not be empty"); + Validate.notEmpty(group, "Consumer group can not be empty"); + + this.enableCheckpoint = ((StreamingRuntimeContext) getRuntimeContext()).isCheckpointingEnabled(); + + if (offsetTable == null) { + offsetTable = new ConcurrentHashMap<>(); + } + if (restoredOffsets == null) { + restoredOffsets = new ConcurrentHashMap<>(); + } + + //use restoredOffsets to init offset table. + initOffsetTableFromRestoredOffsets(); + + if (pendingOffsetsToCommit == null) { + pendingOffsetsToCommit = new LinkedMap(); + } + if (checkPointLock == null) { + checkPointLock = new ReentrantLock(); + } + if (waterMarkPerQueue == null) { + waterMarkPerQueue = new WaterMarkPerQueue(5000); + } + if (waterMarkForAll == null) { + waterMarkForAll = new WaterMarkForAll(5000); + } + if (timer == null) { + timer = Executors.newSingleThreadScheduledExecutor(); + } + + runningChecker = new RunningChecker(); + runningChecker.setRunning(true); + + final ThreadFactory threadFactory = new ThreadFactoryBuilder() + .setDaemon(true).setNameFormat("rmq-pull-thread-%d").build(); + executor = Executors.newCachedThreadPool(threadFactory); + + int indexOfThisSubTask = getRuntimeContext().getIndexOfThisSubtask(); + consumer = new DefaultMQPullConsumer(group, RocketMQConfig.buildAclRPCHook(props)); + RocketMQConfig.buildConsumerConfigs(props, consumer); + + // set unique instance name, avoid exception: https://help.aliyun.com/document_detail/29646.html + String runtimeName = ManagementFactory.getRuntimeMXBean().getName(); + String instanceName = RocketMQUtils.getInstanceName(runtimeName, topic, group, + String.valueOf(indexOfThisSubTask), String.valueOf(System.nanoTime())); + consumer.setInstanceName(instanceName); + consumer.start(); + + Counter outputCounter = getRuntimeContext().getMetricGroup() + .counter(MetricUtils.METRICS_TPS + "_counter", new SimpleCounter()); + tpsMetric = getRuntimeContext().getMetricGroup() + .meter(MetricUtils.METRICS_TPS, new MeterView(outputCounter, 60)); + } + + @Override + public void run(SourceContext context) throws Exception { + String tag = props.getProperty(RocketMQConfig.CONSUMER_TAG, RocketMQConfig.DEFAULT_CONSUMER_TAG); + int pullBatchSize = getInteger(props, CONSUMER_BATCH_SIZE, DEFAULT_CONSUMER_BATCH_SIZE); + + final RuntimeContext ctx = getRuntimeContext(); + // The lock that guarantees that record emission and state updates are atomic, + // from the view of taking a checkpoint. + int taskNumber = ctx.getNumberOfParallelSubtasks(); + int taskIndex = ctx.getIndexOfThisSubtask(); + log.info("Source run, NumberOfTotalTask={}, IndexOfThisSubTask={}", taskNumber, taskIndex); + + + timer.scheduleAtFixedRate(() -> { + // context.emitWatermark(waterMarkPerQueue.getCurrentWatermark()); + context.emitWatermark(waterMarkForAll.getCurrentWatermark()); + }, 5, 5, TimeUnit.SECONDS); + + Collection totalQueues = consumer.fetchSubscribeMessageQueues(topic); + messageQueues = RocketMQUtils.allocate(totalQueues, taskNumber, ctx.getIndexOfThisSubtask()); + for (MessageQueue mq : messageQueues) { + this.executor.execute(() -> { + RetryUtil.call(() -> { + while (runningChecker.isRunning()) { + try { + long offset = getMessageQueueOffset(mq); + PullResult pullResult = consumer.pullBlockIfNotFound(mq, tag, offset, pullBatchSize); + + boolean found = false; + switch (pullResult.getPullStatus()) { + case FOUND: + List messages = pullResult.getMsgFoundList(); + for (MessageExt msg : messages) { + byte[] tags = msg.getTags() != null ? msg.getTags().getBytes(StandardCharsets.UTF_8) : null; + byte[] key = msg.getKeys() != null ? msg.getKeys().getBytes(StandardCharsets.UTF_8) : null; + byte[] value = msg.getBody(); + OUT data = schema.deserializeTagKeyAndValue(tags, key, value); + + // output and state update are atomic + synchronized (checkPointLock) { + log.debug(msg.getMsgId() + "_" + msg.getBrokerName() + " " + msg.getQueueId() + " " + msg.getQueueOffset()); + context.collectWithTimestamp(data, msg.getBornTimestamp()); + + // update max eventTime per queue + // waterMarkPerQueue.extractTimestamp(mq, msg.getBornTimestamp()); + waterMarkForAll.extractTimestamp(msg.getBornTimestamp()); + tpsMetric.markEvent(); + } + } + found = true; + break; + case NO_MATCHED_MSG: + log.debug("No matched message after offset {} for queue {}", offset, mq); + break; + case NO_NEW_MSG: + log.debug("No new message after offset {} for queue {}", offset, mq); + break; + case OFFSET_ILLEGAL: + log.warn("Offset {} is illegal for queue {}", offset, mq); + break; + default: + break; + } + + synchronized (checkPointLock) { + updateMessageQueueOffset(mq, pullResult.getNextBeginOffset()); + } + + if (!found) { + RetryUtil.waitForMs(RocketMQConfig.DEFAULT_CONSUMER_DELAY_WHEN_MESSAGE_NOT_FOUND); + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + return true; + }, "RuntimeException"); + }); + } + + awaitTermination(); + } + + private void awaitTermination() throws InterruptedException { + while (runningChecker.isRunning()) { + Thread.sleep(50); + } + } + + private long getMessageQueueOffset(MessageQueue mq) throws MQClientException { + Long offset = offsetTable.get(mq); + // restoredOffsets(unionOffsetStates) is the restored global union state; + // should only snapshot mqs that actually belong to us + if (offset == null) { + // fetchConsumeOffset from broker + offset = consumer.fetchConsumeOffset(mq, false); + if (!restored || offset < 0) { + String initialOffset = props.getProperty(RocketMQConfig.CONSUMER_OFFSET_RESET_TO, CONSUMER_OFFSET_LATEST); + switch (initialOffset) { + case CONSUMER_OFFSET_EARLIEST: + offset = consumer.minOffset(mq); + break; + case CONSUMER_OFFSET_LATEST: + offset = consumer.maxOffset(mq); + break; + case CONSUMER_OFFSET_TIMESTAMP: + offset = consumer.searchOffset(mq, getLong(props, + RocketMQConfig.CONSUMER_OFFSET_FROM_TIMESTAMP, System.currentTimeMillis())); + break; + default: + throw new IllegalArgumentException("Unknown value for CONSUMER_OFFSET_RESET_TO."); + } + } + } + offsetTable.put(mq, offset); + return offsetTable.get(mq); + } + + private void updateMessageQueueOffset(MessageQueue mq, long offset) throws MQClientException { + offsetTable.put(mq, offset); + if (!enableCheckpoint) { + consumer.updateConsumeOffset(mq, offset); + } + } + + @Override + public void cancel() { + log.debug("cancel ..."); + runningChecker.setRunning(false); + + if (consumer != null) { + consumer.shutdown(); + } + + if (offsetTable != null) { + offsetTable.clear(); + } + if (restoredOffsets != null) { + restoredOffsets.clear(); + } + if (pendingOffsetsToCommit != null) { + pendingOffsetsToCommit.clear(); + } + } + + @Override + public void close() throws Exception { + log.debug("close ..."); + // pretty much the same logic as cancelling + try { + cancel(); + } finally { + super.close(); + } + } + + public void initOffsetTableFromRestoredOffsets() { + Preconditions.checkNotNull(restoredOffsets, "restoredOffsets can't be null"); + restoredOffsets.forEach((mq, offset) -> { + if (!offsetTable.containsKey(mq) || offsetTable.get(mq) < offset) { + offsetTable.put(mq, offset); + } + }); + log.info("init offset table from restoredOffsets successful.", offsetTable); + } + + @Override + public void snapshotState(FunctionSnapshotContext context) throws Exception { + // called when a snapshot for a checkpoint is requested + log.info("Snapshotting state {} ...", context.getCheckpointId()); + if (!runningChecker.isRunning()) { + log.info("snapshotState() called on closed source; returning null."); + return; + } + + // Discovery topic Route change when snapshot + RetryUtil.call(() -> { + Collection totalQueues = consumer.fetchSubscribeMessageQueues(topic); + int taskNumber = getRuntimeContext().getNumberOfParallelSubtasks(); + int taskIndex = getRuntimeContext().getIndexOfThisSubtask(); + List newQueues = RocketMQUtils.allocate(totalQueues, taskNumber, taskIndex); + Collections.sort(newQueues); + log.debug(taskIndex + " Topic route is same."); + if (!messageQueues.equals(newQueues)) { + throw new RuntimeException(); + } + return true; + }, "RuntimeException due to topic route changed"); + + unionOffsetStates.clear(); + HashMap currentOffsets = new HashMap<>(offsetTable.size()); + for (Map.Entry entry : offsetTable.entrySet()) { + unionOffsetStates.add(Tuple2.of(entry.getKey(), entry.getValue())); + currentOffsets.put(entry.getKey(), entry.getValue()); + } + pendingOffsetsToCommit.put(context.getCheckpointId(), currentOffsets); + log.info("Snapshotted state, last processed offsets: {}, checkpoint id: {}, timestamp: {}", + offsetTable, context.getCheckpointId(), context.getCheckpointTimestamp()); + } + + /** + * called every time the user-defined function is initialized, + * be that when the function is first initialized or be that + * when the function is actually recovering from an earlier checkpoint. + * Given this, initializeState() is not only the place where different types of state are initialized, + * but also where state recovery logic is included. + */ + @Override + public void initializeState(FunctionInitializationContext context) throws Exception { + log.info("initialize State ..."); + + this.unionOffsetStates = context.getOperatorStateStore().getUnionListState(new ListStateDescriptor<>( + OFFSETS_STATE_NAME, TypeInformation.of(new TypeHint>() { + }))); + this.restored = context.isRestored(); + + if (restored) { + if (restoredOffsets == null) { + restoredOffsets = new ConcurrentHashMap<>(); + } + for (Tuple2 mqOffsets : unionOffsetStates.get()) { + if (!restoredOffsets.containsKey(mqOffsets.f0) || restoredOffsets.get(mqOffsets.f0) < mqOffsets.f1) { + restoredOffsets.put(mqOffsets.f0, mqOffsets.f1); + } + } + log.info("Setting restore state in the consumer. Using the following offsets: {}", restoredOffsets); + } else { + log.info("No restore state for the consumer."); + } + } + + @Override + public TypeInformation getProducedType() { + return schema.getProducedType(); + } + + @Override + public void notifyCheckpointComplete(long checkpointId) throws Exception { + // callback when checkpoint complete + if (!runningChecker.isRunning()) { + log.info("notifyCheckpointComplete() called on closed source; returning null."); + return; + } + + final int posInMap = pendingOffsetsToCommit.indexOf(checkpointId); + if (posInMap == -1) { + log.warn("Received confirmation for unknown checkpoint id {}", checkpointId); + return; + } + + Map offsets = (Map) pendingOffsetsToCommit.remove(posInMap); + + // remove older checkpoints in map + for (int i = 0; i < posInMap; i++) { + pendingOffsetsToCommit.remove(0); + } + + if (offsets == null || offsets.size() == 0) { + log.debug("Checkpoint state was empty."); + return; + } + + for (Map.Entry entry : offsets.entrySet()) { + consumer.updateConsumeOffset(entry.getKey(), entry.getValue()); + } + } +} diff --git a/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/serialization/JsonDeserializationSchema.java b/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/serialization/JsonDeserializationSchema.java new file mode 100644 index 0000000..10036e7 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/serialization/JsonDeserializationSchema.java @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.rocketmq.flink.serialization; + +import org.apache.flink.api.common.serialization.DeserializationSchema; +import org.apache.flink.api.common.typeinfo.TypeHint; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.table.data.RowData; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; + +/** + * 将rocketmq消息反序列化成RowData + * @author ChengLong 2021-5-9 13:40:17 + */ +public class JsonDeserializationSchema implements TagKeyValueDeserializationSchema { + private DeserializationSchema key; + private DeserializationSchema value; + + public JsonDeserializationSchema(DeserializationSchema key, DeserializationSchema value) { + this.key = key; + this.value = value; + } + + @Override + public RowData deserializeTagKeyAndValue(byte[] tag, byte[] key, byte[] value) { + /*String keyString = key != null ? new String(key, StandardCharsets.UTF_8) : null; + String valueString = value != null ? new String(value, StandardCharsets.UTF_8) : null;*/ + if (value != null) { + try { + // 调用sql connector的format进行反序列化 + return this.value.deserialize(value); + } catch (IOException e) { + e.printStackTrace(); + } + } + return null; + } + + @Override + public TypeInformation getProducedType() { + return TypeInformation.of(new TypeHint(){}); + } +} diff --git a/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/serialization/SimpleTagKeyValueDeserializationSchema.java b/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/serialization/SimpleTagKeyValueDeserializationSchema.java new file mode 100644 index 0000000..43211c3 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/serialization/SimpleTagKeyValueDeserializationSchema.java @@ -0,0 +1,46 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.rocketmq.flink.serialization; + +import org.apache.flink.api.common.typeinfo.TypeHint; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import scala.Tuple3; + +import java.nio.charset.StandardCharsets; + +/** + * 反序列化MessageExt,将tag、key、value以tuple3方式返回 + * + * @author ChengLong 2021-5-10 09:44:55 + */ +public class SimpleTagKeyValueDeserializationSchema implements TagKeyValueDeserializationSchema> { + + @Override + public Tuple3 deserializeTagKeyAndValue(byte[] tag, byte[] key, byte[] value) { + String tagString = tag != null ? new String(tag, StandardCharsets.UTF_8) : null; + String keyString = key != null ? new String(key, StandardCharsets.UTF_8) : null; + String valueString = value != null ? new String(value, StandardCharsets.UTF_8) : null; + return new Tuple3<>(tagString, keyString, valueString); + } + + @Override + public TypeInformation> getProducedType() { + return TypeInformation.of(new TypeHint>(){}); + } +} diff --git a/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/serialization/TagKeyValueDeserializationSchema.java b/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/serialization/TagKeyValueDeserializationSchema.java new file mode 100644 index 0000000..01571d6 --- /dev/null +++ b/fire-engines/fire-flink/src/main/java/org/apache/rocketmq/flink/serialization/TagKeyValueDeserializationSchema.java @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.rocketmq.flink.serialization; + +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; + +import java.io.Serializable; + +/** + * 反序列化,携带tag信息 + * @author ChengLong 2021-5-10 09:43:35 + */ +public interface TagKeyValueDeserializationSchema extends ResultTypeQueryable, Serializable { + + T deserializeTagKeyAndValue(byte[] tag, byte[] key, byte[] value); +} \ No newline at end of file diff --git a/fire-engines/fire-flink/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory b/fire-engines/fire-flink/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory new file mode 100644 index 0000000..a49aff8 --- /dev/null +++ b/fire-engines/fire-flink/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory @@ -0,0 +1 @@ +com.zto.fire.flink.sql.connector.rocketmq.RocketMQDynamicTableFactory \ No newline at end of file diff --git a/fire-engines/fire-flink/src/main/resources/flink-batch.properties b/fire-engines/fire-flink/src/main/resources/flink-batch.properties new file mode 100644 index 0000000..d93aaa3 --- /dev/null +++ b/fire-engines/fire-flink/src/main/resources/flink-batch.properties @@ -0,0 +1,18 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.fire.config_center.enable = false \ No newline at end of file diff --git a/fire-engines/fire-flink/src/main/resources/flink-streaming.properties b/fire-engines/fire-flink/src/main/resources/flink-streaming.properties new file mode 100644 index 0000000..cce3aca --- /dev/null +++ b/fire-engines/fire-flink/src/main/resources/flink-streaming.properties @@ -0,0 +1,16 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# diff --git a/fire-engines/fire-flink/src/main/resources/flink.properties b/fire-engines/fire-flink/src/main/resources/flink.properties new file mode 100644 index 0000000..7316954 --- /dev/null +++ b/fire-engines/fire-flink/src/main/resources/flink.properties @@ -0,0 +1,118 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# ----------------------------------------------- < flink 配置 > ----------------------------------------------- # +# flink的应用名称,为空则取类名 +flink.appName = +# kafka的groupid,为空则取类名 +flink.kafka.group.id = +# bigdata表示连接大数据的kafka,zms表示连接zms的kafka集群 +flink.kafka.brokers.name = +# topic列表 +flink.kafka.topics = +# 用于配置启动时的消费位点,默认取最新 +flink.kafka.starting.offsets = +# 数据丢失时执行失败 +flink.kafka.failOnDataLoss = true +# 是否启用自动commit +flink.kafka.enable.auto.commit = false +# 是否在checkpoint时记录offset值 +flink.kafka.CommitOffsetsOnCheckpoints = true +# 设置从指定时间戳位置开始消费kafka +flink.kafka.StartFromTimestamp = 0 +# 从topic中指定的group上次消费的位置开始消费,必须配置group.id参数 +flink.kafka.StartFromGroupOffsets = false +# flink.kafka.conf开头的配置支持所有kafka client的配置 +#flink.kafka.conf.session.timeout.ms = 300000 +#flink.kafka.conf.request.timeout.ms = 400000 +# 默认的日志级别 +flink.log.level = WARN +# flink sql配置项,以flink.sql.conf.开头将会被自动加载 +#flink.sql.conf.table.exec.mini-batch.enabled = false +#flink.sql.conf.table.exec.state.ttl = 0 ms +# flink sql udf注册,以flink.sql.udf.开头,以下配置的含义是:CREATE FUNCTION fireUdf AS 'com.zto.fire.examples.flink.stream.Udf' +flink.sql.udf.fireUdf = com.zto.fire.examples.flink.stream.Udf +flink.sql.udf.fireUdf.enable = false +# 指定在flink引擎下,可进行配置同步的子类实现 +flink.fire.conf.deploy.engine = com.zto.fire.flink.conf.FlinkEngineConf +# 是否打印组装with语句后的flink sql,由于with表达式中可能含有敏感信息,默认为关闭 +flink.sql.log.enable = false +# 是否启用配置文件中with强制替换sql中已有的with表达式,如果启用,并且配置文件中有指定with配置信息,则会强制替换掉代码中sql的with列表 +flink.sql_with.replaceMode.enable = false + +# ----------------------------------------------- < hive 配置 > ----------------------------------------------- # +# hive 集群名称(batch离线hive/streaming 180集群hive/test本地测试hive),用于flink跨集群读取hive元数据信息 +flink.hive.cluster = +# flink所集成的hive版本号 +flink.hive.version = 1.1.0 +# 默认的hive数据库 +flink.default.database.name = tmp +# 默认的hive分区字段名称 +flink.default.table.partition.name = ds +# hive的catalog名称 +flink.hive.catalog.name = hive + +# ----------------------------------------------- < HBase 配置 > ----------------------------------------------- # +# 用于区分不同的hbase集群: batch/streaming/old +flink.hbase.cluster = batch +# 一次读写HBase的数据量 +flink.hbase.batch.size = 10000 + + +# ----------------------------------------------- < flink 参数 > ----------------------------------------------- # +# flink相关优化参数列在下面会自动被fire加载生效 +flink.auto.generate.uid.enable = true +flink.auto.type.registration.enable = true +flink.force.avro.enable = false +flink.force.kryo.enable = false +flink.generic.types.enable = true +flink.object.reuse.enable = false +flink.auto.watermark.interval = -1 +# 默认值为:RECURSIVE,包括:RECURSIVE、NONE、TOP_LEVEL +flink.closure.cleaner.level = recursive +flink.default.input.dependency.constraint = any +# 默认值:PIPELINED,包括:PIPELINED、PIPELINED_FORCED、BATCH、BATCH_FORCED +flink.execution.mode = pipelined +flink.latency.tracking.interval = +flink.max.parallelism = -1 +flink.default.parallelism = +flink.task.cancellation.interval = +flink.task.cancellation.timeout.millis = +flink.use.snapshot.compression = false +flink.stream.buffer.timeout.millis = +flink.stream.number.execution.retries = +flink.stream.time.characteristic = + +# checkpoint相关配置 +# checkpoint频率,-1表示关闭 +flink.stream.checkpoint.interval = -1 +# EXACTLY_ONCE/AT_LEAST_ONCE +flink.stream.checkpoint.mode = EXACTLY_ONCE +# checkpoint超时时间,单位:毫秒 +flink.stream.checkpoint.timeout = 600000 +# 同时checkpoint操作的并发数 +flink.stream.checkpoint.max.concurrent = 1 +# 两次checkpoint的最小停顿时间 +flink.stream.checkpoint.min.pause.between = 0 +# 如果有更近的checkpoint时,是否将作业回退到该检查点 +flink.stream.checkpoint.prefer.recovery = false +# 可容忍checkpoint失败的次数,默认不允许失败 +flink.stream.checkpoint.tolerable.failure.number = 0 +# 当cancel job时保留checkpoint +flink.stream.checkpoint.externalized = RETAIN_ON_CANCELLATION +# 是否将配置同步到taskmanager端 +flink.fire.deploy_conf.enable = false \ No newline at end of file diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire.scala new file mode 100644 index 0000000..18def70 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire.scala @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto + +import com.zto.fire.core.ext.BaseFireExt +import com.zto.fire.flink.ext.batch.{BatchExecutionEnvExt, BatchTableEnvExt, DataSetExt} +import com.zto.fire.flink.ext.stream._ +import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment} +import org.apache.flink.streaming.api.scala.{DataStream, KeyedStream, StreamExecutionEnvironment} +import org.apache.flink.table.api.Table +import org.apache.flink.table.api.bridge.scala.{BatchTableEnvironment, StreamTableEnvironment} +import org.apache.flink.types.Row + +/** + * 预定义fire框架中的扩展工具 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-12-22 13:51 + */ +package object fire extends BaseFireExt { + + /** + * StreamExecutionEnvironment扩展 + * + * @param env + * StreamExecutionEnvironment对象 + */ + implicit class StreamExecutionEnvExtBridge(env: StreamExecutionEnvironment) extends StreamExecutionEnvExt(env) { + + } + + /** + * StreamTableEnvironment扩展 + * + * @param tableEnv + * StreamTableEnvironment对象 + */ + implicit class StreamTableEnvExtBridge(tableEnv: StreamTableEnvironment) extends StreamTableEnvExt(tableEnv) { + + } + + + /** + * DataStream扩展 + * + * @param dataStream + * DataStream对象 + */ + implicit class DataStreamExtBridge[T](dataStream: DataStream[T]) extends DataStreamExt(dataStream) { + + } + + /** + * KeyedStream扩展 + * + * @param keyedStream + * KeyedStream对象 + */ + implicit class KeyedStreamExtBridge[T, K](keyedStream: KeyedStream[T, K]) extends KeyedStreamExt[T, K](keyedStream) { + + } + + /** + * Table扩展 + * + * @param table + * Table对象 + */ + implicit class StreamTableExtBridge(table: Table) extends TableExt(table) { + + } + + /** + * BatchTableEnvironment扩展 + * + * @param tableEnv + * BatchTableEnvironment对象 + */ + implicit class BatchTableEnvExtBridge(tableEnv: BatchTableEnvironment) extends BatchTableEnvExt(tableEnv) { + + } + + + /** + * ExecutionEnvironment扩展 + * + * @param env + * ExecutionEnvironment对象 + */ + implicit class BatchExecutionEnvExtBridge(env: ExecutionEnvironment) extends BatchExecutionEnvExt(env) { + + } + + /** + * DataSet扩展 + * + * @param dataSet + * DataSet对象 + */ + implicit class DataSetExtBridge[T](dataSet: DataSet[T]) extends DataSetExt(dataSet) { + + } + + /** + * Row扩展 + */ + implicit class RowExtBridge(row: Row) extends RowExt(row) { + + } + + /** + * Flink SQL扩展 + */ + implicit class SQLExtBridge(sql: String) extends SQLExt(sql) { + + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/BaseFlink.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/BaseFlink.scala new file mode 100644 index 0000000..bab9dd9 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/BaseFlink.scala @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink + +import com.zto.fire._ +import com.zto.fire.common.conf.{FireFrameworkConf, FireHDFSConf, FireHiveConf} +import com.zto.fire.common.util.{OSUtils, PropUtils} +import com.zto.fire.core.BaseFire +import com.zto.fire.core.rest.RestServerManager +import com.zto.fire.flink.conf.FireFlinkConf +import com.zto.fire.flink.rest.FlinkSystemRestful +import com.zto.fire.flink.task.FlinkSchedulerManager +import com.zto.fire.flink.util.{FlinkSingletonFactory, FlinkUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.flink.api.common.ExecutionConfig +import org.apache.flink.api.scala.ExecutionEnvironment +import org.apache.flink.configuration.{Configuration, GlobalConfiguration} +import org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.streaming.api.{CheckpointingMode, TimeCharacteristic} +import org.apache.flink.table.catalog.hive.HiveCatalog +import org.apache.hadoop.hive.conf.HiveConf + + +/** + * Flink引擎通用父接口 + * + * @author ChengLong 2020年1月7日 09:31:09 + */ +trait BaseFlink extends BaseFire { + protected[fire] var _conf: Configuration = _ + protected var hiveCatalog: HiveCatalog = _ + + /** + * 生命周期方法:初始化fire框架必要的信息 + * 注:该方法会同时在driver端与executor端执行 + */ + override private[fire] def boot: Unit = { + PropUtils.load(FireFrameworkConf.FLINK_CONF_FILE) + // flink引擎无需主动在父类中主动加载配置信息,配置加载在GlobalConfiguration中完成 + if (OSUtils.isLocal) { + this.loadConf + PropUtils.load(FireFrameworkConf.userCommonConf: _*).load(this.appName) + } + PropUtils.setProperty(FireFlinkConf.FLINK_DRIVER_CLASS_NAME, this.className) + PropUtils.setProperty(FireFlinkConf.FLINK_CLIENT_SIMPLE_CLASS_NAME, this.driverClass) + FlinkSingletonFactory.setAppName(this.appName) + super.boot + } + + /** + * 初始化flink运行时环境 + */ + override private[fire] def createContext(conf: Any): Unit = { + if (FlinkUtils.isYarnApplicationMode) { + // fire rest 服务仅支持flink的yarn-application模式 + this.restfulRegister = new RestServerManager().startRestPort(GlobalConfiguration.getRestPortAndClose) + this.systemRestful = new FlinkSystemRestful(this, this.restfulRegister) + } + PropUtils.show() + FlinkSchedulerManager.getInstance().registerTasks(this) + // 创建HiveCatalog + val metastore = FireHiveConf.getMetastoreUrl + if (StringUtils.isNotBlank(metastore)) { + val hiveConf = new HiveConf() + hiveConf.setVar(HiveConf.ConfVars.METASTOREURIS, metastore) + // 根据所选的hive,进行对应hdfs的HA参数设置 + FireHDFSConf.hdfsHAConf.foreach(prop => hiveConf.set(prop._1, prop._2)) + this.hiveCatalog = new HiveCatalog(FireHiveConf.hiveCatalogName, FireHiveConf.defaultDB, hiveConf, FireHiveConf.hiveVersion) + this.logger.info("enabled flink-hive support.") + } + } + + /** + * 构建或合并Configuration + * 注:不同的子类需根据需要复写该方法 + * + * @param conf + * 在conf基础上构建 + * @return + * 合并后的Configuration对象 + */ + def buildConf(conf: Configuration): Configuration + + /** + * 生命周期方法:用于回收资源 + */ + override def stop: Unit = { + try { + this.after() + } finally { + this.shutdown() + } + } + + /** + * 生命周期方法:进行fire框架的资源回收 + * 注:不允许子类覆盖 + */ + override protected[fire] final def shutdown(stopGracefully: Boolean = true): Unit = { + super.shutdown(stopGracefully) + System.exit(0) + } + + /** + * 用于解析configuration中的配置,识别flink参数(非用户自定义参数),并设置到env中 + */ + private[fire] def configParse(env: Any): ExecutionConfig = { + requireNonEmpty(env)("Environment对象不能为空") + val config = if (env.isInstanceOf[ExecutionEnvironment]) { + val batchEnv = env.asInstanceOf[ExecutionEnvironment] + // flink.default.parallelism + if (FireFlinkConf.defaultParallelism != -1) batchEnv.setParallelism(FireFlinkConf.defaultParallelism) + batchEnv.getConfig + } else { + val streamEnv = env.asInstanceOf[StreamExecutionEnvironment] + // flink.max.parallelism + if (FireFlinkConf.maxParallelism != -1) streamEnv.setMaxParallelism(FireFlinkConf.maxParallelism) + // flink.default.parallelism + if (FireFlinkConf.defaultParallelism != -1) streamEnv.setParallelism(FireFlinkConf.defaultParallelism) + // flink.stream.buffer.timeout.millis + if (FireFlinkConf.streamBufferTimeoutMillis != -1) streamEnv.setBufferTimeout(FireFlinkConf.streamBufferTimeoutMillis) + // flink.stream.number.execution.retries + if (FireFlinkConf.streamNumberExecutionRetries != -1) streamEnv.setNumberOfExecutionRetries(FireFlinkConf.streamNumberExecutionRetries) + // flink.stream.time.characteristic + if (StringUtils.isNotBlank(FireFlinkConf.streamTimeCharacteristic)) streamEnv.setStreamTimeCharacteristic(TimeCharacteristic.valueOf(FireFlinkConf.streamTimeCharacteristic)) + + // checkPoint相关参数 + val ckConfig = streamEnv.getCheckpointConfig + if (ckConfig != null && FireFlinkConf.streamCheckpointInterval != -1) { + // flink.stream.checkpoint.interval 单位:毫秒 默认:-1 关闭 + streamEnv.enableCheckpointing(FireFlinkConf.streamCheckpointInterval) + // flink.stream.checkpoint.mode EXACTLY_ONCE/AT_LEAST_ONCE 默认:EXACTLY_ONCE + if (StringUtils.isNotBlank(FireFlinkConf.streamCheckpointMode)) ckConfig.setCheckpointingMode(CheckpointingMode.valueOf(FireFlinkConf.streamCheckpointMode.trim.toUpperCase)) + // flink.stream.checkpoint.timeout 单位:毫秒 默认:10 * 60 * 1000 + if (FireFlinkConf.streamCheckpointTimeout > 0) ckConfig.setCheckpointTimeout(FireFlinkConf.streamCheckpointTimeout) + // flink.stream.checkpoint.max.concurrent 默认:1 + if (FireFlinkConf.streamCheckpointMaxConcurrent > 0) ckConfig.setMaxConcurrentCheckpoints(FireFlinkConf.streamCheckpointMaxConcurrent) + // flink.stream.checkpoint.min.pause.between 默认:0 + if (FireFlinkConf.streamCheckpointMinPauseBetween >= 0) ckConfig.setMinPauseBetweenCheckpoints(FireFlinkConf.streamCheckpointMinPauseBetween) + // flink.stream.checkpoint.prefer.recovery 默认:false + ckConfig.setPreferCheckpointForRecovery(FireFlinkConf.streamCheckpointPreferRecovery) + // flink.stream.checkpoint.tolerable.failure.number 默认:0 + if (FireFlinkConf.streamCheckpointTolerableTailureNumber >= 0) ckConfig.setTolerableCheckpointFailureNumber(FireFlinkConf.streamCheckpointTolerableTailureNumber) + // flink.stream.checkpoint.externalized + if (StringUtils.isNotBlank(FireFlinkConf.streamCheckpointExternalized)) ckConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.valueOf(FireFlinkConf.streamCheckpointExternalized.trim)) + } + + streamEnv.getConfig + } + FlinkUtils.parseConf(config) + + config + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/BaseFlinkBatch.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/BaseFlinkBatch.scala new file mode 100644 index 0000000..a74b11c --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/BaseFlinkBatch.scala @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink + +import com.zto.fire.common.conf.{FireFrameworkConf, FireHiveConf} +import com.zto.fire.common.enu.JobType +import com.zto.fire.common.util.{OSUtils, PropUtils} +import com.zto.fire.flink.util.FlinkSingletonFactory +import org.apache.commons.lang3.StringUtils +import org.apache.flink.api.java.utils.ParameterTool +import org.apache.flink.api.scala.ExecutionEnvironment +import org.apache.flink.configuration.{ConfigConstants, Configuration} +import org.apache.flink.table.api.bridge.scala.BatchTableEnvironment + +/** + * flink batch通用父接口 + * @author ChengLong 2020年1月7日 15:15:56 + */ +trait BaseFlinkBatch extends BaseFlink { + override val jobType: JobType = JobType.FLINK_BATCH + protected var env, flink, fire: ExecutionEnvironment = _ + protected var tableEnv: BatchTableEnvironment = _ + + /** + * 构建或合并Configuration + * 注:不同的子类需根据需要复写该方法 + * + * @param conf + * 在conf基础上构建 + * @return + * 合并后的Configuration对象 + */ + override def buildConf(conf: Configuration): Configuration = { + val finalConf = if (conf != null) conf else { + val tmpConf = new Configuration() + PropUtils.settings.foreach(t => tmpConf.setString(t._1, t._2)) + tmpConf + } + finalConf.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true) + + this._conf = finalConf + finalConf + } + + + /** + * 程序初始化方法,用于初始化必要的值 + * + * @param conf + * 用户指定的配置信息 + * @param args + * main方法参数列表 + */ + override def init(conf: Any = null, args: Array[String] = null): Unit = { + super.init(conf, args) + if (conf != null) conf.asInstanceOf[Configuration].setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true) + + this.process + } + + /** + * 创建计算引擎运行时环境 + * + * @param conf + * 配置信息 + */ + override private[fire] def createContext(conf: Any): Unit = { + super.createContext(conf) + val finalConf = this.buildConf(conf.asInstanceOf[Configuration]) + if (OSUtils.isLocal) { + this.env = ExecutionEnvironment.createLocalEnvironmentWithWebUI(finalConf) + } else { + this.env = ExecutionEnvironment.getExecutionEnvironment + } + this.env.getConfig.setGlobalJobParameters(ParameterTool.fromMap(finalConf.toMap)) + this.configParse(this.env) + this.tableEnv = BatchTableEnvironment.create(this.env) + if (StringUtils.isNotBlank(FireHiveConf.getHiveConfDir)) { + this.tableEnv.registerCatalog(FireHiveConf.hiveCatalogName, this.hiveCatalog) + } + this.flink = this.env + this.fire = this.flink + FlinkSingletonFactory.setEnv(this.env).setTableEnv(this.tableEnv) + } + + /** + * 在加载任务配置文件前将被加载 + */ + override private[fire] def loadConf(): Unit = { + // 加载配置文件 + PropUtils.load(FireFrameworkConf.FLINK_BATCH_CONF_FILE) + } + + /** + * 生命周期方法:具体的用户开发的业务逻辑代码 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + override def process: Unit = { + // 子类复写该方法实现业务处理逻辑 + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/BaseFlinkStreaming.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/BaseFlinkStreaming.scala new file mode 100644 index 0000000..df5f20e --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/BaseFlinkStreaming.scala @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink + +import com.zto.fire._ +import com.zto.fire.common.conf.{FireFrameworkConf, FireHiveConf} +import com.zto.fire.common.enu.JobType +import com.zto.fire.common.util.{OSUtils, PropUtils} +import com.zto.fire.flink.conf.FireFlinkConf +import com.zto.fire.flink.util.{FlinkSingletonFactory, FlinkUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.flink.api.java.utils.ParameterTool +import org.apache.flink.api.scala._ +import org.apache.flink.configuration.{ConfigConstants, Configuration} +import org.apache.flink.streaming.api.scala.{OutputTag, StreamExecutionEnvironment} +import org.apache.flink.table.api.EnvironmentSettings +import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment + +/** + * flink streaming通用父接口 + * + * @author ChengLong 2020年1月7日 10:50:19 + */ +trait BaseFlinkStreaming extends BaseFlink { + protected var env, senv, flink, fire: StreamExecutionEnvironment = _ + protected var tableEnv: StreamTableEnvironment = _ + override val jobType: JobType = JobType.FLINK_STREAMING + // 用于存放延期的数据 + protected lazy val outputTag = new OutputTag[Any]("later_data") + + + /** + * 构建或合并Configuration + * 注:不同的子类需根据需要复写该方法 + * + * @param conf + * 在conf基础上构建 + * @return + * 合并后的Configuration对象 + */ + override def buildConf(conf: Configuration): Configuration = { + val finalConf = if (conf != null) conf else { + val tmpConf = new Configuration() + PropUtils.settings.foreach(t => tmpConf.setString(t._1, t._2)) + tmpConf + } + finalConf.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true) + + this._conf = finalConf + finalConf + } + + /** + * 程序初始化方法,用于初始化必要的值 + * + * @param conf + * 用户指定的配置信息 + * @param args + * main方法参数列表 + */ + override def init(conf: Any = null, args: Array[String] = null): Unit = { + super.init(conf, args) + this.process + } + + /** + * 初始化flink运行时环境 + */ + override def createContext(conf: Any): Unit = { + super.createContext(conf) + if (FlinkUtils.isYarnApplicationMode) this.restfulRegister.startRestServer + val finalConf = this.buildConf(conf.asInstanceOf[Configuration]) + if (OSUtils.isLocal) { + this.env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(finalConf) + } else { + this.env = StreamExecutionEnvironment.getExecutionEnvironment + } + this.env.getConfig.setGlobalJobParameters(ParameterTool.fromMap(finalConf.toMap)) + this.configParse(this.env) + this.senv = this.env + val settings = EnvironmentSettings.newInstance.useBlinkPlanner.inStreamingMode.build + this.tableEnv = StreamTableEnvironment.create(this.env, settings) + val tableConfig = this.tableEnv.getConfig.getConfiguration + FireFlinkConf.flinkSqlConfig.filter(kv => noEmpty(kv, kv._1, kv._2)).foreach(kv => tableConfig.setString(kv._1, kv._2)) + if (StringUtils.isNotBlank(FireHiveConf.getMetastoreUrl)) { + this.tableEnv.registerCatalog(FireHiveConf.hiveCatalogName, this.hiveCatalog) + } + this.flink = this.env + this.fire = this.flink + FlinkSingletonFactory.setStreamEnv(this.env).setStreamTableEnv(this.tableEnv) + FlinkUtils.loadUdfJar + // 自动注册配置文件中指定的udf函数 + if (FireFlinkConf.flinkUdfEnable) { + FireFlinkConf.flinkUdfList.filter(udf => noEmpty(udf, udf._1, udf._2)).foreach(udf => { + val createFunction = s"CREATE FUNCTION ${udf._1} AS '${udf._2}'" + this.tableEnv.executeSql(createFunction) + logger.info(s"execute sql: $createFunction") + }) + } + } + + /** + * 在加载任务配置文件前将被加载 + */ + override private[fire] def loadConf(): Unit = { + // 加载配置文件 + PropUtils.load(FireFrameworkConf.FLINK_STREAMING_CONF_FILE) + } + + /** + * 生命周期方法:具体的用户开发的业务逻辑代码 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + override def process: Unit = { + // 子类复写该方法实现业务处理逻辑 + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/acc/MultiCounterAccumulator.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/acc/MultiCounterAccumulator.scala new file mode 100644 index 0000000..e0826bb --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/acc/MultiCounterAccumulator.scala @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.acc + +import java.util.concurrent.ConcurrentHashMap + +import com.zto.fire.predef._ +import org.apache.flink.api.common.accumulators.{Accumulator, SimpleAccumulator} + + +/** + * flink 自定义多值累加器 + * + * @author ChengLong 2020年1月11日 13:58:15 + * @since 0.4.1 + */ +private[fire] class MultiCounterAccumulator extends SimpleAccumulator[ConcurrentHashMap[String, Long]] { + private[fire] val multiCounter = new ConcurrentHashMap[String, Long]() + + /** + * 向累加器中添加新的值 + * + * @param value + */ + override def add(value: ConcurrentHashMap[String, Long]): Unit = { + this.mergeMap(value) + } + + /** + * 添加一个值到累加器中 + */ + def add(kv: (String, Long)): Unit = { + if (kv != null) { + this.multiCounter.put(kv._1, this.multiCounter.getOrDefault(kv._1, 0) + kv._2) + } + } + + /** + * 获取当前本地的累加器中的值 + * + * @return + * 当前jvm中的累加器值,非全局 + */ + override def getLocalValue: ConcurrentHashMap[String, Long] = { + this.multiCounter + } + + /** + * 清空当前本地的累加值 + */ + override def resetLocal(): Unit = { + this.multiCounter.clear() + } + + /** + * 合并两个累加器中的值 + */ + override def merge(other: Accumulator[ConcurrentHashMap[String, Long], ConcurrentHashMap[String, Long]]): Unit = { + this.mergeMap(other.getLocalValue) + } + + /** + * 用于合并数据到累加器的map中 + * 存在的累加,不存在的直接添加 + */ + private[this] def mergeMap(value: ConcurrentHashMap[String, Long]): Unit = { + if (noEmpty(value)) { + value.foreach(kv => { + this.multiCounter.put(kv._1, this.multiCounter.getOrDefault(kv._1, 0) + kv._2) + }) + } + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/conf/FireFlinkConf.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/conf/FireFlinkConf.scala new file mode 100644 index 0000000..8335f6c --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/conf/FireFlinkConf.scala @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.conf + +import com.zto.fire.common.util.PropUtils + +/** + * flink相关配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 14:55 + */ +private[fire] object FireFlinkConf { + lazy val FLINK_AUTO_GENERATE_UID_ENABLE = "flink.auto.generate.uid.enable" + lazy val FLINK_AUTO_TYPE_REGISTRATION_ENABLE = "flink.auto.type.registration.enable" + lazy val FLINK_FORCE_AVRO_ENABLE = "flink.force.avro.enable" + lazy val FLINK_FORCE_KRYO_ENABLE = "flink.force.kryo.enable" + lazy val FLINK_GENERIC_TYPES_ENABLE = "flink.generic.types.enable" + lazy val FLINK_OBJECT_REUSE_ENABLE = "flink.object.reuse.enable" + lazy val FLINK_AUTO_WATERMARK_INTERVAL = "flink.auto.watermark.interval" + lazy val FLINK_CLOSURE_CLEANER_LEVEL = "flink.closure.cleaner.level" + lazy val FLINK_DEFAULT_INPUT_DEPENDENCY_CONSTRAINT = "flink.default.input.dependency.constraint" + lazy val FLINK_EXECUTION_MODE = "flink.execution.mode" + lazy val FLINK_LATENCY_TRACKING_INTERVAL = "flink.latency.tracking.interval" + lazy val FLINK_MAX_PARALLELISM = "flink.max.parallelism" + lazy val FLINK_DEFAULT_PARALLELISM = "flink.default.parallelism" + lazy val FLINK_TASK_CANCELLATION_INTERVAL = "flink.task.cancellation.interval" + lazy val FLINK_TASK_CANCELLATION_TIMEOUT_MILLIS = "flink.task.cancellation.timeout.millis" + lazy val FLINK_USE_SNAPSHOT_COMPRESSION = "flink.use.snapshot.compression" + lazy val FLINK_STREAM_BUFFER_TIMEOUT_MILLIS = "flink.stream.buffer.timeout.millis" + lazy val FLINK_STREAM_NUMBER_EXECUTION_RETRIES = "flink.stream.number.execution.retries" + lazy val FLINK_STREAM_TIME_CHARACTERISTIC = "flink.stream.time.characteristic" + lazy val FLINK_DRIVER_CLASS_NAME = "flink.driver.class.name" + lazy val FLINK_CLIENT_SIMPLE_CLASS_NAME = "flink.client.simple.class.name" + lazy val FLINK_SQL_CONF_UDF_JARS = "flink.sql.conf.pipeline.jars" + lazy val FLINK_SQL_LOG_ENABLE = "flink.sql.log.enable" + + // checkpoint相关配置项 + lazy val FLINK_STREAM_CHECKPOINT_INTERVAL = "flink.stream.checkpoint.interval" + lazy val FLINK_STREAM_CHECKPOINT_MODE = "flink.stream.checkpoint.mode" + lazy val FLINK_STREAM_CHECKPOINT_TIMEOUT = "flink.stream.checkpoint.timeout" + lazy val FLINK_STREAM_CHECKPOINT_MAX_CONCURRENT = "flink.stream.checkpoint.max.concurrent" + lazy val FLINK_STREAM_CHECKPOINT_MIN_PAUSE_BETWEEN = "flink.stream.checkpoint.min.pause.between" + lazy val FLINK_STREAM_CHECKPOINT_PREFER_RECOVERY = "flink.stream.checkpoint.prefer.recovery" + lazy val FLINK_STREAM_CHECKPOINT_TOLERABLE_FAILURE_NUMBER = "flink.stream.checkpoint.tolerable.failure.number" + lazy val FLINK_STREAM_CHECKPOINT_EXTERNALIZED = "flink.stream.checkpoint.externalized" + lazy val FLINK_SQL_WITH_REPLACE_MODE_ENABLE = "flink.sql_with.replaceMode.enable" + + // flink sql相关配置 + lazy val FLINK_SQL_CONF_PREFIX = "flink.sql.conf." + // udf自动注册 + lazy val FLINK_SQL_UDF = "flink.sql.udf." + lazy val FLINK_SQL_UDF_ENABLE = "flink.sql.udf.fireUdf.enable" + lazy val FLINK_SQL_WITH_PREFIX = "flink.sql.with." + + lazy val sqlWithReplaceModeEnable = PropUtils.getBoolean(this.FLINK_SQL_WITH_REPLACE_MODE_ENABLE, false) + lazy val autoGenerateUidEnable = PropUtils.getBoolean(this.FLINK_AUTO_GENERATE_UID_ENABLE, true) + lazy val autoTypeRegistrationEnable = PropUtils.getBoolean(this.FLINK_AUTO_TYPE_REGISTRATION_ENABLE, true) + lazy val forceAvroEnable = PropUtils.getBoolean(this.FLINK_FORCE_AVRO_ENABLE, false) + lazy val forceKryoEnable = PropUtils.getBoolean(this.FLINK_FORCE_KRYO_ENABLE, false) + lazy val genericTypesEnable = PropUtils.getBoolean(this.FLINK_GENERIC_TYPES_ENABLE, false) + lazy val objectReuseEnable = PropUtils.getBoolean(this.FLINK_OBJECT_REUSE_ENABLE, false) + lazy val autoWatermarkInterval = PropUtils.getLong(this.FLINK_AUTO_WATERMARK_INTERVAL, -1) + lazy val closureCleanerLevel = PropUtils.getString(this.FLINK_CLOSURE_CLEANER_LEVEL) + lazy val defaultInputDependencyConstraint = PropUtils.getString(this.FLINK_DEFAULT_INPUT_DEPENDENCY_CONSTRAINT) + lazy val executionMode = PropUtils.getString(this.FLINK_EXECUTION_MODE) + lazy val latencyTrackingInterval = PropUtils.getLong(this.FLINK_LATENCY_TRACKING_INTERVAL, -1) + lazy val maxParallelism = PropUtils.getInt(this.FLINK_MAX_PARALLELISM, 8) + lazy val defaultParallelism = PropUtils.getInt(this.FLINK_DEFAULT_PARALLELISM, -1) + lazy val taskCancellationInterval = PropUtils.getLong(this.FLINK_TASK_CANCELLATION_INTERVAL, -1) + lazy val taskCancellationTimeoutMillis = PropUtils.getLong(this.FLINK_TASK_CANCELLATION_TIMEOUT_MILLIS, -1) + lazy val useSnapshotCompression = PropUtils.getBoolean(this.FLINK_USE_SNAPSHOT_COMPRESSION, false) + lazy val streamBufferTimeoutMillis = PropUtils.getLong(this.FLINK_STREAM_BUFFER_TIMEOUT_MILLIS, -1) + lazy val streamNumberExecutionRetries = PropUtils.getInt(this.FLINK_STREAM_NUMBER_EXECUTION_RETRIES, -1) + lazy val streamTimeCharacteristic = PropUtils.getString(this.FLINK_STREAM_TIME_CHARACTERISTIC, "") + lazy val sqlLogEnable = PropUtils.getBoolean(this.FLINK_SQL_LOG_ENABLE, false) + + // checkpoint相关配置项 + lazy val streamCheckpointInterval = PropUtils.getLong(this.FLINK_STREAM_CHECKPOINT_INTERVAL, -1) + lazy val streamCheckpointMode = PropUtils.getString(this.FLINK_STREAM_CHECKPOINT_MODE, "EXACTLY_ONCE") + lazy val streamCheckpointTimeout = PropUtils.getLong(this.FLINK_STREAM_CHECKPOINT_TIMEOUT, 600000L) + lazy val streamCheckpointMaxConcurrent = PropUtils.getInt(this.FLINK_STREAM_CHECKPOINT_MAX_CONCURRENT, 1) + lazy val streamCheckpointMinPauseBetween = PropUtils.getInt(this.FLINK_STREAM_CHECKPOINT_MIN_PAUSE_BETWEEN, 0) + lazy val streamCheckpointPreferRecovery = PropUtils.getBoolean(this.FLINK_STREAM_CHECKPOINT_PREFER_RECOVERY, false) + lazy val streamCheckpointTolerableTailureNumber = PropUtils.getInt(this.FLINK_STREAM_CHECKPOINT_TOLERABLE_FAILURE_NUMBER, 0) + lazy val streamCheckpointExternalized = PropUtils.getString(this.FLINK_STREAM_CHECKPOINT_EXTERNALIZED, "RETAIN_ON_CANCELLATION") + + // flink sql相关配置 + lazy val flinkSqlConfig = PropUtils.sliceKeys(this.FLINK_SQL_CONF_PREFIX) + // 用于自动注册udf jar包中的函数 + lazy val flinkUdfList = PropUtils.sliceKeys(this.FLINK_SQL_UDF) + // 是否启用fire udf注册功能 + lazy val flinkUdfEnable = PropUtils.getBoolean(this.FLINK_SQL_UDF_ENABLE, true) +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/conf/FlinkEngineConf.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/conf/FlinkEngineConf.scala new file mode 100644 index 0000000..148e182 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/conf/FlinkEngineConf.scala @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.conf + +import com.zto.fire.common.util.ReflectionUtils +import com.zto.fire.core.conf.EngineConf +import com.zto.fire.flink.util.FlinkUtils +import com.zto.fire.predef._ + +/** + * 获取Spark引擎的所有配置信息 + * + * @author ChengLong + * @since 1.0.0 + * @create 2021-03-02 11:12 + */ +private[fire] class FlinkEngineConf extends EngineConf { + + /** + * 获取Flink引擎的所有配置信息 + */ + override def getEngineConf: Map[String, String] = { + if (FlinkUtils.isJobManager) { + // 如果是JobManager端,则需将flink参数和用户参数进行合并,并从合并后的settings中获取 + val clazz = Class.forName("org.apache.flink.configuration.GlobalConfiguration") + if (ReflectionUtils.containsMethod(clazz, "getSettings")) { + return clazz.getMethod("getSettings").invoke(null).asInstanceOf[JMap[String, String]].toMap + } + } else if (FlinkUtils.isTaskManager) { + // 如果是TaskManager端,则flink会通过EnvironmentInformation将参数进行传递 + val clazz = Class.forName("org.apache.flink.runtime.util.EnvironmentInformation") + if (ReflectionUtils.containsMethod(clazz, "getSettings")) { + return clazz.getMethod("getSettings").invoke(null).asInstanceOf[JMap[String, String]].toMap + } + } + new JHashMap[String, String]().toMap + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/batch/BatchExecutionEnvExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/batch/BatchExecutionEnvExt.scala new file mode 100644 index 0000000..6531dc7 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/batch/BatchExecutionEnvExt.scala @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.batch + +import com.zto.fire.common.util.ValueUtils +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment} + +import scala.reflect.ClassTag + +/** + * 用于flink ExecutionEnvironment API库扩展 + * + * @author ChengLong 2020年1月9日 13:52:16 + * @since 0.4.1 + */ +class BatchExecutionEnvExt(env: ExecutionEnvironment) { + + /** + * 提交job执行 + * + * @param jobName + * job名称 + */ + def start(jobName: String = ""): Unit = { + if (ValueUtils.isEmpty(jobName)) this.env.execute() else this.env.execute(jobName) + } + + /** + * 使用集合元素创建DataStream + * @param seq + * 元素集合 + * @tparam T + * 元素的类型 + */ + def parallelize[T: TypeInformation: ClassTag](seq: Seq[T], parallelism: Int = this.env.getParallelism): DataSet[T] = { + this.env.fromCollection[T](seq).setParallelism(parallelism) + } + + /** + * 使用集合元素创建DataStream + * @param seq + * 元素集合 + * @tparam T + * 元素的类型 + */ + def createCollectionDataSet[T: TypeInformation: ClassTag](seq: Seq[T], parallelism: Int = this.env.getParallelism): DataSet[T] = this.parallelize[T](seq, parallelism) +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/batch/BatchTableEnvExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/batch/BatchTableEnvExt.scala new file mode 100644 index 0000000..48b4ab2 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/batch/BatchTableEnvExt.scala @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.batch + +import com.zto.fire.jdbc.JdbcConnectorBridge +import org.apache.flink.table.api.Table +import org.apache.flink.table.api.bridge.scala.BatchTableEnvironment + +/** + * 用于flink BatchTableEnvironment API库扩展 + * + * @author ChengLong 2020年1月9日 13:52:16 + * @since 0.4.1 + */ +class BatchTableEnvExt(env: BatchTableEnvironment) extends JdbcConnectorBridge { + + /** + * 执行sql query操作 + * + * @param sql + * sql语句 + * @return + * table对象 + */ + def sql(sql: String): Table = { + this.env.sqlQuery(sql) + } + +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/batch/DataSetExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/batch/DataSetExt.scala new file mode 100644 index 0000000..3126475 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/batch/DataSetExt.scala @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.batch + +import com.zto.fire.flink.util.FlinkSingletonFactory +import org.apache.flink.api.scala.DataSet +import org.apache.flink.table.api.Table + +/** + * 用于对Flink DataSet的API库扩展 + * + * @author ChengLong 2020年1月15日 16:35:03 + * @since 0.4.1 + */ +class DataSetExt[T](dataSet: DataSet[T]){ + lazy val tableEnv = FlinkSingletonFactory.getBatchTableEnv + + /** + * 将DataSet注册为临时表 + * + * @param tableName + * 临时表的表名 + */ + def createOrReplaceTempView(tableName: String): Table = { + val table = this.tableEnv.fromDataSet(this.dataSet) + this.tableEnv.createTemporaryView(tableName, table) + table + } + + /** + * 设置并行度 + */ + def repartition(parallelism: Int): DataSet[T] = { + this.dataSet.setParallelism(parallelism) + } + + /** + * 将DataSet转为Table + */ + def toTable: Table = { + this.tableEnv.fromDataSet(this.dataSet) + } + + +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/function/FireMapFunction.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/function/FireMapFunction.scala new file mode 100644 index 0000000..84db3b2 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/function/FireMapFunction.scala @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.function + +import java.io.File +import java.lang + +import com.zto.fire._ +import org.apache.flink.api.common.functions._ +import org.apache.flink.api.common.state._ +import org.apache.flink.util.Collector +import org.slf4j.LoggerFactory + +import scala.reflect.ClassTag + +/** + * 增强的MapFunction + * + * @tparam I 输入数据类型 + * @tparam O 输出数据类型(map后) + * @author ChengLong 2021-01-04 09:39:55 + */ +abstract class FireMapFunction[I, O] extends AbstractRichFunction with MapFunction[I, O] with MapPartitionFunction[I, O] with FlatMapFunction[I, O] { + protected lazy val logger = LoggerFactory.getLogger(this.getClass) + protected lazy val runtimeContext = this.getRuntimeContext() + private[this] lazy val stateMap = new JConcurrentHashMap[String, State]() + + /** + * 根据name获取ValueState + */ + protected def getState[T: ClassTag](name: String, ttlConfig: StateTtlConfig = null): ValueState[T] = { + this.stateMap.mergeGet(name) { + val desc = new ValueStateDescriptor[T](name, getParamType[T]) + if (ttlConfig != null) desc.enableTimeToLive(ttlConfig) + this.runtimeContext.getState[T](desc) + }.asInstanceOf[ValueState[T]] + } + + /** + * 根据name获取ListState + */ + protected def getListState[T: ClassTag](name: String, ttlConfig: StateTtlConfig = null): ListState[T] = { + this.stateMap.mergeGet(name) { + val desc = new ListStateDescriptor[T](name, getParamType[T]) + if (ttlConfig != null) desc.enableTimeToLive(ttlConfig) + this.runtimeContext.getListState[T](desc) + }.asInstanceOf[ListState[T]] + } + + /** + * 根据name获取MapState + */ + protected def getMapState[K: ClassTag, V: ClassTag](name: String, ttlConfig: StateTtlConfig = null): MapState[K, V] = { + this.stateMap.mergeGet(name) { + val desc = new MapStateDescriptor[K, V](name, getParamType[K], getParamType[V]) + if (ttlConfig != null) desc.enableTimeToLive(ttlConfig) + this.runtimeContext.getMapState[K, V](desc) + }.asInstanceOf[MapState[K, V]] + } + + /** + * 根据name获取ReducingState + */ + protected def getReducingState[T: ClassTag](name: String, reduceFun: (T, T) => T, ttlConfig: StateTtlConfig = null): ReducingState[T] = { + this.stateMap.mergeGet(name) { + val desc = new ReducingStateDescriptor[T](name, new ReduceFunction[T] { + override def reduce(value1: T, value2: T): T = reduceFun(value1, value2) + }, getParamType[T]) + if (ttlConfig != null) desc.enableTimeToLive(ttlConfig) + this.runtimeContext.getReducingState[T](desc) + }.asInstanceOf[ReducingState[T]] + } + + /** + * 根据name获取AggregatingState + */ + protected def getAggregatingState[T: ClassTag](name: String, aggFunction: AggregateFunction[I, T, O], ttlConfig: StateTtlConfig = null): AggregatingState[I, O] = { + this.stateMap.mergeGet(name) { + val desc = new AggregatingStateDescriptor[I, T, O](name, aggFunction, getParamType[T]) + if (ttlConfig != null) desc.enableTimeToLive(ttlConfig) + this.runtimeContext.getAggregatingState(desc) + }.asInstanceOf[AggregatingState[I, O]] + } + + /** + * 根据name获取广播变量 + * + * @param name 广播变量名称 + * @tparam T + * 广播变量的类型 + * @return + * 广播变量引用 + */ + protected def getBroadcastVariable[T](name: String): Seq[T] = { + requireNonEmpty(name)("广播变量名称不能为空") + this.runtimeContext.getBroadcastVariable[T](name) + } + + /** + * 将值添加到指定的累加器中 + * + * @param name + * 累加器名称 + * @param value + * 待累加的值 + * @tparam T + * 累加值的类型(Int/Long/Double) + */ + protected def addCounter[T: ClassTag](name: String, value: T): Unit = { + requireNonEmpty(name, value) + getParamType[T] match { + case valueType if valueType eq classOf[Int] => this.runtimeContext.getIntCounter(name).add(value.asInstanceOf[Int]) + case valueType if valueType eq classOf[Long] => this.runtimeContext.getLongCounter(name).add(value.asInstanceOf[Long]) + case valueType if valueType eq classOf[Double] => this.runtimeContext.getDoubleCounter(name).add(value.asInstanceOf[Double]) + } + } + + + /** + * 根据文件名获取分布式缓存文件 + * + * @param fileName 缓存文件名称 + * @return 被缓存的文件 + */ + protected def DistributedCache(fileName: String): File = { + requireNonEmpty(fileName)("分布式缓存文件名称不能为空!") + this.runtimeContext.getDistributedCache.getFile(fileName) + } + + + override def map(t: I): O = null.asInstanceOf[O] + + override def mapPartition(iterable: lang.Iterable[I], collector: Collector[O]): Unit = {} + + override def flatMap(t: I, collector: Collector[O]): Unit = {} +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/provider/HBaseConnectorProvider.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/provider/HBaseConnectorProvider.scala new file mode 100644 index 0000000..36116a1 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/provider/HBaseConnectorProvider.scala @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.provider + +import com.zto.fire._ +import com.zto.fire.hbase.bean.HBaseBaseBean +import org.apache.flink.streaming.api.datastream.DataStreamSink +import org.apache.flink.streaming.api.scala.DataStream +import org.apache.flink.table.api.Table +import org.apache.flink.types.Row + +import scala.reflect.ClassTag + +/** + * 为上层扩展层提供HBaseConnector API + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-24 10:16 + */ +trait HBaseConnectorProvider { + + /** + * hbase批量sink操作,DataStream[T]中的T必须是HBaseBaseBean的子类 + * + * @param tableName + * hbase表名 + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def hbasePutDS[T <: HBaseBaseBean[T]: ClassTag](stream: DataStream[T], + tableName: String, + batch: Int = 100, + flushInterval: Long = 3000, + keyNum: Int = 1): DataStreamSink[_] = { + stream.hbasePutDS(tableName, batch, flushInterval, keyNum) + } + + /** + * hbase批量sink操作,DataStream[T]中的T必须是HBaseBaseBean的子类 + * + * @param tableName + * hbase表名 + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + * @param fun + * 将dstream中的数据映射为该sink组件所能处理的数据 + */ + def hbasePutDS2[T <: HBaseBaseBean[T] : ClassTag](stream: DataStream[T], + tableName: String, + batch: Int = 100, + flushInterval: Long = 3000, + keyNum: Int = 1)(fun: T => T): DataStreamSink[_] = { + stream.hbasePutDS2[T](tableName, batch, flushInterval, keyNum)(fun) + } + + /** + * table的hbase批量sink操作,该api需用户定义row的取数规则,并映射到对应的HBaseBaseBean的子类中 + * + * @param tableName + * HBase表名 + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def hbasePutTable[T <: HBaseBaseBean[T]: ClassTag](table: Table, + tableName: String, + batch: Int = 100, + flushInterval: Long = 3000, + keyNum: Int = 1): DataStreamSink[_] = { + table.hbasePutTable[T](tableName, batch, flushInterval, keyNum) + } + + /** + * table的hbase批量sink操作,该api需用户定义row的取数规则,并映射到对应的HBaseBaseBean的子类中 + * + * @param tableName + * HBase表名 + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def hbasePutTable2[T <: HBaseBaseBean[T]: ClassTag](table: Table, + tableName: String, + batch: Int = 100, + flushInterval: Long = 3000, + keyNum: Int = 1)(fun: Row => T): DataStreamSink[_] = { + table.hbasePutTable2[T](tableName, batch, flushInterval, keyNum)(fun) + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/provider/JdbcFlinkProvider.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/provider/JdbcFlinkProvider.scala new file mode 100644 index 0000000..6d9d607 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/provider/JdbcFlinkProvider.scala @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.provider + +import com.zto.fire._ +import org.apache.flink.streaming.api.datastream.DataStreamSink +import org.apache.flink.streaming.api.scala.DataStream +import org.apache.flink.table.api.Table +import org.apache.flink.types.Row + +/** + * 为上层扩展层提供JDBC相关API + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-24 10:18 + */ +trait JdbcFlinkProvider { + + /** + * jdbc批量sink操作,根据用户指定的DataStream中字段的顺序,依次填充到sql中的占位符所对应的位置 + * 注: + * 1. fieldList指定DataStream中JavaBean的字段名称,非jdbc表中的字段名称 + * 2. fieldList多个字段使用逗号分隔 + * 3. fieldList中的字段顺序要与sql中占位符顺序保持一致,数量一致 + * + * @param sql + * 增删改sql + * @param fields + * DataStream中数据的每一列的列名(非数据库中的列名,需与sql中占位符的顺序一致) + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def jdbcBatchUpdateStream[T](stream: DataStream[T], + sql: String, + fields: Seq[String], + batch: Int = 10, + flushInterval: Long = 1000, + keyNum: Int = 1): DataStreamSink[T] = { + stream.jdbcBatchUpdate(sql, fields, batch, flushInterval, keyNum) + } + + /** + * jdbc批量sink操作 + * + * @param sql + * 增删改sql + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + * @param fun + * 将dstream中的数据映射为该sink组件所能处理的数据 + */ + def jdbcBatchUpdateStream2[T](stream: DataStream[T], + sql: String, + batch: Int = 10, + flushInterval: Long = 1000, + keyNum: Int = 1)(fun: T => Seq[Any]): DataStreamSink[T] = { + stream.jdbcBatchUpdate2(sql, batch, flushInterval, keyNum)(fun) + } + + /** + * table的jdbc批量sink操作,根据用户指定的Row中字段的顺序,依次填充到sql中的占位符所对应的位置 + * 注: + * 1. Row中的字段顺序要与sql中占位符顺序保持一致,数量一致 + * 2. 目前仅处理Retract中的true消息,用户需手动传入merge语句 + * + * @param sql + * 增删改sql + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def jdbcBatchUpdateTable(table: Table, + sql: String, + batch: Int = 10, + flushInterval: Long = 1000, + isMerge: Boolean = true, + keyNum: Int = 1): DataStreamSink[Row] = { + table.jdbcBatchUpdate(sql, batch, flushInterval, isMerge, keyNum) + } + + /** + * table的jdbc批量sink操作,该api需用户定义row的取数规则,并与sql中的占位符对等 + * + * @param sql + * 增删改sql + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def jdbcBatchUpdateTable2(table: Table, + sql: String, + batch: Int = 10, + flushInterval: Long = 1000, + isMerge: Boolean = true, + keyNum: Int = 1)(fun: Row => Seq[Any]): DataStreamSink[Row] = { + table.jdbcBatchUpdate2(sql, batch, flushInterval, isMerge, keyNum)(fun) + } + +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/DataStreamExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/DataStreamExt.scala new file mode 100644 index 0000000..baaa66e --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/DataStreamExt.scala @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.stream + +import java.lang.reflect.Field + +import com.zto.fire.common.util.ReflectionUtils +import com.zto.fire.flink.sink.{HBaseSink, JdbcSink} +import com.zto.fire.flink.util.FlinkSingletonFactory +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire._ +import com.zto.fire.hbase.HBaseConnector +import org.apache.commons.lang3.StringUtils +import org.apache.flink.api.common.accumulators.SimpleAccumulator +import org.apache.flink.api.common.functions.RichMapFunction +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.configuration.Configuration +import org.apache.flink.streaming.api.datastream.DataStreamSink +import org.apache.flink.streaming.api.scala.function.AllWindowFunction +import org.apache.flink.streaming.api.scala.{DataStream, _} +import org.apache.flink.streaming.api.windowing.windows.GlobalWindow +import org.apache.flink.table.api.Table +import org.apache.flink.table.api.bridge.scala._ +import org.apache.flink.types.Row +import org.apache.flink.util.Collector + +import scala.collection.mutable.ListBuffer +import scala.reflect.ClassTag + +/** + * 用于对Flink DataStream的API库扩展 + * + * @author ChengLong 2020年1月7日 09:18:21 + * @since 0.4.1 + */ +class DataStreamExt[T](stream: DataStream[T]) { + lazy val tableEnv = FlinkSingletonFactory.getStreamTableEnv + + /** + * 将流注册为临时表 + * + * @param tableName + * 临时表的表名 + */ + def createOrReplaceTempView(tableName: String): Table = { + val table = this.stream.toTable(this.tableEnv) + this.tableEnv.createTemporaryView(tableName, table) + table + } + + /** + * 为当前DataStream设定uid与name + * + * @param uid + * uid + * @param name + * name + * @return + * 当前实例 + */ + def uname(uid: String, name: String = ""): DataStream[T] = { + if (StringUtils.isNotBlank(uid)) stream.uid(uid) + if (StringUtils.isNotBlank(name)) stream.name(name) + this.stream + } + + /** + * 预先注册flink累加器 + * + * @param acc + * 累加器实例 + * @param name + * 累加器名称 + * @return + * 注册累加器之后的流 + */ + def registerAcc(acc: SimpleAccumulator[_], name: String): DataStream[String] = { + this.stream.map(new RichMapFunction[T, String] { + override def open(parameters: Configuration): Unit = { + this.getRuntimeContext.addAccumulator(name, acc) + } + + override def map(value: T): String = value.toString + }) + } + + /** + * 将流映射为批流 + * + * @param count + * 将指定数量的合并为一个集合 + */ + def countWindowSimple[T: ClassTag](count: Long): DataStream[List[T]] = { + implicit val typeInfo = TypeInformation.of(classOf[List[T]]) + stream.asInstanceOf[DataStream[T]].countWindowAll(Math.abs(count)).apply(new AllWindowFunction[T, List[T], GlobalWindow]() { + override def apply(window: GlobalWindow, input: Iterable[T], out: Collector[List[T]]): Unit = { + out.collect(input.toList) + } + })(typeInfo) + } + + /** + * 设置并行度 + */ + def repartition(parallelism: Int): DataStream[T] = { + this.stream.setParallelism(parallelism) + } + + /** + * 将DataStream转为Table + */ + def toTable: Table = { + this.tableEnv.fromDataStream(this.stream) + } + + /** + * jdbc批量sink操作,根据用户指定的DataStream中字段的顺序,依次填充到sql中的占位符所对应的位置 + * 若DataStream为DataStream[Row]类型,则fields可以为空,但此时row中每列的顺序要与sql占位符顺序一致,数量和类型也要一致 + * 注: + * 1. fieldList指定DataStream中JavaBean的字段名称,非jdbc表中的字段名称 + * 2. fieldList多个字段使用逗号分隔 + * 3. fieldList中的字段顺序要与sql中占位符顺序保持一致,数量一致 + * + * @param sql + * 增删改sql + * @param fields + * DataStream中数据的每一列的列名(非数据库中的列名,需与sql中占位符的顺序一致) + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def jdbcBatchUpdate(sql: String, + fields: Seq[String], + batch: Int = 10, + flushInterval: Long = 1000, + keyNum: Int = 1): DataStreamSink[T] = { + this.stream.addSink(new JdbcSink[T](sql, batch = batch, flushInterval = flushInterval, keyNum = keyNum) { + var fieldMap: java.util.Map[String, Field] = _ + var clazz: Class[_] = _ + + override def map(value: T): Seq[Any] = { + requireNonEmpty(sql)("sql语句不能为空") + + val params = ListBuffer[Any]() + if (value.isInstanceOf[Row] || value.isInstanceOf[Tuple2[Boolean, Row]]) { + // 如果是Row类型的DataStream[Row] + val row = if (value.isInstanceOf[Row]) value.asInstanceOf[Row] else value.asInstanceOf[Tuple2[Boolean, Row]]._2 + for (i <- 0 until row.getArity) { + params += row.getField(i) + } + } else { + requireNonEmpty(fields)("字段列表不能为空!需按照sql中的占位符顺序依次指定当前DataStream中数据字段的名称") + + if (clazz == null && value != null) { + clazz = value.getClass + fieldMap = ReflectionUtils.getAllFields(clazz) + } + + fields.foreach(fieldName => { + val field = this.fieldMap.get(StringUtils.trim(fieldName)) + requireNonEmpty(field)(s"当前DataStream中不存在该列名$fieldName,请检查!") + params += field.get(value) + }) + } + params + } + }).name("fire jdbc stream sink") + } + + /** + * jdbc批量sink操作 + * + * @param sql + * 增删改sql + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + * @param fun + * 将dstream中的数据映射为该sink组件所能处理的数据 + */ + def jdbcBatchUpdate2(sql: String, + batch: Int = 10, + flushInterval: Long = 1000, + keyNum: Int = 1)(fun: T => Seq[Any]): DataStreamSink[T] = { + this.stream.addSink(new JdbcSink[T](sql, batch = batch, flushInterval = flushInterval, keyNum = keyNum) { + override def map(value: T): Seq[Any] = { + fun(value) + } + }).name("fire jdbc stream sink") + } + + /** + * hbase批量sink操作,DataStream[T]中的T必须是HBaseBaseBean的子类 + * + * @param tableName + * hbase表名 + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def hbasePutDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, + batch: Int = 100, + flushInterval: Long = 3000, + keyNum: Int = 1): DataStreamSink[_] = { + this.hbasePutDS2[E](tableName, batch, flushInterval, keyNum) { + value => { + value.asInstanceOf[E] + } + } + } + + /** + * hbase批量sink操作,DataStream[T]中的T必须是HBaseBaseBean的子类 + * + * @param tableName + * hbase表名 + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + * @param fun + * 将dstream中的数据映射为该sink组件所能处理的数据 + */ + def hbasePutDS2[E <: HBaseBaseBean[E] : ClassTag](tableName: String, + batch: Int = 100, + flushInterval: Long = 3000, + keyNum: Int = 1)(fun: T => E): DataStreamSink[_] = { + HBaseConnector.checkClass[E]() + this.stream.addSink(new HBaseSink[T, E](tableName, batch, flushInterval, keyNum) { + /** + * 将数据构建成sink的格式 + */ + override def map(value: T): E = fun(value) + }).name("fire hbase stream sink") + } + +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/KeyedStreamExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/KeyedStreamExt.scala new file mode 100644 index 0000000..aa75f2a --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/KeyedStreamExt.scala @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.stream + +import org.apache.flink.streaming.api.TimeCharacteristic +import org.apache.flink.streaming.api.scala.{KeyedStream, WindowedStream} +import org.apache.flink.streaming.api.windowing.assigners._ +import org.apache.flink.streaming.api.windowing.time.Time +import org.apache.flink.streaming.api.windowing.windows.Window + +/** + * 用于对Flink KeyedStream的API库扩展 + * + * @author ChengLong + * @since 2.0.0 + * @create 2021-01-15 10:20 + */ +class KeyedStreamExt[T, K](keyedStream: KeyedStream[T, K]) { + + /** + * 创建滑动窗口 + * + * @param size + * 窗口的大小 + * @param slide + * 窗口滑动间隔 + * @param offset + * 时区 + * @param timeCharacteristic 时间类别 + */ + def slidingTimeWindow[W <: Window](size: Time, slide: Time, offset: Time = Time.milliseconds(0), timeCharacteristic: TimeCharacteristic = TimeCharacteristic.ProcessingTime): WindowedStream[T, K, W] = { + if (timeCharacteristic == TimeCharacteristic.EventTime) { + keyedStream.window(SlidingEventTimeWindows.of(size, slide, offset).asInstanceOf[WindowAssigner[T, W]]) + } else { + keyedStream.window(SlidingProcessingTimeWindows.of(size, slide, offset).asInstanceOf[WindowAssigner[T, W]]) + } + } + + /** + * 创建滚动窗口窗口 + * + * @param size + * 窗口的大小 + * @param offset + * 时区 + * @param timeCharacteristic 时间类别 + */ + def tumblingTimeWindow[W <: Window](size: Time, offset: Time = Time.milliseconds(0), timeCharacteristic: TimeCharacteristic = TimeCharacteristic.ProcessingTime): WindowedStream[T, K, W] = { + if (timeCharacteristic == TimeCharacteristic.EventTime) { + keyedStream.window(TumblingEventTimeWindows.of(size, offset).asInstanceOf[WindowAssigner[T, W]]) + } else { + keyedStream.window(TumblingProcessingTimeWindows.of(size, offset).asInstanceOf[WindowAssigner[T, W]]) + } + } + + /** + * 创建session会话窗口 + * + * @param size + * 超时时间 + * @param timeCharacteristic 时间类别 + */ + def sessionTimeWindow[W <: Window](size: Time, timeCharacteristic: TimeCharacteristic = TimeCharacteristic.ProcessingTime): WindowedStream[T, K, W] = { + if (timeCharacteristic == TimeCharacteristic.EventTime) { + keyedStream.window(EventTimeSessionWindows.withGap(size).asInstanceOf[WindowAssigner[T, W]]) + } else { + keyedStream.window(ProcessingTimeSessionWindows.withGap(size).asInstanceOf[WindowAssigner[T, W]]) + } + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/RowExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/RowExt.scala new file mode 100644 index 0000000..8af28c1 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/RowExt.scala @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.stream + +import com.zto.fire.flink.bean.FlinkTableSchema +import com.zto.fire.flink.util.FlinkUtils +import org.apache.flink.types.Row + +/** + * 用于flink Row API库扩展 + * + * @author ChengLong 2020年3月30日 17:00:05 + * @since 0.4.1 + */ +class RowExt(row: Row) { + + /** + * 将flink的row转为指定类型的JavaBean + * @param schema + * 表的schema + * @param clazz + * 目标JavaBean类型 + */ + def rowToBean[T](schema: FlinkTableSchema, clazz: Class[T]): T = { + FlinkUtils.rowToBean(schema, row, clazz) + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/SQLExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/SQLExt.scala new file mode 100644 index 0000000..95169ce --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/SQLExt.scala @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.stream + +import com.zto.fire.common.util.PropUtils +import com.zto.fire.flink.conf.FireFlinkConf +import com.zto.fire.{noEmpty, requireNonEmpty} +import org.slf4j.LoggerFactory + +/** + * Flink SQL扩展类 + * + * @author ChengLong 2021-4-23 10:36:49 + * @since 2.0.0 + */ +class SQLExt(sql: String) { + // 用于匹配Flink SQL中的with表达式 + private[this] lazy val withPattern = """(with|WITH)\s*\(([\s\S]*)(\)|;)$""".r + // 用于匹配Flink SQL中的create语句 + private[this] lazy val createTablePattern = """^\s*(create|CREATE)\s+(table|TABLE)""".r + + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 将给定的不包含with表达式的Flink SQL添加with表达式 + * + * @param keyNum + * with表达式在配置文件中声明的keyNum,小于零时,则表示不拼接with表达式 + * @return + * 组装了with表达式的Flink SQL文本 + */ + def with$(keyNum: Int = 1): String = { + requireNonEmpty(sql, "sql语句不能为空!") + + if (keyNum < 1) return sql + val withMatcher = withPattern.findFirstIn(sql) + + // 如果SQL中已有with表达式,并且未开启with替换功能,则直接返回传入的sql + if (withMatcher.isDefined && !FireFlinkConf.sqlWithReplaceModeEnable) { + logger.warn(s"sql中已经包含with表达式,请移除后再使用动态with替换功能,或将[${FireFlinkConf.FLINK_SQL_WITH_REPLACE_MODE_ENABLE}]置为true进行强制覆盖,当前with表达式:\n${withMatcher.get}") + if (FireFlinkConf.sqlLogEnable) logger.info(s"完整SQL语句:$sql") + return sql + } + + // 仅匹配create table语句,进行with表达式处理 + val createTableMatcher = this.createTablePattern.findFirstIn(sql) + if (createTableMatcher.isEmpty) return sql + + // 从配置文件中获取指定keyNum的with参数 + val withMap = PropUtils.sliceKeysByNum(FireFlinkConf.FLINK_SQL_WITH_PREFIX, keyNum) + if (withMap.isEmpty) throw new IllegalArgumentException(s"配置文件中未找到以${FireFlinkConf.FLINK_SQL_WITH_PREFIX}开头以${keyNum}结尾的配置信息!") + + // 如果开启with表达式强制替换功能,则将sql中with表达式移除 + val fixSql = if (withMatcher.isDefined && FireFlinkConf.sqlWithReplaceModeEnable) withPattern.replaceAllIn(sql, "") else sql + val finalSQL = buildWith(fixSql, withMap) + if (FireFlinkConf.sqlLogEnable) logger.info(s"完整SQL语句:$finalSQL") + finalSQL + } + + /** + * 根据给定的配置列表构建Flink SQL with表达式 + * + * @param map + * Flink SQL with配置列表 + * @return + * with sql表达式 + */ + private[fire] def buildWith(sql: String, map: Map[String, String]): String = { + val withSql = new StringBuilder() + withSql.append(sql).append("WITH (\n") + map.filter(conf => noEmpty(conf, conf._1, conf._2)).foreach(conf => { + withSql + .append(s"""\t'${conf._1}'=""") + .append(s"'${conf._2}'") + .append(",\n") + }) + withSql.substring(0, withSql.length - 2) + "\n)" + } +} \ No newline at end of file diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/StreamExecutionEnvExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/StreamExecutionEnvExt.scala new file mode 100644 index 0000000..e307747 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/StreamExecutionEnvExt.scala @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.stream + +import java.util.Properties +import com.zto.fire._ +import com.zto.fire.common.conf.{FireKafkaConf, FireRocketMQConf} +import com.zto.fire.common.util.{KafkaUtils, ValueUtils} +import com.zto.fire.core.Api +import com.zto.fire.flink.ext.provider.{HBaseConnectorProvider, JdbcFlinkProvider} +import com.zto.fire.flink.util.{FlinkSingletonFactory, RocketMQUtils} +import com.zto.fire.jdbc.JdbcConnectorBridge +import org.apache.commons.lang3.StringUtils +import org.apache.flink.api.common.JobExecutionResult +import org.apache.flink.api.common.functions.RuntimeContext +import org.apache.flink.api.common.serialization.SimpleStringSchema +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.tuple.Tuple2 +import org.apache.flink.api.scala._ +import org.apache.flink.streaming.api.datastream.DataStreamSource +import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment} +import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer +import org.apache.flink.streaming.connectors.kafka.internals.KafkaTopicPartition +import org.apache.flink.table.api.{Table, TableResult} +import org.apache.rocketmq.flink.{RocketMQConfig, RocketMQSource} +import org.apache.rocketmq.flink.common.serialization.{KeyValueDeserializationSchema, SimpleTupleDeserializationSchema} +import org.apache.rocketmq.flink.serialization.SimpleTagKeyValueDeserializationSchema + +import scala.collection.JavaConversions + +/** + * 用于对Flink StreamExecutionEnvironment的API库扩展 + * + * @author ChengLong 2020年1月7日 09:18:21 + * @since 0.4.1 + */ +class StreamExecutionEnvExt(env: StreamExecutionEnvironment) extends Api with JdbcConnectorBridge + with HBaseConnectorProvider with JdbcFlinkProvider { + private[fire] lazy val tableEnv = FlinkSingletonFactory.getStreamTableEnv + + /** + * 创建Socket流 + */ + def createSocketTextStream(hostname: String, port: Int, delimiter: Char = '\n', maxRetry: Long = 0): DataStream[String] = { + this.env.socketTextStream(hostname, port, delimiter, maxRetry) + } + + /** + * 根据配置信息创建Kafka Consumer + * + * @param kafkaParams + * kafka相关的配置参数 + * @return + * FlinkKafkaConsumer011 + */ + def createKafkaConsumer(kafkaParams: Map[String, Object] = null, + topics: Set[String] = null, + keyNum: Int = 1): FlinkKafkaConsumer[String] = { + val confTopics = FireKafkaConf.kafkaTopics(keyNum) + val topicList = if (StringUtils.isNotBlank(confTopics)) confTopics.split(",") else topics.toArray + require(topicList != null && topicList.nonEmpty, s"kafka topic不能为空,请在配置文件中指定:flink.kafka.topics$keyNum") + + val confKafkaParams = KafkaUtils.kafkaParams(kafkaParams, FlinkSingletonFactory.getAppName, keyNum = keyNum) + // 配置文件中相同的key优先级高于代码中的 + require(confKafkaParams.nonEmpty, "kafka相关配置不能为空!") + val properties = new Properties() + confKafkaParams.foreach(t => properties.setProperty(t._1, t._2.toString)) + + val kafkaConsumer = new FlinkKafkaConsumer[String](JavaConversions.seqAsJavaList(topicList.map(topic => StringUtils.trim(topic))), + new SimpleStringSchema(), properties) + kafkaConsumer + } + + /** + * 创建DStream流 + * + * @param kafkaParams + * kafka相关的配置参数 + * @return + * DStream + */ + def createDirectStream(kafkaParams: Map[String, Object] = null, + topics: Set[String] = null, + specificStartupOffsets: Map[KafkaTopicPartition, java.lang.Long] = null, + runtimeContext: RuntimeContext = null, + keyNum: Int = 1): DataStream[String] = { + + val kafkaConsumer = this.createKafkaConsumer(kafkaParams, topics, keyNum) + + if (runtimeContext != null) kafkaConsumer.setRuntimeContext(runtimeContext) + if (specificStartupOffsets != null) kafkaConsumer.setStartFromSpecificOffsets(specificStartupOffsets) + // 设置从指定时间戳位置开始消费kafka + val startFromTimeStamp = FireKafkaConf.kafkaStartFromTimeStamp(keyNum) + if (startFromTimeStamp > 0) kafkaConsumer.setStartFromTimestamp(FireKafkaConf.kafkaStartFromTimeStamp(keyNum)) + // 是否在checkpoint时记录offset值 + kafkaConsumer.setCommitOffsetsOnCheckpoints(FireKafkaConf.kafkaCommitOnCheckpoint(keyNum)) + // 设置从最早的位置开始消费 + if (FireKafkaConf.offsetSmallest.equalsIgnoreCase(FireKafkaConf.kafkaStartingOffset(keyNum))) kafkaConsumer.setStartFromEarliest() + // 设置从最新位置开始消费 + if (FireKafkaConf.offsetLargest.equalsIgnoreCase(FireKafkaConf.kafkaStartingOffset(keyNum))) kafkaConsumer.setStartFromLatest() + // 从topic中指定的group上次消费的位置开始消费,必须配置group.id参数 + if (FireKafkaConf.kafkaStartFromGroupOffsets(keyNum)) kafkaConsumer.setStartFromGroupOffsets() + + this.env.addSource(kafkaConsumer) + } + + /** + * 创建DStream流 + * + * @param kafkaParams + * kafka相关的配置参数 + * @return + * DStream + */ + def createKafkaDirectStream(kafkaParams: Map[String, Object] = null, + topics: Set[String] = null, + specificStartupOffsets: Map[KafkaTopicPartition, java.lang.Long] = null, + runtimeContext: RuntimeContext = null, + keyNum: Int = 1): DataStream[String] = { + this.createDirectStream(kafkaParams, topics, specificStartupOffsets, runtimeContext, keyNum) + } + + /** + * 构建RocketMQ拉取消息的DStream流,获取消息中的tag、key以及value + * + * @param rocketParam + * rocketMQ相关消费参数 + * @param groupId + * groupId + * @param topics + * topic列表 + * @return + * rocketMQ DStream + */ + def createRocketMqPullStreamWithTag(rocketParam: Map[String, String] = null, + groupId: String = null, + topics: String = null, + tag: String = null, + keyNum: Int = 1): DataStream[(String, String, String)] = { + // 获取topic信息,配置文件优先级高于代码中指定的 + val confTopics = FireRocketMQConf.rocketTopics(keyNum) + val finalTopics = if (StringUtils.isNotBlank(confTopics)) confTopics else topics + require(StringUtils.isNotBlank(finalTopics), s"RocketMQ的Topics不能为空,请在配置文件中指定:rocket.topics$keyNum") + + // groupId信息 + val confGroupId = FireRocketMQConf.rocketGroupId(keyNum) + val finalGroupId = if (StringUtils.isNotBlank(confGroupId)) confGroupId else groupId + require(StringUtils.isNotBlank(finalGroupId), s"RocketMQ的groupId不能为空,请在配置文件中指定:rocket.group.id$keyNum") + + // 详细的RocketMQ配置信息 + val finalRocketParam = RocketMQUtils.rocketParams(rocketParam, finalTopics, finalGroupId, rocketNameServer = null, tag = tag, keyNum) + require(!finalRocketParam.isEmpty, "RocketMQ相关配置不能为空!") + require(finalRocketParam.containsKey(RocketMQConfig.NAME_SERVER_ADDR), s"RocketMQ nameserver.address不能为空,请在配置文件中指定:rocket.brokers.name$keyNum") + // require(finalRocketParam.containsKey(RocketMQConfig.CONSUMER_TAG), s"RocketMQ tag不能为空,请在配置文件中指定:rocket.consumer.tag$keyNum") + + val props = new Properties() + props.putAll(finalRocketParam) + + this.env.addSource(new RocketMQSource[(String, String, String)](new SimpleTagKeyValueDeserializationSchema, props)).name("RocketMQ Source") + } + + /** + * 构建RocketMQ拉取消息的DStream流,仅获取消息体中的key和value + * + * @param rocketParam + * rocketMQ相关消费参数 + * @param groupId + * groupId + * @param topics + * topic列表 + * @return + * rocketMQ DStream + */ + def createRocketMqPullStreamWithKey(rocketParam: Map[String, String] = null, + groupId: String = null, + topics: String = null, + tag: String = null, + keyNum: Int = 1): DataStream[(String, String)] = { + this.createRocketMqPullStreamWithTag(rocketParam, groupId, topics, tag, keyNum).map(t => (t._2, t._3)) + } + + /** + * 构建RocketMQ拉取消息的DStream流,仅获取消息体中的value + * + * @param rocketParam + * rocketMQ相关消费参数 + * @param groupId + * groupId + * @param topics + * topic列表 + * @return + * rocketMQ DStream + */ + def createRocketMqPullStream(rocketParam: Map[String, String] = null, + groupId: String = null, + topics: String = null, + tag: String = null, + keyNum: Int = 1): DataStream[String] = { + this.createRocketMqPullStreamWithTag(rocketParam, groupId, topics, tag, keyNum).map(t => t._3) + } + + /** + * 执行sql query操作 + * + * @param sql + * sql语句 + * @param keyNum + * 指定sql的with列表对应的配置文件中key的值,如果为<0则表示不从配置文件中读取with表达式 + * @return + * table对象 + */ + def sqlQuery(sql: String, keyNum: Int = 0): Table = { + require(StringUtils.isNotBlank(sql), "待执行的sql语句不能为空") + this.tableEnv.sqlQuery(sql.with$(keyNum)) + } + + /** + * 执行sql语句 + * 支持DDL、DML + * @param keyNum + * 指定sql的with列表对应的配置文件中key的值,如果为<0则表示不从配置文件中读取with表达式 + */ + def sql(sql: String, keyNum: Int = 0): TableResult = { + require(StringUtils.isNotBlank(sql), "待执行的sql语句不能为空") + this.tableEnv.executeSql(sql.with$(keyNum)) + } + + /** + * 使用集合元素创建DataStream + * + * @param seq + * 元素集合 + * @tparam T + * 元素的类型 + */ + def parallelize[T: TypeInformation](seq: Seq[T]): DataStream[T] = { + this.env.fromCollection[T](seq) + } + + /** + * 使用集合元素创建DataStream + * + * @param seq + * 元素集合 + * @tparam T + * 元素的类型 + */ + def createCollectionStream[T: TypeInformation](seq: Seq[T]): DataStream[T] = this.env.fromCollection[T](seq) + + /** + * 提交job执行 + * + * @param jobName + * job名称 + */ + def startAwaitTermination(jobName: String = ""): JobExecutionResult = { + if (ValueUtils.isEmpty(jobName)) this.env.execute() else this.env.execute(jobName) + } + + /** + * 提交Flink Streaming Graph并执行 + */ + def start(jobName: String): JobExecutionResult = this.startAwaitTermination(jobName) + + /** + * 流的启动 + */ + override def start: JobExecutionResult = this.env.execute(FlinkSingletonFactory.getAppName) +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/StreamTableEnvExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/StreamTableEnvExt.scala new file mode 100644 index 0000000..87477f6 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/StreamTableEnvExt.scala @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.stream + +import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment +import org.apache.flink.table.functions.ScalarFunction + +/** + * 用于对Flink StreamTableEnvironment的API库扩展 + * + * @author ChengLong 2020年1月7日 09:18:21 + * @since 0.4.1 + */ +class StreamTableEnvExt(tableEnv: StreamTableEnvironment) { + + /** + * 注册自定义udf函数 + * + * @param name + * 函数名 + * @param function + * 函数的实例 + */ + def udf(name: String, function: ScalarFunction): Unit = { + this.tableEnv.registerFunction(name, function) + } + +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/TableExt.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/TableExt.scala new file mode 100644 index 0000000..ce904f4 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/ext/stream/TableExt.scala @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.ext.stream + +import com.zto.fire.flink.bean.FlinkTableSchema +import com.zto.fire.flink.sink.HBaseSink +import com.zto.fire.flink.util.FlinkSingletonFactory +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.hbase.bean.HBaseBaseBean +import org.apache.flink.api.scala._ +import org.apache.flink.streaming.api.datastream.DataStreamSink +import org.apache.flink.streaming.api.scala.DataStream +import org.apache.flink.table.api.Table +import org.apache.flink.table.api.bridge.scala._ +import org.apache.flink.types.Row + +import scala.collection.mutable.ListBuffer +import scala.reflect.ClassTag + + +/** + * 用于flink StreamTable API库扩展 + * + * @author ChengLong 2020年1月9日 13:52:16 + * @since 0.4.1 + */ +class TableExt(table: Table) { + lazy val streamTableEnv = FlinkSingletonFactory.getStreamTableEnv + lazy val batchTableEnv = FlinkSingletonFactory.getBatchTableEnv + + /** + * 逐条打印每行记录 + */ + def show(): Unit = { + this.table.addSink(row => println(row)) + } + + /** + * 获取表的schema包装类,用于flinkRowToBean + * + * @return + * fire包装后的表schema信息 + */ + def getTableSchema: FlinkTableSchema = { + new FlinkTableSchema(table.getSchema) + } + + /** + * 将Table转为追加流 + */ + def toAppendStream[T]: DataStream[Row] = { + this.streamTableEnv.toAppendStream[Row](this.table) + } + + /** + * 将Table转为Retract流 + */ + def toRetractStream[T]: DataStream[(Boolean, Row)] = { + this.streamTableEnv.toRetractStream[Row](this.table) + } + + /** + * 将Table转为DataSet + */ + def toDataSet[T]: DataSet[Row] = { + require(this.batchTableEnv != null) + this.batchTableEnv.toDataSet[Row](this.table) + } + + /** + * 将流注册为临时表 + * + * @param tableName + * 临时表的表名 + */ + def createOrReplaceTempView(tableName: String): Table = { + if (this.streamTableEnv != null) { + this.streamTableEnv.createTemporaryView(tableName, table) + } else if (this.batchTableEnv != null) { + this.batchTableEnv.createTemporaryView(tableName, table) + } else { + throw new NullPointerException("table environment对象实例为空,请检查") + } + table + } + + /** + * 将table映射为Retract流,仅保留新增数据和变更数据,忽略变更前为false的数据 + */ + def toRetractStreamSingle: DataStream[Row] = { + this.table.toRetractStream[Row].filter(t => t._1).map(t => t._2) + } + + /** + * table的jdbc批量sink操作,根据用户指定的Row中字段的顺序,依次填充到sql中的占位符所对应的位置 + * 注: + * 1. Row中的字段顺序要与sql中占位符顺序保持一致,数量一致 + * 2. 目前仅处理Retract中的true消息,用户需手动传入merge语句 + * + * @param sql + * 增删改sql + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def jdbcBatchUpdate(sql: String, + batch: Int = 10, + flushInterval: Long = 1000, + isMerge: Boolean = true, + keyNum: Int = 1): DataStreamSink[Row] = { + + this.jdbcBatchUpdate2(sql, batch, flushInterval, isMerge, keyNum) { + row => { + val param = ListBuffer[Any]() + for (i <- 0 until row.getArity) { + param += row.getField(i) + } + param + } + } + } + + /** + * table的jdbc批量sink操作,该api需用户定义row的取数规则,并与sql中的占位符对等 + * + * @param sql + * 增删改sql + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def jdbcBatchUpdate2(sql: String, + batch: Int = 10, + flushInterval: Long = 1000, + isMerge: Boolean = true, + keyNum: Int = 1)(fun: Row => Seq[Any]): DataStreamSink[Row] = { + import com.zto.fire._ + if (!isMerge) throw new IllegalArgumentException("该jdbc sink api暂不支持非merge语义,delete操作需单独实现") + this.table.toRetractStreamSingle.jdbcBatchUpdate2(sql, batch, flushInterval, keyNum) { + row => fun(row) + }.name("fire jdbc sink") + } + + /** + * table的hbase批量sink操作,该api需用户定义row的取数规则,并映射到对应的HBaseBaseBean的子类中 + * + * @param tableName + * HBase表名 + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def hbasePutTable[T <: HBaseBaseBean[T]: ClassTag](tableName: String, + batch: Int = 100, + flushInterval: Long = 3000, + keyNum: Int = 1): DataStreamSink[_] = { + import com.zto.fire._ + this.table.hbasePutTable2[T](tableName, batch, flushInterval, keyNum) { + val schema = table.getTableSchema + row => { + // 将row转为clazz对应的JavaBean + val hbaseBean = row.rowToBean(schema, getParamType[T]) + if (!hbaseBean.isInstanceOf[HBaseBaseBean[T]]) throw new IllegalArgumentException("clazz参数必须是HBaseBaseBean的子类") + hbaseBean + } + } + } + + /** + * table的hbase批量sink操作,该api需用户定义row的取数规则,并映射到对应的HBaseBaseBean的子类中 + * + * @param tableName + * HBase表名 + * @param batch + * 每次sink最大的记录数 + * @param flushInterval + * 多久flush一次(毫秒) + * @param keyNum + * 配置文件中的key后缀 + */ + def hbasePutTable2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, + batch: Int = 100, + flushInterval: Long = 3000, + keyNum: Int = 1)(fun: Row => T): DataStreamSink[_] = { + import com.zto.fire._ + HBaseConnector.checkClass[T]() + this.table.toRetractStreamSingle.addSink(new HBaseSink[Row, T](tableName, batch, flushInterval, keyNum) { + override def map(value: Row): T = fun(value) + }).name("fire hbase sink") + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/rest/FlinkSystemRestful.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/rest/FlinkSystemRestful.scala new file mode 100644 index 0000000..bc4f6d5 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/rest/FlinkSystemRestful.scala @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.rest + +import com.zto.fire.common.anno.Rest +import com.zto.fire.common.bean.rest.ResultMsg +import com.zto.fire.common.enu.{ErrorCode, RequestMethod} +import com.zto.fire.common.util.{ExceptionBus, _} +import com.zto.fire.core.rest.{RestCase, RestServerManager, SystemRestful} +import com.zto.fire.flink.BaseFlink +import org.apache.commons.lang3.StringUtils +import spark._ + +/** + * 系统预定义的restful服务,为Flink计算引擎提供接口服务 + * + * @author ChengLong 2020年4月2日 13:50:01 + */ +private[fire] class FlinkSystemRestful(var baseFlink: BaseFlink, val restfulRegister: RestServerManager) extends SystemRestful(baseFlink) { + + /** + * 注册Flink引擎restful接口 + */ + override protected def register: Unit = { + this.restfulRegister + .addRest(RestCase(RequestMethod.GET.toString, s"/system/flink/kill", kill)) + .addRest(RestCase(RequestMethod.GET.toString, s"/system/flink/datasource", datasource)) + } + + /** + * 设置baseFlink实例 + */ + private[fire] def setBaseFlink(baseFlink: BaseFlink): Unit = this.baseFlink = baseFlink + + /** + * kill 当前 Flink 任务 + */ + @Rest("/system/flink/kill") + def kill(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + // 参数校验与参数获取 + this.baseFlink.shutdown() + this.logger.info(s"[kill] kill任务成功:json=$json") + msg.buildSuccess("任务停止成功", ErrorCode.SUCCESS.toString) + } catch { + case e: Exception => { + this.logger.error(s"[kill] 执行kill任务失败:json=$json", e) + msg.buildError("执行kill任务失败", ErrorCode.ERROR) + } + } + } + + + /** + * 用于执行sql语句 + */ + @Rest(value = "/system/sql", method = "post") + def sql(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + // 参数校验与参数获取 + val sql = JSONUtils.getValue(json, "sql", "") + + // sql合法性检查 + if (StringUtils.isBlank(sql) || !sql.toLowerCase.trim.startsWith("select ")) { + this.logger.warn(s"[sql] sql不合法,在线调试功能只支持查询操作:json=$json") + return msg.buildError(s"sql不合法,在线调试功能只支持查询操作", ErrorCode.ERROR) + } + + if (this.baseFlink == null) { + this.logger.warn(s"[sql] 系统正在初始化,请稍后再试:json=$json") + return "系统正在初始化,请稍后再试" + } + + "" + } catch { + case e: Exception => { + this.logger.error(s"[sql] 执行用户sql失败:json=$json", e) + msg.buildError("执行用户sql失败,异常堆栈:" + ExceptionBus.stackTrace(e), ErrorCode.ERROR) + } + } + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/sql/connector/rocketmq/RocketMQDynamicTableFactory.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/sql/connector/rocketmq/RocketMQDynamicTableFactory.scala new file mode 100644 index 0000000..6477039 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/sql/connector/rocketmq/RocketMQDynamicTableFactory.scala @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.sql.connector.rocketmq + +import com.zto.fire._ +import com.zto.fire.common.conf.FireRocketMQConf +import com.zto.fire.flink.sql.connector.rocketmq.RocketMQOptions._ +import org.apache.flink.api.common.serialization.DeserializationSchema +import org.apache.flink.configuration.ConfigOption +import org.apache.flink.table.connector.format.DecodingFormat +import org.apache.flink.table.connector.source.DynamicTableSource +import org.apache.flink.table.data.RowData +import org.apache.flink.table.factories.{DeserializationFormatFactory, DynamicTableFactory, DynamicTableSourceFactory, FactoryUtil} + +import java.util +import java.util.Properties + +/** + * sql connector的source与sink创建工厂 + * + * @author ChengLong 2021-5-7 15:48:03 + */ +class RocketMQDynamicTableFactory extends DynamicTableSourceFactory { + val IDENTIFIER = "fire-rocketmq" + + override def factoryIdentifier(): String = this.IDENTIFIER + + private def getValueDecodingFormat(helper: FactoryUtil.TableFactoryHelper): DecodingFormat[DeserializationSchema[RowData]] = { + helper.discoverDecodingFormat(classOf[DeserializationFormatFactory], FactoryUtil.FORMAT) + } + + private def getKeyDecodingFormat(helper: FactoryUtil.TableFactoryHelper): DecodingFormat[DeserializationSchema[RowData]] = { + helper.discoverDecodingFormat(classOf[DeserializationFormatFactory], FactoryUtil.FORMAT) + } + + /** + * 必填参数列表 + */ + override def requiredOptions(): JSet[ConfigOption[_]] = { + val set = new JHashSet[ConfigOption[_]] + set.add(TOPIC) + set.add(PROPS_BOOTSTRAP_SERVERS) + set.add(PROPS_GROUP_ID) + set + } + + /** + * 可选的参数列表 + */ + override def optionalOptions(): JSet[ConfigOption[_]] = { + val optionalOptions = new JHashSet[ConfigOption[_]] + optionalOptions + } + + + /** + * 创建rocketmq table source + */ + override def createDynamicTableSource(context: DynamicTableFactory.Context): DynamicTableSource = { + val helper = FactoryUtil.createTableFactoryHelper(this, context) + + val tableOptions = helper.getOptions + val keyDecodingFormat = this.getKeyDecodingFormat(helper) + val valueDecodingFormat = this.getValueDecodingFormat(helper) + val withOptions = context.getCatalogTable.getOptions + val physicalDataType = context.getCatalogTable.getSchema.toPhysicalRowDataType + val keyProjection = createKeyFormatProjection(tableOptions, physicalDataType) + val valueProjection = createValueFormatProjection(tableOptions, physicalDataType) + val keyPrefix = tableOptions.getOptional(KEY_FIELDS_PREFIX).orElse(null) + + + new RocketMQDynamicTableSource(physicalDataType, + keyDecodingFormat, + valueDecodingFormat, + keyProjection, + valueProjection, + keyPrefix, + withOptions) + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/sql/connector/rocketmq/RocketMQDynamicTableSource.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/sql/connector/rocketmq/RocketMQDynamicTableSource.scala new file mode 100644 index 0000000..1b6c96f --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/sql/connector/rocketmq/RocketMQDynamicTableSource.scala @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.sql.connector.rocketmq + +import com.zto.fire.common.conf.FireRocketMQConf +import com.zto.fire.flink.sql.connector.rocketmq.RocketMQOptions.{TOPIC, getRocketMQProperties} +import com.zto.fire.noEmpty +import com.zto.fire.predef._ +import org.apache.flink.api.common.serialization.DeserializationSchema +import org.apache.flink.table.connector.ChangelogMode +import org.apache.flink.table.connector.format.DecodingFormat +import org.apache.flink.table.connector.source.{DynamicTableSource, ScanTableSource, SourceFunctionProvider} +import org.apache.flink.table.data.RowData +import org.apache.flink.table.types.DataType +import org.apache.flink.table.types.utils.DataTypeUtils +import org.apache.rocketmq.flink.serialization.JsonDeserializationSchema +import org.apache.rocketmq.flink.{RocketMQConfig, RocketMQSource} + +import java.util.Properties + +/** + * 定义source table + * + * @author ChengLong 2021-5-7 15:48:03 + */ +class RocketMQDynamicTableSource(physicalDataType: DataType, + keyDecodingFormat: DecodingFormat[DeserializationSchema[RowData]], + valueDecodingFormat: DecodingFormat[DeserializationSchema[RowData]], + keyProjection: Array[Int], + valueProjection: Array[Int], + keyPrefix: String, + tableOptions: JMap[String, String]) extends ScanTableSource { + + override def getChangelogMode: ChangelogMode = ChangelogMode.insertOnly() + + override def copy(): DynamicTableSource = new RocketMQDynamicTableSource(physicalDataType, keyDecodingFormat, valueDecodingFormat, keyProjection, valueProjection, keyPrefix, tableOptions) + + override def asSummaryString(): String = "fire-rocketmq" + + /** + * 创建反序列化器 + */ + def createDeserialization(context: DynamicTableSource.Context, format: DecodingFormat[DeserializationSchema[RowData]], projection: Array[Int], prefix: String): DeserializationSchema[RowData] = { + if (format == null) return null + + var physicalFormatDataType = DataTypeUtils.projectRow(this.physicalDataType, projection) + if (noEmpty(prefix)) { + physicalFormatDataType = DataTypeUtils.stripRowPrefix(physicalFormatDataType, prefix) + } + format.createRuntimeDecoder(context, physicalFormatDataType) + } + + /** + * 消费rocketmq中的数据,并反序列化为RowData对象实例 + */ + override def getScanRuntimeProvider(context: ScanTableSource.ScanContext): ScanTableSource.ScanRuntimeProvider = { + // 获取以rocket.conf.为前缀的配置 + val properties = getRocketMQProperties(this.tableOptions) + + // 获取rocket.brokers.name对应的nameserver地址 + val brokerName = tableOptions.get(FireRocketMQConf.ROCKET_BROKERS_NAME) + val nameserver = FireRocketMQConf.rocketClusterMap.getOrElse(brokerName, brokerName) + if (noEmpty(nameserver)) properties.setProperty(RocketMQConfig.NAME_SERVER_ADDR, nameserver) + assert(noEmpty(properties.getProperty(RocketMQConfig.NAME_SERVER_ADDR)), s"""nameserver不能为空,请在with中使用 '${FireRocketMQConf.ROCKET_BROKERS_NAME}'='ip:port' 指定""") + + // 获取topic信息 + val topic = tableOptions.get(FireRocketMQConf.ROCKET_TOPICS) + if (noEmpty(topic)) properties.setProperty(RocketMQConfig.CONSUMER_TOPIC, topic) + assert(noEmpty(properties.getProperty(RocketMQConfig.CONSUMER_TOPIC)), s"""topic不能为空,请在with中使用 '${FireRocketMQConf.ROCKET_TOPICS}'='topicName' 指定""") + + // 获取groupId信息 + val groupId = tableOptions.get(FireRocketMQConf.ROCKET_GROUP_ID) + if (noEmpty(groupId)) properties.setProperty(RocketMQConfig.CONSUMER_GROUP, groupId) + assert(noEmpty(properties.getProperty(RocketMQConfig.CONSUMER_GROUP)), s"""group.id不能为空,请在with中使用 '${FireRocketMQConf.ROCKET_GROUP_ID}'='groupId' 指定""") + + // 获取tag信息 + val tag = tableOptions.get(FireRocketMQConf.ROCKET_CONSUMER_TAG) + if (noEmpty(tag)) properties.setProperty(RocketMQConfig.CONSUMER_TAG, tag) else properties.setProperty(RocketMQConfig.CONSUMER_TAG, "*") + + val keyDeserialization = createDeserialization(context, keyDecodingFormat, keyProjection, keyPrefix) + val valueDeserialization = createDeserialization(context, valueDecodingFormat, valueProjection, null) + + SourceFunctionProvider.of(new RocketMQSource(new JsonDeserializationSchema(keyDeserialization, valueDeserialization), properties), false) + } + +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/sql/connector/rocketmq/RocketMQOptions.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/sql/connector/rocketmq/RocketMQOptions.scala new file mode 100644 index 0000000..fd05d13 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/sql/connector/rocketmq/RocketMQOptions.scala @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.sql.connector.rocketmq + +import com.zto.fire._ +import com.zto.fire.flink.sql.connector.rocketmq.RocketMQOptions.ValueFieldsStrategy.ValueFieldsStrategy +import org.apache.flink.configuration.{ConfigOption, ConfigOptions, ReadableConfig} +import org.apache.flink.table.api.{TableException, ValidationException} +import org.apache.flink.table.types.DataType +import org.apache.flink.table.types.logical.utils.LogicalTypeChecks + +import java.util +import java.util.Properties +import java.util.stream.IntStream + +/** + * RocketMQ connector支持的with参数 + * + * @author ChengLong 2021-5-7 15:48:03 + */ +object RocketMQOptions { + val PROPERTIES_PREFIX = "rocket.conf." + + val TOPIC: ConfigOption[String] = ConfigOptions + .key("topic") + .stringType + .noDefaultValue + .withDescription("Topic names from which the table is read. Either 'topic' or 'topic-pattern' must be set for source. Option 'topic' is required for sink.") + + val PROPS_BOOTSTRAP_SERVERS: ConfigOption[String] = ConfigOptions + .key("properties.bootstrap.servers") + .stringType + .noDefaultValue + .withDescription("Required RocketMQ server connection string") + + val PROPS_GROUP_ID: ConfigOption[String] = ConfigOptions + .key("properties.group.id") + .stringType.noDefaultValue + .withDescription("Required consumer group in RocketMQ consumer, no need for v producer.") + + val KEY_FIELDS_PREFIX: ConfigOption[String] = + ConfigOptions.key("key.fields-prefix") + .stringType() + .noDefaultValue() + .withDescription( + s""" + |Defines a custom prefix for all fields of the key format to avoid name clashes with fields of the value format. + |By default, the prefix is empty. If a custom prefix is defined, both the table schema and '${ValueFieldsStrategy.ALL}' + |will work with prefixed names. When constructing the data type of the key format, the prefix will be removed and the + |non-prefixed names will be used within the key format. Please note that this option requires that must be '${ValueFieldsStrategy.EXCEPT_KEY}'. + |""".stripMargin) + + val KEY_FIELDS: ConfigOption[JList[String]] = + ConfigOptions.key("key.fields") + .stringType() + .asList() + .defaultValues() + .withDescription( + """ + |Defines an explicit list of physical columns from the table schema that configure the data type for the key format. + |By default, this list is empty and thus a key is undefined. + |""".stripMargin) + + val VALUE_FIELDS_INCLUDE: ConfigOption[ValueFieldsStrategy] = + ConfigOptions.key("value.fields-include") + .defaultValue(ValueFieldsStrategy.ALL) + .withDescription( + """ + |Defines a strategy how to deal with key columns in the data type of + |the value format. By default, 'ValueFieldsStrategy.ALL' physical + |columns of the table schema will be included in the value format which + |means that key columns appear in the data type for both the key and value format. + |""".stripMargin) + + val FORMAT_SUFFIX = ".format" + + val KEY_FORMAT: ConfigOption[String] = + ConfigOptions.key("key" + FORMAT_SUFFIX) + .stringType() + .noDefaultValue() + .withDescription("Defines the format identifier for encoding key data. The identifier is used to discover a suitable format factory.") + + val VALUE_FORMAT: ConfigOption[String] = + ConfigOptions.key("value" + FORMAT_SUFFIX) + .stringType() + .noDefaultValue() + .withDescription("Defines the format identifier for encoding value data. The identifier is used to discover a suitable format factory.") + + object ValueFieldsStrategy extends Enumeration { + type ValueFieldsStrategy = Value + val ALL, EXCEPT_KEY = Value + } + + def createKeyFormatProjection(options: ReadableConfig, physicalDataType: DataType): Array[Int] = { + val physicalType = physicalDataType.getLogicalType + + val optionalKeyFormat = options.getOptional(RocketMQOptions.KEY_FORMAT) + val optionalKeyFields = options.getOptional(RocketMQOptions.KEY_FIELDS) + + if (!optionalKeyFormat.isPresent && optionalKeyFields.isPresent) { + throw new ValidationException(s"The option '${RocketMQOptions.KEY_FIELDS.key}' can only be declared if a key format is defined using '${RocketMQOptions.KEY_FORMAT.key}'.") + } else if (optionalKeyFormat.isPresent && (!optionalKeyFields.isPresent || optionalKeyFields.get.size == 0)) { + throw new ValidationException(s"A key format '${RocketMQOptions.KEY_FORMAT.key}' requires the declaration of one or more of key fields using '${RocketMQOptions.KEY_FIELDS.key}'.") + } + + if (!optionalKeyFormat.isPresent) return new Array[Int](0) + val keyPrefix = options.getOptional(RocketMQOptions.KEY_FIELDS_PREFIX).orElse("") + val keyFields = optionalKeyFields.get + val physicalFields = LogicalTypeChecks.getFieldNames(physicalType) + keyFields.stream.mapToInt((keyField: String) => { + def foo(keyField: String): Int = { + val pos = physicalFields.indexOf(keyField) + // check that field name exists + if (pos < 0) throw new ValidationException(s"Could not find the field '${keyField}' in the table schema for usage in the key format. A key field must be a regular, physical column. The following columns can be selected in the '${RocketMQOptions.KEY_FIELDS.key}' option:\n${physicalFields}") + // check that field name is prefixed correctly + if (!keyField.startsWith(keyPrefix)) throw new ValidationException(s"All fields in '${RocketMQOptions.KEY_FIELDS.key}' must be prefixed with '${keyPrefix}' when option '${RocketMQOptions.KEY_FIELDS_PREFIX.key}' is set but field '${keyField}' is not prefixed.") + pos + } + + foo(keyField) + }).toArray + } + + def createValueFormatProjection(options: ReadableConfig, physicalDataType: DataType): Array[Int] = { + val physicalType = physicalDataType.getLogicalType + + val physicalFieldCount = LogicalTypeChecks.getFieldCount(physicalType) + val physicalFields = IntStream.range(0, physicalFieldCount) + + val keyPrefix = options.getOptional(KEY_FIELDS_PREFIX).orElse("") + + val strategy = options.get(VALUE_FIELDS_INCLUDE); + if (strategy == ValueFieldsStrategy.ALL) { + if (keyPrefix.nonEmpty) { + throw new ValidationException(s"A key prefix is not allowed when option '${VALUE_FIELDS_INCLUDE.key()}' is set to '${ValueFieldsStrategy.ALL}'. Set it to '${ValueFieldsStrategy.EXCEPT_KEY}' instead to avoid field overlaps.") + } + return physicalFields.toArray + } else if (strategy == ValueFieldsStrategy.EXCEPT_KEY) { + val keyProjection = createKeyFormatProjection(options, physicalDataType); + return physicalFields + .filter(pos => IntStream.of(keyProjection: _*).noneMatch(k => k == pos)) + .toArray + } + throw new TableException(s"Unknown value fields strategy:$strategy"); + } + + /** + * 是否存在以properties.开头的参数 + */ + private def hasRocketMQClientProperties(tableOptions: util.Map[String, String]) = tableOptions + .keySet + .stream + .anyMatch((k: String) => k.startsWith(PROPERTIES_PREFIX)) + + /** + * 获取以rocket.conf.开头的所有的参数 + */ + def getRocketMQProperties(tableOptions: util.Map[String, String]): Properties = { + val rocketMQProperties = new Properties + if (hasRocketMQClientProperties(tableOptions)) tableOptions.keySet.stream.filter((key: String) => key.startsWith(PROPERTIES_PREFIX)).forEach((key: String) => { + def foo(key: String): Unit = { + val value = tableOptions.get(key) + val subKey = key.substring(PROPERTIES_PREFIX.length) + rocketMQProperties.put(subKey, value) + } + + foo(key) + }) + rocketMQProperties + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/task/FlinkInternalTask.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/task/FlinkInternalTask.scala new file mode 100644 index 0000000..7dac71e --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/task/FlinkInternalTask.scala @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.task + +import com.zto.fire.core.task.FireInternalTask +import com.zto.fire.flink.BaseFlink + +/** + * 定时任务调度器,用于定时执行Flink框架内部指定的任务 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-07-14 11:04 + */ +private[fire] class FlinkInternalTask(baseFlink: BaseFlink) extends FireInternalTask(baseFlink) { + +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/util/FlinkSingletonFactory.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/util/FlinkSingletonFactory.scala new file mode 100644 index 0000000..7d12a29 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/util/FlinkSingletonFactory.scala @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.util + +import com.zto.fire.core.util.SingletonFactory +import org.apache.flink.api.scala.ExecutionEnvironment +import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment +import org.apache.flink.table.api.bridge.scala.{BatchTableEnvironment, StreamTableEnvironment} + +/** + * 单例工厂,用于创建单例的对象 + * Created by ChengLong on 2020年1月6日 16:50:56. + */ +object FlinkSingletonFactory extends SingletonFactory { + @transient private[this] var streamEnv: StreamExecutionEnvironment = _ + @transient private[this] var streamTableEnv: StreamTableEnvironment = _ + @transient private[this] var env: ExecutionEnvironment = _ + @transient private[this] var tableEnv: BatchTableEnvironment = _ + + /** + * 设置TableEnv实例 + */ + private[fire] def setStreamEnv(env: StreamExecutionEnvironment): this.type = { + if (env != null && this.streamEnv == null) this.streamEnv = env + this + } + + /** + * 设置TableEnv实例 + */ + private[fire] def setStreamTableEnv(tableEnv: StreamTableEnvironment): this.type = { + if (tableEnv != null && this.streamTableEnv == null) this.streamTableEnv = tableEnv + this + } + + /** + * 设置ExecutionEnvironment实例 + */ + private[fire] def setEnv(env: ExecutionEnvironment): this.type = { + if (env != null && this.env == null) this.env = env + this + } + + + /** + * 设置TableEnv实例 + */ + private[fire] def setTableEnv(tableEnv: BatchTableEnvironment): this.type = { + if (tableEnv != null && this.tableEnv == null) this.tableEnv = tableEnv + this + } + + /** + * 获取appName + * + * @return + * TableEnv实例 + */ + private[fire] def getAppName: String = this.appName + + /** + * 获取StreamTableEnv实例 + * + * @return + * TableEnv实例 + */ + private[fire] def getStreamTableEnv: StreamTableEnvironment = { + require(this.streamTableEnv != null, "StreamTableEnvironment仍未被实例化,请稍后再试") + this.streamTableEnv + } + + /** + * 获取TableEnv实例 + * + * @return + * TableEnv实例 + */ + private[fire] def getBatchTableEnv: BatchTableEnvironment = this.tableEnv +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/util/FlinkUtils.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/util/FlinkUtils.scala new file mode 100644 index 0000000..e8b9549 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/util/FlinkUtils.scala @@ -0,0 +1,313 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.util + +import com.google.common.collect.HashBasedTable +import com.zto.fire.common.anno.FieldName +import com.zto.fire.common.util.{PropUtils, ReflectionUtils, ValueUtils} +import com.zto.fire.flink.bean.FlinkTableSchema +import com.zto.fire.flink.conf.FireFlinkConf +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.predef._ +import org.apache.commons.lang3.StringUtils +import org.apache.flink.api.common.ExecutionConfig.ClosureCleanerLevel +import org.apache.flink.api.common.{ExecutionConfig, ExecutionMode, InputDependencyConstraint} +import org.apache.flink.table.data.binary.BinaryStringData +import org.apache.flink.table.data.{DecimalData, GenericRowData, RowData} +import org.apache.flink.table.types.logical.RowType +import org.apache.flink.types.Row +import org.slf4j.LoggerFactory + +import java.net.{URL, URLClassLoader} + +/** + * flink相关工具类 + * + * @author ChengLong 2020年1月16日 16:28:23 + * @since 0.4.1 + */ +object FlinkUtils extends Serializable { + // 维护schema、fieldName与fieldIndex关系 + private[this] val schemaTable = HashBasedTable.create[FlinkTableSchema, String, Int] + private lazy val logger = LoggerFactory.getLogger(this.getClass) + private var jobManager: Option[Boolean] = None + private var mode: Option[String] = None + + /** + * 将schema、fieldName与fieldIndex信息维护到table中 + */ + private[this] def extendSchemaTable(schema: FlinkTableSchema): Unit = { + if (schema != null && !schemaTable.containsRow(schema)) { + for (i <- 0 until schema.getFieldCount) { + schemaTable.put(schema, schema.getFieldName(i).get(), i) + } + } + } + + /** + * 将Row转为自定义bean,以JavaBean中的Field为基准 + * bean中的field名称要与DataFrame中的field名称保持一致 + * + * @return + */ + def rowToBean[T](schema: FlinkTableSchema, row: Row, clazz: Class[T]): T = { + requireNonEmpty(schema, row, clazz) + val obj = clazz.newInstance() + tryWithLog { + this.extendSchemaTable(schema) + clazz.getDeclaredFields.foreach(field => { + ReflectionUtils.setAccessible(field) + val anno = field.getAnnotation(classOf[FieldName]) + val begin = if (anno == null) true else !anno.disuse() + if (begin) { + val fieldName = if (anno != null && ValueUtils.noEmpty(anno.value())) anno.value().trim else field.getName + if (this.schemaTable.contains(schema, fieldName)) { + val fieldIndex = this.schemaTable.get(schema, fieldName) + field.set(obj, row.getField(fieldIndex)) + } + } + }) + if (obj.isInstanceOf[HBaseBaseBean[T]]) { + val method = ReflectionUtils.getMethodByName(clazz, "buildRowKey") + if (method != null) method.invoke(obj) + } + }(this.logger, catchLog = "flink row转为JavaBean过程中发生异常.") + obj + } + + /** + * 解析并设置配置文件中的配置信息 + */ + def parseConf(config: ExecutionConfig): ExecutionConfig = { + requireNonEmpty(config)("Flink配置实例不能为空") + + // flink.auto.generate.uid.enable=true 默认为:true + if (FireFlinkConf.autoGenerateUidEnable) { + config.enableAutoGeneratedUIDs() + } else { + config.disableAutoGeneratedUIDs() + } + + // flink.auto.type.registration.enable=true 默认为:true + if (!FireFlinkConf.autoTypeRegistrationEnable) { + config.disableAutoTypeRegistration() + } + + // flink.force.avro.enable=true 默认值为:false + if (FireFlinkConf.forceAvroEnable) { + config.enableForceAvro() + } else { + config.disableForceAvro() + } + + // flink.force.kryo.enable=true 默认值为:false + if (FireFlinkConf.forceKryoEnable) { + config.enableForceKryo() + } else { + config.disableForceKryo() + } + + // flink.generic.types.enable=true 默认值为:false + if (FireFlinkConf.genericTypesEnable) { + config.enableGenericTypes() + } else { + config.disableGenericTypes() + } + + // flink.object.reuse.enable=true 默认值为:false + if (FireFlinkConf.objectReuseEnable) { + config.enableObjectReuse() + } else { + config.disableObjectReuse() + } + + // flink.auto.watermark.interval=0 默认值为:0 + if (FireFlinkConf.autoWatermarkInterval != -1) config.setAutoWatermarkInterval(FireFlinkConf.autoWatermarkInterval) + + // flink.closure.cleaner.level=recursive 默认值为:RECURSIVE,包括:RECURSIVE、NONE、TOP_LEVEL + if (StringUtils.isNotBlank(FireFlinkConf.closureCleanerLevel)) config.setClosureCleanerLevel(ClosureCleanerLevel.valueOf(FireFlinkConf.closureCleanerLevel.toUpperCase)) + + // flink.default.input.dependency.constraint=any 默认值:ANY,包括:ANY、ALL + if (StringUtils.isNotBlank(FireFlinkConf.defaultInputDependencyConstraint)) config.setDefaultInputDependencyConstraint(InputDependencyConstraint.valueOf(FireFlinkConf.defaultInputDependencyConstraint.toUpperCase)) + + // flink.execution.mode=pipelined 默认值:PIPELINED,包括:PIPELINED、PIPELINED_FORCED、BATCH、BATCH_FORCED + if (StringUtils.isNotBlank(FireFlinkConf.executionMode)) config.setExecutionMode(ExecutionMode.valueOf(FireFlinkConf.executionMode.toUpperCase)) + + // flink.latency.tracking.interval=0 默认值:0 + if (FireFlinkConf.latencyTrackingInterval != -1) config.setLatencyTrackingInterval(FireFlinkConf.latencyTrackingInterval) + + // flink.max.parallelism=1 没有默认值 + if (FireFlinkConf.maxParallelism != -1) config.setMaxParallelism(FireFlinkConf.maxParallelism) + + // flink.task.cancellation.interval=1 无默认值 + if (FireFlinkConf.taskCancellationInterval != -1) config.setTaskCancellationInterval(FireFlinkConf.taskCancellationInterval) + + // flink.task.cancellation.timeout.millis=1000 无默认值 + if (FireFlinkConf.taskCancellationTimeoutMillis != -1) config.setTaskCancellationTimeout(FireFlinkConf.taskCancellationTimeoutMillis) + + // flink.use.snapshot.compression=false 默认值:false + config.setUseSnapshotCompression(FireFlinkConf.useSnapshotCompression) + + config + } + + /** + * 加载指定路径下的udf jar包 + */ + def loadUdfJar: Unit = { + val udfJarUrl = PropUtils.getString(FireFlinkConf.FLINK_SQL_CONF_UDF_JARS, "") + if (StringUtils.isBlank(udfJarUrl)) { + logger.warn(udfJarUrl, s"flink udf jar包路径不能为空,请在配置文件中通过:${FireFlinkConf.FLINK_SQL_CONF_UDF_JARS}=/path/to/udf.jar 指定") + return + } + + val method = classOf[URLClassLoader].getDeclaredMethod("addURL", classOf[URL]) + method.setAccessible(true) + val classLoader = ClassLoader.getSystemClassLoader.asInstanceOf[URLClassLoader] + method.invoke(classLoader, new URL(udfJarUrl)) + } + + /** + * 判断当前环境是否为JobManager + */ + def isJobManager: Boolean = { + if (this.jobManager.isEmpty) { + val envClass = Class.forName("org.apache.flink.runtime.util.EnvironmentInformation") + if (ReflectionUtils.containsMethod(envClass, "isJobManager")) { + val method = envClass.getMethod("isJobManager") + jobManager = Some((method.invoke(null) + "").toBoolean) + } else { + logger.error("未找到方法:EnvironmentInformation.isJobManager()") + } + } + jobManager.getOrElse(true) + } + + /** + * 判断当前环境是否为TaskManager + */ + def isTaskManager: Boolean = !this.isJobManager + + /** + * 获取flink的运行模式 + */ + def runMode: String = { + if (this.mode.isEmpty) { + val globalConfClass = Class.forName("org.apache.flink.configuration.GlobalConfiguration") + if (ReflectionUtils.containsMethod(globalConfClass, "getRunMode")) { + val method = globalConfClass.getMethod("getRunMode") + this.mode = Some(method.invoke(null) + "") + } else { + logger.error("未找到方法:GlobalConfiguration.getRunMode()") + } + } + this.mode.getOrElse("yarn-per-job") + } + + /** + * 判断当前运行模式是否为yarn-application模式 + */ + def isYarnApplicationMode: Boolean = "yarn-application".equalsIgnoreCase(this.runMode) + + /** + * 判断当前运行模式是否为yarn-per-job模式 + */ + def isYarnPerJobMode: Boolean = "yarn-per-job".equalsIgnoreCase(this.runMode) + + /** + * 将Javabean中匹配的field值转为RowData + * + * @param bean + * 任意符合JavaBean规范的实体对象 + * @return + * RowData实例 + */ + def bean2RowData(bean: Object, rowType: RowType): RowData = { + requireNonEmpty(bean, rowType) + + val genericRowData = new GenericRowData(rowType.getFieldCount) + val fieldNames = rowType.getFieldNames + val clazz = bean.getClass + + // 以建表语句中声明的字段列表为标准进行循环 + for (pos <- 0 until rowType.getFieldCount) { + // 根据临时表的字段名称获取JavaBean中对应的同名的field的值 + val field = ReflectionUtils.getFieldByName(clazz, fieldNames.get(pos)) + requireNonEmpty(field, s"JavaBean中未找到名为${fieldNames.get(pos)}的field,请检查sql建表语句或JavaBean的声明!") + + val value = field.get(bean).toString + // 进行类型匹配,将获取到的JavaBean中的字段值映射为SQL建表语句中所指定的类型,并设置到对应的field中 + rowType.getTypeAt(pos).toString match { + case "INT" | "TINYINT" | "SMALLINT" | "INTEGER" => genericRowData.setField(pos, value.toInt) + case "BIGINT" => genericRowData.setField(pos, value.toLong) + case "DOUBLE" => genericRowData.setField(pos, value.toDouble) + case "FLOAT" => genericRowData.setField(pos, value.toFloat) + case "BOOLEAN" => genericRowData.setField(pos, value.toBoolean) + case "BYTE" => genericRowData.setField(pos, value.toByte) + case "SHORT" => genericRowData.setField(pos, value.toShort) + case fieldType if fieldType.contains("DECIMAL") => { + // 获取SQL建表语句中的DECIMAL字段的精度 + val accuracy = rowType.getTypeAt(pos).toString.replace("DECIMAL(", "").replace(")", "").split(",") + genericRowData.setField(pos, DecimalData.fromBigDecimal(new JBigDecimal(value), accuracy(0).trim.toInt, accuracy(1).trim.toInt)) + } + case _ => genericRowData.setField(pos, new BinaryStringData(value)) + } + } + genericRowData + } + + /** + * 将RowData中匹配的field值转为Javabean + * + * @param clazz + * 任意符合JavaBean规范的Class类型 + * @return + * JavaBean实例 + */ + def rowData2Bean[T](clazz: Class[T], rowType: RowType, rowData: RowData): T = { + requireNonEmpty(clazz, rowData) + val bean = clazz.newInstance() + + val fieldNames = rowType.getFieldNames + + // 以建表语句中声明的字段列表为标准进行循环 + for (pos <- 0 until rowType.getFieldCount) { + // 根据临时表的字段名称获取JavaBean中对应的同名的field的值 + val field = ReflectionUtils.getFieldByName(clazz, fieldNames.get(pos)) + requireNonEmpty(field, s"JavaBean中未找到名为${fieldNames.get(pos)}的field,请检查sql建表语句或JavaBean的声明!") + + // 进行类型匹配,将获取到的JavaBean中的字段值映射为SQL建表语句中所指定的类型,并设置到对应的field中 + rowType.getTypeAt(pos).toString match { + case "INT" | "TINYINT" | "SMALLINT" | "INTEGER" => field.setInt(bean, rowData.getInt(pos)) + case "BIGINT" => field.setLong(bean, rowData.getLong(pos)) + case "DOUBLE" => field.setDouble(bean, rowData.getDouble(pos)) + case "FLOAT" => field.setFloat(bean, rowData.getFloat(pos)) + case "BOOLEAN" => field.setBoolean(bean, rowData.getBoolean(pos)) + case "BYTE" => field.setByte(bean, rowData.getByte(pos)) + case "SHORT" => field.setShort(bean, rowData.getShort(pos)) + case fieldType if fieldType.contains("DECIMAL") => { + // 获取SQL建表语句中的DECIMAL字段的精度 + val accuracy = rowType.getTypeAt(pos).toString.replace("DECIMAL(", "").replace(")", "").split(",") + field.set(bean, rowData.getDecimal(pos, accuracy(0).trim.toInt, accuracy(1).trim.toInt)) + } + case _ => field.set(bean, rowData.getString(pos).toString) + } + } + bean + } +} diff --git a/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/util/RocketMQUtils.scala b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/util/RocketMQUtils.scala new file mode 100644 index 0000000..c410379 --- /dev/null +++ b/fire-engines/fire-flink/src/main/scala/com/zto/fire/flink/util/RocketMQUtils.scala @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.flink.util + +import com.zto.fire._ +import com.zto.fire.common.conf.FireRocketMQConf +import com.zto.fire.common.util.{LogUtils, StringsUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.rocketmq.flink.RocketMQConfig +import org.slf4j.LoggerFactory + +/** + * RocketMQ相关工具类 + * + * @author ChengLong + * @since 2.0.0 + * @create 2021-5-6 14:04:53 + */ +object RocketMQUtils { + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * rocketMQ配置信息 + * + * @param groupId + * 消费组 + * @return + * rocketMQ相关配置 + */ + def rocketParams(rocketParam: JMap[String, String] = null, + topics: String = null, + groupId: String = null, + rocketNameServer: String = null, + tag: String = null, + keyNum: Int = 1): JMap[String, String] = { + + val optionParams = if (rocketParam != null) rocketParam else new JHashMap[String, String]() + if (StringUtils.isNotBlank(topics)) optionParams.put(RocketMQConfig.CONSUMER_TOPIC, topics) + if (StringUtils.isNotBlank(groupId)) optionParams.put(RocketMQConfig.CONSUMER_GROUP, groupId) + + // rocket name server 配置 + val confNameServer = FireRocketMQConf.rocketNameServer(keyNum) + val finalNameServer = if (StringUtils.isNotBlank(confNameServer)) confNameServer else rocketNameServer + if (StringUtils.isNotBlank(finalNameServer)) optionParams.put(RocketMQConfig.NAME_SERVER_ADDR, finalNameServer) + + // tag配置 + val confTag = FireRocketMQConf.rocketConsumerTag(keyNum) + val finalTag = if (StringUtils.isNotBlank(confTag)) confTag else tag + if (StringUtils.isNotBlank(finalTag)) optionParams.put(RocketMQConfig.CONSUMER_TAG, finalTag) + + // 以rocket.conf.开头的配置优先级最高 + val confMap = FireRocketMQConf.rocketConfMap(keyNum) + if (confMap.nonEmpty) optionParams.putAll(confMap) + + // 日志记录RocketMQ的配置信息 + LogUtils.logMap(this.logger, optionParams.toMap, s"RocketMQ configuration. keyNum=$keyNum.") + + optionParams + } + +} diff --git a/fire-engines/fire-spark/pom.xml b/fire-engines/fire-spark/pom.xml new file mode 100644 index 0000000..17e199c --- /dev/null +++ b/fire-engines/fire-spark/pom.xml @@ -0,0 +1,309 @@ + + + + + 4.0.0 + fire-spark_${spark.reference} + jar + fire-spark + + + com.zto.fire + fire-engines_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + + + org.apache.spark + spark-core_${scala.binary.version} + + + com.esotericsoftware.kryo + kryo + + + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-sql_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-streaming_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-sql-kafka-0-10_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-streaming_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-streaming-kafka-0-10_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + + + org.apache.hadoop + hadoop-common + ${hadoop.version} + ${maven.scope} + + + org.apache.hadoop + hadoop-hdfs + ${hadoop.version} + ${maven.scope} + + + org.apache.hadoop + hadoop-client + ${hadoop.version} + ${maven.scope} + + + + + org.apache.hbase + hbase-common + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-server + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-client_${scala.binary.version} + ${hbase.version} + + + org.apache.hbase + hbase-spark${spark.major.version}_${scala.binary.version} + ${hbase.version} + + + org.apache.hbase + hbase-client + + + + + + + org.apache.kudu + kudu-spark${spark.major.version}_${scala.binary.version} + ${kudu.version} + ${maven.scope} + + + org.apache.kudu + kudu-client + ${kudu.version} + ${maven.scope} + + + + + org.apache.rocketmq + rocketmq-client + ${rocketmq.version} + ${maven.scope} + + + org.apache.rocketmq + rocketmq-spark${spark.major.version}_${scala.binary.version} + ${rocketmq.external.version} + ${maven.scope} + + + + + + + hadoop-2.7 + + org.spark-project.hive + 1.2.1.spark2 + + + true + + + + org.apache.spark + spark-hive_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.hive + hive-common + + + org.apache.hive + hive-exec + + + org.apache.hive + hive-metastore + + + org.apache.hive + hive-serde + + + org.apache.hive + hive-shims + + + + + org.apache.spark + spark-hive-thriftserver_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.hive + hive-cli + + + org.apache.hive + hive-jdbc + + + org.apache.hive + hive-beeline + + + + + ${hive.group} + hive-cli + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-jdbc + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-beeline + ${hive.version} + ${maven.scope} + + + + ${hive.group} + hive-common + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-metastore + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-exec + ${hive.version} + ${maven.scope} + + + org.apache.commons + commons-lang3 + + + org.apache.spark + spark-core_2.10 + + + + + + + hadoop-3.2 + + + org.apache.spark + spark-hive_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-engines/fire-spark/src/main/java/com/zto/fire/spark/bean/KuduBaseBean.java b/fire-engines/fire-spark/src/main/java/com/zto/fire/spark/bean/KuduBaseBean.java new file mode 100644 index 0000000..e88fa3f --- /dev/null +++ b/fire-engines/fire-spark/src/main/java/com/zto/fire/spark/bean/KuduBaseBean.java @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.bean; + +import java.io.Serializable; + +/** + * kudu实体bean + * Created by ChengLong on 2017-09-22. + */ +public class KuduBaseBean implements Serializable { + +} diff --git a/fire-engines/fire-spark/src/main/java/com/zto/fire/spark/bean/RestartParams.java b/fire-engines/fire-spark/src/main/java/com/zto/fire/spark/bean/RestartParams.java new file mode 100644 index 0000000..cf5fa7f --- /dev/null +++ b/fire-engines/fire-spark/src/main/java/com/zto/fire/spark/bean/RestartParams.java @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.bean; + +import java.util.Map; + +/** + * 重启streaming参数 + * {"batchDuration":10,"restartSparkContext":false,"stopGracefully": false,"sparkConf":{"spark.streaming.concurrentJobs":"2"}} + * @author ChengLong 2019-5-5 16:57:49 + */ +public class RestartParams { + // 批次时间(秒) + private long batchDuration; + // 是否重启SparkContext对象 + private boolean restartSparkContext; + // 是否等待数据全部处理完成再重启 + private boolean stopGracefully; + // 是否做checkPoint + private boolean isCheckPoint; + // 附加的conf信息 + private Map sparkConf; + + public long getBatchDuration() { + return batchDuration; + } + + public void setBatchDuration(long batchDuration) { + this.batchDuration = batchDuration; + } + + public boolean isRestartSparkContext() { + return restartSparkContext; + } + + public void setRestartSparkContext(boolean restartSparkContext) { + this.restartSparkContext = restartSparkContext; + } + + public Map getSparkConf() { + return sparkConf; + } + + public void setSparkConf(Map sparkConf) { + this.sparkConf = sparkConf; + } + + public RestartParams() { + } + + public boolean isStopGracefully() { + return stopGracefully; + } + + public void setStopGracefully(boolean stopGracefully) { + this.stopGracefully = stopGracefully; + } + + public boolean isCheckPoint() { + return isCheckPoint; + } + + public void setCheckPoint(boolean checkPoint) { + isCheckPoint = checkPoint; + } + + public RestartParams(long batchDuration, boolean restartSparkContext, boolean stopGracefully, boolean isCheckPoint, Map sparkConf) { + this.batchDuration = batchDuration; + this.restartSparkContext = restartSparkContext; + this.stopGracefully = stopGracefully; + this.isCheckPoint = isCheckPoint; + this.sparkConf = sparkConf; + } +} diff --git a/fire-engines/fire-spark/src/main/java/com/zto/fire/spark/task/SparkSchedulerManager.java b/fire-engines/fire-spark/src/main/java/com/zto/fire/spark/task/SparkSchedulerManager.java new file mode 100644 index 0000000..a06018e --- /dev/null +++ b/fire-engines/fire-spark/src/main/java/com/zto/fire/spark/task/SparkSchedulerManager.java @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.task; + +import com.zto.fire.core.task.SchedulerManager; +import org.apache.spark.SparkEnv; + +/** + * Spark 定时调度任务管理器 + * + * @author ChengLong + * @create 2020-12-18 17:00 + * @since 1.0.0 + */ +public class SparkSchedulerManager extends SchedulerManager { + // 单例对象 + private static SchedulerManager instance = null; + + static { + instance = new SparkSchedulerManager(); + } + + private SparkSchedulerManager() {} + + /** + * 获取单例实例 + */ + public static SchedulerManager getInstance() { + return instance; + } + + @Override + protected String label() { + SparkEnv sparkEnv = SparkEnv.get(); + if (sparkEnv == null || DRIVER.equalsIgnoreCase(sparkEnv.executorId())) { + return DRIVER; + } else { + return EXECUTOR; + } + } +} diff --git a/fire-engines/fire-spark/src/main/resources/spark-core.properties b/fire-engines/fire-spark/src/main/resources/spark-core.properties new file mode 100644 index 0000000..cb907a3 --- /dev/null +++ b/fire-engines/fire-spark/src/main/resources/spark-core.properties @@ -0,0 +1,29 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.fire.config_center.enable = false +# 是否关闭fire内置的所有累加器 +spark.fire.acc.enable = false +spark.fire.rest.enable = false +spark.ui.killEnabled = false +spark.port.maxRetries = 200 +spark.default.parallelism = 1000 +spark.sql.broadcastTimeout = 3000 +spark.ui.timeline.tasks.maximum = 300 +spark.sql.parquet.writeLegacyFormat = true +spark.scheduler.listenerbus.eventqueue.size = 130000 +spark.serializer = org.apache.spark.serializer.KryoSerializer \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/resources/spark-streaming.properties b/fire-engines/fire-spark/src/main/resources/spark-streaming.properties new file mode 100644 index 0000000..78518c5 --- /dev/null +++ b/fire-engines/fire-spark/src/main/resources/spark-streaming.properties @@ -0,0 +1,40 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# spark streaming的remember时间,-1表示不生效(ms) +spark.streaming.remember = -1 +spark.fire.hbase.scan.partitions = 20 +# spark streaming批次时间,可覆盖代码中所指定的时间 +# spark.streaming.batch.duration = +# 用于在消费多个topic时区分实例 +# spark.rocket.consumer.instance = driver + +# 以下是Spark引擎调优参数 +spark.port.maxRetries = 200 +spark.ui.retainedJobs = 500 +spark.ui.killEnabled = false +spark.ui.retailedStages = 300 +spark.default.parallelism = 300 +spark.sql.broadcastTimeout = 3000 +spark.streaming.concurrentJobs = 1 +spark.ui.timeline.tasks.maximum = 300 +# 任务通过提交脚本提交到yarn后主动退出提交脚本进程,降低提交节点资源占用(注:此配置需要放到spark-default或提交任务通过--conf指定才会生效) +spark.yarn.submit.waitAppCompletion = false +spark.sql.parquet.writeLegacyFormat = true +spark.streaming.backpressure.enabled = true +spark.streaming.stopGracefullyOnShutdown = true +spark.serializer = org.apache.spark.serializer.KryoSerializer \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/resources/spark.properties b/fire-engines/fire-spark/src/main/resources/spark.properties new file mode 100644 index 0000000..7fab779 --- /dev/null +++ b/fire-engines/fire-spark/src/main/resources/spark.properties @@ -0,0 +1,72 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# spark的应用名称,为空则取类名 +spark.appName = +# spark local模式下使用多少core运行,默认为local[*],自动根据当前pc的cpu核心数设置 +spark.local.cores = * +# spark checkpoint目录地址 +spark.chkpoint.dir = hdfs://appcluster/user/spark/ckpoint/ +# 默认的spark日志级别 +spark.log.level = WARN +spark.redaction.regex = (?i)secret|password|map|address|namenode|connection|metastore +spark.fire.scheduler.blacklist = jvmMonitor +# 指定在spark引擎下,可进行配置同步的子类实现 +spark.fire.conf.deploy.engine = com.zto.fire.spark.conf.SparkEngineConf + +# ----------------------------------------------- < kafka 配置 > ----------------------------------------------- # +# kafka的groupid,为空则取类名 +spark.kafka.group.id = +# bigdata表示连接大数据的kafka,zms表示连接zms的kafka集群 +# spark.kafka.brokers.name = bigdata +# topic列表 +spark.kafka.topics = +# 用于配置启动时的消费位点,默认取最新 +spark.kafka.starting.offsets = latest +# 数据丢失时执行失败 +spark.kafka.failOnDataLoss = true +# 是否启用自动commit +spark.kafka.enable.auto.commit = false +# 以spark.kafka.conf开头的配置支持所有kafka client的配置 +#spark.kafka.conf.session.timeout.ms = 300000 +#spark.kafka.conf.request.timeout.ms = 400000 + +# ----------------------------------------------- < hive 配置 > ------------------------------------------------ # +# hive 集群名称(batch离线hive/streaming 180集群hive/test本地测试hive),用于spark跨集群读取hive元数据信息 +spark.hive.cluster = +# 以spark.hive.conf.为前缀的配置将直接生效,比如开启hive动态分区 +# this.spark.sql("set hive.exec.dynamic.partition=true") +#spark.hive.conf.hive.exec.dynamic.partition = true +# spark.sqlContext.sql("set hive.exec.dynamic.partition.mode=nonstrict") +#spark.hive.conf.hive.exec.dynamic.partition.mode = nonstrict +#spark.hive.conf.hive.exec.max.dynamic.partitions = 5000 + +# ----------------------------------------------- < HBase 配置 > ----------------------------------------------- # +# 用于区分不同的hbase集群: batch/streaming/old/test +spark.hbase.cluster = + +# --------------------------------------------- < RocketMQ 配置 > ---------------------------------------------- # +spark.rocket.cluster.map.test = 192.168.1.169:9876;192.168.1.170:9876 +# 以spark.rocket.conf开头的配置支持所有rocket client的配置 +#spark.rocket.conf.pull.max.speed.per.partition = 5000 + +# ----------------------------------------------- < impala 配置 > ---------------------------------------------- # +spark.impala.connection.url = jdbc:hive2://192.168.25.37:21050/;auth=noSasl +spark.impala.jdbc.driver.class.name = org.apache.hive.jdbc.HiveDriver + +# ----------------------------------------------- < spark 参数 > ----------------------------------------------- # +# Spark相关优化参数列在下面会自动被fire加载生效 \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/resources/structured-streaming.properties b/fire-engines/fire-spark/src/main/resources/structured-streaming.properties new file mode 100644 index 0000000..6eeb918 --- /dev/null +++ b/fire-engines/fire-spark/src/main/resources/structured-streaming.properties @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.port.maxRetries = 200 +spark.ui.killEnabled = false +spark.default.parallelism = 1000 +spark.sql.broadcastTimeout = 3000 +spark.ui.timeline.tasks.maximum = 300 +spark.scheduler.listenerbus.eventqueue.size = 130000 +spark.serializer = org.apache.spark.serializer.KryoSerializer \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire.scala new file mode 100644 index 0000000..fa59853 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire.scala @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto + +import com.zto.fire.core.ext.BaseFireExt +import com.zto.fire.spark.ext.core._ +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Dataset, SQLContext, SparkSession} +import org.apache.spark.streaming.StreamingContext +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.{SparkConf, SparkContext} + +import scala.reflect.ClassTag + +/** + * 预定义fire框架中的扩展工具 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-12-22 13:41 + */ +package object fire extends BaseFireExt { + + /** + * SparkContext扩展 + * + * @param spark + * sparkSession对象 + */ + implicit class SparkSessionExtBridge(spark: SparkSession) extends SparkSessionExt(spark) { + + } + + /** + * SparkContext扩展 + * + * @param sc + * SparkContext对象 + */ + implicit class SparkContextExtBridge(sc: SparkContext) extends SparkContextExt(sc) { + + } + + + /** + * RDD相关的扩展 + * + * @param rdd + * rdd + */ + implicit class RDDExtBridge[T: ClassTag](rdd: RDD[T]) extends RDDExt[T](rdd) { + + } + + /** + * SparkConf扩展 + * + * @param sparkConf + * sparkConf对象 + */ + implicit class SparkConfExtBridge(sparkConf: SparkConf) extends SparkConfExt(sparkConf) { + + } + + /** + * SQLContext与HiveContext扩展 + * + * @param sqlContext + * sqlContext对象 + */ + implicit class SQLContextExtBridge(sqlContext: SQLContext) extends SQLContextExt(sqlContext) { + + } + + /** + * DataFrame扩展 + * + * @param dataFrame + * dataFrame实例 + */ + implicit class DataFrameExtBridge(dataFrame: DataFrame) extends DataFrameExt(dataFrame) { + + } + + /** + * Dataset扩展 + * + * @param dataset + * dataset对象 + */ + implicit class DatasetExtBridge[T: ClassTag](dataset: Dataset[T]) extends DatasetExt[T](dataset) { + + } + + /** + * StreamingContext扩展 + * + * @param ssc + * StreamingContext对象 + */ + implicit class StreamingContextExtBridge(ssc: StreamingContext) extends StreamingContextExt(ssc){ + + } + + + /** + * DStream扩展 + * + * @param stream + * stream对象 + */ + implicit class DStreamExtBridge[T: ClassTag](stream: DStream[T]) extends DStreamExt[T](stream) { + + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSpark.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSpark.scala new file mode 100644 index 0000000..81bc4e7 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSpark.scala @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark + +import com.zto.fire._ +import com.zto.fire.common.conf.{FireFrameworkConf, FireHDFSConf, FireHiveConf} +import com.zto.fire.common.util.{OSUtils, PropUtils} +import com.zto.fire.core.BaseFire +import com.zto.fire.core.rest.RestServerManager +import com.zto.fire.spark.acc.AccumulatorManager +import com.zto.fire.spark.conf.FireSparkConf +import com.zto.fire.spark.ext.module.KuduContextExt +import com.zto.fire.spark.rest.SparkSystemRestful +import com.zto.fire.spark.task.{SparkInternalTask, SparkSchedulerManager} +import com.zto.fire.spark.util.{SparkSingletonFactory, SparkUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.spark.scheduler.SparkListener +import org.apache.spark.sql.catalog.Catalog +import org.apache.spark.sql.{SQLContext, SparkSession} +import org.apache.spark.streaming.StreamingContext +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.internal.Logging + +/** + * Spark通用父类 + * Created by ChengLong on 2018-03-06. + */ +trait BaseSpark extends SparkListener with BaseFire with Logging with Serializable { + private[fire] var _conf: SparkConf = _ + protected[fire] var _spark: SparkSession = _ + protected lazy val spark, fire: SparkSession = _spark + protected[fire] var sc: SparkContext = _ + protected[fire] var catalog: Catalog = _ + protected[fire] var ssc: StreamingContext = _ + protected[fire] var hiveContext, sqlContext: SQLContext = _ + protected[fire] var kuduContext: KuduContextExt = _ + protected[fire] val acc = AccumulatorManager + protected[fire] var batchDuration: Long = _ + protected[fire] var listener: SparkListener = _ + + /** + * 生命周期方法:初始化fire框架必要的信息 + * 注:该方法会同时在driver端与executor端执行 + */ + override private[fire] final def boot: Unit = { + // 进Driver端进行引擎配置与用户配置的加载,executor端会通过fire进行分发,应避免在executor端加载引擎和用户配置文件 + if (SparkUtils.isDriver) { + this.loadConf + PropUtils.load(FireFrameworkConf.userCommonConf: _*).load(this.appName) + } + PropUtils.setProperty(FireFrameworkConf.DRIVER_CLASS_NAME, this.className) + if (StringUtils.isNotBlank(FireSparkConf.appName)) { + this.appName = FireSparkConf.appName + } + SparkSingletonFactory.setAppName(this.appName) + super.boot + this.logger.info("<-- 完成fire框架初始化 -->") + } + + /** + * 生命周期方法:用于关闭SparkContext + */ + override final def stop: Unit = { + if (this._spark != null && this.sc != null && !this.sc.isStopped) { + this._spark.stop() + } + } + + /** + * 生命周期方法:进行fire框架的资源回收 + * 注:不允许子类覆盖 + */ + override protected[fire] final def shutdown(stopGracefully: Boolean = true): Unit = { + try { + this.logger.info("<-- 完成用户资源回收 -->") + + if (stopGracefully) { + if (this.sqlContext != null) this.sqlContext.clearCache + if (this.ssc != null) { + this.ssc.stop(true, stopGracefully) + this.ssc = null + this.sc = null + } + if (this.sc != null && !this.sc.isStopped) { + this.sc.stop() + this.sc = null + } + } + + } finally { + super.shutdown(stopGracefully) + } + } + + /** + * 构建或合并SparkConf + * 注:不同的子类需根据需要复写该方法 + * + * @param conf + * 在conf基础上构建 + * @return + * 合并后的SparkConf对象 + */ + def buildConf(conf: SparkConf): SparkConf = { + if (conf == null) new SparkConf().setAppName(this.appName) else conf + } + + + /** + * 构建一系列context对象 + */ + override private[fire] final def createContext(conf: Any): Unit = { + this.restfulRegister = new RestServerManager().startRestPort() + this.systemRestful = new SparkSystemRestful(this) + // 注册到实时平台,并覆盖配置信息 + PropUtils.invokeConfigCenter(this.className) + PropUtils.show() + + // 构建SparkConf信息 + val tmpConf = if (conf == null) this.buildConf(null) else conf.asInstanceOf[SparkConf] + tmpConf.setAll(PropUtils.settings) + tmpConf.set("spark.driver.class.simple.name", this.driverClass) + + // 设置hive metastore地址 + val hiveMetastoreUrl = FireHiveConf.getMetastoreUrl + if (StringUtils.isBlank(hiveMetastoreUrl)) this.logger.warn("当前任务未指定hive连接信息,将不会连接hive metastore。如需使用hive,请通过spark.hive.cluster=xxx指定。") + if (StringUtils.isNotBlank(hiveMetastoreUrl)) tmpConf.set("hive.metastore.uris", hiveMetastoreUrl) + + // 构建SparkSession对象 + val sessionBuilder = SparkSession.builder().config(tmpConf) + if (StringUtils.isNotBlank(hiveMetastoreUrl)) sessionBuilder.enableHiveSupport() + // 在mac或windows环境下执行local模式,cpu数通过spark.local.cores指定,默认local[*] + if (OSUtils.isLocal) sessionBuilder.master(s"local[${FireSparkConf.localCores}]") + this._spark = sessionBuilder.getOrCreate() + SparkSingletonFactory.setSparkSession(this._spark) + this._spark.registerUDF() + this.sc = this._spark.sparkContext + // 关联所连接的hive集群,根据预制方案启用HDFS HA + FireHDFSConf.hdfsHAConf.foreach(t => this.sc.hadoopConfiguration.set(t._1, t._2)) + this.catalog = this._spark.catalog + this.sc.setLogLevel(FireSparkConf.logLevel) + this.listener = new BaseSparkListener(this) + this.sc.addSparkListener(listener) + // this.initLogging(this.className) + this.hiveContext = this._spark.sqlContext + this.sqlContext = this.hiveContext + this.kuduContext = SparkSingletonFactory.getKuduContextInstance(this.sc) + this.applicationId = SparkUtils.getApplicationId(this._spark) + this.webUI = SparkUtils.getWebUI(this._spark) + this._conf = tmpConf + this.deployConf + this.logger.info("<-- 完成Spark运行时信息初始化 -->") + SparkUtils.executeHiveConfSQL(this._spark) + } + + /** + * 用于fire框架初始化,传递累加器与配置信息到executor端 + */ + override protected def deployConf: Unit = { + if (!FireFrameworkConf.deployConf) return + // 向driver和executor注册定时任务 + val taskSchedule = new SparkInternalTask(this) + // driver端注册定时任务 + SparkSchedulerManager.getInstance().registerTasks(this, taskSchedule, this.listener) + // executor端与自定义累加器一同完成定时任务注册 + AccumulatorManager.registerTasks(this, taskSchedule) + // 向executor端注册自定义累加器 + if (FireFrameworkConf.accEnable) this.acc.registerAccumulators(this.sc) + } + + /** + * 用于注册定时任务实例 + * + * @param instances + * 标记有@Scheduled类的实例 + */ + def registerSchedule(instances: Object*): Unit = { + try { + // 向driver端注册定时任务 + SparkSchedulerManager.getInstance().registerTasks(instances: _*) + // 向executor端注册定时任务 + val executors = this._conf.get("spark.executor.instances").toInt + if (executors > 0 && this.sc != null) { + this.sc.parallelize(1 to executors, executors).foreachPartition(i => SparkSchedulerManager.getInstance().registerTasks(instances: _*)) + } + } catch { + case e: Throwable => this.logger.error("定时任务注册失败.", e) + } + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSparkCore.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSparkCore.scala new file mode 100644 index 0000000..c21d510 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSparkCore.scala @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark + +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.common.enu.JobType +import com.zto.fire.common.util.PropUtils + +/** + * 实时平台Spark通用父类 + * Created by ChengLong on 2018-03-28. + */ +class BaseSparkCore extends BaseSpark { + override val jobType = JobType.SPARK_CORE + + /** + * 程序初始化方法,用于初始化必要的值 + * + * @param conf + * Spark配置信息 + */ + override def init(conf: Any = null, args: Array[String] = null): Unit = { + super.init(conf, args) + this.process + } + + /** + * 在加载任务配置文件前将被加载 + */ + override private[fire] def loadConf: Unit = { + PropUtils.load(FireFrameworkConf.SPARK_CORE_CONF_FILE) + } + + /** + * Spark处理逻辑 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + override def process: Unit = { + // 子类复写该方法实现业务处理逻辑 + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSparkListener.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSparkListener.scala new file mode 100644 index 0000000..663444f --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSparkListener.scala @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark + +import java.util.concurrent.atomic.AtomicBoolean + +import com.zto.fire.common.anno.Scheduled +import com.zto.fire.common.enu.JobType +import com.zto.fire.spark.acc.AccumulatorManager +import org.apache.spark.internal.Logging +import org.apache.spark.scheduler._ +import org.slf4j.LoggerFactory + +/** + * Spark事件监听器桥 + * Created by ChengLong on 2018-05-19. + */ +class BaseSparkListener(baseSpark: BaseSpark) extends SparkListener with Logging { + private[this] val module = "listener" + private[this] val needRegister = new AtomicBoolean(false) + private[this] lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 当SparkContext启动时触发 + */ + override def onApplicationStart(applicationStart: SparkListenerApplicationStart): Unit = { + this.logger.info(s"Spark 初始化完成.") + this.baseSpark.onApplicationStart(applicationStart) + } + + /** + * 当Spark运行结束时执行 + */ + override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { + try { + this.baseSpark.after() + } finally { + this.baseSpark.shutdown() + } + super.onApplicationEnd(applicationEnd) + } + + /** + * 当executor metrics更新时触发 + */ + override def onExecutorMetricsUpdate(executorMetricsUpdate: SparkListenerExecutorMetricsUpdate): Unit = this.baseSpark.onExecutorMetricsUpdate(executorMetricsUpdate) + + /** + * 当添加新的executor时,重新初始化内置的累加器 + */ + override def onExecutorAdded(executorAdded: SparkListenerExecutorAdded): Unit = { + this.baseSpark.onExecutorAdded(executorAdded) + if (this.baseSpark.jobType != JobType.SPARK_CORE) this.needRegister.compareAndSet(false, true) + this.logger.debug(s"executor[${executorAdded.executorId}] added. host: [${executorAdded.executorInfo.executorHost}].", this.module) + } + + /** + * 当移除已有的executor时,executor数递减 + */ + override def onExecutorRemoved(executorRemoved: SparkListenerExecutorRemoved): Unit = { + this.baseSpark.onExecutorRemoved(executorRemoved) + this.logger.debug(s"executor[${executorRemoved.executorId}] removed. reason: [${executorRemoved.reason}].", this.module) + } + + /** + * 当环境信息更新时触发 + */ + override def onEnvironmentUpdate(environmentUpdate: SparkListenerEnvironmentUpdate): Unit = this.baseSpark.onEnvironmentUpdate(environmentUpdate) + + /** + * 当BlockManager添加时触发 + */ + override def onBlockManagerAdded(blockManagerAdded: SparkListenerBlockManagerAdded): Unit = this.baseSpark.onBlockManagerAdded(blockManagerAdded) + + /** + * 当BlockManager移除时触发 + */ + override def onBlockManagerRemoved(blockManagerRemoved: SparkListenerBlockManagerRemoved): Unit = this.baseSpark.onBlockManagerRemoved(blockManagerRemoved) + + /** + * 当block更新时触发 + */ + override def onBlockUpdated(blockUpdated: SparkListenerBlockUpdated): Unit = this.baseSpark.onBlockUpdated(blockUpdated) + + /** + * 当job开始执行时触发 + */ + override def onJobStart(jobStart: SparkListenerJobStart): Unit = this.baseSpark.onJobStart(jobStart) + + /** + * 当job执行完成时触发 + */ + override def onJobEnd(jobEnd: SparkListenerJobEnd): Unit = { + this.baseSpark.onJobEnd(jobEnd) + if (jobEnd != null && jobEnd.jobResult == JobSucceeded) { + AccumulatorManager.addMultiTimer(module, "onJobEnd", "onJobEnd", "", "INFO", "", 1) + } else { + AccumulatorManager.addMultiTimer(module, "onJobEnd", "onJobEnd", "", "ERROR", "", 1) + this.logger.error(s"job failed.", this.module) + } + } + + /** + * 当stage提交以后触发 + */ + override def onStageSubmitted(stageSubmitted: SparkListenerStageSubmitted): Unit = this.baseSpark.onStageSubmitted(stageSubmitted) + + /** + * 当stage执行完成以后触发 + */ + override def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit = { + this.baseSpark.onStageCompleted(stageCompleted) + if (stageCompleted != null && stageCompleted.stageInfo.failureReason.isEmpty) { + AccumulatorManager.addMultiTimer(module, "onStageCompleted", "onStageCompleted", "", "INFO", "", 1) + } else { + AccumulatorManager.addMultiTimer(module, "onStageCompleted", "onStageCompleted", "", "ERROR", "", 1) + this.logger.error(s"stage failed. reason: " + stageCompleted.stageInfo.failureReason, this.module) + } + } + + /** + * 当task开始执行时触发 + */ + override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = this.baseSpark.onTaskStart(taskStart) + + /** + * 当从task获取计算结果时触发 + */ + override def onTaskGettingResult(taskGettingResult: SparkListenerTaskGettingResult): Unit = this.baseSpark.onTaskGettingResult(taskGettingResult) + + /** + * 当task执行完成以后触发 + */ + override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = { + this.baseSpark.onTaskEnd(taskEnd) + if (taskEnd != null && taskEnd.reason != null && "Success".equalsIgnoreCase(taskEnd.reason.toString)) { + AccumulatorManager.addMultiTimer(module, "onTaskEnd", "onTaskEnd", "", "INFO", "", 1) + } else { + AccumulatorManager.addMultiTimer(module, "onTaskEnd", "onTaskEnd", "", "ERROR", "", 1) + this.logger.error(s"task failed.", this.module) + } + } + + /** + * 当取消缓存RDD时触发 + */ + override def onUnpersistRDD(unpersistRDD: SparkListenerUnpersistRDD): Unit = this.baseSpark.onUnpersistRDD(unpersistRDD) + + /** + * 用于注册内置累加器,每隔1分钟执行一次,延迟1分钟执行,默认执行10次 + */ + @Scheduled(fixedInterval = 60 * 1000, initialDelay = 60 * 1000, concurrent = false, repeatCount = 10) + private[fire] def registerAcc: Unit = { + if (this.needRegister.compareAndSet(true, false)) { + AccumulatorManager.registerAccumulators(this.baseSpark.sc) + AccumulatorManager.broadcastNewConf(this.baseSpark.sc, this.baseSpark._conf) + this.logger.info(s"完成系统累加器注册.", this.module) + } + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSparkStreaming.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSparkStreaming.scala new file mode 100644 index 0000000..fd2ef5b --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseSparkStreaming.scala @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark + + +import com.zto.fire.common.anno.Rest +import com.zto.fire.common.bean.rest.ResultMsg +import com.zto.fire.common.conf.{FireFrameworkConf, FireKafkaConf} +import com.zto.fire.common.enu.{ErrorCode, JobType, RequestMethod} +import com.zto.fire.common.util.{JSONUtils, KafkaUtils, PropUtils} +import com.zto.fire.core.rest.RestCase +import com.zto.fire.spark.bean.RestartParams +import com.zto.fire.spark.util.{SparkSingletonFactory, SparkUtils} +import org.apache.spark.SparkConf +import org.apache.spark.streaming.{Milliseconds, Seconds, StreamingContext} +import spark.{Request, Response} +import com.zto.fire._ +import com.zto.fire.spark.conf.FireSparkConf + + +/** + * 实时平台Spark通用父类 + * Created by ChengLong on 2018-03-28. + */ +trait BaseSparkStreaming extends BaseSpark { + var checkPointDir: String = _ + var externalConf: RestartParams = _ + override val jobType = JobType.SPARK_STREAMING + + /** + * 程序初始化方法,用于初始化必要的值 + * + * @param batchDuration + * Streaming每个批次间隔时间 + * @param isCheckPoint + * 是否做checkpoint + */ + def init(batchDuration: Long, isCheckPoint: Boolean): Unit = { + this.init(batchDuration, isCheckPoint, null) + } + + /** + * 程序初始化方法,用于初始化必要的值 + * + * @param batchDuration + * Streaming每个批次间隔时间 + * @param isCheckPoint + * 是否做checkpoint + */ + def init(batchDuration: Long, isCheckPoint: Boolean, args: Array[String]): Unit = { + this.init(batchDuration, isCheckPoint, null, args) + } + + /** + * 程序初始化方法,用于初始化必要的值 + * + * @param batchDuration + * Streaming每个批次间隔时间 + * @param isCheckPoint + * 是否做checkpoint + * @param conf + * 传入自己构建的sparkConf对象,可以为空 + */ + def init(batchDuration: Long, isCheckPoint: Boolean, conf: SparkConf, args: Array[String]): Unit = { + val tmpConf = buildConf(conf) + if (this.sc == null) { + // 添加streaming相关的restful接口,并启动 + this.init(tmpConf, args) + this.restfulRegister + .addRest(RestCase(RequestMethod.POST.toString, "/system/streaming/hotRestart", this.hotRestart)) + .startRestServer + } + // 判断是否为热重启,batchDuration优先级分别为 [ 代码<配置文件<热重启 ] + this.batchDuration = SparkUtils.overrideBatchDuration(batchDuration, this.externalConf != null) + if (!isCheckPoint) { + if (this.externalConf != null && this.externalConf.isRestartSparkContext) { + // 重启SparkContext对象 + this.ssc = new StreamingContext(tmpConf, Seconds(Math.abs(this.batchDuration))) + this.sc = this.ssc.sparkContext + } else { + this.ssc = new StreamingContext(this.sc, Seconds(Math.abs(this.batchDuration))) + } + val rememberTime = FireSparkConf.streamingRemember + if (rememberTime > 0) this.ssc.remember(Milliseconds(Math.abs(rememberTime))) + SparkSingletonFactory.setStreamingContext(this.ssc) + this.process + } else { + this.checkPointDir = FireSparkConf.chkPointDirPrefix + this.appName + this.ssc = StreamingContext.getOrCreate(this.checkPointDir, createStreamingContext _) + + // 初始化Streaming + def createStreamingContext(): StreamingContext = { + tmpConf.set("spark.streaming.receiver.writeAheadLog.enable", "true") + if (this.externalConf != null && this.externalConf.isRestartSparkContext) { + // 重启SparkContext对象 + this.ssc = new StreamingContext(tmpConf, Seconds(Math.abs(this.batchDuration))) + this.sc = this.ssc.sparkContext + } else { + this.ssc = new StreamingContext(this.sc, Seconds(Math.abs(this.batchDuration))) + } + this.ssc.checkpoint(checkPointDir) + SparkSingletonFactory.setStreamingContext(this.ssc) + this.process + this.ssc + } + } + this._conf = tmpConf + } + + /** + * 构建内部使用的SparkConf对象 + */ + override def buildConf(conf: SparkConf = null): SparkConf = { + val tmpConf = super.buildConf(conf) + + // 若重启SparkContext对象,则设置restful传递过来的新的配置信息 + if (this.externalConf != null && this.externalConf.isRestartSparkContext) { + if (this.externalConf.getSparkConf != null && this.externalConf.getSparkConf.size() > 0) { + tmpConf.setAll(this.externalConf.getSparkConf) + } + } + + tmpConf + } + + /** + * 在加载任务配置文件前将被加载 + */ + override private[fire] def loadConf: Unit = { + PropUtils.load(FireFrameworkConf.SPARK_STREAMING_CONF_FILE) + } + + /** + * Streaming的处理过程强烈建议放到process中,保持风格统一 + * 注:此方法会被自动调用,在以下两种情况下,必须将逻辑写在process中 + * 1. 开启checkpoint + * 2. 支持streaming热重启(可在不关闭streaming任务的前提下修改batch时间) + */ + override def process: Unit = { + require(this.checkPointDir == null, "当开启checkPoint机制时,必须将对接kafka的代码写在process方法内") + require(this.externalConf == null, "当需要使用热重启功能时,必须将对接kafka的代码写在process方法内") + } + + /** + * kafka配置信息 + * + * @param groupId + * 消费组 + * @param offset + * offset位点,smallest、largest,默认为largest + * @return + * kafka相关配置 + */ + @Deprecated + def kafkaParams(groupId: String = this.appName, kafkaBrokers: String = null, offset: String = FireKafkaConf.offsetLargest, autoCommit: Boolean = false, keyNum: Int = 1): Map[String, Object] = { + KafkaUtils.kafkaParams(null, groupId, kafkaBrokers, offset, autoCommit, keyNum) + } + + /** + * 用于重置StreamingContext(仅支持batch时间的修改) + * + * @return + * 响应结果 + */ + @Rest("/system/streaming/hotRestart") + def hotRestart(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + this.externalConf = JSONUtils.parseObject[RestartParams](json) + new Thread(new Runnable { + override def run(): Unit = { + ssc.stop(externalConf.isRestartSparkContext, externalConf.isStopGracefully) + init(externalConf.getBatchDuration, externalConf.isCheckPoint) + } + }).start() + + this.logger.info(s"[hotRestart] 执行热重启成功:duration=${this.externalConf.getBatchDuration} json=$json", "rest") + msg.buildSuccess(s"执行热重启成功:duration=${this.externalConf.getBatchDuration}", ErrorCode.SUCCESS.toString) + } catch { + case e: Exception => { + this.logger.error(s"[hotRestart] 执行热重启成功失败:json=$json", "rest") + msg.buildError("执行热重启成功失败", ErrorCode.ERROR) + } + } + } + +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseStreamingQueryListener.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseStreamingQueryListener.scala new file mode 100644 index 0000000..0317bd7 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseStreamingQueryListener.scala @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark + +import org.apache.spark.sql.streaming.StreamingQueryListener + +/** + * structured streaming事件监听器 + * + * @author ChengLong 2019年12月24日 16:26:33 + * @since 0.4.1 + */ +class BaseStreamingQueryListener extends StreamingQueryListener { + @volatile protected var latestBatchId = -1L + + override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = { + // onQueryStarted + } + + override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = { + this.latestBatchId = event.progress.batchId + } + + override def onQueryTerminated(event: StreamingQueryListener.QueryTerminatedEvent): Unit = { + // onQueryTerminated + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseStructuredStreaming.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseStructuredStreaming.scala new file mode 100644 index 0000000..b2347b7 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/BaseStructuredStreaming.scala @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark + +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.common.enu.JobType +import com.zto.fire.common.util.PropUtils + +/** + * Structured Streaming通用父类 + * Created by ChengLong on 2019-03-11. + */ +class BaseStructuredStreaming extends BaseSpark { + override val jobType = JobType.SPARK_STRUCTURED_STREAMING + + /** + * 程序初始化方法,用于初始化必要的值 + * + * @param conf + * Spark配置信息 + * @param args main方法参数 + */ + override def init(conf: Any = null, args: Array[String] = null): Unit = { + super.init(conf, args) + // 添加时间监听器 + this._spark.streams.addListener(new BaseStreamingQueryListener) + this.restfulRegister.startRestServer + this.process + } + + /** + * Spark处理逻辑 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + override def process: Unit = { + // 子类复写该方法实现业务处理逻辑 + } + + + /** + * 在加载任务配置文件前将被加载 + */ + override private[fire] def loadConf: Unit = { + PropUtils.load(FireFrameworkConf.SPARK_STRUCTURED_STREAMING_CONF_FILE) + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/AccumulatorManager.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/AccumulatorManager.scala new file mode 100644 index 0000000..e26f4c1 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/AccumulatorManager.scala @@ -0,0 +1,358 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.acc + +import com.google.common.collect.HashBasedTable +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.common.util._ +import com.zto.fire.predef._ +import com.zto.fire.spark.task.SparkSchedulerManager +import com.zto.fire.spark.util.SparkUtils +import org.apache.commons.lang3.StringUtils +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.util.LongAccumulator +import org.apache.spark.{SparkConf, SparkContext, SparkEnv} +import org.slf4j.LoggerFactory + +import java.nio.ByteBuffer +import java.util.concurrent.atomic.AtomicInteger +import java.util.concurrent.{ConcurrentHashMap, ConcurrentLinkedQueue} +import scala.collection.mutable + +/** + * fire内置Spark累加器工具类 + * + * @author ChengLong 2019-7-25 19:11:16 + */ +private[fire] object AccumulatorManager { + private lazy val logger = LoggerFactory.getLogger(this.getClass) + private lazy val executorId = SparkUtils.getExecutorId + // 累加器名称,含有fire的名字将会显示在webui中 + private[this] val counterLabel = "fire-counter" + private[fire] val counter = new LongAccumulator + + // 日志累加器 + private[this] val logAccumulatorLabel = "logAccumulator" + private[fire] val logAccumulator = new LogAccumulator + + // 多值累加器 + private[this] val multiCounterLabel = "fire-multiCounter" + private[fire] val multiCounter = new MultiCounterAccumulator + + // timer累加器 + private[this] val multiTimerLabel = "multiTimer" + private[fire] val multiTimer = new MultiTimerAccumulator + + // env累加器 + private[this] val envAccumulatorLabel = "envAccumulator" + private[fire] val envAccumulator = new EnvironmentAccumulator + + // 累加器注册列表 + private[this] val accMap = Map(this.logAccumulatorLabel -> this.logAccumulator, this.counterLabel -> this.counter, this.multiCounterLabel -> this.multiCounter, this.multiTimerLabel -> this.multiTimer, this.envAccumulatorLabel -> this.envAccumulator) + private[this] val initExecutors: AtomicInteger = new AtomicInteger(0) + + // 获取当前任务的全类名 + private[this] lazy val jobClassName = SparkEnv.get.conf.get(FireFrameworkConf.DRIVER_CLASS_NAME, "") + // 用于注册定时任务的列表 + private[this] val taskRegisterSet = mutable.HashSet[Object]() + // 用于广播spark配置信息 + private[fire] var broadcastConf: Broadcast[SparkConf] = _ + + /** + * 注册定时任务实例 + */ + def registerTasks(tasks: Object*): Unit = { + if (tasks != null) { + tasks.foreach(taskInstances => taskRegisterSet.add(taskInstances)) + } + } + + /** + * 将数据累加到count累加器中 + * + * @param value + * 累加值 + */ + def addCounter(value: Long): Unit = { + if (FireUtils.isSparkEngine) { + if (SparkEnv.get != null && !"driver".equalsIgnoreCase(SparkEnv.get.executorId)) { + val countAccumulator = SparkEnv.get.conf.get(this.counterLabel, "") + if (StringUtils.isNotBlank(countAccumulator)) { + val counter: LongAccumulator = SparkEnv.get.closureSerializer.newInstance.deserialize(ByteBuffer.wrap(StringsUtils.toByteArray(countAccumulator))) + counter.add(value) + } + } else { + this.counter.add(value) + } + } + } + + /** + * 获取counter累加器的值 + * + * @return + * 累加结果 + */ + def getCounter: Long = this.counter.value + + /** + * 将timeCost累加到日志累加器中 + * + * @param log + * TimeCost实例对象 + */ + def addLog(log: String): Unit = { + if (isEmpty(log)) return + if (FireUtils.isSparkEngine) { + val env = SparkEnv.get + if (env != null && !"driver".equalsIgnoreCase(SparkEnv.get.executorId)) { + val logAccumulator = SparkEnv.get.conf.get(this.logAccumulatorLabel, "") + if (StringUtils.isNotBlank(logAccumulator)) { + val logAcc: LogAccumulator = SparkEnv.get.closureSerializer.newInstance.deserialize(ByteBuffer.wrap(StringsUtils.toByteArray(logAccumulator))) + logAcc.add(log) + } + } else { + this.logAccumulator.add(log) + } + } + } + + /** + * 添加异常堆栈日志到累加器中 + * + * @param exceptionList + * 堆栈列表 + */ + def addExceptionLog(exceptionList: List[(String, Throwable)], count: Long): Unit = { + exceptionList.foreach(t => this.addLog(exceptionStack(t))) + + /** + * 转换throwable为堆栈信息 + */ + def exceptionStack(exceptionTuple: (String, Throwable)): String = { + s""" + |异常信息<< ip:${OSUtils.getIp} executorId:${executorId} 异常时间:${exceptionTuple._1} 累计:${count}次. >> + |异常堆栈:${ExceptionBus.stackTrace(exceptionTuple._2)} + |""".stripMargin + } + } + + /** + * 获取日志累加器中的值 + * + * @return + * 日志累加值 + */ + def getLog: ConcurrentLinkedQueue[String] = this.logAccumulator.value + + /** + * 将运行时信息累加到env累加器中 + * + * @param envInfo + * 运行时信息 + */ + def addEnv(envInfo: String): Unit = { + if (FireUtils.isSparkEngine) { + val env = SparkEnv.get + if (env != null && !"driver".equalsIgnoreCase(SparkEnv.get.executorId)) { + val envAccumulator = SparkEnv.get.conf.get(this.envAccumulatorLabel, "") + if (StringUtils.isNotBlank(envAccumulator)) { + val envAcc: EnvironmentAccumulator = SparkEnv.get.closureSerializer.newInstance.deserialize(ByteBuffer.wrap(StringsUtils.toByteArray(envAccumulator))) + envAcc.add(envInfo) + } + } else { + this.envAccumulator.add(envInfo) + } + } + } + + /** + * 获取env累加器中的运行时信息 + * + * @return + * 运行时信息 + */ + def getEnv: ConcurrentLinkedQueue[String] = this.envAccumulator.value + + /** + * 将数据累加到multiCount累加器中 + * + * @param value + * 累加值 + */ + def addMultiCounter(key: String, value: Long): Unit = { + if (FireUtils.isSparkEngine) { + if (SparkEnv.get != null && !"driver".equalsIgnoreCase(SparkEnv.get.executorId)) { + val countAccumulator = SparkEnv.get.conf.get(this.multiCounterLabel, "") + if (StringUtils.isNotBlank(countAccumulator)) { + val multiCounter: MultiCounterAccumulator = SparkEnv.get.closureSerializer.newInstance.deserialize(ByteBuffer.wrap(StringsUtils.toByteArray(countAccumulator))) + multiCounter.add(key, value) + } + } else { + this.multiCounter.add(key, value) + } + } + } + + /** + * 获取multiCounter累加器的值 + * + * @return + * 累加结果 + */ + def getMultiCounter: ConcurrentHashMap[String, Long] = this.multiCounter.value + + /** + * 将数据累加到timer累加器中 + * + * @param value + * 累加值的key、value和时间的schema,默认为yyyy-MM-dd HH:mm:00 + */ + def addMultiTimer(key: String, value: Long, schema: String = DateFormatUtils.TRUNCATE_MIN): Unit = { + if (FireUtils.isSparkEngine) { + if (SparkEnv.get != null && !"driver".equalsIgnoreCase(SparkEnv.get.executorId)) { + val timerAccumulator = SparkEnv.get.conf.get(this.multiTimerLabel, "") + if (StringUtils.isNotBlank(timerAccumulator)) { + val multiTimer: MultiTimerAccumulator = SparkEnv.get.closureSerializer.newInstance.deserialize(ByteBuffer.wrap(StringsUtils.toByteArray(timerAccumulator))) + multiTimer.add(key, value, schema) + } + } else { + this.multiTimer.add(key, value, schema) + } + } + } + + /** + * 用于构建复杂类型(json)的多时间维度累加器的key + * 并将key作为多时间维度累加器的key + * + * @param value + * 累加的值 + * @param cluster + * 连接的集群名 + * @param module + * 所在的模块 + * @param method + * 所在的方法名 + * @param action + * 执行的动作 + * @param sink + * 作用的目标 + * @param level + * 日志级别:INFO、ERROR + * @return + * 累加器的key(json格式) + */ + def addMultiTimer(module: String, method: String, action: String, sink: String, level: String, cluster: String, value: Long): Unit = { + if (FireUtils.isSparkEngine) { + val multiKey = s"""{"cluster":"$cluster","module":"$module","method":"$method","action":"$action","sink":"$sink","level":"$level","jobClass":"$jobClassName"}""" + this.addMultiTimer(multiKey, value) + } + } + + /** + * 获取timer累加器的值 + * + * @return + * 累加结果 + */ + def getMultiTimer: HashBasedTable[String, String, Long] = this.multiTimer.value + + /** + * 获取动态配置信息 + */ + def getConf: SparkConf = { + if (this.broadcastConf != null) { + this.broadcastConf.value + } else { + new SparkConf().setAll(PropUtils.settings) + } + } + + /** + * 广播新的配置 + */ + private[fire] def broadcastNewConf(sc: SparkContext, conf: SparkConf): Unit = { + if (sc != null && conf != null && FireFrameworkConf.dynamicConf) { + val executorNum = this.getInitExecutors(sc) + val broadcastConf = sc.broadcast(conf) + this.broadcastConf = broadcastConf + val rdd = sc.parallelize(1 to executorNum * 10, executorNum * 3) + rdd.foreachPartitionAsync(_ => { + this.broadcastConf = broadcastConf + this.broadcastConf.value.getAll.foreach(kv => { + PropUtils.setProperty(kv._1, kv._2) + }) + this.logger.info("The Executor side configuration has been reloaded.") + }) + this.logger.info("The Driver side configuration has been reloaded.") + } + } + + /** + * 获取当前任务的executor数 + */ + private[this] def getInitExecutors(sc: SparkContext): Int = { + if (this.initExecutors.get() == 0) this.initExecutors.set(sc.getConf.get("spark.executor.instances", if (OSUtils.isLinux) "1000" else "10").toInt) + this.initExecutors.get() + } + + /** + * 注册多个自定义累加器到每个executor + * + * @param sc + * SparkContext + * [key, accumulator] + */ + private[fire] def registerAccumulators(sc: SparkContext): Unit = this.synchronized { + if (sc != null && accMap != null && accMap.nonEmpty) { + val executorNum = this.getInitExecutors(sc) + // 将定时任务所在类的实例广播到每个executor端 + val taskSet = sc.broadcast(taskRegisterSet) + val broadcastConf = sc.broadcast(SparkEnv.get.conf) + this.broadcastConf = broadcastConf + // 序列化内置的累加器 + val accumulatorMap = accMap.map(accInfo => { + // 注册每个累加器,必须是合法的名称并且未被注册过 + if (accInfo._2 != null && !accInfo._2.isRegistered) { + if (StringUtils.isNotBlank(accInfo._1) && accInfo._1.contains("fire")) { + sc.register(accInfo._2, accInfo._1) + } else { + sc.register(accInfo._2) + } + } + (accInfo._1, SparkEnv.get.closureSerializer.newInstance().serialize(accInfo._2).array()) + }) + + // 获取申请的executor数,设置累加器到conf中 + val rdd = sc.parallelize(1 to executorNum * 10, executorNum * 3) + rdd.foreachPartition(_ => { + this.broadcastConf = broadcastConf + // 将序列化后的累加器放置到conf中 + accumulatorMap.foreach(accSer => SparkEnv.get.conf.set(accSer._1, StringsUtils.toHexString(accSer._2))) + if (FireFrameworkConf.scheduleEnable) { + // 从广播中获取到定时任务的实例,并在executor端完成注册 + val tasks = taskSet.value + if (tasks != null && tasks.nonEmpty && !SparkSchedulerManager.getInstance().schedulerIsStarted()) { + SparkSchedulerManager.getInstance().registerTasks(tasks.toArray: _*) + } + } + }) + } + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/EnvironmentAccumulator.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/EnvironmentAccumulator.scala new file mode 100644 index 0000000..3f538bb --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/EnvironmentAccumulator.scala @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.acc + +import java.util.concurrent.ConcurrentLinkedQueue + +import com.zto.fire.common.conf.FireFrameworkConf +import org.apache.commons.lang3.StringUtils +import org.apache.spark.util.AccumulatorV2 + +/** + * 运行时累加器,用于收集运行时的jvm、gc、thread、cpu、memory、disk等信息 + * + * @author ChengLong 2019年11月6日 16:56:38 + */ +private[fire] class EnvironmentAccumulator extends AccumulatorV2[String, ConcurrentLinkedQueue[String]] { + // 用于存放运行时信息的队列 + private val envInfoQueue = new ConcurrentLinkedQueue[String] + // 判断是否打开运行时信息累加器 + private lazy val isEnable = FireFrameworkConf.accEnable && FireFrameworkConf.accEnvEnable + + /** + * 判断累加器是否为空 + */ + override def isZero: Boolean = this.envInfoQueue.size() == 0 + + /** + * 用于复制累加器 + */ + override def copy(): AccumulatorV2[String, ConcurrentLinkedQueue[String]] = new EnvironmentAccumulator + + /** + * driver端执行有效,用于清空累加器 + */ + override def reset(): Unit = this.envInfoQueue.clear + + /** + * executor端执行,用于收集运行时信息 + * + * @param envInfo + * 运行时信息 + */ + override def add(envInfo: String): Unit = { + if (this.isEnable && StringUtils.isNotBlank(envInfo)) { + this.envInfoQueue.add(envInfo) + this.clear + } + } + + /** + * executor端向driver端merge累加数据 + * + * @param other + * executor端累加结果 + */ + override def merge(other: AccumulatorV2[String, ConcurrentLinkedQueue[String]]): Unit = { + if (other != null && other.value.size() > 0) { + this.envInfoQueue.addAll(other.value) + this.clear + } + } + + /** + * driver端获取累加器的值 + * + * @return + * 收集到的日志信息 + */ + override def value: ConcurrentLinkedQueue[String] = this.envInfoQueue + + /** + * 当日志累积量超过maxLogSize所设定的值时清理过期的日志数据 + * 直到达到minLogSize所设定的最小值,防止频繁的进行清理 + */ + def clear: Unit = { + if (this.envInfoQueue.size() > FireFrameworkConf.maxEnvSize) { + while (this.envInfoQueue.size() > FireFrameworkConf.minEnvSize) { + this.envInfoQueue.poll + } + } + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/LogAccumulator.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/LogAccumulator.scala new file mode 100644 index 0000000..10bef5f --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/LogAccumulator.scala @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.acc + +import com.zto.fire.common.conf.FireFrameworkConf +import org.apache.spark.util.AccumulatorV2 + +import java.util.concurrent.ConcurrentLinkedQueue + +/** + * fire框架日志累加器 + * + * @author ChengLong 2019-7-23 14:22:16 + */ +private[fire] class LogAccumulator extends AccumulatorV2[String, ConcurrentLinkedQueue[String]] { + // 用于存放日志的队列 + private val logQueue = new ConcurrentLinkedQueue[String] + // 判断是否打开日志累加器 + private lazy val isEnable = FireFrameworkConf.accEnable && FireFrameworkConf.accLogEnable + + /** + * 判断累加器是否为空 + */ + override def isZero: Boolean = this.logQueue.size() == 0 + + /** + * 用于复制累加器 + */ + override def copy(): AccumulatorV2[String, ConcurrentLinkedQueue[String]] = new LogAccumulator + + /** + * driver端执行有效,用于清空累加器 + */ + override def reset(): Unit = this.logQueue.clear + + /** + * executor端执行,用于收集日志信息 + * + * @param log + * 日志信息 + */ + override def add(log: String): Unit = { + if (this.isEnable) { + this.logQueue.add(log) + this.clear + } + } + + /** + * executor端向driver端merge累加数据 + * + * @param other + * executor端累加结果 + */ + override def merge(other: AccumulatorV2[String, ConcurrentLinkedQueue[String]]): Unit = { + if (other != null && other.value.size() > 0) { + this.logQueue.addAll(other.value) + this.clear + } + } + + /** + * driver端获取累加器的值 + * + * @return + * 收集到的日志信息 + */ + override def value: ConcurrentLinkedQueue[String] = this.logQueue + + /** + * 当日志累积量超过maxLogSize所设定的值时清理过期的日志数据 + * 直到达到minLogSize所设定的最小值,防止频繁的进行清理 + */ + def clear: Unit = { + if (this.logQueue.size() > FireFrameworkConf.maxLogSize) { + while (this.logQueue.size() > FireFrameworkConf.minLogSize) { + this.logQueue.poll + } + } + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/MultiCounterAccumulator.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/MultiCounterAccumulator.scala new file mode 100644 index 0000000..448e5fd --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/MultiCounterAccumulator.scala @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.acc + +import java.util.concurrent.ConcurrentHashMap + +import com.zto.fire._ +import com.zto.fire.common.conf.FireFrameworkConf +import org.apache.commons.lang3.StringUtils +import org.apache.spark.util.AccumulatorV2 + + +/** + * 多值累加器 + * + * @author ChengLong 2019-8-16 16:56:06 + */ +private[fire] class MultiCounterAccumulator extends AccumulatorV2[(String, Long), ConcurrentHashMap[String, Long]] { + private[fire] val multiCounter = new ConcurrentHashMap[String, Long]() + // 判断是否打开多值累加器 + private lazy val isEnable = FireFrameworkConf.accEnable && FireFrameworkConf.accMultiCounterEnable + + /** + * 用于判断当前累加器是否为空 + * + * @return + * true: 空 false:不为空 + */ + override def isZero: Boolean = this.multiCounter.size() == 0 + + /** + * 用于复制一个新的累加器实例 + * + * @return + * 新的累加器实例对象 + */ + override def copy(): AccumulatorV2[(String, Long), ConcurrentHashMap[String, Long]] = { + val tmpAcc = new MultiCounterAccumulator + tmpAcc.multiCounter.putAll(this.multiCounter) + tmpAcc + } + + /** + * 用于重置累加器 + */ + override def reset(): Unit = this.multiCounter.clear + + /** + * 用于添加新的数据到累加器中 + * + * @param kv + * 累加值的key和value + */ + override def add(kv: (String, Long)): Unit = this.mergeMap(kv) + + /** + * 用于合并数据到累加器的map中 + * 存在的累加,不存在的直接添加 + * + * @param kv + * 累加值的key和value + */ + private[this] def mergeMap(kv: (String, Long)): Unit = { + if (this.isEnable && kv != null && StringUtils.isNotBlank(kv._1)) { + this.multiCounter.put(kv._1, this.multiCounter.getOrDefault(kv._1, 0) + kv._2) + } + } + + /** + * 用于合并executor端的map到driver端 + * + * @param other + * executor端的map + */ + override def merge(other: AccumulatorV2[(String, Long), ConcurrentHashMap[String, Long]]): Unit = { + val otherMap = other.value + if (otherMap != null && otherMap.nonEmpty) { + otherMap.foreach(kv => { + this.mergeMap(kv) + }) + } + } + + /** + * 用于driver端获取累加器(map)中的值 + * + * @return + * 累加器中的值 + */ + override def value: ConcurrentHashMap[String, Long] = this.multiCounter +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/MultiTimerAccumulator.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/MultiTimerAccumulator.scala new file mode 100644 index 0000000..dfb7ac2 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/acc/MultiTimerAccumulator.scala @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.acc + +import com.google.common.collect.HashBasedTable +import com.zto.fire._ +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.common.util.DateFormatUtils +import org.apache.commons.lang3.StringUtils +import org.apache.spark.util.AccumulatorV2 + +import java.util.Date +import scala.collection.mutable + +/** + * timer累加器,对相同的key进行分钟级维度累加 + * + * @author ChengLong 2019-8-21 14:22:12 + */ +private[fire] class MultiTimerAccumulator extends AccumulatorV2[(String, Long, String), HashBasedTable[String, String, Long]] { + private[fire] lazy val timerCountTable = HashBasedTable.create[String, String, Long] + // 用于记录上次清理过期累加数据的时间 + private var lastClearTime = new Date + // 判断是否打开多时间维度累加器 + private lazy val isEnable = FireFrameworkConf.accEnable && FireFrameworkConf.accMultiCounterEnable + + /** + * 用于判断当前累加器是否为空 + * + * @return + * true: 空 false:不为空 + */ + override def isZero: Boolean = this.timerCountTable.size() == 0 + + /** + * 用于复制一个新的累加器实例 + * + * @return + * 新的累加器实例对象 + */ + override def copy(): AccumulatorV2[(String, Long, String), HashBasedTable[String, String, Long]] = { + val tmpAcc = new MultiTimerAccumulator + tmpAcc.timerCountTable.putAll(this.timerCountTable) + tmpAcc + } + + /** + * 用于重置累加器 + */ + override def reset(): Unit = this.timerCountTable.clear + + /** + * 用于添加新的数据到累加器中 + * + * @param kv + * 累加值的key、value和时间的schema,默认为yyyy-MM-dd HH:mm:00 + */ + override def add(kv: (String, Long, String)): Unit = { + if (!isEnable || kv == null) return + val schema = if (StringUtils.isBlank(kv._3)) { + DateFormatUtils.TRUNCATE_MIN + } else kv._3 + if (StringUtils.isNotBlank(kv._1)) { + this.mergeTable(kv._1, DateFormatUtils.formatCurrentBySchema(schema), kv._2) + } + } + + /** + * 用于合并数据到累加器的map中 + * 存在的累加,不存在的直接添加 + * + * @param kv + * 累加值的key和value + */ + private[this] def mergeTable(kv: (String, String, Long)): Unit = { + if (kv != null && StringUtils.isNotBlank(kv._1) && kv._2 != null) { + val value = if (this.timerCountTable.contains(kv._1, kv._2)) this.timerCountTable.get(kv._1, kv._2) else 0L + this.timerCountTable.put(kv._1, kv._2, kv._3 + value) + this.clear + } + } + + /** + * 用于合并executor端的map到driver端 + * + * @param other + * executor端的map + */ + override def merge(other: AccumulatorV2[(String, Long, String), HashBasedTable[String, String, Long]]): Unit = { + val otherTable = other.value + if (otherTable != null && !otherTable.isEmpty) { + otherTable.cellSet().foreach(timer => { + this.mergeTable(timer.getRowKey, timer.getColumnKey, timer.getValue) + }) + } + } + + /** + * 用于driver端获取累加器(map)中的值 + * + * @return + * 累加器中的值 + */ + override def value: HashBasedTable[String, String, Long] = this.timerCountTable + + /** + * 当累积量超过maxTimerSize所设定的值时清理过期的数据 + */ + private[this] def clear: Unit = { + val currentDate = new Date + if (this.timerCountTable.size() >= FireFrameworkConf.maxTimerSize && DateFormatUtils.betweenHours(currentDate, lastClearTime) >= FireFrameworkConf.maxTimerHour) { + val criticalTime = DateFormatUtils.addHours(currentDate, -Math.abs(FireFrameworkConf.maxTimerHour)) + + val timeOutSet = new mutable.HashSet[String]() + this.timerCountTable.rowMap().foreach(kmap => { + kmap._2.filter(_ != null).foreach(kv => { + if (kv._1.compareTo(criticalTime) <= 0 && StringUtils.isNotBlank(kmap._1) && StringUtils.isNotBlank(kv._1)) { + timeOutSet += kmap._1 + "#" + kv._1 + } + }) + }) + + timeOutSet.filter(StringUtils.isNotBlank).map(t => (t.split("#"))).foreach(kv => { + this.timerCountTable.remove(kv(0), kv(1)) + }) + this.lastClearTime = currentDate + } + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/conf/FireSparkConf.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/conf/FireSparkConf.scala new file mode 100644 index 0000000..03ac6f4 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/conf/FireSparkConf.scala @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.conf + +import com.zto.fire.common.util.PropUtils + +/** + * Spark引擎相关配置 + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-07-13 14:57 + */ +private[fire] object FireSparkConf { + lazy val SPARK_APP_NAME = "spark.appName" + lazy val SPARK_LOCAL_CORES = "spark.local.cores" + lazy val SPARK_LOG_LEVEL = "spark.log.level" + lazy val SPARK_SAVE_MODE = "spark.saveMode" + lazy val SPARK_PARALLELISM = "spark.parallelism" + lazy val SPARK_CHK_POINT_DIR = "spark.chkpoint.dir" + + // spark datasource v2 api中的options配置key前缀 + lazy val SPARK_DATASOURCE_OPTIONS_PREFIX = "spark.datasource.options." + lazy val SPARK_DATASOURCE_FORMAT = "spark.datasource.format" + lazy val SPARK_DATSOURCE_SAVE_MODE = "spark.datasource.saveMode" + // 用于dataFrame.write.format.save()参数 + lazy val SPARK_DATASOURCE_SAVE_PARAM = "spark.datasource.saveParam" + lazy val SPARK_DATASOURCE_IS_SAVE_TABLE = "spark.datasource.isSaveTable" + // 用于spark.read.format.load()参数 + lazy val SPARK_DATASOURCE_LOAD_PARAM = "spark.datasource.loadParam" + + // spark 默认的checkpoint地址 + lazy val sparkChkPointDir = "hdfs://nameservice1/user/spark/ckpoint/" + // spark streaming批次时间 + lazy val SPARK_STREAMING_BATCH_DURATION = "spark.streaming.batch.duration" + // spark streaming的remember时间,-1表示不生效(ms) + lazy val SPARK_STREAMING_REMEMBER = "spark.streaming.remember" + + // spark streaming的remember时间,-1表示不生效(ms) + def streamingRemember: Long = PropUtils.getLong(this.SPARK_STREAMING_REMEMBER, -1) + lazy val appName = PropUtils.getString(this.SPARK_APP_NAME, "") + lazy val localCores = PropUtils.getString(this.SPARK_LOCAL_CORES, "*") + lazy val logLevel = PropUtils.getString(this.SPARK_LOG_LEVEL, "info").toUpperCase + lazy val saveMode = PropUtils.getString(this.SPARK_SAVE_MODE, "Append") + lazy val parallelism = PropUtils.getInt(this.SPARK_PARALLELISM, 200) + lazy val chkPointDirPrefix = PropUtils.getString(this.SPARK_CHK_POINT_DIR, this.sparkChkPointDir) + lazy val confBathDuration = PropUtils.getInt(this.SPARK_STREAMING_BATCH_DURATION, -1) + + /** + * spark datasource api中的format参数 + */ + def datasourceFormat(keyNum: Int = 1): String = PropUtils.getString(this.SPARK_DATASOURCE_FORMAT, "", keyNum) + + /** + * spark datasource api中的saveMode参数 + */ + def datasourceSaveMode(keyNum: Int = 1): String = PropUtils.getString(this.SPARK_DATSOURCE_SAVE_MODE, "Append", keyNum) + + /** + * spark datasource api中的save方法参数 + */ + def datasourceSaveParam(keyNum: Int = 1): String = PropUtils.getString(this.SPARK_DATASOURCE_SAVE_PARAM, "", keyNum) + + /** + * spark datasource api中的isSaveTable方法 + */ + def datasourceIsSaveTable(keyNum: Int = 1): String = PropUtils.getString(this.SPARK_DATASOURCE_IS_SAVE_TABLE, "", keyNum) + + /** + * spark datasource api中的load方法参数 + */ + def datasourceLoadParam(keyNum: Int = 1): String = PropUtils.getString(this.SPARK_DATASOURCE_LOAD_PARAM, "", keyNum) +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/conf/SparkEngineConf.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/conf/SparkEngineConf.scala new file mode 100644 index 0000000..329c534 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/conf/SparkEngineConf.scala @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.conf + +import com.zto.fire.core.conf.EngineConf +import com.zto.fire.spark.util.SparkUtils +import org.apache.spark.SparkEnv + +/** + * 获取Spark引擎的所有配置信息 + * + * @author ChengLong + * @since 2.0.0 + * @create 2021-03-02 10:57 + */ +private[fire] class SparkEngineConf extends EngineConf { + + /** + * 获取引擎的所有配置信息 + */ + override def getEngineConf: Map[String, String] = { + if (SparkUtils.isExecutor) { + SparkEnv.get.conf.getAll.toMap + } else { + Map.empty[String, String] + } + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/connector/HBaseBulkConnector.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/connector/HBaseBulkConnector.scala new file mode 100644 index 0000000..15eac6d --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/connector/HBaseBulkConnector.scala @@ -0,0 +1,516 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.connector + +import com.zto.fire.common.anno.Internal +import com.zto.fire.core.connector.{Connector, ConnectorFactory} +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.hbase.bean.{HBaseBaseBean, MultiVersionsBean} +import com.zto.fire.hbase.conf.FireHBaseConf +import com.zto.fire.predef._ +import com.zto.fire.spark.conf.FireSparkConf +import com.zto.fire.spark.util.{SparkSingletonFactory, SparkUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.hbase.TableName +import org.apache.hadoop.hbase.client._ +import org.apache.hadoop.hbase.io.ImmutableBytesWritable +import org.apache.hadoop.hbase.mapreduce.TableOutputFormat +import org.apache.hadoop.hbase.spark.HBaseContext +import org.apache.hadoop.hbase.util.Bytes +import org.apache.hadoop.mapreduce.Job +import org.apache.spark.SparkContext +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Dataset, Encoders, Row} +import org.apache.spark.storage.StorageLevel +import org.apache.spark.streaming.dstream.DStream + +import scala.collection.mutable.ListBuffer +import scala.reflect.ClassTag + +/** + * HBase直连工具类,基于HBase-Spark API开发 + * 具有更强大的性能和更低的资源开销,适用于 + * 与Spark相结合的大数据量操作,优点体现在并行 + * 和大数据量。如果数据量不大,仍推荐使用 + * HBaseConnector进行相关的操作 + * + * @param sc + * SparkContext实例 + * @param config + * HBase相关配置参数 + * @author ChengLong 2018年4月10日 10:39:28 + */ +class HBaseBulkConnector(@scala.transient sc: SparkContext, @scala.transient config: Configuration, batchSize: Int = 10000, keyNum: Int = 1) + extends HBaseContext(sc, config) with Connector { + private[fire] lazy val finalBatchSize = if (FireHBaseConf.hbaseBatchSize(this.keyNum) != -1) FireHBaseConf.hbaseBatchSize(this.keyNum) else this.batchSize + private[this] lazy val sparkSession = SparkSingletonFactory.getSparkSession + @transient + private[this] lazy val tableConfMap = new JConcurrentHashMap[String, Configuration]() + + /** + * 根据RDD[String]批量删除,rdd是rowkey的集合 + * 类型为String + * + * @param rdd + * 类型为String的RDD数据集 + * @param tableName + * HBase表名 + */ + def bulkDeleteRDD(tableName: String, rdd: RDD[String]): Unit = { + requireNonEmpty(tableName, rdd) + tryWithLog { + val rowKeyRDD = rdd.filter(rowkey => StringUtils.isNotBlank(rowkey)).map(rowKey => Bytes.toBytes(rowKey)) + this.bulkDelete[Array[Byte]](rowKeyRDD, TableName.valueOf(tableName), rec => new Delete(rec), this.finalBatchSize) + }(this.logger, s"execute bulkDeleteRDD(tableName: ${tableName}, batchSize: ${batchSize}) success. keyNum: ${keyNum}") + } + + /** + * 根据Dataset[String]批量删除,Dataset是rowkey的集合 + * 类型为String + * + * @param dataset + * 类型为String的Dataset集合 + * @param tableName + * HBase表名 + */ + def bulkDeleteDS(tableName: String, dataset: Dataset[String]): Unit = { + requireNonEmpty(tableName, dataset) + tryWithLog { + this.bulkDeleteRDD(tableName, dataset.rdd) + }(this.logger, s"execute bulkDeleteDS(tableName: ${tableName}, batchSize: ${finalBatchSize}) success. keyNum: ${keyNum}") + } + + /** + * 指定rowkey集合,进行批量删除操作内部会将这个集合转为RDD + * 推荐在较大量数据时使用,小数据量的删除操作仍推荐使用HBaseConnector + * + * @param tableName + * HBase表名 + * @param seq + * 待删除的rowKey集合 + */ + def bulkDeleteList(tableName: String, seq: Seq[String]): Unit = { + requireNonEmpty(tableName, seq) + tryWithLog { + val rdd = sc.parallelize(seq, math.max(1, math.min(seq.length / 2, FireSparkConf.parallelism))) + this.bulkDeleteRDD(tableName, rdd) + }(this.logger, s"execute bulkDeleteList(tableName: ${tableName}) success. keyNum: ${keyNum}") + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param rdd + * rowKey集合,类型为RDD[String] + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def bulkGetRDD[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rdd: RDD[String], clazz: Class[E]): RDD[E] = { + requireNonEmpty(tableName, rdd, clazz) + + tryWithReturn { + val rowKeyRDD = rdd.filter(StringUtils.isNotBlank(_)).map(rowKey => Bytes.toBytes(rowKey)) + val getRDD = this.bulkGet[Array[Byte], E](TableName.valueOf(tableName), batchSize, rowKeyRDD, rowKey => new Get(rowKey), (result: Result) => { + HBaseConnector(keyNum = this.keyNum).hbaseRow2Bean(result, clazz) + }).filter(bean => bean != null).persist(StorageLevel.fromString(FireHBaseConf.hbaseStorageLevel)) + getRDD + }(this.logger, s"execute bulkGetRDD(tableName: ${tableName}, batchSize: ${finalBatchSize}) success. keyNum: ${keyNum}") + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param rdd + * rowKey集合,类型为RDD[String] + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def bulkGetDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rdd: RDD[String], clazz: Class[E]): DataFrame = { + requireNonEmpty(tableName, rdd, clazz) + tryWithReturn { + val resultRdd = this.bulkGetRDD[E](tableName, rdd, clazz) + this.sparkSession.createDataFrame(resultRdd, clazz) + }(this.logger, s"execute bulkGetDF(tableName: ${tableName}, batchSize: ${finalBatchSize}) success. keyNum: ${keyNum}") + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param rdd + * rowKey集合,类型为RDD[String] + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def bulkGetDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rdd: RDD[String], clazz: Class[E]): Dataset[E] = { + requireNonEmpty(tableName, rdd, clazz) + tryWithReturn { + val resultRdd = this.bulkGetRDD[E](tableName, rdd, clazz) + this.sparkSession.createDataset(resultRdd)(Encoders.bean(clazz)) + }(this.logger, s"execute bulkGetDS(tableName: ${tableName}, batchSize: ${finalBatchSize}) success. keyNum: ${keyNum}") + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * 内部实现是将rowkey集合转为RDD[String],推荐在数据量较大 + * 时使用。数据量较小请优先使用HBaseConnector + * + * @param tableName + * HBase表名 + * @param clazz + * 具体类型 + * @param seq + * rowKey集合 + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def bulkGetSeq[E <: HBaseBaseBean[E] : ClassTag](tableName: String, seq: Seq[String], clazz: Class[E]): RDD[E] = { + requireNonEmpty(tableName, seq, clazz) + + tryWithReturn { + val rdd = sc.parallelize(seq, math.max(1, math.min(seq.length / 2, FireSparkConf.parallelism))) + this.bulkGetRDD(tableName, rdd, clazz) + }(this.logger, s"execute bulkGetSeq(tableName: ${tableName}, batchSize: ${finalBatchSize}) success. keyNum: ${keyNum}") + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @param rdd + * 数据集合,数类型需继承自HBaseBaseBean + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def bulkPutRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, rdd: RDD[T]): Unit = { + requireNonEmpty(tableName, rdd) + + tryWithLog { + this.bulkPut[T](rdd, + TableName.valueOf(tableName), + (putRecord: T) => { + HBaseConnector(keyNum = this.keyNum).convert2Put[T](if (HBaseConnector(keyNum = this.keyNum).getMultiVersion[T]) new MultiVersionsBean(putRecord).asInstanceOf[T] else putRecord, HBaseConnector(keyNum = this.keyNum).getNullable[T]) + }) + }(this.logger, s"execute bulkPutRDD(tableName: ${tableName}) success. keyNum: ${keyNum}") + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入。如果数据量 + * 较大,推荐使用。数据量过小则推荐使用HBaseConnector + * + * @param tableName + * HBase表名 + * @param seq + * 数据集,类型为HBaseBaseBean的子类 + * @tparam T + * 对象类型必须是HBaseBaseBean的子类 + */ + def bulkPutSeq[T <: HBaseBaseBean[T] : ClassTag](tableName: String, seq: Seq[T]): Unit = { + requireNonEmpty(tableName, seq) + + tryWithLog { + val rdd = this.sc.parallelize(seq, math.max(1, math.min(seq.length / 2, FireSparkConf.parallelism))) + this.bulkPutRDD(tableName, rdd) + }(this.logger, s"execute bulkPutRDD(tableName: ${tableName}) success. keyNum: ${keyNum}") + } + + /** + * 定制化scan设置后从指定的表中scan数据 + * 并将scan到的结果集映射为自定义JavaBean对象 + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * @param clazz + * 自定义JavaBean的Class对象 + * @tparam T + * 对象类型必须是HBaseBaseBean的子类 + * @return + * scan获取到的结果集,类型为RDD[T] + */ + def bulkScanRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan)(implicit canOverload: Boolean = true): RDD[T] = { + requireNonEmpty(tableName, scan, clazz) + + tryWithReturn { + if (scan.getCaching == -1) { + scan.setCaching(this.finalBatchSize) + } + this.hbaseRDD(TableName.valueOf(tableName), scan).mapPartitions(it => HBaseConnector(keyNum = this.keyNum).hbaseRow2BeanList(it, clazz)).persist(StorageLevel.fromString(FireHBaseConf.hbaseStorageLevel)) + }(this.logger, s"execute bulkScanRDD(tableName: ${tableName}) success. keyNum: ${keyNum}") + } + + /** + * 指定startRow和stopRow后自动创建scan对象完成数据扫描 + * 并将scan到的结果集映射为自定义JavaBean对象 + * + * @param tableName + * HBase表名 + * @param startRow + * rowkey的起始 + * @param stopRow + * rowkey的结束 + * @param clazz + * 自定义JavaBean的Class对象 + * @tparam T + * 对象类型必须是HBaseBaseBean的子类 + * @return + * scan获取到的结果集,类型为RDD[T] + */ + def bulkScanRDD2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String): RDD[T] = { + requireNonEmpty(tableName, clazz, startRow, stopRow) + this.bulkScanRDD(tableName, clazz, HBaseConnector.buildScan(startRow, stopRow)) + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @param dataFrame + * dataFrame实例,数类型需继承自HBaseBaseBean + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def bulkPutDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dataFrame: DataFrame, clazz: Class[T]): Unit = { + requireNonEmpty(tableName, dataFrame, clazz) + + val rdd = dataFrame.rdd.mapPartitions(it => SparkUtils.sparkRowToBean(it, clazz)) + this.bulkPutRDD[T](tableName, rdd) + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @param dataset + * dataFrame实例,数类型需继承自HBaseBaseBean + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def bulkPutDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dataset: Dataset[T]): Unit = { + requireNonEmpty(tableName, dataset) + + this.bulkPutRDD[T](tableName, dataset.rdd) + } + + /** + * 用于已经映射为指定类型的DStream实时 + * 批量写入至HBase表中 + * + * @param tableName + * HBase表名 + * @param dstream + * 类型为自定义JavaBean的DStream流 + * @tparam T + * 对象类型必须是HBaseBaseBean的子类 + */ + def bulkPutStream[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dstream: DStream[T]): Unit = { + requireNonEmpty(tableName, dstream) + + tryWithLog { + this.streamBulkPut[T](dstream, TableName.valueOf(tableName), (putRecord: T) => { + HBaseConnector(keyNum = this.keyNum).convert2Put[T](if (HBaseConnector(keyNum = this.keyNum).getMultiVersion[T]) new MultiVersionsBean(putRecord).asInstanceOf[T] else putRecord, HBaseConnector(keyNum = this.keyNum).getNullable[T]) + }) + }(this.logger, s"execute bulkPutStream(tableName: ${tableName}) success. keyNum: ${keyNum}") + } + + /** + * 以spark 方式批量将rdd数据写入到hbase中 + * + * @param rdd + * 类型为HBaseBaseBean子类的rdd + * @param tableName + * hbase表名 + * @tparam T + * 数据类型 + */ + def hadoopPut[T <: HBaseBaseBean[T] : ClassTag](tableName: String, rdd: RDD[T]): Unit = { + requireNonEmpty(tableName, rdd) + + tryWithLog { + rdd.mapPartitions(it => { + val putList = ListBuffer[(ImmutableBytesWritable, Put)]() + it.foreach(t => { + putList += Tuple2(new ImmutableBytesWritable(), HBaseConnector(keyNum = this.keyNum).convert2Put[T](t, HBaseConnector(keyNum = this.keyNum).getNullable[T])) + }) + putList.iterator + }).saveAsNewAPIHadoopDataset(this.getConfiguration(tableName)) + }(this.logger, s"execute hadoopPut(tableName: ${tableName}) success. keyNum: ${keyNum}") + } + + /** + * 使用spark API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hadoopPutDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, dataFrame: DataFrame, clazz: Class[E]): Unit = { + requireNonEmpty(tableName, dataFrame, clazz) + + val rdd = dataFrame.rdd.mapPartitions(it => SparkUtils.sparkRowToBean(it, clazz)) + this.hadoopPut[E](tableName, rdd) + } + + /** + * 使用spark API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param dataset + * JavaBean类型,待插入到hbase的数据集 + */ + def hadoopPutDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, dataset: Dataset[E]): Unit = { + requireNonEmpty(tableName, dataset)("参数不合法:dataset不能为空") + this.hadoopPut[E](tableName, dataset.rdd) + } + + /** + * 以spark 方式批量将DataFrame数据写入到hbase中 + * 注:此方法与hbaseHadoopPutDF不同之处在于,它不强制要求该DataFrame一定要与HBaseBaseBean的子类对应 + * 但需要指定rowKey的构建规则,相对与hbaseHadoopPutDF来说,少了中间的两次转换,性能会更高 + * + * @param df + * spark的DataFrame + * @param tableName + * hbase表名 + * @tparam T + * JavaBean类型 + */ + def hadoopPutDFRow[T <: HBaseBaseBean[T] : ClassTag](tableName: String, df: DataFrame, buildRowKey: (Row) => String): Unit = { + requireNonEmpty(tableName, df) + val insertEmpty = HBaseConnector(keyNum = this.keyNum).getNullable[T] + tryWithLog { + val fields = df.schema.fields + df.rdd.mapPartitions(it => { + var count = 0 + val putList = ListBuffer[(ImmutableBytesWritable, Put)]() + it.foreach(row => { + val put = new Put(Bytes.toBytes(buildRowKey(row))) + fields.foreach(field => { + val fieldName = field.name + val fieldIndex = row.fieldIndex(fieldName) + val dataType = field.dataType.getClass.getSimpleName + var fieldValue: Any = null + if (!row.isNullAt(fieldIndex)) { + fieldValue = row.get(fieldIndex) + if (dataType.contains("StringType")) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), Bytes.toBytes(fieldValue.asInstanceOf[java.lang.String])) + } else if (dataType.contains("IntegerType")) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), Bytes.toBytes(fieldValue.asInstanceOf[java.lang.Integer])) + } else if (dataType.contains("DoubleType")) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), Bytes.toBytes(fieldValue.asInstanceOf[java.lang.Double])) + } else if (dataType.contains("LongType")) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), Bytes.toBytes(fieldValue.asInstanceOf[java.lang.Long])) + } else if (dataType.contains("DecimalType")) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), Bytes.toBytes(fieldValue.asInstanceOf[java.math.BigDecimal])) + } else if (dataType.contains("FloatType")) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), Bytes.toBytes(fieldValue.asInstanceOf[java.lang.Float])) + } else if (dataType.contains("BooleanType")) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), Bytes.toBytes(fieldValue.asInstanceOf[java.lang.Boolean])) + } else if (dataType.contains("ShortType")) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), Bytes.toBytes(fieldValue.asInstanceOf[java.lang.Short])) + } else if (dataType.contains("NullType") && insertEmpty) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), null) + } + } else if (insertEmpty) { + put.addColumn(Bytes.toBytes("info"), Bytes.toBytes(fieldName), null) + } + }) + putList += Tuple2(new ImmutableBytesWritable, put) + count += putList.size + }) + putList.iterator + }).saveAsNewAPIHadoopDataset(this.getConfiguration(tableName)) + }(this.logger, s"execute hadoopPut(tableName: ${tableName}) success. keyNum: ${keyNum}") + } + + /** + * 根据表名构建hadoop configuration + * + * @param tableName + * HBase表名 + * @return + * hadoop configuration + */ + @Internal + private[this] def getConfiguration(tableName: String): Configuration = { + requireNonEmpty(tableName) + + if (!this.tableConfMap.containsKey(tableName)) { + val hadoopConfiguration = this.config + hadoopConfiguration.set(TableOutputFormat.OUTPUT_TABLE, tableName) + val job = Job.getInstance(hadoopConfiguration) + job.setOutputKeyClass(classOf[ImmutableBytesWritable]) + job.setOutputValueClass(classOf[Result]) + job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]]) + this.tableConfMap.put(tableName, job.getConfiguration) + } + + this.tableConfMap.get(tableName) + } +} + +/** + * 用于单例构建伴生类HBaseContextExt的实例对象 + * 每个HBaseContextExt实例使用keyNum作为标识,并且与每个HBase集群一一对应 + */ +private[fire] object HBaseBulkConnector extends ConnectorFactory[HBaseBulkConnector] with HBaseBulkFunctions { + + /** + * 创建指定集群标识的HBaseContextExt对象实例 + */ + override protected def create(conf: Any = null, keyNum: Int = 1): HBaseBulkConnector = { + val hadoopConf = if (conf != null) conf.asInstanceOf[Configuration] else HBaseConnector.getConfiguration(keyNum) + val connector = new HBaseBulkConnector(SparkSingletonFactory.getSparkSession.sparkContext, hadoopConf, keyNum) + connector + } + +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/connector/HBaseBulkFunctions.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/connector/HBaseBulkFunctions.scala new file mode 100644 index 0000000..69bfd7f --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/connector/HBaseBulkFunctions.scala @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.connector + +import com.zto.fire.hbase.bean.HBaseBaseBean +import org.apache.hadoop.hbase.client.Scan +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Dataset, Row} +import org.apache.spark.streaming.dstream.DStream + +import scala.reflect.ClassTag + +/** + * HBase Bulk api库 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 15:46 + */ +trait HBaseBulkFunctions { + /** + * 根据RDD[String]批量删除,rdd是rowkey的集合 + * 类型为String + * + * @param rdd + * 类型为String的RDD数据集 + * @param tableName + * HBase表名 + */ + def bulkDeleteRDD(tableName: String, rdd: RDD[String], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).bulkDeleteRDD(tableName, rdd) + } + + /** + * 根据Dataset[String]批量删除,Dataset是rowkey的集合 + * 类型为String + * + * @param dataset + * 类型为String的Dataset集合 + * @param tableName + * HBase表名 + */ + def bulkDeleteDS(tableName: String, dataset: Dataset[String], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).bulkDeleteDS(tableName, dataset) + } + + /** + * 指定rowkey集合,进行批量删除操作内部会将这个集合转为RDD + * 推荐在较大量数据时使用,小数据量的删除操作仍推荐使用HBaseConnector + * + * @param tableName + * HBase表名 + * @param seq + * 待删除的rowKey集合 + */ + def bulkDeleteList(tableName: String, seq: Seq[String], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).bulkDeleteList(tableName, seq) + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param rdd + * rowKey集合,类型为RDD[String] + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def bulkGetRDD[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rdd: RDD[String], clazz: Class[E], keyNum: Int = 1): RDD[E] = { + HBaseBulkConnector(keyNum = keyNum).bulkGetRDD[E](tableName, rdd, clazz) + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param rdd + * rowKey集合,类型为RDD[String] + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def bulkGetDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rdd: RDD[String], clazz: Class[E], keyNum: Int = 1): DataFrame = { + HBaseBulkConnector(keyNum = keyNum).bulkGetDF[E](tableName, rdd, clazz) + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param rdd + * rowKey集合,类型为RDD[String] + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def bulkGetDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rdd: RDD[String], clazz: Class[E], keyNum: Int = 1): Dataset[E] = { + HBaseBulkConnector(keyNum = keyNum).bulkGetDS[E](tableName, rdd, clazz) + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * 内部实现是将rowkey集合转为RDD[String],推荐在数据量较大 + * 时使用。数据量较小请优先使用HBaseConnector + * + * @param tableName + * HBase表名 + * @param clazz + * 具体类型 + * @param seq + * rowKey集合 + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def bulkGetSeq[E <: HBaseBaseBean[E] : ClassTag](tableName: String, seq: Seq[String], clazz: Class[E], keyNum: Int = 1): RDD[E] = { + HBaseBulkConnector(keyNum = keyNum).bulkGetSeq[E](tableName, seq, clazz) + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @param rdd + * 数据集合,数类型需继承自HBaseBaseBean + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def bulkPutRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, rdd: RDD[T], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).bulkPutRDD[T](tableName, rdd) + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入。如果数据量 + * 较大,推荐使用。数据量过小则推荐使用HBaseConnector + * + * @param tableName + * HBase表名 + * @param seq + * 数据集,类型为HBaseBaseBean的子类 + * @tparam T + * 对象类型必须是HBaseBaseBean的子类 + */ + def bulkPutSeq[T <: HBaseBaseBean[T] : ClassTag](tableName: String, seq: Seq[T], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).bulkPutSeq[T](tableName, seq) + } + + /** + * 定制化scan设置后从指定的表中scan数据 + * 并将scan到的结果集映射为自定义JavaBean对象 + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * @param clazz + * 自定义JavaBean的Class对象 + * @tparam T + * 对象类型必须是HBaseBaseBean的子类 + * @return + * scan获取到的结果集,类型为RDD[T] + */ + def bulkScanRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): RDD[T] = { + HBaseBulkConnector(keyNum = keyNum).bulkScanRDD[T](tableName, clazz, scan) + } + + /** + * 指定startRow和stopRow后自动创建scan对象完成数据扫描 + * 并将scan到的结果集映射为自定义JavaBean对象 + * + * @param tableName + * HBase表名 + * @param startRow + * rowkey的起始 + * @param stopRow + * rowkey的结束 + * @param clazz + * 自定义JavaBean的Class对象 + * @tparam T + * 对象类型必须是HBaseBaseBean的子类 + * @return + * scan获取到的结果集,类型为RDD[T] + */ + def bulkScanRDD2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): RDD[T] = { + HBaseBulkConnector(keyNum = keyNum).bulkScanRDD2[T](tableName, clazz, startRow, stopRow) + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @param dataFrame + * dataFrame实例,数类型需继承自HBaseBaseBean + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def bulkPutDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dataFrame: DataFrame, clazz: Class[T], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).bulkPutDF[T](tableName, dataFrame, clazz) + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @param dataset + * dataFrame实例,数类型需继承自HBaseBaseBean + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def bulkPutDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dataset: Dataset[T], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).bulkPutDS[T](tableName, dataset) + } + + /** + * 用于已经映射为指定类型的DStream实时 + * 批量写入至HBase表中 + * + * @param tableName + * HBase表名 + * @param dstream + * 类型为自定义JavaBean的DStream流 + * @tparam T + * 对象类型必须是HBaseBaseBean的子类 + */ + def bulkPutStream[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dstream: DStream[T], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).bulkPutStream[T](tableName, dstream) + } + + /** + * 以spark 方式批量将rdd数据写入到hbase中 + * + * @param rdd + * 类型为HBaseBaseBean子类的rdd + * @param tableName + * hbase表名 + * @tparam T + * 数据类型 + */ + def hadoopPut[T <: HBaseBaseBean[T] : ClassTag](tableName: String, rdd: RDD[T], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).hadoopPut[T](tableName, rdd) + } + + /** + * 使用spark API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hadoopPutDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, dataFrame: DataFrame, clazz: Class[E], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).hadoopPutDF[E](tableName, dataFrame, clazz) + } + + /** + * 使用spark API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param dataset + * JavaBean类型,待插入到hbase的数据集 + */ + def hadoopPutDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, dataset: Dataset[E], keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).hadoopPutDS[E](tableName, dataset) + } + + /** + * 以spark 方式批量将DataFrame数据写入到hbase中 + * 注:此方法与hbaseHadoopPutDF不同之处在于,它不强制要求该DataFrame一定要与HBaseBaseBean的子类对应 + * 但需要指定rowKey的构建规则,相对与hbaseHadoopPutDF来说,少了中间的两次转换,性能会更高 + * + * @param df + * spark的DataFrame + * @param tableName + * hbase表名 + * @tparam T + * JavaBean类型 + */ + def hadoopPutDFRow[T <: HBaseBaseBean[T] : ClassTag](tableName: String, df: DataFrame, buildRowKey: (Row) => String, keyNum: Int = 1): Unit = { + HBaseBulkConnector(keyNum = keyNum).hadoopPutDFRow[T](tableName, df, buildRowKey) + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/connector/HBaseSparkBridge.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/connector/HBaseSparkBridge.scala new file mode 100644 index 0000000..2113ee7 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/connector/HBaseSparkBridge.scala @@ -0,0 +1,563 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.connector + +import java.nio.charset.StandardCharsets + +import com.zto.fire.core.connector.{ConnectorFactory, FireConnector} +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.hbase.conf.FireHBaseConf +import com.zto.fire.hbase.utils.HBaseUtils +import com.zto.fire.predef._ +import com.zto.fire.spark.util.{SparkSingletonFactory, SparkUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.hadoop.hbase.client.{Get, Result, Scan} +import org.apache.hadoop.hbase.io.ImmutableBytesWritable +import org.apache.hadoop.hbase.mapreduce.TableInputFormat +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.storage.StorageLevel + +import scala.collection.mutable.ListBuffer +import scala.reflect.ClassTag + +/** + * HBase-Spark桥,为Spark提供了使用Java API操作HBase的方式 + * + * @author ChengLong 2019-5-10 14:39:39 + */ +class HBaseSparkBridge(keyNum: Int = 1) extends FireConnector(keyNum = keyNum) { + private[this] lazy val spark = SparkSingletonFactory.getSparkSession + def batchSize: Int = FireHBaseConf.hbaseBatchSize() + + /** + * 使用Java API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param df + * DataFrame + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hbasePutDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, clazz: Class[E], df: DataFrame): Unit = { + df.mapPartitions(row => SparkUtils.sparkRowToBean(row, clazz))(Encoders.bean(clazz)).foreachPartition((it: Iterator[E]) => { + this.multiBatchInsert(tableName, it) + }) + } + + /** + * 使用Java API的方式将Dataset中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param ds + * DataSet[E]的具体类型必须为HBaseBaseBean的子类 + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hbasePutDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, clazz: Class[E], ds: Dataset[E]): Unit = { + ds.foreachPartition((it: Iterator[E]) => { + this.multiBatchInsert(tableName, it) + }) + } + + /** + * 使用Java API的方式将RDD中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + */ + def hbasePutRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, rdd: RDD[T]): Unit = { + rdd.foreachPartition(it => { + this.multiBatchInsert(tableName, it) + }) + } + + /** + * Scan指定HBase表的数据,并映射为DataFrame + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan): DataFrame = { + val beanRDD = this.hbaseScanRDD(tableName, clazz, scan) + // 将rdd转为DataFrame + this.spark.createDataFrame(beanRDD, clazz) + } + + /** + * Scan指定HBase表的数据,并映射为Dataset + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan): Dataset[T] = { + val beanRDD = this.hbaseScanRDD(tableName, clazz, scan) + // 将rdd转为DataFrame + spark.createDataset(beanRDD)(Encoders.bean(clazz)) + } + + /** + * Scan指定HBase表的数据,并映射为Dataset + * + * @param tableName + * HBase表名 + * @param startRow + * 开始主键 + * @param stopRow 结束主键 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanDS2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String): Dataset[T] = { + this.hbaseScanDS[T](tableName, clazz, HBaseConnector.buildScan(startRow, stopRow)) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * 目标类型 + * @return + */ + def hbaseHadoopScanRS(tableName: String, scan: Scan): RDD[(ImmutableBytesWritable, Result)] = { + val hbaseConf = HBaseConnector(keyNum = this.keyNum).getConfiguration + hbaseConf.set(TableInputFormat.INPUT_TABLE, tableName) + hbaseConf.set(TableInputFormat.SCAN, HBaseUtils.convertScanToString(scan)) + // 将指定范围内的hbase数据转为rdd + val resultRDD = this.spark.sparkContext.newAPIHadoopRDD(hbaseConf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result]).repartition(FireHBaseConf.hbaseHadoopScanPartitions).persist(StorageLevel.fromString(FireHBaseConf.hbaseStorageLevel)) + resultRDD + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseHadoopScanRS2(tableName: String, startRow: String, stopRow: String): RDD[(ImmutableBytesWritable, Result)] = { + this.hbaseHadoopScanRS(tableName, HBaseConnector.buildScan(startRow, stopRow)) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(T] + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * 目标类型 + * @return + */ + def hbaseHadoopScanRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan): RDD[T] = { + val rdd = this.hbaseHadoopScanRS(tableName, scan) + rdd.mapPartitions(it => HBaseConnector(keyNum = keyNum).hbaseRow2BeanList(it, clazz)) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[T] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseHadoopScanRDD2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String): RDD[T] = { + this.hbaseHadoopScanRDD[T](tableName, clazz, HBaseConnector.buildScan(startRow, stopRow)) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(T] + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * 目标类型 + * @return + */ + def hbaseHadoopScanDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan): DataFrame = { + val rdd = this.hbaseHadoopScanRDD[T](tableName, clazz, scan) + this.spark.createDataFrame(rdd, clazz) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseHadoopScanDF2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String): DataFrame = { + this.hbaseHadoopScanDF[T](tableName, clazz, HBaseConnector.buildScan(startRow, stopRow)) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(T] + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * 目标类型 + * @return + */ + def hbaseHadoopScanDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan): Dataset[T] = { + val rdd = this.hbaseHadoopScanRDD[T](tableName, clazz, scan) + this.spark.createDataset(rdd)(Encoders.bean(clazz)) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseHadoopScanDS2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String): Dataset[T] = { + this.hbaseHadoopScanDS[T](tableName, clazz, HBaseConnector.buildScan(startRow, stopRow)) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseScanDF2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String): DataFrame = { + this.hbaseScanDF(tableName, clazz, HBaseConnector.buildScan(startRow, stopRow)) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param scan + * HBase scan对象 + * @return + */ + def hbaseScanRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan): RDD[T] = { + val hbaseRDD = this.hbaseHadoopScanRS(tableName, scan) + val scanRDD = hbaseRDD.mapPartitions(it => { + if (HBaseConnector(keyNum = this.keyNum).getMultiVersion[T]) { + HBaseConnector(keyNum = keyNum).hbaseMultiVersionRow2BeanList[T](it, clazz) + } else { + HBaseConnector(keyNum = keyNum).hbaseRow2BeanList(it, clazz) + } + }).persist(StorageLevel.fromString(FireHBaseConf.hbaseStorageLevel)) + scanRDD + } + + /** + * Scan指定HBase表的数据,并映射为List + * + * @param tableName + * HBase表名 + * @param scan + * hbase scan对象 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanList[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan): Seq[T] = { + HBaseConnector(keyNum = this.keyNum).scan(tableName, clazz, scan) + } + + /** + * Scan指定HBase表的数据,并映射为List + * + * @param tableName + * HBase表名 + * @param startRow + * 开始主键 + * @param stopRow 结束主键 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanList2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String): Seq[T] = { + this.hbaseScanList[T](tableName, clazz, HBaseConnector.buildScan(startRow, stopRow)) + } + + /** + * 通过RDD[String]批量获取对应的数据(可获取历史版本的记录) + * + * @param rowKeyRDD + * rdd中存放了待查询的rowKey集合 + * @param tableName + * HBase表名 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseGetRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], rowKeyRDD: RDD[String]): RDD[T] = { + val getRDD = rowKeyRDD.mapPartitions(it => { + val beanList = ListBuffer[T]() + val getList = ListBuffer[Get]() + it.foreach(rowKey => { + if (StringUtils.isNotBlank(rowKey)) { + val get = new Get(rowKey.getBytes(StandardCharsets.UTF_8)) + getList += get + if (getList.size >= this.batchSize) { + beanList ++= HBaseConnector(keyNum = this.keyNum).get(tableName, clazz, getList: _*) + getList.clear() + } + } + }) + + if (getList.nonEmpty) { + beanList ++= HBaseConnector(keyNum = this.keyNum).get(tableName, clazz, getList: _*) + getList.clear() + } + beanList.iterator + }).persist(StorageLevel.fromString(FireHBaseConf.hbaseStorageLevel)) + getRDD + } + + /** + * 通过RDD[String]批量获取对应的数据(可获取历史版本的记录) + * + * @param rowKeyRDD + * rdd中存放了待查询的rowKey集合 + * @param tableName + * HBase表名 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseGetDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], rowKeyRDD: RDD[String]): DataFrame = { + this.spark.createDataFrame(hbaseGetRDD(tableName, clazz, rowKeyRDD), clazz) + } + + /** + * 通过RDD[String]批量获取对应的数据(可获取历史版本的记录) + * + * @param rowKeyRDD + * rdd中存放了待查询的rowKey集合 + * @param tableName + * HBase表名 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseGetDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], rowKeyRDD: RDD[String]): Dataset[T] = { + this.spark.createDataset(hbaseGetRDD(tableName, clazz, rowKeyRDD))(Encoders.bean(clazz)) + } + + /** + * 使用hbase java api方式插入一个集合的数据到hbase表中 + * + * @param tableName + * hbase表名 + * @param seq + * HBaseBaseBean的子类集合 + */ + def hbasePutList[T <: HBaseBaseBean[T] : ClassTag](tableName: String, seq: Seq[T]): Unit = { + HBaseConnector(keyNum = this.keyNum).insert[T](tableName, seq: _*) + } + + /** + * 根据rowKey查询数据,并转为List[T] + * + * @param tableName + * hbase表名 + * @param seq + * rowKey集合 + * @param clazz + * 目标类型 + * get的版本数 + * @return + * List[T] + */ + def hbaseGetList[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], seq: Seq[Get]): Seq[T] = { + HBaseConnector(keyNum = this.keyNum).get[T](tableName, clazz, seq: _*) + } + + /** + * 根据rowKey查询数据,并转为List[T] + * + * @param tableName + * hbase表名 + * @param seq + * rowKey集合 + * @param clazz + * 目标类型 + * @return + * List[T] + */ + def hbaseGetList2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], seq: Seq[String]): Seq[T] = { + val getList = ListBuffer[Get]() + seq.filter(StringUtils.isNotBlank).foreach(rowKey => { + getList += new Get(rowKey.getBytes(StandardCharsets.UTF_8)) + }) + + this.hbaseGetList[T](tableName, clazz, getList) + } + + /** + * 根据rowKey集合批量删除记录 + * + * @param tableName + * hbase表名 + * @param rowKeys + * rowKey集合 + */ + def hbaseDeleteList(tableName: String, rowKeys: Seq[String]): Unit = { + HBaseConnector(keyNum = this.keyNum).deleteRows(tableName, rowKeys: _*) + } + + /** + * 根据RDD[RowKey]批量删除记录 + * + * @param tableName + * hbase表名 + * @param rowKeyRDD + * rowKey集合 + */ + def hbaseDeleteRDD(tableName: String, rowKeyRDD: RDD[String]): Unit = { + rowKeyRDD.foreachPartition(it => { + val rowKeyList = ListBuffer[String]() + var count = 0 + it.foreach(rowKey => { + if (StringUtils.isNotBlank(rowKey)) { + rowKeyList += rowKey + count += rowKeyList.size + } + if (rowKeyList.size >= batchSize) { + HBaseConnector(keyNum = this.keyNum).deleteRows(tableName, rowKeyList: _*) + rowKeyList.clear() + } + }) + if (rowKeyList.nonEmpty) { + HBaseConnector(keyNum = this.keyNum).deleteRows(tableName, rowKeyList: _*) + rowKeyList.clear() + } + }) + } + + /** + * 根据Dataset[RowKey]批量删除记录 + * + * @param tableName + * hbase表名 + * @param dataSet + * rowKey集合 + */ + def hbaseDeleteDS(tableName: String, dataSet: Dataset[String]): Unit = { + this.hbaseDeleteRDD(tableName, dataSet.rdd) + } + + /** + * 按照指定的批次大小分多个批次插入数据到hbase中 + * + * @param tableName + * hbase表名 + * @param iterator + * 数据集迭代器 + */ + private def multiBatchInsert[E <: HBaseBaseBean[E] : ClassTag](tableName: String, iterator: Iterator[E]): Unit = { + var count = 0 + val list = ListBuffer[E]() + iterator.foreach(bean => { + list += bean + if (list.size >= batchSize) { + HBaseConnector(keyNum = this.keyNum).insert[E](tableName, list: _*) + count += list.size + list.clear() + } + }) + if (list.nonEmpty) HBaseConnector(keyNum = this.keyNum).insert[E](tableName, list: _*) + count += list.size + list.clear() + } +} + +/** + * 用于单例构建伴生类HBaseSparkBridge的实例对象 + * 每个HBaseSparkBridge实例使用keyNum作为标识,并且与每个HBase集群一一对应 + */ +object HBaseSparkBridge extends ConnectorFactory[HBaseSparkBridge] { + + /** + * 约定创建connector子类实例的方法 + */ + override protected def create(conf: Any = null, keyNum: Int = 1): HBaseSparkBridge = { + requireNonEmpty(keyNum) + val connector = new HBaseSparkBridge(keyNum) + logger.debug(s"创建HBaseSparkBridge实例成功. keyNum=$keyNum") + connector + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/DStreamExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/DStreamExt.scala new file mode 100644 index 0000000..13675f6 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/DStreamExt.scala @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.core + +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.spark.connector.HBaseBulkConnector +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.apache.rocketmq.common.message.MessageExt +import org.apache.spark.storage.StorageLevel +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.streaming.kafka010.{CanCommitOffsets, HasOffsetRanges} + +import scala.reflect._ + +/** + * DStream扩展 + * + * @param stream + * stream对象 + * @author ChengLong 2019-5-18 11:06:56 + */ +class DStreamExt[T: ClassTag](stream: DStream[T]) { + + /** + * DStrea数据实时写入 + * + * @param tableName + * HBase表名 + */ + def hbaseBulkPutStream[T <: HBaseBaseBean[T] : ClassTag](tableName: String, keyNum: Int = 1): Unit = { + HBaseBulkConnector.bulkPutStream(tableName, stream.asInstanceOf[DStream[T]], keyNum) + } + + /** + * 清空RDD的缓存 + */ + def uncache: Unit = { + stream.persist(StorageLevel.NONE) + } + + /** + * 维护kafka的offset + */ + def kafkaCommitOffsets[T <: ConsumerRecord[String, String]]: Unit = { + stream.asInstanceOf[DStream[T]].foreachRDD { rdd => + try { + val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges + stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges) + } catch { + case e: Exception => e.printStackTrace() + } + } + } + + /** + * 维护RocketMQ的offset + */ + def rocketCommitOffsets[T <: MessageExt]: Unit = { + stream.asInstanceOf[DStream[T]].foreachRDD { rdd => + if (!rdd.isEmpty()) { + try { + val offsetRanges = rdd.asInstanceOf[org.apache.rocketmq.spark.HasOffsetRanges].offsetRanges + stream.asInstanceOf[org.apache.rocketmq.spark.CanCommitOffsets].commitAsync(offsetRanges) + } catch { + case e: Exception => e.printStackTrace() + } + } + } + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/DataFrameExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/DataFrameExt.scala new file mode 100644 index 0000000..c185c1c --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/DataFrameExt.scala @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.core + +import java.util.Properties + +import com.zto.fire.common.util.ValueUtils +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.jdbc.JdbcConnector +import com.zto.fire.jdbc.conf.FireJdbcConf +import com.zto.fire.jdbc.util.DBUtils +import com.zto.fire.spark.conf.FireSparkConf +import com.zto.fire.spark.connector.{HBaseBulkConnector, HBaseSparkBridge} +import com.zto.fire.spark.util.SparkUtils +import org.apache.commons.lang3.StringUtils +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.storage.StorageLevel +import org.slf4j.LoggerFactory + +import scala.collection.mutable.ListBuffer +import scala.reflect._ + +/** + * DataFrame扩展 + * + * @param dataFrame + * dataFrame实例 + */ +class DataFrameExt(dataFrame: DataFrame) { + // 获取单例的HBaseContext对象 + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 注册为临时表的同时缓存表 + * + * @param tmpTableName + * 临时表名 + * @param storageLevel + * 指定存储级别 + * @return + * 生成的DataFrame + */ + def createOrReplaceTempViewCache(tmpTableName: String, storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER): DataFrame = { + if (StringUtils.isNotBlank(tmpTableName)) { + dataFrame.createOrReplaceTempView(tmpTableName) + this.dataFrame.sparkSession.catalog.cacheTable(tmpTableName, storageLevel) + } + dataFrame + } + + /** + * 保存Hive表 + * + * @param saveMode + * 保存模式,默认为Overwrite + * @param partitionName + * 分区字段 + * @param tableName + * 表名 + * @return + * 生成的DataFrame + */ + def saveAsHiveTable(tableName: String, partitionName: String, saveMode: SaveMode = SaveMode.valueOf(FireSparkConf.saveMode)): DataFrame = { + if (StringUtils.isNotBlank(tableName)) { + if (StringUtils.isNotBlank(partitionName)) { + dataFrame.write.mode(saveMode).partitionBy(partitionName).saveAsTable(tableName) + } else { + dataFrame.write.mode(saveMode).saveAsTable(tableName) + } + } + dataFrame + } + + /** + * 将DataFrame数据保存到关系型数据库中 + * + * @param tableName + * 关系型数据库表名 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + */ + def jdbcTableSave(tableName: String, saveMode: SaveMode = SaveMode.Append, jdbcProps: Properties = null, keyNum: Int = 1): Unit = { + dataFrame.write.mode(saveMode).jdbc(FireJdbcConf.jdbcUrl(keyNum), tableName, DBUtils.getJdbcProps(jdbcProps, keyNum)) + } + + /** + * 将DataFrame中指定的列写入到jdbc中 + * 调用者需自己保证DataFrame中的列类型与关系型数据库对应字段类型一致 + * + * @param sql + * 关系型数据库待执行的增删改sql + * @param fields + * 指定部分DataFrame列名作为参数,顺序要对应sql中问号占位符的顺序 + * 若不指定字段,则默认传入当前DataFrame所有列,且列的顺序与sql中问号占位符顺序一致 + * @param batch + * 每个批次执行多少条 + * @param keyNum + * 对应配置文件中指定的数据源编号 + */ + def jdbcBatchUpdate(sql: String, fields: Seq[String] = null, batch: Int = FireJdbcConf.batchSize(), keyNum: Int = 1): Unit = { + if (ValueUtils.isEmpty(sql)) { + logger.error("执行jdbcBatchUpdate失败,sql语句不能为空") + return + } + + if (dataFrame.isStreaming) { + // 如果是streaming流 + dataFrame.writeStream.format("fire-jdbc") + .option("checkpointLocation", FireSparkConf.chkPointDirPrefix) + .option("sql", sql) + .option("batch", batch) + .option("keyNum", keyNum) + .option("fields", if (fields != null) fields.mkString(",") else "") + .start() + } else { + // 非structured streaming调用 + dataFrame.foreachPartition((it: Iterator[Row]) => { + var count: Int = 0 + val list = ListBuffer[ListBuffer[Any]]() + var params: ListBuffer[Any] = null + + it.foreach(row => { + count += 1 + params = ListBuffer[Any]() + if (ValueUtils.noEmpty(fields)) { + // 若调用者指定了某些列,则取这些列的数据 + fields.foreach(field => { + val index = row.fieldIndex(field) + params += row.get(index) + }) + } else { + // 否则取当前DataFrame全部的列,顺序要与sql问号占位符保持一致 + (0 until row.size).foreach(index => { + params += row.get(index) + }) + } + list += params + + // 分批次执行 + if (count == batch) { + JdbcConnector.executeBatch(sql, list, keyNum = keyNum) + count = 0 + list.clear() + } + }) + + // 将剩余的数据一次执行掉 + if (list.nonEmpty) { + JdbcConnector.executeBatch(sql, list, keyNum = keyNum) + list.clear() + } + }) + } + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def hbaseBulkPutDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], keyNum: Int = 1): Unit = { + HBaseBulkConnector.bulkPutDF[T](tableName, dataFrame, clazz, keyNum) + } + + /** + * 以spark 方式批量将DataFrame数据写入到hbase中 + * 注:此方法与hbaseHadoopPutDF不同之处在于,它不强制要求该DataFrame一定要与HBaseBaseBean的子类对应 + * 但需要指定rowKey的构建规则,相对与hbaseHadoopPutDF来说,少了中间的两次转换,性能会更高 + * + * @param tableName + * hbase表名 + * @tparam T + * JavaBean类型 + */ + def hbaseHadoopPutDFRow[T <: HBaseBaseBean[T] : ClassTag](tableName: String, buildRowKey: (Row) => String, keyNum: Int = 1): Unit = { + HBaseBulkConnector.hadoopPutDFRow[T](tableName, dataFrame, buildRowKey, keyNum) + } + + + /** + * 使用spark API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hbaseHadoopPutDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, clazz: Class[E], keyNum: Int = 1): Unit = { + HBaseBulkConnector.hadoopPutDF[E](tableName, dataFrame, clazz, keyNum) + } + + /** + * 使用Java API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hbasePutDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, clazz: Class[E], keyNum: Int = 1): Unit = { + HBaseSparkBridge(keyNum = keyNum).hbasePutDF(tableName, clazz, this.dataFrame) + } + + /** + * 将DataFrame注册为临时表,并缓存表 + * + * @param tableName + * 临时表名 + */ + def dataFrameRegisterAndCache(tableName: String): Unit = { + if (StringUtils.isBlank(tableName)) throw new IllegalArgumentException("临时表名不能为空") + dataFrame.createOrReplaceTempView(tableName) + dataFrame.sqlContext.cacheTable(tableName) + } + + /** + * 将DataFrame映射为指定JavaBean类型的RDD + * + * @param clazz + * @return + */ + def toRDD[E <: Object : ClassTag](clazz: Class[E], toUppercase: Boolean = false): RDD[E] = { + this.dataFrame.rdd.mapPartitions(it => SparkUtils.sparkRowToBean(it, clazz, toUppercase)) + } + + /** + * 将DataFrame的schema转为小写 + * + * @return + */ + def toLowerDF: DataFrame = { + this.dataFrame.selectExpr(SparkUtils.schemaToLowerCase(this.dataFrame.schema): _*) + } + + /** + * 清空RDD的缓存 + */ + def uncache: Unit = { + dataFrame.unpersist() + } + + /** + * 将实时流转为静态DataFrame + * + * @return + * 静态DataFrame + */ + def toExternalRow: DataFrame = { + if (this.dataFrame.isStreaming) SparkUtils.toExternalRow(dataFrame) else this.dataFrame + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/DatasetExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/DatasetExt.scala new file mode 100644 index 0000000..13bd654 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/DatasetExt.scala @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.core + +import com.zto.fire._ +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.spark.conf.FireSparkConf +import com.zto.fire.spark.connector.{HBaseBulkConnector, HBaseSparkBridge} +import com.zto.fire.spark.util.SparkUtils +import org.apache.spark.sql._ +import org.apache.spark.sql.streaming.Trigger +import org.slf4j.LoggerFactory + +import scala.collection.mutable.ListBuffer +import scala.reflect._ + +/** + * Dataset扩展 + * + * @param dataset + * dataset对象 + * @author ChengLong 2019-5-18 11:02:56 + */ +class DatasetExt[T: ClassTag](dataset: Dataset[T]) { + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 用于检查当前Dataset是否为空 + * + * @return + * true: 为空 false:不为空 + */ + def isEmpty: Boolean = dataset.rdd.isEmpty() + + /** + * 用于检查当前Dataset是否不为空 + * + * @return + * true: 不为空 false:为空 + */ + def isNotEmpty: Boolean = !this.isEmpty + + /** + * 打印Dataset的值 + * + * @param lines + * 打印的行数 + * @return + */ + def showString(lines: Int = 1000): String = { + val showLines = if (lines <= 1000) lines else 1000 + val showStringMethod = dataset.getClass.getDeclaredMethod("showString", classOf[Int], classOf[Int], classOf[Boolean]) + showStringMethod.invoke(dataset, Integer.valueOf(showLines), Integer.valueOf(Int.MaxValue), java.lang.Boolean.valueOf(false)).toString + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def hbaseBulkPutDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, keyNum: Int = 1): Unit = { + HBaseBulkConnector.bulkPutDS[T](tableName, dataset.asInstanceOf[Dataset[T]], keyNum) + } + + /** + * 根据Dataset[String]批量删除,Dataset是rowkey的集合 + * 类型为String + * + * @param tableName + * HBase表名 + */ + def hbaseBulkDeleteDS(tableName: String, keyNum: Int = 1): Unit = { + HBaseBulkConnector.bulkDeleteDS(tableName, dataset.asInstanceOf[Dataset[String]], keyNum) + } + + /** + * 根据Dataset[RowKey]批量删除记录 + * + * @param tableName + * rowKey集合 + */ + def hbaseDeleteDS(tableName: String, keyNum: Int = 1): Unit = { + HBaseSparkBridge(keyNum = keyNum).hbaseDeleteDS(tableName, dataset.asInstanceOf[Dataset[String]]) + } + + /** + * 使用spark API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + */ + def hbaseHadoopPutDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, keyNum: Int = 1): Unit = { + HBaseBulkConnector.hadoopPutDS[T](tableName, dataset.asInstanceOf[Dataset[T]], keyNum) + } + + /** + * 使用Java API的方式将Dataset中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hbasePutDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, clazz: Class[E], keyNum: Int = 1): Unit = { + HBaseSparkBridge(keyNum = keyNum).hbasePutDS[E](tableName, clazz, dataset.asInstanceOf[Dataset[E]]) + } + + /** + * 清空RDD的缓存 + */ + def uncache: Unit = { + dataset.unpersist + } + + /** + * 将当前Dataset记录打印到控制台 + */ + def print(outputMode: String = "append", trigger: Trigger = null, numRows: Int = 20, truncate: Boolean = true): Dataset[T] = { + if (dataset.isStreaming) { + val tmpStream = dataset.writeStream.outputMode(outputMode).option("numRows", numRows).option("truncate", truncate).format("console") + if (trigger != null) tmpStream.trigger(trigger) + tmpStream.start + } else { + dataset.show(numRows, truncate) + } + dataset + } + + /** + * 分配次执行指定的业务逻辑 + * + * @param batch + * 多大批次执行一次sinkFun中定义的操作 + * @param mapFun + * 将Row类型映射为E类型的逻辑,并将处理后的数据放到listBuffer中 + * @param sinkFun + * 具体处理逻辑,将数据sink到目标源 + */ + def foreachPartitionBatch[E](mapFun: T => E, sinkFun: ListBuffer[E] => Unit, batch: Int = 1000): Unit = { + SparkUtils.datasetForeachPartitionBatch(this.dataset, mapFun, sinkFun, batch) + } + + /** + * spark datasource write api增强,提供配置文件进行覆盖配置 + * + * @param format + * DataSource中的format + * @param saveMode + * DataSource中的saveMode + * @param saveParam + * save方法的参数,可以是路径或表名:save(path)、saveAsTable(tableName) + * @param isSaveTable + * true:调用saveAsTable(saveParam)方法 false:调用save(saveParam)方法 + * @param options + * DataSource中的options,支持参数传入和配置文件读取,相同的选项配置文件优先级更高 + * @param keyNum + * 用于标识不同DataSource api所对应的配置文件中key的后缀 + */ + def writeEnhance(format: String = "", + saveMode: SaveMode = SaveMode.Append, + saveParam: String = "", + isSaveTable: Boolean = false, + options: Map[String, String] = Map.empty, + keyNum: Int = 1): Unit = { + val finalFormat = if (noEmpty(FireSparkConf.datasourceFormat(keyNum))) FireSparkConf.datasourceFormat(keyNum) else format + val finalSaveMode = if (noEmpty(FireSparkConf.datasourceSaveMode(keyNum))) SaveMode.valueOf(FireSparkConf.datasourceSaveMode(keyNum)) else saveMode + val finalSaveParam = if (noEmpty(FireSparkConf.datasourceSaveParam(keyNum))) FireSparkConf.datasourceSaveParam(keyNum) else saveParam + val finalIsSaveTable = if (noEmpty(FireSparkConf.datasourceIsSaveTable(keyNum))) FireSparkConf.datasourceIsSaveTable(keyNum).toBoolean else isSaveTable + requireNonEmpty(dataset, finalFormat, finalSaveMode, finalSaveParam, finalIsSaveTable) + + this.logger.info(s"--> Spark DataSource write api参数信息(keyNum=$keyNum)<--") + this.logger.info(s"format=${finalFormat} saveMode=${finalSaveMode} save参数=${finalSaveParam} saveToTable=${finalIsSaveTable}") + + val writer = dataset.write.format(finalFormat).options(SparkUtils.optionsEnhance(options, keyNum)).mode(finalSaveMode) + if (!isSaveTable) { + if (com.zto.fire.isEmpty(finalSaveMode)) writer.save() else writer.save(finalSaveParam) + } else writer.saveAsTable(finalSaveParam) + } + +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/RDDExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/RDDExt.scala new file mode 100644 index 0000000..7ac54e9 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/RDDExt.scala @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.core + +import com.zto.fire._ +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.spark.connector.{HBaseBulkConnector, HBaseSparkBridge} +import com.zto.fire.spark.util.{SparkSingletonFactory, SparkUtils} +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.apache.rocketmq.common.message.MessageExt +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.sql.functions.from_json +import org.apache.spark.streaming.dstream.{DStream, InputDStream} +import org.apache.spark.streaming.kafka010.{CanCommitOffsets, HasOffsetRanges} + +import scala.collection.mutable.ListBuffer +import scala.reflect.{ClassTag, classTag} + +/** + * RDD相关扩展 + * + * @author ChengLong 2019-5-18 10:28:31 + */ +class RDDExt[T: ClassTag](rdd: RDD[T]) { + private lazy val spark = SparkSingletonFactory.getSparkSession + + import spark.implicits._ + + /** + * 用于判断rdd是否为空 + * + * @return + * true: 不为空 false:为空 + */ + def isNotEmpty: Boolean = !rdd.isEmpty() + + /** + * 遍历每个partition并打印元素到控制台 + */ + def printEachPartition: Unit = { + rdd.foreachPartition(it => { + it.foreach(item => println(item + " ")) + }) + } + + /** + * 集群模式下打印数据 + */ + def printEachClusterPartition: Unit = { + rdd.collect().foreach(println) + } + + /** + * 将rdd转为DataFrame + */ + def toDF(): DataFrame = { + this.spark.createDataFrame(rdd, classTag[T].runtimeClass) + } + + /** + * 将rdd转为DataFrame并注册成临时表 + * + * @param tableName + * 表名 + * @return + * DataFrame + */ + def createOrReplaceTempView(tableName: String, cache: Boolean = false): DataFrame = { + val dataFrame = this.toDF() + dataFrame.createOrReplaceTempView(tableName) + if (cache) this.spark.cacheTables(tableName) + dataFrame + } + + /** + * 根据RDD[String]批量删除 + * + * @param tableName + * HBase表名 + */ + def hbaseBulkDeleteRDD[T <: String : ClassTag](tableName: String, keyNum: Int = 1): Unit = { + HBaseBulkConnector.bulkDeleteRDD(tableName, rdd.asInstanceOf[RDD[String]], keyNum) + } + + /** + * 根据RDD[RowKey]批量删除记录 + * + * @param tableName + * rowKey集合 + */ + def hbaseDeleteRDD(tableName: String, keyNum: Int = 1): Unit = { + HBaseSparkBridge(keyNum = keyNum).hbaseDeleteRDD(tableName, rdd.asInstanceOf[RDD[String]]) + } + + /** + * 根据rowKey集合批量获取数据 + * + * @param tableName + * HBase表名 + * @param clazz + * 获取后的记录转换为目标类型 + * @return + * 结果集 + */ + def hbaseBulkGetRDD[E <: HBaseBaseBean[E] : ClassTag](tableName: String, clazz: Class[E], keyNum: Int = 1): RDD[E] = { + HBaseBulkConnector.bulkGetRDD(tableName, rdd.asInstanceOf[RDD[String]], clazz, keyNum) + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def hbaseBulkGetDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, clazz: Class[E], keyNum: Int = 1): DataFrame = { + HBaseBulkConnector.bulkGetDF[E](tableName, rdd.asInstanceOf[RDD[String]], clazz, keyNum) + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def hbaseBulkGetDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, clazz: Class[E], keyNum: Int = 1): Dataset[E] = { + HBaseBulkConnector.bulkGetDS[E](tableName, rdd.asInstanceOf[RDD[String]], clazz, keyNum) + } + + /** + * 批量插入数据 + * + * @param tableName + * HBase表名 + * 数据集合,继承自HBaseBaseBean + */ + def hbaseBulkPutRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, keyNum: Int = 1): Unit = { + HBaseBulkConnector.bulkPutRDD(tableName, rdd.asInstanceOf[RDD[T]], keyNum) + } + + /** + * 使用Spark API的方式将RDD中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + */ + def hbaseHadoopPutRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, keyNum: Int = 1): Unit = { + HBaseBulkConnector.hadoopPut(tableName, rdd.asInstanceOf[RDD[T]], keyNum) + } + + /** + * 通过RDD[String]批量获取对应的数据(可获取历史版本的记录) + * + * @param tableName + * HBase表名 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseGetRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], keyNum: Int = 1): RDD[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseGetRDD(tableName, clazz, rdd.asInstanceOf[RDD[String]]) + } + + /** + * 通过RDD[String]批量获取对应的数据(可获取历史版本的记录) + * + * @param tableName + * HBase表名 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseGetDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], keyNum: Int = 1): Dataset[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseGetDS[T](tableName, clazz, rdd.asInstanceOf[RDD[String]]) + } + + /** + * 通过RDD[String]批量获取对应的数据(可获取历史版本的记录) + * + * @param tableName + * HBase表名 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseGetDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], keyNum: Int = 1): DataFrame = { + HBaseSparkBridge(keyNum = keyNum).hbaseGetDF(tableName, clazz, rdd.asInstanceOf[RDD[String]]) + } + + /** + * 使用Java API的方式将RDD中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + */ + def hbasePutRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, keyNum: Int = 1): Unit = { + HBaseSparkBridge(keyNum = keyNum).hbasePutRDD[T](tableName, rdd.asInstanceOf[RDD[T]]) + } + + /** + * 解析DStream中每个rdd的json数据,并转为DataFrame类型 + * + * @param schema + * 目标DataFrame类型的schema + * @param isMySQL + * 是否为mysql解析的消息 + * @param fieldNameUpper + * 字段名称是否为大写 + * @param parseAll + * 是否需要解析所有字段信息 + * @return + */ + def kafkaJson2DFV(schema: Class[_], parseAll: Boolean = false, isMySQL: Boolean = true, fieldNameUpper: Boolean = false): DataFrame = { + val ds = this.spark.createDataset(rdd.asInstanceOf[RDD[String]])(Encoders.STRING) + val df = ds.select(from_json(new ColumnName("value"), SparkUtils.buildSchema2Kafka(schema, parseAll, isMySQL, fieldNameUpper)).as("data")) + if (parseAll) + df.select("data.*") + else + df.select("data.after.*") + } + + /** + * 解析DStream中每个rdd的json数据,并转为DataFrame类型 + * + * @param schema + * 目标DataFrame类型的schema + * @param isMySQL + * 是否为mysql解析的消息 + * @param fieldNameUpper + * 字段名称是否为大写 + * @param parseAll + * 是否解析所有字段信息 + * @return + */ + def kafkaJson2DF(schema: Class[_], parseAll: Boolean = false, isMySQL: Boolean = true, fieldNameUpper: Boolean = false): DataFrame = { + val ds = this.spark.createDataset(rdd.asInstanceOf[RDD[ConsumerRecord[String, String]]].map(t => t.value()))(Encoders.STRING) + val structType = SparkUtils.buildSchema2Kafka(schema, parseAll, isMySQL, fieldNameUpper) + val df = ds.select(from_json(new ColumnName("value"), structType).as("data")) + val tmpDF = if (parseAll) + df.select("data.*") + else + df.select("data.after.*") + if (fieldNameUpper) tmpDF.toLowerDF else tmpDF + } + + /** + * 解析json数据,并注册为临时表 + * + * @param tableName + * 临时表名 + */ + def kafkaJson2Table(tableName: String, cacheTable: Boolean = false): Unit = { + val msgDS = rdd.asInstanceOf[RDD[ConsumerRecord[String, String]]].map(t => t.value()).toDS() + this.spark.read.json(msgDS).toLowerDF.createOrReplaceTempView(tableName) + if (cacheTable) this.spark.cacheTables(tableName) + } + + /** + * 清空RDD的缓存 + */ + def uncache: Unit = { + rdd.unpersist() + } + + /** + * 维护RocketMQ的offset + */ + def kafkaCommitOffsets(stream: DStream[ConsumerRecord[String, String]]): Unit = { + val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges + stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges) + } + + /** + * 维护RocketMQ的offset + */ + def rocketCommitOffsets(stream: InputDStream[MessageExt]): Unit = { + val offsetRanges = rdd.asInstanceOf[org.apache.rocketmq.spark.HasOffsetRanges].offsetRanges + stream.asInstanceOf[org.apache.rocketmq.spark.CanCommitOffsets].commitAsync(offsetRanges) + } + + /** + * 分配次执行指定的业务逻辑 + * + * @param batch + * 多大批次执行一次sinkFun中定义的操作 + * @param mapFun + * 将Row类型映射为E类型的逻辑,并将处理后的数据放到listBuffer中 + * @param sinkFun + * 具体处理逻辑,将数据sink到目标源 + */ + def foreachPartitionBatch[E](mapFun: T => E, sinkFun: ListBuffer[E] => Unit, batch: Int = 1000): Unit = { + SparkUtils.rddForeachPartitionBatch(this.rdd, mapFun, sinkFun, batch) + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SQLContextExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SQLContextExt.scala new file mode 100644 index 0000000..bc37740 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SQLContextExt.scala @@ -0,0 +1,476 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.core + +import java.util.Properties + +import com.zto.fire._ +import com.zto.fire.common.conf.{FireHiveConf, FireKuduConf} +import com.zto.fire.jdbc.conf.FireJdbcConf +import com.zto.fire.jdbc.util.DBUtils +import com.zto.fire.spark.conf.FireSparkConf +import com.zto.fire.spark.ext.module.KuduContextExt +import com.zto.fire.spark.util.{KuduUtils, SparkSingletonFactory} +import org.apache.commons.lang3.StringUtils +import org.apache.kudu.spark.kudu._ +import org.apache.spark.sql.{DataFrame, SQLContext, SaveMode} + +/** + * SQLContext与HiveContext扩展 + * + * @param sqlContext + * sqlContext对象 + * @author ChengLong 2019-5-18 10:52:00 + */ +class SQLContextExt(sqlContext: SQLContext) { + + /** + * 获取KuduContext实例 + * + * @return + * kuduContext的扩展实例 + */ + def createKuduContext: KuduContextExt = { + SparkSingletonFactory.getKuduContextInstance(sqlContext.sparkContext) + } + + /** + * 判断给定的表是否存在 + * + * @param tableName + * 表名 + * @return + * 存在、不存在 + */ + def tmpTableExists(tableName: String): Boolean = { + val count = sqlContext.tables().where("tableName='zto_sign_new_kudu' and isTemporary=true").count() + if (count == 1) true else false + } + + /** + * 加载kudu表转为DataFrame + * + * @param map + * map集合 + * @return + * DataFrame + */ + def loadKuduTable(map: Map[String, String]): DataFrame = { + sqlContext.read.options(map).kudu + } + + /** + * 加载kudu表转为DataFrame + * + * @param tableName + * 表名 + * @return + * DataFrame + */ + def loadKuduTable(tableName: String): DataFrame = { + sqlContext.read.options(Map("kudu.master" -> FireKuduConf.kuduMaster, "kudu.table" -> KuduUtils.packageKuduTableName(tableName))).kudu + } + + /** + * 链式设置 + * + * @return + * SQLContext对象 + */ + def set(key: String, value: String): SQLContext = { + sqlContext.setConf(key, value) + sqlContext + } + + /** + * 执行一段Hive QL语句,注册为临时表,持久化到hive中 + * + * @param sqlStr + * SQL语句 + * @param tmpTableName + * 临时表名 + * @param saveMode + * 持久化的模式,默认为Overwrite + * @param cache + * 默认缓存表 + * @return + * 生成的DataFrame + */ + def sqlForPersistent(sqlStr: String, tmpTableName: String, partitionName: String, saveMode: SaveMode = SaveMode.valueOf(FireSparkConf.saveMode), cache: Boolean = true): DataFrame = { + val dataFrame = sqlContext.sql(sqlStr) + val dataFrameWriter = dataFrame.write.mode(saveMode) + if (StringUtils.isNotBlank(partitionName)) { + dataFrameWriter.partitionBy(partitionName).saveAsTable(tmpTableName) + } else { + dataFrameWriter.saveAsTable(tmpTableName) + } + dataFrame + } + + /** + * 执行一段Hive QL语句,注册为临时表,并cache + * + * @param sqlStr + * SQL语句 + * @param tmpTableName + * 临时表名 + * @return + * 生成的DataFrame + */ + def sqlForCache(sqlStr: String, tmpTableName: String): DataFrame = { + val dataFrame = sqlContext.sql(sqlStr) + dataFrame.createOrReplaceTempView(tmpTableName) + sqlContext.cacheTable(tmpTableName) + dataFrame + } + + /** + * 执行一段Hive QL语句,注册为临时表 + * + * @param sqlStr + * SQL语句 + * @param tmpTableName + * 临时表名 + * @return + * 生成的DataFrame + */ + def sqlNoCache(sqlStr: String, tmpTableName: String): DataFrame = { + val dataFrame = sqlContext.sql(sqlStr) + dataFrame.createOrReplaceTempView(tmpTableName) + dataFrame + } + + /** + * 批量清空多张缓存表 + * + * @param tables + * 多个表名 + */ + def uncacheTables(tables: String*): Unit = { + if (noEmpty(tables)) { + tables.filter(StringUtils.isNotBlank).foreach(tableName => { + if (sqlContext.isCached(tableName)) { + sqlContext.uncacheTable(tableName) + } + }) + } + } + + /** + * 批量缓存多张表 + * + * @param tables + * 多个表名 + */ + def cacheTables(tables: String*): Unit = { + tables.foreach(tableName => { + sqlContext.cacheTable(tableName) + }) + } + + /** + * 删除指定的hive表 + * + * @param tableNames + * 多个表名 + */ + def dropHiveTable(tableNames: String*): Unit = { + if (noEmpty(tableNames)) { + tableNames.filter(StringUtils.isNotBlank).foreach(tableName => { + sqlContext.sql(s"DROP TABLE IF EXISTS $tableName") + }) + } + } + + /** + * 为指定表添加分区 + * + * @param tableName + * 表名 + * @param partitions + * 分区 + */ + def addPartitions(tableName: String, partitions: String*): Unit = { + if (noEmpty(tableName, partitions)) { + partitions.foreach(ds => { + this.addPartition(tableName, ds, FireHiveConf.partitionName) + }) + } + } + + /** + * 为指定表添加分区 + * + * @param tableName + * 表名 + * @param partition + * 分区 + * @param partitionName + * 分区字段名称,默认ds + */ + def addPartition(tableName: String, partition: String, partitionName: String = FireHiveConf.partitionName): Unit = { + if (noEmpty(tableName, partition, partitionName)) { + sqlContext.sql(s"ALTER TABLE $tableName ADD IF NOT EXISTS partition($partitionName='$partition')") + } + } + + /** + * 为指定表删除分区 + * + * @param tableName + * 表名 + * @param partition + * 分区 + */ + def dropPartition(tableName: String, partition: String, partitionName: String = FireHiveConf.partitionName): Unit = { + if (noEmpty(tableName, partition, partitionName)) { + sqlContext.sql(s"ALTER TABLE $tableName DROP IF EXISTS partition($partitionName='$partition')") + } + } + + /** + * 为指定表删除多个分区 + * + * @param tableName + * 表名 + * @param partitions + * 分区 + */ + def dropPartitions(tableName: String, partitions: String*): Unit = { + if (StringUtils.isNotBlank(tableName) && partitions != null) { + partitions.foreach(ds => { + this.dropPartition(tableName, ds, FireHiveConf.partitionName) + }) + } + } + + /** + * 根据给定的表创建新表 + * + * @param srcTableName + * 源表名 + * @param destTableName + * 目标表名 + */ + def createTableAsSelect(srcTableName: String, destTableName: String): Unit = { + if (StringUtils.isNotBlank(srcTableName) && StringUtils.isNotBlank(destTableName)) { + sqlContext.sql( + s""" + |CREATE TABLE IF NOT EXISTS $destTableName AS + |SELECT * FROM $srcTableName + """.stripMargin) + } + } + + /** + * 根据一张表创建另一张表 + * + * @param tableName + * 表名 + * @param destTableName + * 目标表名 + */ + def createTableLike(tableName: String, destTableName: String): Unit = { + if (StringUtils.isNotBlank(tableName) && StringUtils.isNotBlank(destTableName)) { + sqlContext.sql( + s""" + |create table $tableName like $destTableName + """.stripMargin) + } + } + + /** + * 根据给定的表创建新表 + * + * @param srcTableName + * 来源表 + * @param destTableName + * 目标表 + * @param cols + * 多个列,逗号分隔 + */ + def createTableAsSelectFields(srcTableName: String, destTableName: String, cols: String): Unit = { + if (StringUtils.isNotBlank(srcTableName) && StringUtils.isNotBlank(destTableName) && StringUtils.isNotBlank(cols)) { + sqlContext.sql( + s""" + |CREATE TABLE IF NOT EXISTS $destTableName AS + |SELECT $cols FROM $srcTableName + """.stripMargin) + } + } + + /** + * 将数据插入到指定表的分区中 + * + * @param srcTableName + * 来源表 + * @param destTableName + * 目标表 + * @param ds + * 分区名 + * @param cols + * 多个列,逗号分隔 + */ + def insertIntoPartition(srcTableName: String, destTableName: String, ds: String, cols: String, partitionName: String = FireHiveConf.partitionName): Unit = { + sqlContext.sql( + s""" + |INSERT INTO TABLE $destTableName partition($partitionName='$ds') + | SELECT $cols + | FROM $srcTableName + """.stripMargin) + } + + /** + * 将sql执行结果插入到目标表指定分区中 + * + * @param destTableName + * 目标表名 + * @param ds + * 分区名 + * @param querySQL + * 查询语句 + */ + def insertIntoPartitionAsSelect(destTableName: String, ds: String, querySQL: String, partitionName: String = FireHiveConf.partitionName, overwrite: Boolean = false): Unit = { + val overwriteVal = if (overwrite) "OVERWRITE" else "INTO" + sqlContext.sql( + s""" + |INSERT $overwriteVal TABLE $destTableName partition($partitionName='$ds') + | $querySQL + """.stripMargin) + } + + /** + * 将sql执行结果插入到目标表指定分区中 + * + * @param destTableName + * 目标表名 + * @param querySQL + * 查询语句 + */ + def insertIntoDymPartitionAsSelect(destTableName: String, querySQL: String, partitionName: String = FireHiveConf.partitionName): Unit = { + sqlContext.sql( + s""" + |INSERT INTO TABLE $destTableName partition($partitionName) + | $querySQL + """.stripMargin) + } + + /** + * 修改表名 + * + * @param oldTableName + * 表名称 + * @param newTableName + * 新的表名 + */ + def rename(oldTableName: String, newTableName: String): Unit = { + if (StringUtils.isBlank(oldTableName) || StringUtils.isBlank(newTableName)) { + return + } + val sql = s"ALTER TABLE $oldTableName RENAME TO $newTableName" + sqlContext.sql(sql) + } + + /** + * 将表从一个db移动到另一个db中 + * + * @param tableName + * 表名 + * @param oldDB + * 老库名称 + * @param newDB + * 新库名称 + */ + def moveDB(tableName: String, oldDB: String, newDB: String): Unit = { + if (StringUtils.isBlank(tableName) || StringUtils.isBlank(newDB)) { + return + } + val allName = if (StringUtils.isNotBlank(oldDB) && tableName.indexOf(".") == -1) { + s"$oldDB.$tableName" + } else { + tableName + } + this.dropHiveTable(s"$newDB.$tableName") + val sql = s"ALTER TABLE $allName RENAME TO $newDB.$tableName" + println(sql) + sqlContext.sql(sql) + } + + // ----------------------------------- 关系型数据库API ----------------------------------- // + + /** + * 单线程加载一张关系型数据库表 + * 注:仅限用于小的表,不支持条件查询 + * + * @param tableName + * 关系型数据库表名 + * @param jdbcProps + * 调用者指定的数据库连接信息,如果为空,则默认读取配置文件 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * DataFrame + */ + def jdbcTableLoadAll(tableName: String, jdbcProps: Properties = null, keyNum: Int = 1): DataFrame = { + sqlContext.read.jdbc(FireJdbcConf.jdbcUrl(keyNum), tableName, DBUtils.getJdbcProps(jdbcProps, keyNum)) + } + + /** + * 指定load的条件,从关系型数据库中并行的load数据,并转为DataFrame + * + * @param tableName 数据库表名 + * @param predicates + * 并行load数据时,每一个分区load数据的where条件 + * 比如:gmt_create >= '2019-06-20' AND gmt_create <= '2019-06-21' 和 gmt_create >= '2019-06-22' AND gmt_create <= '2019-06-23' + * 那么将两个线程同步load,线程数与predicates中指定的参数个数保持一致 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * 查询结果集 + */ + def jdbcTableLoad(tableName: String, predicates: Array[String], jdbcProps: Properties = null, keyNum: Int = 1): DataFrame = { + sqlContext.read.jdbc(FireJdbcConf.jdbcUrl(keyNum), tableName, predicates, DBUtils.getJdbcProps(jdbcProps, keyNum)) + } + + /** + * 根据指定分区字段的范围load关系型数据库中的数据 + * + * @param tableName + * 表名 + * @param columnName + * 表的分区字段 + * @param lowerBound + * 分区的下边界 + * @param upperBound + * 分区的上边界 + * @param numPartitions + * 加载数据的并行度,默认为10,设置过大可能会导致数据库挂掉 + * @param jdbcProps + * jdbc连接信息,默认读取配置文件 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + */ + def jdbcTableLoadBound(tableName: String, columnName: String, lowerBound: Long, upperBound: Long, numPartitions: Int = 10, jdbcProps: Properties = null, keyNum: Int = 1): DataFrame = { + sqlContext.read.jdbc(FireJdbcConf.jdbcUrl(keyNum), tableName, columnName, lowerBound, upperBound, numPartitions, DBUtils.getJdbcProps(jdbcProps, keyNum)) + } + +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SparkConfExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SparkConfExt.scala new file mode 100644 index 0000000..942da2c --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SparkConfExt.scala @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.core + +import org.apache.spark.SparkConf + +/** + * SparkConf扩展 + * + * @param sparkConf + * sparkConf对象 + * @author ChengLong 2019-5-18 10:50:35 + */ +class SparkConfExt(sparkConf: SparkConf) { + +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SparkContextExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SparkContextExt.scala new file mode 100644 index 0000000..b6fbe72 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SparkContextExt.scala @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.core + +import org.apache.spark.SparkContext + +/** + * SparkContext扩展 + * + * @param sc + * SparkContext对象 + * @author ChengLong 2019-5-18 10:53:56 + */ +class SparkContextExt(sc: SparkContext) { + +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SparkSessionExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SparkSessionExt.scala new file mode 100644 index 0000000..08907f4 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/SparkSessionExt.scala @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.core + +import java.io.InputStream +import com.zto.fire._ +import com.zto.fire.core.Api +import com.zto.fire.jdbc.JdbcConnectorBridge +import com.zto.fire.spark.conf.FireSparkConf +import com.zto.fire.spark.ext.provider._ +import com.zto.fire.spark.util.{SparkSingletonFactory, SparkUtils} +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.apache.rocketmq.common.message.MessageExt +import org.apache.rocketmq.spark.{ConsumerStrategy, LocationStrategy} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.storage.StorageLevel +import org.apache.spark.streaming.dstream.{DStream, InputDStream, ReceiverInputDStream} + +import scala.reflect.ClassTag + +/** + * SparkContext扩展 + * + * @param spark + * sparkSession对象 + * @author ChengLong 2019-5-18 10:51:19 + */ +class SparkSessionExt(spark: SparkSession) extends Api with JdbcConnectorBridge with JdbcSparkProvider + with HBaseBulkProvider with SqlProvider with HBaseConnectorProvider with HBaseHadoopProvider with KafkaSparkProvider { + private[fire] lazy val ssc = SparkSingletonFactory.getStreamingContext + private[this] lazy val appName = ssc.sparkContext.appName + + /** + * 根据给定的集合,创建rdd + * + * @param seq + * seq + * @param numSlices + * 分区数 + * @return + * RDD + */ + def parallelize[T: ClassTag](seq: Seq[T], numSlices: Int = sc.defaultParallelism): RDD[T] = { + this.sc.parallelize(seq, numSlices) + } + + /** + * 根据给定的集合,创建rdd + * + * @param seq + * seq + * @param numSlices + * 分区数 + * @return + * RDD + */ + def createRDD[T: ClassTag](seq: Seq[T], numSlices: Int = sc.defaultParallelism): RDD[T] = { + this.parallelize[T](seq, numSlices) + } + + /** + * 创建socket流 + */ + def createSocketStream[T: ClassTag]( + hostname: String, + port: Int, + converter: (InputStream) => Iterator[T], + storageLevel: StorageLevel + ): ReceiverInputDStream[T] = { + this.ssc.socketStream[T](hostname, port, converter, storageLevel) + } + + /** + * 创建socket文本流 + */ + def createSocketTextStream( + hostname: String, + port: Int, + storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2 + ): ReceiverInputDStream[String] = { + this.ssc.socketTextStream(hostname, port, storageLevel) + } + + + /** + * 构建Kafka DStream流 + * + * @param kafkaParams + * kafka参数 + * @param topics + * topic列表 + * @return + * DStream + */ + def createKafkaDirectStream(kafkaParams: Map[String, Object] = null, topics: Set[String] = null, groupId: String = null, keyNum: Int = 1): DStream[ConsumerRecord[String, String]] = { + this.ssc.createDirectStream(kafkaParams, topics, groupId, keyNum) + } + + /** + * 构建RocketMQ拉取消息的DStream流 + * + * @param rocketParam + * rocketMQ相关消费参数 + * @param groupId + * groupId + * @param topics + * topic列表 + * @param consumerStrategy + * 从何处开始消费 + * @return + * rocketMQ DStream + */ + def createRocketMqPullStream(rocketParam: JMap[String, String] = null, + groupId: String = this.appName, + topics: String = null, + tag: String = null, + consumerStrategy: ConsumerStrategy = ConsumerStrategy.lastest, + locationStrategy: LocationStrategy = LocationStrategy.PreferConsistent, + instance: String = "", + keyNum: Int = 1): InputDStream[MessageExt] = { + this.ssc.createRocketPullStream(rocketParam, groupId, topics, tag, consumerStrategy, locationStrategy, instance, keyNum) + } + + /** + * 启动StreamingContext + */ + override def start(): Unit = { + if (this.ssc != null) { + this.ssc.startAwaitTermination() + } + } + + /** + * spark datasource read api增强,提供配置文件进行覆盖配置 + * + * @param format + * DataSource中的format + * @param loadParams + * load方法的参数,多个路径以逗号分隔 + * @param options + * DataSource中的options,支持参数传入和配置文件读取,相同的选项配置文件优先级更高 + * @param keyNum + * 用于标识不同DataSource api所对应的配置文件中key的后缀 + */ + def readEnhance(format: String = "", + loadParams: Seq[String] = null, + options: Map[String, String] = Map.empty, + keyNum: Int = 1): Unit = { + val finalFormat = if (noEmpty(FireSparkConf.datasourceFormat(keyNum))) FireSparkConf.datasourceFormat(keyNum) else format + val finalLoadParam = if (noEmpty(FireSparkConf.datasourceLoadParam(keyNum))) FireSparkConf.datasourceLoadParam(keyNum).split(",").toSeq else loadParams + this.logger.info(s"--> Spark DataSource read api参数信息(keyNum=$keyNum)<--") + this.logger.info(s"format=${finalFormat} loadParams=${finalLoadParam}") + + requireNonEmpty(finalFormat, finalLoadParam) + SparkSingletonFactory.getSparkSession.read.format(format).options(SparkUtils.optionsEnhance(options, keyNum)).load(finalLoadParam: _*) + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/StreamingContextExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/StreamingContextExt.scala new file mode 100644 index 0000000..eb514f1 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/core/StreamingContextExt.scala @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.core + +import com.zto.fire.common.conf.{FireKafkaConf, FireRocketMQConf} +import com.zto.fire.spark.util.{RocketMQUtils, SparkUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.apache.rocketmq.common.message.MessageExt +import org.apache.rocketmq.spark.{ConsumerStrategy, LocationStrategy, RocketMQConfig, RocketMqUtils} +import org.apache.spark.streaming.StreamingContext +import org.apache.spark.streaming.dstream.{DStream, InputDStream} +import org.apache.spark.streaming.kafka010.KafkaUtils +import org.slf4j.LoggerFactory + +import com.zto.fire._ + +/** + * StreamingContext扩展 + * + * @param ssc + * StreamingContext对象 + * @author ChengLong 2019-5-18 11:03:59 + */ +class StreamingContextExt(ssc: StreamingContext) { + + import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe + import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent + + private lazy val logger = LoggerFactory.getLogger(this.getClass) + private[this] lazy val appName = ssc.sparkContext.appName + + /** + * 创建DStream流 + * + * @param kafkaParams + * kafka参数 + * @param topics + * topic列表 + * @return + * DStream + */ + def createDirectStream(kafkaParams: Map[String, Object] = null, topics: Set[String] = null, groupId: String = null, keyNum: Int = 1): DStream[ConsumerRecord[String, String]] = { + // kafka topic优先级:配置文件 > topics参数 + val confTopic = FireKafkaConf.kafkaTopics(keyNum) + val finalKafkaTopic = if (StringUtils.isNotBlank(confTopic)) SparkUtils.topicSplit(confTopic) else topics + require(finalKafkaTopic != null && finalKafkaTopic.nonEmpty, s"kafka topic不能为空,请在配置文件中指定:spark.kafka.topics$keyNum") + this.logger.info(s"kafka topic is $finalKafkaTopic") + + val confKafkaParams = com.zto.fire.common.util.KafkaUtils.kafkaParams(kafkaParams, groupId, keyNum = keyNum) + require(confKafkaParams.nonEmpty, "kafka相关配置不能为空!") + require(confKafkaParams.contains("bootstrap.servers"), s"kafka bootstrap.servers不能为空,请在配置文件中指定:spark.kafka.brokers.name$keyNum") + require(confKafkaParams.contains("group.id"), s"kafka group.id不能为空,请在配置文件中指定:spark.kafka.group.id$keyNum") + + KafkaUtils.createDirectStream[String, String]( + ssc, PreferConsistent, Subscribe[String, String](finalKafkaTopic, confKafkaParams)) + } + + /** + * 构建RocketMQ拉取消息的DStream流 + * + * @param rocketParam + * rocketMQ相关消费参数 + * @param groupId + * groupId + * @param topics + * topic列表 + * @param consumerStrategy + * 从何处开始消费 + * @return + * rocketMQ DStream + */ + def createRocketPullStream(rocketParam: JMap[String, String] = null, + groupId: String = this.appName, + topics: String = null, + tag: String = null, + consumerStrategy: ConsumerStrategy = ConsumerStrategy.lastest, + locationStrategy: LocationStrategy = LocationStrategy.PreferConsistent, + instance: String = "", + keyNum: Int = 1): InputDStream[MessageExt] = { + + // 获取topic信息,配置文件优先级高于代码中指定的 + val confTopics = FireRocketMQConf.rocketTopics(keyNum) + val finalTopics = if (StringUtils.isNotBlank(confTopics)) confTopics else topics + require(StringUtils.isNotBlank(finalTopics), s"RocketMQ的Topics不能为空,请在配置文件中指定:spark.rocket.topics$keyNum") + + // 起始消费位点 + val confOffset = FireRocketMQConf.rocketStartingOffset(keyNum) + val finalConsumerStrategy = if (StringUtils.isNotBlank(confOffset)) RocketMQUtils.valueOfStrategy(confOffset) else consumerStrategy + + // 是否自动提交offset + val finalAutoCommit = FireRocketMQConf.rocketEnableAutoCommit(keyNum) + + // groupId信息 + val confGroupId = FireRocketMQConf.rocketGroupId(keyNum) + val finalGroupId = if (StringUtils.isNotBlank(confGroupId)) confGroupId else groupId + require(StringUtils.isNotBlank(finalGroupId), s"RocketMQ的groupId不能为空,请在配置文件中指定:spark.rocket.group.id$keyNum") + + // 详细的RocketMQ配置信息 + val finalRocketParam = RocketMQUtils.rocketParams(rocketParam, finalGroupId, rocketNameServer = null, tag = tag, keyNum) + require(!finalRocketParam.isEmpty, "RocketMQ相关配置不能为空!") + require(finalRocketParam.containsKey(RocketMQConfig.NAME_SERVER_ADDR), s"RocketMQ nameserver.addr不能为空,请在配置文件中指定:spark.rocket.brokers.name$keyNum") + require(finalRocketParam.containsKey(RocketMQConfig.CONSUMER_TAG), s"RocketMQ tag不能为空,请在配置文件中指定:spark.rocket.consumer.tag$keyNum") + // 消费者标识 + val instanceId = FireRocketMQConf.rocketInstanceId(keyNum) + val finalInstanceId = if (StringUtils.isNotBlank(instanceId)) instanceId else instance + if (StringUtils.isNotBlank(finalInstanceId)) finalRocketParam.put("consumer.instance", finalInstanceId) + + RocketMqUtils.createMQPullStream(this.ssc, + finalGroupId, + finalTopics.split(",").toList, + finalConsumerStrategy, + finalAutoCommit, + forceSpecial = FireRocketMQConf.rocketForceSpecial(keyNum), + failOnDataLoss = FireRocketMQConf.rocketFailOnDataLoss(keyNum), + locationStrategy, finalRocketParam) + } + + /** + * 开启streaming + */ + def startAwaitTermination(): Unit = { + ssc.start() + ssc.awaitTermination() + Thread.currentThread().join() + } + + /** + * 提交Spark Streaming Graph并执行 + */ + def start: Unit = this.startAwaitTermination() +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/module/KuduContextExt.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/module/KuduContextExt.scala new file mode 100644 index 0000000..9f2e23b --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/module/KuduContextExt.scala @@ -0,0 +1,990 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.module + +import java.lang.reflect.Field +import java.sql._ +import java.util +import java.util.concurrent.atomic.AtomicBoolean + +import com.zto.fire._ +import com.zto.fire.common.anno.FieldName +import com.zto.fire.common.conf.FireKuduConf +import com.zto.fire.common.util.{ReflectionUtils, ValueUtils} +import com.zto.fire.jdbc.util.DBUtils +import com.zto.fire.spark.bean.KuduBaseBean +import com.zto.fire.spark.conf.FireSparkConf +import com.zto.fire.spark.util.{KuduUtils, SparkUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.kudu.client.{CreateTableOptions, KuduTable} +import org.apache.kudu.spark.kudu.{KuduContext, _} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.types._ +import org.apache.spark.sql.{DataFrame, Row, SQLContext} + +import scala.collection.mutable +import scala.collection.mutable.ListBuffer +import scala.reflect._ + +/** + * Kudu相关扩展API + * Created by ChengLong on 2017-09-21. + */ +class KuduContextExt(val sqlContext: SQLContext, val kuduContext: KuduContext) extends Serializable { + private[this] lazy val paramErrorMsg = "参数不能为空" + + /** + * 用于维护kudu表与临时表之间的关系 + */ + private val tableMap = mutable.Map[String, String]() + + /** + * 将表名包装为以impala::开头的表 + * + * @param tableName + * 库名.表名 + * @return + * 包装后的表名 + */ + def packageKuduTableName(tableName: String): String = { + require(StringUtils.isNotBlank(tableName), "表名不能为空") + if (tableName.startsWith("impala::")) { + tableName + } else { + s"impala::$tableName" + } + } + + /** + * 插入多个实体bean到指定kudu表中 + * + * @param tableName + * 表名 + * @param beans + * 多个与表结构对应的实体bean + * @tparam T + * 具体类型 + */ + def insertIgnoreBeans[T: ClassTag](tableName: String, beans: T*): Unit = { + this.insertIgnoreList(tableName, beans) + } + + /** + * 插入多个实体bean到指定kudu表中 + * + * @param tableName + * 表名 + * @param beanSeq + * 多个与表结构对应的实体bean + * @tparam T + * 具体类型 + */ + def insertIgnoreList[T: ClassTag](tableName: String, beanSeq: Seq[T]): Unit = { + val beanRDD = sqlContext.sparkContext.parallelize(beanSeq, FireSparkConf.parallelism) + this.insertIgnoreRDD(tableName, beanRDD) + } + + /** + * 插入RDD到指定kudu表中 + * + * @param tableName + * 表名 + * @param rdd + * 多个与表结构对应的实体bean + * @tparam T + * 具体类型 + */ + def insertIgnoreRDD[T: ClassTag](tableName: String, rdd: RDD[T]): Unit = { + require(rdd != null, paramErrorMsg) + val df = sqlContext.createDataFrame(rdd, classTag[T].runtimeClass) + this.insertIgnoreRows(tableName, df) + } + + /** + * 向指定表插入DataFrame + * + * @param tableName + * 表名 + * @param dataFrame + * Spark DataFrame. 要与表结构对应 + */ + def insertIgnoreRows(tableName: String, dataFrame: DataFrame): Unit = { + this.kuduContext.insertIgnoreRows(dataFrame, this.packageKuduTableName(tableName)) + } + + /** + * 向指定表插入DataFrame + * + * @param tableName + * 表名 + * @param dataFrame + * Spark DataFrame. 要与表结构对应 + */ + def insertIgnoreDataFrame(tableName: String, dataFrame: DataFrame): Unit = { + this.insertIgnoreRows(tableName, dataFrame) + } + + /** + * 插入多个实体bean到指定kudu表中 + * + * @param tableName + * 表名 + * @param beans + * 多个与表结构对应的实体bean + * @tparam T + * 具体类型 + */ + def insertBeans[T: ClassTag](tableName: String, beans: T*): Unit = { + this.insertList(tableName, beans) + } + + /** + * 插入多个实体bean到指定kudu表中 + * + * @param tableName + * 表名 + * @param beanSeq + * 多个与表结构对应的实体bean + * @tparam T + * 具体类型 + */ + def insertList[T: ClassTag](tableName: String, beanSeq: Seq[T]): Unit = { + val beanRDD = sqlContext.sparkContext.parallelize(beanSeq, FireSparkConf.parallelism) + this.insertRDD(tableName, beanRDD) + } + + /** + * 插入RDD到指定kudu表中 + * + * @param tableName + * 表名 + * @param rdd + * 多个与表结构对应的实体bean + * @tparam T + * 具体类型 + */ + def insertRDD[T: ClassTag](tableName: String, rdd: RDD[T]): Unit = { + require(rdd != null, paramErrorMsg) + val df = sqlContext.createDataFrame(rdd, classTag[T].runtimeClass) + this.insertRows(tableName, df) + } + + /** + * 向指定表插入DataFrame + * + * @param tableName + * 表名 + * @param dataFrame + * Spark DataFrame. 要与表结构对应 + */ + def insertRows(tableName: String, dataFrame: DataFrame): Unit = { + this.kuduContext.insertRows(dataFrame, this.packageKuduTableName(tableName)) + } + + /** + * 向指定表插入DataFrame + * + * @param tableName + * 表名 + * @param dataFrame + * Spark DataFrame. 要与表结构对应 + */ + def insertDataFrame(tableName: String, dataFrame: DataFrame): Unit = { + this.insertRows(tableName, dataFrame) + } + + /** + * 使用DataFrame更新kudu表中的数据 + * + * @param tableName + * 表名 + * @param dataFrame + * 与表结构对应的DataFrame + */ + def updateRows(tableName: String, dataFrame: DataFrame): Unit = { + this.kuduContext.updateRows(dataFrame, this.packageKuduTableName(tableName)) + } + + /** + * 更新插入数据到kudu表中 + * + * @param tableName + * kudu表名 + * @param dataFrame + * 数据集 + */ + def upsertRows(tableName: String, dataFrame: DataFrame): Unit = { + this.kuduContext.upsertRows(dataFrame, this.packageKuduTableName(tableName)) + } + + /** + * 使用DataFrame更新插入kudu表中的数据 + * + * @param tableName + * 表名 + * @param dataFrame + * 与表结构对应的DataFrame + */ + def upsertDataFrame(tableName: String, dataFrame: DataFrame): Unit = { + this.upsertRows(tableName, dataFrame) + } + + /** + * 使用RDD更新插入kudu表中的数据 + * + * @param tableName + * 表名 + * @param rdd + * 与表结构对应的RDD + */ + def upsertRDD[T: ClassTag](tableName: String, rdd: RDD[T]): Unit = { + require(rdd != null, paramErrorMsg) + val df = sqlContext.createDataFrame(rdd, classTag[T].runtimeClass) + this.upsertRows(tableName, df) + } + + /** + * 使用多个实体bean更新插入kudu表中的数据 + * + * @param tableName + * 表名 + * @param beans + * 与表结构对应的多个实体bean + */ + def upsertBeans[T: ClassTag](tableName: String, beans: T*): Unit = { + val beanRDD = sqlContext.sparkContext.parallelize(beans, 1) + this.upsertRDD(tableName, beanRDD) + } + + /** + * 使用多个实体bean集合更新插入kudu表中的数据 + * + * @param tableName + * 表名 + * @param beanSeq + * 与表结构对应的多个实体bean + */ + def upsertList[T: ClassTag](tableName: String, beanSeq: Seq[T]): Unit = { + val beanRDD = sqlContext.sparkContext.parallelize(beanSeq, FireSparkConf.parallelism) + val df = sqlContext.createDataFrame(beanRDD, classTag[T].runtimeClass) + this.upsertRows(tableName, df) + } + + /** + * 使用DataFrame更新kudu表中的数据 + * + * @param tableName + * 表名 + * @param dataFrame + * 与表结构对应的DataFrame + */ + def updateDataFrame(tableName: String, dataFrame: DataFrame): Unit = { + this.updateRows(tableName, dataFrame) + } + + /** + * 使用RDD更新kudu表中的数据 + * + * @param tableName + * 表名 + * @param rdd + * 与表结构对应的RDD + */ + def updateRDD[T: ClassTag](tableName: String, rdd: RDD[T]): Unit = { + require(rdd != null, paramErrorMsg) + val df = sqlContext.createDataFrame(rdd, classTag[T].runtimeClass) + this.updateRows(tableName, df) + } + + /** + * 使用多个实体bean更新kudu表中的数据 + * + * @param tableName + * 表名 + * @param beans + * 与表结构对应的多个实体bean + */ + def updateBeans[T: ClassTag](tableName: String, beans: T*): Unit = { + val beanRDD = sqlContext.sparkContext.parallelize(beans, 1) + this.updateRDD(tableName, beanRDD) + } + + /** + * 使用多个实体bean集合更新kudu表中的数据 + * + * @param tableName + * 表名 + * @param beanSeq + * 与表结构对应的多个实体bean + */ + def updateList[T: ClassTag](tableName: String, beanSeq: Seq[T]): Unit = { + val beanRDD = sqlContext.sparkContext.parallelize(beanSeq, FireSparkConf.parallelism) + val df = sqlContext.createDataFrame(beanRDD, classTag[T].runtimeClass) + this.updateRows(tableName, df) + } + + /** + * 使用多个实体bean集合删除kudu表中的多条数据 + * + * @param tableName + * 表名 + * @param dataFrame + * 与表结构对应的DataFrame + */ + def deleteRows(tableName: String, dataFrame: DataFrame): Unit = { + require(dataFrame != null && StringUtils.isNotBlank(tableName), paramErrorMsg) + this.kuduContext.deleteRows(dataFrame, this.packageKuduTableName(tableName)) + } + + /** + * 使用多个实体bean集合删除kudu表中的多条数据 + * + * @param tableName + * 表名 + * @param dataFrame + * 与表结构对应的DataFrame + */ + def deleteDataFrame(tableName: String, dataFrame: DataFrame): Unit = { + this.deleteRows(tableName, dataFrame) + } + + /** + * 使用rdd集合删除kudu表中的多条数据 + * + * @param tableName + * 表名 + * 与表结构对应的rdd + */ + def deleteKVRDD[T: ClassTag](tableName: String, kv: (String, RDD[_])*): Unit = { + if (!this.tableExists(tableName)) throw new IllegalArgumentException(s"表名${tableName}不存在") + val keys = mutable.LinkedHashMap[String, Class[_]]() + val head = kv.head + var rdd: RDD[_] = head._2 + keys += (head._1 -> rdd.first().getClass) + kv.toList.tail.foreach(t => { + keys += (t._1 -> t._2.first().getClass) + rdd = rdd.zip(t._2).map(t => { + if (t._1.isInstanceOf[(Any, Any)]) { + val tmp = t._1.asInstanceOf[(Any, Any)] + (tmp._1, tmp._2, t._2) + } else if (t._1.isInstanceOf[(Any, Any, Any)]) { + val tmp = t._1.asInstanceOf[(Any, Any, Any)] + (tmp._1, tmp._2, tmp._3, t._2) + } else if (t._1.isInstanceOf[(Any, Any, Any, Any)]) { + val tmp = t._1.asInstanceOf[(Any, Any, Any, Any)] + (tmp._1, tmp._2, tmp._3, tmp._4, t._2) + } else if (t._1.isInstanceOf[(Any, Any, Any, Any, Any)]) { + val tmp = t._1.asInstanceOf[(Any, Any, Any, Any, Any)] + (tmp._1, tmp._2, tmp._3, tmp._4, tmp._5, t._2) + } else { + t + } + }) + }) + + val tmpRDD = rdd.map(t => { + if (t.isInstanceOf[(Any, Any)]) { + val tmp = t.asInstanceOf[(Any, Any)] + Row(tmp._1, tmp._2) + } else if (t.isInstanceOf[(Any, Any, Any)]) { + val tmp = t.asInstanceOf[(Any, Any, Any)] + Row(tmp._1, tmp._2, tmp._3) + } else if (t.isInstanceOf[(Any, Any, Any, Any)]) { + val tmp = t.asInstanceOf[(Any, Any, Any, Any)] + Row(tmp._1, tmp._2, tmp._3, tmp._4) + } else if (t.isInstanceOf[(Any, Any, Any, Any, Any)]) { + val tmp = t.asInstanceOf[(Any, Any, Any, Any, Any)] + Row(tmp._1, tmp._2, tmp._3, tmp._4, tmp._5) + } else { + Row(t) + } + }) + + val structType = ListBuffer[StructField]() + keys.foreach(key => { + val dataType: DataType = if (key._2 == classOf[java.lang.Integer]) { + IntegerType + } else if (key._2 == classOf[java.lang.Long]) { + LongType + } else { + StringType + } + structType += StructField(key._1, dataType, nullable = false) + }) + val df = sqlContext.createDataFrame(tmpRDD, StructType(structType.toArray)) + this.deleteRows(tableName, df) + } + + /** + * 使用实体bean集合删除kudu表中的多条数据 + * + * @param tableName + * 表名 + * 与表结构对应的实体bean集合 + */ + def deleteKVList[T: ClassTag](tableName: String, kv: (String, Seq[_])*): Unit = { + val params = new ListBuffer[(String, RDD[_])]() + kv.foreach(t => { + val fieldName = t._1 + val rdd = sqlContext.sparkContext.parallelize(t._2, 1) + params += (fieldName -> rdd) + }) + this.deleteKVRDD(tableName, params: _*) + } + + /** + * 删除多条记录 + * + * @param tableName + * 表名 + * @param beans + * 多个JavaBean实例 + * @tparam T + * 类型 + */ + def deleteBeans[T: ClassTag](tableName: String, beans: T*): Unit = { + this.deleteList(tableName, beans) + } + + /** + * 删除多条记录 + * + * @param tableName + * 表名 + * @param beans + * 多个JavaBean实例 + * @tparam T + * 类型 + */ + def deleteList[T: ClassTag](tableName: String, beans: Seq[T]): Unit = { + val beanClazz = classTag[T].runtimeClass + val rdd = this.sqlContext.sparkContext.parallelize(beans, 1) + val rowRDD = rdd.map(bean => { + KuduUtils.kuduBean2Row(bean) + }) + val list = KuduUtils.buildSchemaFromKuduBean(beanClazz) + val df = this.sqlContext.createDataFrame(rowRDD, StructType(list)) + df.show() + this.deleteDataFrame(tableName, df) + } + + /** + * 判断指定表是否存在 + * + * @param tableName + * 表名 + * @return + * 存在、不存在 + */ + def tableExists(tableName: String): Boolean = { + if (StringUtils.isBlank(tableName)) { + false + } else { + this.kuduContext.tableExists(this.packageKuduTableName(tableName)) + } + } + + /** + * 删除指定表 + * + * @param tableName + * 表名 + */ + def deleteTable(tableName: String): Unit = { + if (this.tableExists(tableName)) { + this.kuduContext.deleteTable(this.packageKuduTableName(tableName)) + } + } + + /** + * 删除指定表 + * + * @param tableName + * 表名 + */ + def dropTable(tableName: String): Unit = { + this.deleteTable(tableName) + } + + /** + * 根据给定的主键值集合查询kudu数据,返回DataFrame + * + * 表名(与kudu表对应的Spark SQL临时表) + * + * @param kv + * 联合主键键值对,如:("bill_code", "1", "2"), ("ds", "20171010","20171011") + * @tparam T + * 主键的类型(Int、Long、String) + * @return + * 查询结果,以DataFrame形式返回 + */ + def selectKVList[T: ClassTag](tableName: String, kv: (String, Seq[T])*): DataFrame = { + require(StringUtils.isNotBlank(tableName) && kv != null && kv.nonEmpty, paramErrorMsg) + val len = kv.head._2.length + if (len < 1 || kv.filter(t => t._2.length != len).length != 0) throw new IllegalArgumentException("联合主键值的个数必须一致") + + val tmpTableName = getTmpTableName(tableName) + val sqlStr = new mutable.StringBuilder("") + for (i <- 0 until len) { + val fragment = new mutable.StringBuilder(s"SELECT * FROM $tmpTableName WHERE ") + kv.foreach(t => { + val primaryKey = t._1 + if (StringUtils.isBlank(primaryKey)) throw new IllegalArgumentException("主键字段名称不能为空") + val value = t._2(i) + if (value.isInstanceOf[Int] || value.isInstanceOf[Long]) { + fragment.append(s" $primaryKey=$value AND") + } else { + fragment.append(s" $primaryKey='$value' AND") + } + }) + sqlStr.append(s" ${fragment.substring(0, fragment.length - 3)}\n UNION ALL\n") + } + this.sqlContext.sql( + s""" + |${sqlStr.substring(0, sqlStr.length - 10)} + """.stripMargin) + } + + /** + * 通过多个JavaBean查询 + * + * @param tableName + * 表名 + * @param beans + * 多个JavaBean + * @tparam T + * JavaBean实体类型 + * @return + * 数据集 + */ + def selectBeans[T: ClassTag](tableName: String, beans: KuduBaseBean*): DataFrame = { + this.selectKVList(tableName, this.getParams(beans: _*): _*) + } + + /** + * 通过多个JavaBean查询 + * + * @param tableName + * 表名 + * @param beans + * 多个JavaBean + * @tparam T + * JavaBean实体类型 + * @return + * 数据集 + */ + def selectList[T: ClassTag](tableName: String, beans: Seq[KuduBaseBean]): DataFrame = { + this.selectKVList(tableName, this.getParams(beans: _*): _*) + } + + /** + * 通过多个JavaBean查询 + * + * @param tableName + * 表名 + * @param rdd + * JavaBean集合 + * @tparam T + * JavaBean实体类型 + * @return + * 数据集 + */ + def selectRDD[T: ClassTag](tableName: String, rdd: RDD[T]): DataFrame = { + this.selectList(tableName, rdd.map(bean => bean.asInstanceOf[KuduBaseBean]).collect()) + } + + /** + * 加载kudu表转为DataFrame + * + * @param tableName + * @return + */ + def loadKuduTable(tableName: String): DataFrame = { + sqlContext.read.options(Map("kudu.master" -> FireKuduConf.kuduMaster, "kudu.table" -> KuduUtils.packageKuduTableName(tableName))).kudu + } + + /** + * 通过多个JavaBean查询 + * + * @param tableName + * 表名 + * @tparam T + * JavaBean实体类型 + * @return + * 数据集 + */ + def selectDataFrame[T: ClassTag](tableName: String, dataFrame: DataFrame, clazz: Class[_]): DataFrame = { + val rdd = dataFrame.rdd.map(row => KuduUtils.kuduRowToBean(row, clazz)) + this.selectRDD(tableName, rdd) + } + + /** + * 根据自定义Javabean获取联合主键和值 + * + * @param beans + * 自定义JavaBean的多个实体 + * @tparam T + * 类型 + * @return + * kv参数 + */ + def getParams[T: ClassTag](beans: KuduBaseBean*): ListBuffer[(String, Seq[T])] = { + val params = new ListBuffer[(String, Seq[T])]() + val idField = new ListBuffer[Field]() + val clazz = beans.head.getClass + clazz.getDeclaredFields.foreach(field => { + val anno = field.getAnnotation(classOf[FieldName]) + if (anno != null && anno.id) { + idField += field + } + }) + val map = mutable.Map[String, ListBuffer[T]]() + beans.foreach(bean => { + idField.foreach(field => { + ReflectionUtils.setAccessible(field) + val fieldName = field.getName + val seq = map.getOrElse(fieldName, null) + if (seq == null) { + val list = ListBuffer[T]() + list += field.get(bean).asInstanceOf[T] + map.put(fieldName, list) + } else { + seq += field.get(bean).asInstanceOf[T] + map.put(fieldName, seq) + } + }) + }) + map.foreach(t => { + params += (t._1 -> t._2) + }) + params + } + + /** + * 执行sql语句 + * 注:sql中引用的表必须存在 + * + * @param sql + * Spark 支持的sql语句 + * @return + * 结果集 + */ + def sql(sql: String): DataFrame = { + sqlContext.sql(sql) + } + + /** + * 判断记录是否存在 + * + * @param tableName + * 临时表名 + * @param kv + * 联合主键键值对,如:("bill_code", "1", "2"), ("ds", "20171010","20171011") + * @tparam T + * 主键的类型(Int、Long、String) + * @return + * 查询结果,以DataFrame形式返回 + */ + def rowExists[T: ClassTag](tableName: String, kv: (String, T)*): Boolean = { + val idClazz = classTag[T].runtimeClass + if (idClazz != classOf[Int] && idClazz != classOf[Long] && idClazz != classOf[String]) { + throw new IllegalArgumentException("主键字段类型必须为Int、Long、String") + } + val params = new ListBuffer[(String, Seq[T])]() + kv.foreach(t => { + val fieldName = t._1 + params += (fieldName -> Seq(t._2)) + }) + val df = this.selectKVList[T](tableName, params: _*) + !df.rdd.isEmpty() + } + + /** + * 判断记录是否存在 + * + * @param tableName + * 表名 + * @param bean + * JavaBean实体 + * @return + * 存在、不存在 + */ + def rowExists[T: ClassTag](tableName: String, bean: KuduBaseBean): Boolean = { + !this.selectBeans(tableName, bean).rdd.isEmpty() + } + + /** + * 创建kudu表原始API + * + * @param tableName + * 表名 + * @param schema + * 表模式 + * @param keys + * 主键集合 + * @param options + * 选项 + * @return + * kudu表对象 + */ + def createTable(tableName: String, schema: StructType, keys: Seq[String], options: CreateTableOptions): KuduTable = { + this.kuduContext.createTable(this.packageKuduTableName(tableName), schema, keys, options) + } + + /** + * 使用实体bean自动生成schema的方式创建kudu表 + * + * @param tableName + * 表名 + * @param beanClazz + * 实体bean的类型 + * @param keys + * 主键集合 + * @param options + * 选项 + * @return + * kudu表对象 + */ + def createTable(tableName: String, beanClazz: Class[_], keys: Seq[String], options: CreateTableOptions): KuduTable = { + val typeField = SparkUtils.buildSchemaFromBean(beanClazz) + this.createTable(tableName, StructType(typeField), keys, options) + } + + /** + * 根据kudu表名获取临时表 + * + * @param tableName + * kudu表名 + * @return + * Spark临时表名 + */ + private def getTmpTableName(tableName: String): String = { + val pkTableName = this.packageKuduTableName(tableName) + var tmpTableName = this.tableMap.getOrElse(pkTableName, "") + if (StringUtils.isBlank(tmpTableName)) { + tmpTableName = pkTableName.substring(pkTableName.indexOf(".") + 1, pkTableName.length) + this.tableMap += (pkTableName -> tmpTableName) + sqlContext.loadKuduTable(pkTableName).registerTempTable(tmpTableName) + } + tmpTableName + } +} + +object KuduContextExt { + private lazy val impalaDaemons = FireKuduConf.impalaDaemons.split(",").toSet[String] + private lazy val dataSource: util.LinkedList[Connection] = new util.LinkedList[Connection]() + private lazy val isInit = new AtomicBoolean(false) + + /** + * 初始化impala连接池 + */ + private[this] def initPool: Unit = { + if (isInit.compareAndSet(false, true)) { + if (ValueUtils.noEmpty(FireKuduConf.impalaJdbcDriverName)) Class.forName(FireKuduConf.impalaJdbcDriverName) + + impalaDaemons.filter(ValueUtils.noEmpty(_)).map(_.trim).foreach(ip => { + val conn: Connection = DriverManager.getConnection(s"jdbc:hive2://$ip:21050/;auth=noSasl") + println(s"已成功创建impala连接:$ip") + this.dataSource.push(conn) + }) + } + } + + /** + * 从数据库连接池中获取一个连接 + * + * @return + * impala连接 + */ + def getConnection: Connection = { + this.synchronized { + this.initPool + // 如果当前连接池中没有连接,则一直等待,直到获取到连接 + while (this.dataSource.isEmpty) + try { + Thread.sleep(100) + } + catch { + case e: InterruptedException => { + e.printStackTrace() + } + } + this.dataSource.poll + } + } + + /** + * 回收Connection + * + * @param conn + * 数据库连接 + */ + def closeConnection(conn: Connection): Unit = { + if (conn != null) + this.dataSource.push(conn) + } + + /** + * 执行kudu 的 sql语句 + * + * @param sqls + * 多条sql + */ + def execute(sqls: String*): Unit = { + if (ValueUtils.isEmpty(sqls)) return + + var con: Connection = null + var stmt: Statement = null + try { + con = this.getConnection + stmt = con.createStatement + sqls.filter(ValueUtils.noEmpty(_)).foreach(sql => { + stmt.execute(sql) + }) + } catch { + case e: Exception => e.printStackTrace + } finally { + if (stmt != null) { + stmt.close(); + } + this.closeConnection(con) + } + } + + /** + * 为指定的kudu表添加分区 + * + * @param tables + * 多个kudu表名 + * @param start + * 分区的起始时间(闭) + * @param end + * 分区的结束时间(开) + */ + def addPartition(tables: Seq[String], start: String, end: String): Unit = { + if (ValueUtils.isEmpty(tables, start, end)) return + + val sqls = tables.filter(ValueUtils.noEmpty(_)).map(table => s"""ALTER TABLE $table ADD IF NOT EXISTS RANGE PARTITION '$start' <= VALUES < '$end'""") + if (ValueUtils.noEmpty(sqls)) this.execute(sqls: _*) + } + + /** + * 根据主键批量删除 + * + * @param tableName + * 表名 + * @param ids + * 主键集合 + * @tparam T + * 类型 + * @return + * 影响的记录数 + */ + def deleteByIds[T: ClassTag](tableName: String, ids: Seq[T]): Long = { + if (StringUtils.isBlank(tableName) || ids == null || ids.length < 1) { + return 0L + } + val sqlStr = new StringBuilder(s"delete from $tableName where id in(") + val clazz = classTag[T].runtimeClass + ids.foreach(id => { + if (classOf[Int] == clazz || classOf[Long] == clazz) { + sqlStr.append(s"$id,") + } else { + sqlStr.append(s"'$id',") + } + }) + val sql = sqlStr.substring(0, sqlStr.length - 1) + ")" + var con: Connection = null + var stmt: Statement = null + try { + con = this.getConnection + stmt = con.createStatement() + stmt.executeUpdate(sql).toLong + } catch { + case e: Exception => e.printStackTrace() + } finally { + if (stmt != null) { + stmt.close() + } + this.closeConnection(con) + } + 0L + } + + /** + * 执行删除操作 + * + * @param sql + * SQL语句 + * @return + * 影响的记录数 + */ + def deleteBySQL[T: ClassTag](sql: String): Long = { + if (StringUtils.isBlank(sql)) { + return 0L + } + var con: Connection = null + var stmt: Statement = null + try { + con = this.getConnection + stmt = con.createStatement() + val rs = stmt.executeUpdate(sql) + return rs.toLong + } catch { + case e: Exception => e.printStackTrace() + } finally { + if (stmt != null) { + stmt.close() + } + this.closeConnection(con) + } + 0L + } + + def findBySQL[T](sql: String, clazz: Class[T]): ListBuffer[T] = { + if (StringUtils.isBlank(sql) || clazz == null) { + throw new IllegalArgumentException("参数不合法") + } + val list = ListBuffer[T]() + var con: Connection = null + var stmt: Statement = null + var rs: ResultSet = null + try { + con = this.getConnection + println(con.getCatalog) + stmt = con.createStatement() + rs = stmt.executeQuery(sql) + list ++= DBUtils.dbResultSet2Bean(rs, clazz) + } catch { + case e: Exception => e.printStackTrace() + } finally { + try { + if (rs != null) { + rs.close + } + if (stmt != null) { + stmt.close + } + this.closeConnection(con) + } catch { + case e1: Exception => e1.printStackTrace() + } + } + list + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/HBaseBulkProvider.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/HBaseBulkProvider.scala new file mode 100644 index 0000000..fbe057b --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/HBaseBulkProvider.scala @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.provider + +import com.zto.fire._ +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.spark.connector.HBaseBulkConnector +import org.apache.hadoop.hbase.client.Scan +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Dataset, Encoders} +import org.apache.spark.streaming.dstream.DStream + +import scala.reflect.ClassTag + +/** + * 为扩展层提供HBase bulk api + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 17:31 + */ +trait HBaseBulkProvider extends SparkProvider { + + /** + * scan数据,并转为RDD + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * @param clazz + * 对应的返回值类型 + * @return + * clazz类型的rdd + */ + def hbaseBulkScanRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): RDD[T] = { + HBaseBulkConnector.bulkScanRDD(tableName, clazz, scan, keyNum) + } + + /** + * scan数据,并转为RDD + * + * @param tableName + * HBase表名 + * @param startRow + * 开始 + * @param stopRow + * 结束 + * @param clazz + * 对应的返回值类型 + * @return + * clazz类型的rdd + */ + def hbaseBulkScanRDD2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): RDD[T] = { + HBaseBulkConnector.bulkScanRDD2(tableName, clazz, startRow, stopRow, keyNum) + } + + /** + * 使用bulk方式scan数据,并转为DataFrame + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * @param clazz + * 对应的返回值类型 + * @return + * clazz类型的rdd + */ + def hbaseBulkScanDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): DataFrame = { + val rdd = HBaseBulkConnector.bulkScanRDD(tableName, clazz, scan, keyNum) + this.spark.createDataFrame(rdd, clazz) + } + + /** + * 使用bulk方式scan数据,并转为DataFrame + * + * @param tableName + * HBase表名 + * @param startRow + * 开始 + * @param stopRow + * 结束 + * @param clazz + * 对应的返回值类型 + * @return + * clazz类型的rdd + */ + def hbaseBulkScanDF2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): DataFrame = { + this.hbaseBulkScanDF[T](tableName, clazz, HBaseConnector.buildScan(startRow, stopRow), keyNum) + } + + /** + * 使用bulk方式scan数据,并转为Dataset + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * @param clazz + * 对应的返回值类型 + * @return + * clazz类型的rdd + */ + def hbaseBulkScanDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): Dataset[T] = { + val rdd = HBaseBulkConnector.bulkScanRDD(tableName, clazz, scan, keyNum) + this.spark.createDataset(rdd)(Encoders.bean(clazz)) + } + + /** + * 使用bulk方式scan数据,并转为DataFrame + * + * @param tableName + * HBase表名 + * @param startRow + * 开始 + * @param stopRow + * 结束 + * @param clazz + * 对应的返回值类型 + * @return + * clazz类型的rdd + */ + def hbaseBulkScanDS2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): Dataset[T] = { + this.hbaseBulkScanDS[T](tableName, clazz, HBaseConnector.buildScan(startRow, stopRow), keyNum) + } + + /** + * 使用bulk方式批量插入数据 + * + * @param tableName + * HBase表名 + * 数据集合,继承自HBaseBaseBean + */ + def hbaseBulkPutRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, rdd: RDD[T], keyNum: Int = 1): Unit = { + rdd.hbaseBulkPutRDD(tableName, keyNum) + } + + /** hbaseInsertList + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def hbaseBulkPutDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dataFrame: DataFrame, clazz: Class[T], keyNum: Int = 1): Unit = { + dataFrame.hbaseBulkPutDF[T](tableName, clazz, keyNum) + } + + /** + * 批量写入,将自定义的JavaBean数据集批量并行写入 + * 到HBase的指定表中。内部会将自定义JavaBean的相应 + * 字段一一映射为Put对象,并完成一次写入 + * + * @param tableName + * HBase表名 + * @param dataset + * dataFrame实例,数类型需继承自HBaseBaseBean + * @tparam T + * 数据类型为HBaseBaseBean的子类 + */ + def hbaseBulkPutDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dataset: Dataset[T], keyNum: Int = 1): Unit = { + dataset.hbaseBulkPutDS[T](tableName, keyNum) + } + + /** + * DStrea数据实时写入 + * + * @param tableName + * HBase表名 + */ + def hbaseBulkPutStream[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dstream: DStream[T], keyNum: Int = 1): Unit = { + dstream.hbaseBulkPutStream[T](tableName, keyNum) + } + + /** + * 根据RDD[String]批量删除 + * + * @param tableName + * HBase表名 + * @param rowKeyRDD + * 装有rowKey的rdd集合 + */ + def hbaseBulkDeleteRDD(tableName: String, rowKeyRDD: RDD[String], keyNum: Int = 1): Unit = { + rowKeyRDD.hbaseBulkDeleteRDD(tableName, keyNum) + } + + /** + * 根据Dataset[String]批量删除,Dataset是rowkey的集合 + * 类型为String + * + * @param tableName + * HBase表名 + */ + def hbaseBulkDeleteDS(tableName: String, dataSet: Dataset[String], keyNum: Int = 1): Unit = { + dataSet.hbaseBulkDeleteDS(tableName, keyNum) + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * 内部实现是将rowkey集合转为RDD[String],推荐在数据量较大 + * 时使用。数据量较小请优先使用HBaseOper + * + * @param tableName + * HBase表名 + * @param clazz + * 具体类型 + * @param seq + * rowKey集合 + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def hbaseBulkGetSeq[E <: HBaseBaseBean[E] : ClassTag](tableName: String, seq: Seq[String], clazz: Class[E], keyNum: Int = 1): RDD[E] = { + HBaseBulkConnector.bulkGetSeq[E](tableName, seq, clazz, keyNum) + } + + /** + * 根据rowKey集合批量获取数据 + * + * @param tableName + * HBase表名 + * @param clazz + * 获取后的记录转换为目标类型 + * @return + * 结果集 + */ + def hbaseBulkGetRDD[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rowKeyRDD: RDD[String], clazz: Class[E], keyNum: Int = 1): RDD[E] = { + rowKeyRDD.hbaseBulkGetRDD[E](tableName, clazz, keyNum) + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def hbaseBulkGetDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rowKeyRDD: RDD[String], clazz: Class[E], keyNum: Int = 1): DataFrame = { + rowKeyRDD.hbaseBulkGetDF[E](tableName, clazz, keyNum) + } + + /** + * 根据rowKey集合批量获取数据,并映射为自定义的JavaBean类型 + * + * @param tableName + * HBase表名 + * @param clazz + * 获取后的记录转换为目标类型(自定义的JavaBean类型) + * @tparam E + * 自定义JavaBean类型,必须继承自HBaseBaseBean + * @return + * 自定义JavaBean的对象结果集 + */ + def hbaseBulkGetDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rowKeyRDD: RDD[String], clazz: Class[E], keyNum: Int = 1): Dataset[E] = { + rowKeyRDD.hbaseBulkGetDS[E](tableName, clazz, keyNum) + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/HBaseConnectorProvider.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/HBaseConnectorProvider.scala new file mode 100644 index 0000000..22c6c06 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/HBaseConnectorProvider.scala @@ -0,0 +1,331 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.provider + +import com.zto.fire._ +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.spark.connector.HBaseSparkBridge +import org.apache.hadoop.hbase.client.{Get, Scan} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Dataset} + +import scala.reflect.ClassTag + +/** + * 为扩展层提供HBaseConnector相关API + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-12-23 17:39 + */ +trait HBaseConnectorProvider extends SparkProvider { + + /** + * Scan指定HBase表的数据,并映射为DataFrame + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): DataFrame = { + HBaseSparkBridge(keyNum = keyNum).hbaseScanDF(tableName, clazz, scan) + } + + /** + * Scan指定HBase表的数据,并映射为DataFrame + * + * @param tableName + * HBase表名 + * @param startRow + * 开始主键 + * @param stopRow 结束主键 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanDF2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): DataFrame = { + HBaseSparkBridge(keyNum = keyNum).hbaseScanDF2(tableName, clazz, startRow, stopRow) + } + + /** + * Scan指定HBase表的数据,并映射为Dataset + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): Dataset[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseScanDS[T](tableName, clazz, scan) + } + + /** + * Scan指定HBase表的数据,并映射为Dataset + * + * @param tableName + * HBase表名 + * @param startRow + * 开始主键 + * @param stopRow 结束主键 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanDS2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): Dataset[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseScanDS2[T](tableName, clazz, startRow, stopRow) + } + + /** + * 使用hbase java api方式插入一个集合的数据到hbase表中 + * + * @param tableName + * hbase表名 + * @param seq + * HBaseBaseBean的子类集合 + */ + def hbasePutList[T <: HBaseBaseBean[T] : ClassTag](tableName: String, seq: Seq[T], keyNum: Int = 1): Unit = { + HBaseSparkBridge(keyNum = keyNum).hbasePutList[T](tableName, seq) + } + + /** + * 使用Java API的方式将RDD中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + */ + def hbasePutRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, rdd: RDD[T], keyNum: Int = 1): Unit = { + rdd.hbasePutRDD[T](tableName, keyNum) + } + + /** + * 使用Java API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param df + * DataFrame + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hbasePutDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, df: DataFrame, clazz: Class[E], keyNum: Int = 1): Unit = { + df.hbasePutDF(tableName, clazz, keyNum) + } + + /** + * 使用Java API的方式将Dataset中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hbasePutDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, dataset: Dataset[E], clazz: Class[E], keyNum: Int = 1): Unit = { + dataset.hbasePutDS[E](tableName, clazz, keyNum) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param scan + * HBase scan对象 + * @return + */ + def hbaseScanRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): RDD[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseScanRDD(tableName, clazz, scan) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseScanRDD2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): RDD[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseScanRDD(tableName, clazz, HBaseConnector.buildScan(startRow, stopRow)) + } + + /** + * Scan指定HBase表的数据,并映射为List + * + * @param tableName + * HBase表名 + * @param scan + * hbase scan对象 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanList[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): Seq[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseScanList[T](tableName, clazz, scan) + } + + /** + * Scan指定HBase表的数据,并映射为List + * + * @param tableName + * HBase表名 + * @param startRow + * 开始主键 + * @param stopRow 结束主键 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseScanList2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): Seq[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseScanList2[T](tableName, clazz, startRow, stopRow) + } + + /** + * 通过RDD[String]批量获取对应的数据(可获取历史版本的记录) + * + * @param tableName + * HBase表名 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseGetRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], rdd: RDD[String], keyNum: Int = 1): RDD[T] = { + rdd.hbaseGetRDD(tableName, clazz, keyNum) + } + + /** + * 通过RDD[String]批量获取对应的数据(可获取历史版本的记录) + * + * @param tableName + * HBase表名 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseGetDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], rdd: RDD[String], keyNum: Int = 1): DataFrame = { + rdd.hbaseGetDF(tableName, clazz, keyNum) + } + + /** + * 通过RDD[String]批量获取对应的数据(可获取历史版本的记录) + * + * @param tableName + * HBase表名 + * @param clazz + * 目标类型 + * @tparam T + * 目标类型 + * @return + */ + def hbaseGetDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], rdd: RDD[String], keyNum: Int = 1): Dataset[T] = { + rdd.hbaseGetDS[T](tableName, clazz, keyNum) + } + + /** + * 根据rowKey查询数据,并转为List[T] + * + * @param tableName + * hbase表名 + * @param seq + * rowKey集合 + * @param clazz + * 目标类型 + * @return + * List[T] + */ + def hbaseGetList[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], seq: Seq[Get], keyNum: Int = 1): Seq[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseGetList[T](tableName, clazz, seq) + } + + /** + * 根据rowKey查询数据,并转为List[T] + * + * @param tableName + * hbase表名 + * @param seq + * rowKey集合 + * @param clazz + * 目标类型 + * @return + * List[T] + */ + def hbaseGetList2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], seq: Seq[String], keyNum: Int = 1): Seq[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseGetList2[T](tableName, clazz, seq) + } + + /** + * 根据rowKey集合批量删除记录 + * + * @param tableName + * hbase表名 + * @param rowKeys + * rowKey集合 + */ + def hbaseDeleteList(tableName: String, rowKeys: Seq[String], keyNum: Int = 1): Unit = { + HBaseSparkBridge(keyNum = keyNum).hbaseDeleteList(tableName, rowKeys) + } + + /** + * 根据RDD[RowKey]批量删除记录 + * + * @param tableName + * rowKey集合 + * @param rowKeyRDD + * rowKey的rdd集合 + */ + def hbaseDeleteRDD(tableName: String, rowKeyRDD: RDD[String], keyNum: Int = 1): Unit = { + rowKeyRDD.hbaseDeleteRDD(tableName, keyNum) + } + + /** + * 根据Dataset[RowKey]批量删除记录 + * + * @param tableName + * rowKey集合 + */ + def hbaseDeleteDS(tableName: String, dataset: Dataset[String], keyNum: Int = 1): Unit = { + dataset.hbaseDeleteDS(tableName, keyNum) + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/HBaseHadoopProvider.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/HBaseHadoopProvider.scala new file mode 100644 index 0000000..0777804 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/HBaseHadoopProvider.scala @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.provider + +import com.zto.fire._ +import com.zto.fire.hbase.bean.HBaseBaseBean +import com.zto.fire.spark.connector.HBaseSparkBridge +import org.apache.hadoop.hbase.client.{Result, Scan} +import org.apache.hadoop.hbase.io.ImmutableBytesWritable +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Dataset, Row} + +import scala.reflect.ClassTag + +/** + * 为扩展层提供spark方式的HBase操作API + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 17:41 + */ +trait HBaseHadoopProvider extends SparkProvider { + + /** + * 使用Spark API的方式将RDD中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + */ + def hbaseHadoopPutRDD[E <: HBaseBaseBean[E] : ClassTag](tableName: String, rdd: RDD[E], keyNum: Int = 1): Unit = { + rdd.hbaseHadoopPutRDD(tableName, keyNum) + } + + /** + * 使用spark API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param clazz + * JavaBean类型,为HBaseBaseBean的子类 + */ + def hbaseHadoopPutDF[E <: HBaseBaseBean[E] : ClassTag](tableName: String, dataFrame: DataFrame, clazz: Class[E], keyNum: Int = 1): Unit = { + dataFrame.hbaseHadoopPutDF(tableName, clazz, keyNum) + } + + /** + * 使用spark API的方式将DataFrame中的数据分多个批次插入到HBase中 + * + * @param tableName + * HBase表名 + * @param dataset + * JavaBean类型,待插入到hbase的数据集 + */ + def hbaseHadoopPutDS[E <: HBaseBaseBean[E] : ClassTag](tableName: String, dataset: Dataset[E], keyNum: Int = 1): Unit = { + dataset.hbaseHadoopPutDS[E](tableName, keyNum) + } + + /** + * 以spark 方式批量将DataFrame数据写入到hbase中 + * + * @param tableName + * hbase表名 + * @tparam T + * JavaBean类型 + */ + def hbaseHadoopPutDFRow[T <: HBaseBaseBean[T] : ClassTag](tableName: String, dataFrame: DataFrame, buildRowKey: (Row) => String, keyNum: Int = 1): Unit = { + dataFrame.hbaseHadoopPutDFRow[T](tableName, buildRowKey, keyNum) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * 目标类型 + * @return + */ + def hbaseHadoopScanRS(tableName: String, scan: Scan, keyNum: Int = 1): RDD[(ImmutableBytesWritable, Result)] = { + HBaseSparkBridge(keyNum = keyNum).hbaseHadoopScanRS(tableName, scan) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseHadoopScanRS2(tableName: String, startRow: String, stopRow: String, keyNum: Int = 1): RDD[(ImmutableBytesWritable, Result)] = { + HBaseSparkBridge(keyNum = keyNum).hbaseHadoopScanRS2(tableName, startRow, stopRow) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(T] + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * 目标类型 + * @return + */ + def hbaseHadoopScanRDD[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): RDD[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseHadoopScanRDD[T](tableName, clazz, scan) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[T] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseHadoopScanRDD2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): RDD[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseHadoopScanRDD2[T](tableName, clazz, startRow, stopRow) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(T] + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * 目标类型 + * @return + */ + def hbaseHadoopScanDF[T <: HBaseBaseBean[T] : ClassTag](tableName: String, scan: Scan, clazz: Class[T], keyNum: Int = 1): DataFrame = { + HBaseSparkBridge(keyNum = keyNum).hbaseHadoopScanDF[T](tableName, clazz, scan) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseHadoopScanDF2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): DataFrame = { + HBaseSparkBridge(keyNum = keyNum).hbaseHadoopScanDF2[T](tableName, clazz, startRow, stopRow) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(T] + * + * @param tableName + * HBase表名 + * @param scan + * scan对象 + * 目标类型 + * @return + */ + def hbaseHadoopScanDS[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], scan: Scan, keyNum: Int = 1): Dataset[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseHadoopScanDS[T](tableName, clazz, scan) + } + + /** + * Scan指定HBase表的数据,并映射为RDD[(ImmutableBytesWritable, Result)] + * + * @param tableName + * HBase表名 + * @param startRow + * rowKey开始位置 + * @param stopRow + * rowKey结束位置 + * 目标类型 + * @return + */ + def hbaseHadoopScanDS2[T <: HBaseBaseBean[T] : ClassTag](tableName: String, clazz: Class[T], startRow: String, stopRow: String, keyNum: Int = 1): Dataset[T] = { + HBaseSparkBridge(keyNum = keyNum).hbaseHadoopScanDS2[T](tableName, clazz, startRow, stopRow) + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/JdbcSparkProvider.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/JdbcSparkProvider.scala new file mode 100644 index 0000000..0a8fcab --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/JdbcSparkProvider.scala @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.provider + +import java.sql.Connection +import java.util.Properties + +import com.zto.fire._ +import com.zto.fire.jdbc.JdbcConnector +import com.zto.fire.jdbc.conf.FireJdbcConf +import org.apache.commons.lang3.StringUtils +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Dataset, Encoders, SaveMode} +import org.apache.spark.storage.StorageLevel + +import scala.reflect.ClassTag + +/** + * 为扩展层提供jdbc相关的api + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 17:48 + */ +trait JdbcSparkProvider extends SparkProvider { + + /** + * 执行查询操作,以RDD方式返回结果集 + * + * @param sql + * 查询语句 + * @param params + * sql执行参数 + * @param clazz + * JavaBean类型 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return 查询结果集 + */ + def jdbcQueryRDD[T <: Object : ClassTag](sql: String, params: Seq[Any] = null, clazz: Class[T], connection: Connection = null, keyNum: Int = 1): RDD[T] = { + val rsList = JdbcConnector.executeQuery[T](sql, params, clazz, connection, keyNum) + this.sc.parallelize(rsList, FireJdbcConf.jdbcQueryPartition).persist(StorageLevel.fromString(FireJdbcConf.jdbcStorageLevel)) + } + + /** + * 执行查询操作,以DataFrame方式返回结果集 + * + * @param sql + * 查询语句 + * @param params + * sql执行参数 + * @param clazz + * JavaBean类型 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return 查询结果集 + */ + def jdbcQueryDF[T <: Object : ClassTag](sql: String, params: Seq[Any] = null, clazz: Class[T], connection: Connection = null, keyNum: Int = 1): DataFrame = { + this.spark.createDataFrame(this.jdbcQueryRDD(sql, params, clazz, connection, keyNum), clazz) + } + + /** + * 执行查询操作,以Dataset方式返回结果集 + * + * @param sql + * 查询语句 + * @param params + * sql执行参数 + * @param clazz + * JavaBean类型 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * 查询结果集 + */ + def jdbcQueryDS[T <: Object : ClassTag](sql: String, params: Seq[Any] = null, clazz: Class[T], connection: Connection = null, keyNum: Int = 1): Dataset[T] = { + this.spark.createDataset[T](this.jdbcQueryRDD(sql, params, clazz, connection, keyNum))(Encoders.bean(clazz)) + } + + /** + * 将DataFrame数据保存到关系型数据库中 + * + * @param dataFrame + * DataFrame数据集 + * @param tableName + * 关系型数据库表名 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + */ + def jdbcTableSave(dataFrame: DataFrame, tableName: String, saveMode: SaveMode = SaveMode.Append, jdbcProps: Properties = null, keyNum: Int = 1): Unit = { + dataFrame.jdbcTableSave(tableName, saveMode, jdbcProps, keyNum) + } + + /** + * 单线程加载一张关系型数据库表 + * 注:仅限用于小的表,不支持条件查询 + * + * @param tableName + * 关系型数据库表名 + * @param jdbcProps + * 调用者指定的数据库连接信息,如果为空,则默认读取配置文件 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * DataFrame + */ + def jdbcTableLoadAll(tableName: String, jdbcProps: Properties = null, keyNum: Int = 1): DataFrame = { + this.spark.sqlContext.jdbcTableLoadAll(tableName, jdbcProps, keyNum) + } + + /** + * 指定load的条件,从关系型数据库中并行的load数据,并转为DataFrame + * + * @param tableName 数据库表名 + * @param predicates + * 并行load数据时,每一个分区load数据的where条件 + * 比如:gmt_create >= '2019-06-20' AND gmt_create <= '2019-06-21' 和 gmt_create >= '2019-06-22' AND gmt_create <= '2019-06-23' + * 那么将两个线程同步load,线程数与predicates中指定的参数个数保持一致 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + * 查询结果集 + */ + def jdbcTableLoad(tableName: String, predicates: Array[String], jdbcProps: Properties = null, keyNum: Int = 1): DataFrame = { + this.spark.sqlContext.jdbcTableLoad(tableName, predicates, jdbcProps, keyNum) + } + + /** + * 根据指定字段的范围load关系型数据库中的数据 + * + * @param tableName + * 表名 + * @param columnName + * 表的分区字段 + * @param lowerBound + * 分区的下边界 + * @param upperBound + * 分区的上边界 + * @param jdbcProps + * jdbc连接信息,默认读取配置文件 + * @param keyNum + * 配置文件中数据源配置的数字后缀,用于应对多数据源的情况,如果仅一个数据源,可不填 + * 比如需要操作另一个数据库,那么配置文件中key需携带相应的数字后缀:spark.db.jdbc.url2,那么此处方法调用传参为3,以此类推 + * @return + */ + def jdbcTableLoadBound(tableName: String, columnName: String, lowerBound: Long, upperBound: Long, numPartitions: Int = 10, jdbcProps: Properties = null, keyNum: Int = 1): DataFrame = { + this.spark.sqlContext.jdbcTableLoadBound(tableName, columnName, lowerBound, upperBound, keyNum, jdbcProps, keyNum) + } + + /** + * 将DataFrame中指定的列写入到jdbc中 + * 调用者需自己保证DataFrame中的列类型与关系型数据库对应字段类型一致 + * + * @param dataFrame + * 将要插入到关系型数据库中原始的数据集 + * @param sql + * 关系型数据库待执行的增删改sql + * @param fields + * 指定部分DataFrame列名作为参数,顺序要对应sql中问号占位符的顺序 + * 若不指定字段,则默认传入当前DataFrame所有列,且列的顺序与sql中问号占位符顺序一致 + * @param batch + * 每个批次执行多少条 + * @param keyNum + * 对应配置文件中指定的数据源编号 + */ + def jdbcBatchUpdateDF(dataFrame: DataFrame, sql: String, fields: Seq[String] = null, batch: Int = FireJdbcConf.batchSize(), keyNum: Int = 1): Unit = { + require(dataFrame != null && StringUtils.isNotBlank(sql), "执行jdbcBatchUpdateDF失败,dataFrame或sql为空") + dataFrame.jdbcBatchUpdate(sql, fields, batch, keyNum) + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/KafkaSparkProvider.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/KafkaSparkProvider.scala new file mode 100644 index 0000000..4e892b8 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/KafkaSparkProvider.scala @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.provider + +import com.zto.fire.common.conf.FireKafkaConf +import com.zto.fire.common.util.{KafkaUtils, LogUtils} +import com.zto.fire.spark.util.SparkUtils +import com.zto.fire.{requireNonEmpty, retry, _} +import org.apache.commons.lang3.StringUtils +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.functions.from_json +import org.apache.spark.sql.{DataFrame, Dataset, Encoders} + +/** + * 为扩展层提供Kafka相关的API + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 17:43 + */ +trait KafkaSparkProvider extends SparkProvider { + import spark.implicits._ + + /** + * 消费kafka中的json数据,并解析成json字符串 + * + * @param extraOptions + * 消费kafka额外的参数,如果有key同时出现在配置文件中和extraOptions中,将被extraOptions覆盖 + * @param keyNum + * 配置文件中key的数字后缀 + * @return + * 转换成json字符串后的Dataset + */ + def loadKafka(extraOptions: Map[String, String] = null, keyNum: Int = 1): Dataset[(String, String)] = { + val extraOptionsMap = new scala.collection.mutable.HashMap[String, String] + if (extraOptions != null && extraOptions.nonEmpty) extraOptionsMap ++= extraOptions + + val confGroupId = FireKafkaConf.kafkaGroupId(keyNum) + val groupId = if (StringUtils.isNotBlank(confGroupId)) confGroupId else spark.sparkContext.appName + extraOptionsMap += ("group.id" -> groupId) + + val finalBrokers = FireKafkaConf.kafkaBrokers(keyNum) + if (StringUtils.isNotBlank(finalBrokers)) extraOptionsMap += ("kafka.bootstrap.servers" -> finalBrokers) + require(extraOptionsMap.contains("kafka.bootstrap.servers"), s"kafka bootstrap.servers不能为空,请在配置文件中指定:spark.kafka.brokers.name$keyNum") + + val topics = FireKafkaConf.kafkaTopics() + if (StringUtils.isNotBlank(topics)) extraOptionsMap += ("subscribe" -> topics) + require(extraOptionsMap.contains("subscribe"), s"kafka topic不能为空,请在配置文件中指定:spark.kafka.topics$keyNum") + + // 以spark.kafka.conf.开头的配置优先级最高 + val configMap = FireKafkaConf.kafkaConfMap(keyNum) + extraOptionsMap ++= configMap + LogUtils.logMap(this.logger, extraOptionsMap.toMap, s"Kafka client configuration. keyNum=$keyNum.") + + val kafkaReader = spark.readStream + .format("kafka") + .options(extraOptionsMap) + .load() + .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING) as value") + .as[(String, String)] + kafkaReader + } + + /** + * 消费kafka中的json数据,并按照指定的schema解析成目标类型 + * + * @param schemaClass + * json对应的javabean类型 + * @param extraOptions + * 消费kafka额外的参数 + * @param parseAll + * 是否解析所有字段信息 + * @param isMySQL + * 是否为mysql解析的消息 + * @param fieldNameUpper + * 字段名称是否为大写 + * @return + * 转换成json字符串后的Dataset + */ + def loadKafkaParse(schemaClass: Class[_], + extraOptions: Map[String, String] = null, + parseAll: Boolean = false, + isMySQL: Boolean = true, + fieldNameUpper: Boolean = false, keyNum: Int = 1): DataFrame = { + val kafkaDataset = this.loadKafka(extraOptions, keyNum) + val schemaDataset = kafkaDataset.select(from_json($"value", SparkUtils.buildSchema2Kafka(schemaClass, parseAll, isMySQL, fieldNameUpper)).as("data")) + if (parseAll) + schemaDataset.select("data.*") + else + schemaDataset.select("data.after.*") + } + + /** + * 消费kafka中的json数据,并自动解析json数据,将解析后的数据注册到tableName所指定的临时表中 + * + * @param tableName + * 解析后的数据存放的临时表名,默认名为kafka + * @param extraOptions + * 消费kafka额外的参数 + * @return + * 转换成json字符串后的Dataset + */ + def loadKafkaParseJson(tableName: String = "kafka", + extraOptions: Map[String, String] = null, + keyNum: Int = 1): DataFrame = { + val msg = retry(5, 1000) { + KafkaUtils.getMsg(FireKafkaConf.kafkaBrokers(keyNum), FireKafkaConf.kafkaTopics(keyNum), null) + } + requireNonEmpty(msg, s"获取样例消息失败!请重启任务尝试重新获取,并保证topic[${FireKafkaConf.kafkaTopics(keyNum)}]持续的有新消息。") + val jsonDS = this.spark.createDataset(Seq(msg))(Encoders.STRING) + val jsonDF = this.spark.read.json(jsonDS) + + val kafkaDataset = this.loadKafka(extraOptions, keyNum) + val schemaDataset = kafkaDataset.select(from_json($"value", jsonDF.schema).as(tableName)).select(s"${tableName}.*") + schemaDataset.createOrReplaceTempView(tableName) + schemaDataset + } + + /** + * 解析DStream中每个rdd的json数据,并转为DataFrame类型 + * + * @param schema + * 目标DataFrame类型的schema + * @param isMySQL + * 是否为mysql解析的消息 + * @param fieldNameUpper + * 字段名称是否为大写 + * @param parseAll + * 是否需要解析所有字段信息 + * @return + */ + def kafkaJson2DFV(rdd: RDD[String], schema: Class[_], parseAll: Boolean = false, isMySQL: Boolean = true, fieldNameUpper: Boolean = false): DataFrame = { + rdd.kafkaJson2DFV(schema, parseAll, isMySQL, fieldNameUpper) + } + + /** + * 解析DStream中每个rdd的json数据,并转为DataFrame类型 + * + * @param schema + * 目标DataFrame类型的schema + * @param isMySQL + * 是否为mysql解析的消息 + * @param fieldNameUpper + * 字段名称是否为大写 + * @param parseAll + * 是否解析所有字段信息 + * @return + */ + def kafkaJson2DF(rdd: RDD[ConsumerRecord[String, String]], schema: Class[_], parseAll: Boolean = false, isMySQL: Boolean = true, fieldNameUpper: Boolean = false): DataFrame = { + rdd.kafkaJson2DF(schema, parseAll, isMySQL, fieldNameUpper) + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/SparkProvider.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/SparkProvider.scala new file mode 100644 index 0000000..4fcb97c --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/SparkProvider.scala @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.provider + +import com.zto.fire.core.ext.Provider +import com.zto.fire.spark.util.SparkSingletonFactory + +/** + * spark provider父接口 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 17:49 + */ +trait SparkProvider extends Provider { + protected lazy val spark = SparkSingletonFactory.getSparkSession + protected lazy val sc = spark.sparkContext +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/SqlProvider.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/SqlProvider.scala new file mode 100644 index 0000000..3fb170d --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/ext/provider/SqlProvider.scala @@ -0,0 +1,380 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.ext.provider + +import com.zto.fire._ +import com.zto.fire.common.conf.FireHiveConf +import com.zto.fire.spark.conf.FireSparkConf +import com.zto.fire.spark.udf.UDFs +import com.zto.fire.spark.util.SparkSingletonFactory +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{DataFrame, Dataset, SaveMode, SparkSession} +import org.apache.spark.streaming.dstream.DStream + +/** + * 为扩展层提供Spark SQL api + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-23 17:35 + */ +trait SqlProvider extends SparkProvider { + + /** + * 清理 RDD、DataFrame、Dataset、DStream、TableName 缓存 + * 等同于unpersist + * + * @param any + * RDD、DataFrame、Dataset、DStream、TableName + */ + def uncache(any: Any*): Unit = { + if (any != null && any.nonEmpty) { + any.foreach(elem => { + if (elem != null) { + if (elem.isInstanceOf[String]) { + val tableName = elem.asInstanceOf[String] + if (this.tableExists(tableName) && this.isCached(tableName)) { + SparkSingletonFactory.getSparkSession.sqlContext.uncacheTables(tableName) + } + } else if (elem.isInstanceOf[Dataset[_]]) { + elem.asInstanceOf[Dataset[_]].uncache + } else if (elem.isInstanceOf[DataFrame]) { + elem.asInstanceOf[DataFrame].uncache + } else if (elem.isInstanceOf[RDD[_]]) { + elem.asInstanceOf[RDD[_]].uncache + } else if (elem.isInstanceOf[DStream[_]]) { + elem.asInstanceOf[DStream[_]].uncache + } + } + }) + } + } + + /** + * 清理 RDD、DataFrame、Dataset、DStream、TableName 缓存 + * 等同于uncache + * + * @param any + * RDD、DataFrame、Dataset、DStream、TableName + */ + def unpersist(any: Any*): Unit = { + this.uncache(any: _*) + } + + /** + * 批量注册udf函数,包含系统内置的与用户自定义的 + */ + def registerUDF(): SparkSession = { + UDFs.registerSysUDF(SparkSingletonFactory.getSparkSession) + SparkSingletonFactory.getSparkSession + } + + /** + * 用于判断当前SparkSession下临时表或Hive表是否存在 + * + * @param tableName + * 表名 + * @return + * true:存在 false:不存在 + */ + def tableExists(tableName: String): Boolean = { + SparkSingletonFactory.getSparkSession.catalog.tableExists(tableName) + } + + /** + * 用于判断当前SparkSession下临时表或Hive表是否存在 + * + * @param tableName + * 表名 + * @return + * true:存在 false:不存在 + */ + def tableExists(dbName: String, tableName: String): Boolean = { + SparkSingletonFactory.getSparkSession.catalog.tableExists(dbName, tableName) + } + + /** + * 执行一段Hive QL语句,注册为临时表,持久化到hive中 + * + * @param sqlStr + * sql语句 + * @param tmpTableName + * 临时表名 + * @param saveMode + * 持久化的模式,默认为Overwrite + * @param cache + * 默认缓存表 + * @return + * 生成的DataFrame + */ + def sqlForPersistent(sqlStr: String, tmpTableName: String, partitionName: String, saveMode: SaveMode = SaveMode.valueOf(FireSparkConf.saveMode), cache: Boolean = true): DataFrame = { + SparkSingletonFactory.getSparkSession.sqlContext.sqlForPersistent(sqlStr, tmpTableName, partitionName, saveMode, cache) + } + + /** + * 执行一段Hive QL语句,注册为临时表,并cache + * + * @param sqlStr + * SQL语句 + * @param tmpTableName + * 临时表名 + * @return + * 生成的DataFrame + */ + def sqlForCache(sqlStr: String, tmpTableName: String): DataFrame = { + SparkSingletonFactory.getSparkSession.sqlContext.sqlForCache(sqlStr, tmpTableName) + } + + /** + * 执行一段Hive QL语句,注册为临时表 + * + * @param sqlStr + * SQL语句 + * @param tmpTableName + * 临时表名 + * @return + * 生成的DataFrame + */ + def sqlNoCache(sqlStr: String, tmpTableName: String): DataFrame = { + SparkSingletonFactory.getSparkSession.sqlContext.sqlNoCache(sqlStr, tmpTableName) + } + + /** + * 批量缓存多张表 + * + * @param tables + * 多个表名 + */ + def cacheTables(tables: String*): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.cacheTables(tables: _*) + } + + /** + * 判断表是否被缓存 + * + * @param tableName + * 表名 + * @return + */ + def isCached(tableName: String): Boolean = { + SparkSingletonFactory.getSparkSession.sqlContext.isCached(tableName) + } + + /** + * 判断表是否未被缓存 + * + * @param tableName + * 表名 + * @return + */ + def isNotCached(tableName: String): Boolean = !this.isCached(tableName) + + /** + * refresh给定的表 + * + * @param tables + * 表名 + */ + def refreshTables(tables: String*): Unit = { + if (tables != null) { + tables.filter(noEmpty(_)).foreach(table => SparkSingletonFactory.getSparkSession.catalog.refreshTable(table)) + } + } + + /** + * 缓存或刷新给定的表 + * 1. 当表未被cache时会首先进行cache + * 2. 当表已被cache,再次调用会进行refresh操作 + * + * @param tables + * 待cache或refresh的表名集合 + */ + def cacheOrRefreshTables(tables: String*): Unit = { + if (tables != null) { + tables.filter(noEmpty(_)).foreach(table => { + if (this.isNotCached(table)) this.cacheTables(table) else this.refreshTables(table) + }) + } + } + + /** + * 删除指定的hive表 + * + * @param tableNames + * 多个表名 + */ + def dropHiveTable(tableNames: String*): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.dropHiveTable(tableNames: _*) + } + + /** + * 为指定表添加分区 + * + * @param tableName + * 表名 + * @param partitions + * 分区 + */ + def addPartitions(tableName: String, partitions: String*): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.addPartitions(tableName, partitions: _*) + } + + /** + * 为指定表添加分区 + * + * @param tableName + * 表名 + * @param partition + * 分区 + * @param partitionName + * 分区字段名称,默认ds + */ + def addPartition(tableName: String, partition: String, partitionName: String = FireHiveConf.partitionName): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.addPartition(tableName, partition, partitionName) + } + + /** + * 为指定表删除分区 + * + * @param tableName + * 表名 + * @param partition + * 分区 + */ + def dropPartition(tableName: String, partition: String, partitionName: String = FireHiveConf.partitionName): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.dropPartition(tableName, partition, partitionName) + } + + /** + * 为指定表删除多个分区 + * + * @param tableName + * 表名 + * @param partitions + * 分区 + */ + def dropPartitions(tableName: String, partitions: String*): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.dropPartitions(tableName, partitions: _*) + } + + /** + * 根据给定的表创建新表 + * + * @param srcTableName + * 源表 + * @param destTableName + * 目标表 + */ + def createTableAsSelect(srcTableName: String, destTableName: String): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.createTableAsSelect(srcTableName, destTableName) + } + + /** + * 根据一张表创建另一张表 + * + * @param tableName + * 表名 + * @param destTableName + * 目标表名 + */ + def createTableLike(tableName: String, destTableName: String): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.createTableLike(tableName, destTableName) + } + + /** + * 根据给定的表创建新表 + * + * @param srcTableName + * 来源表 + * @param destTableName + * 目标表 + * @param cols + * 多个列,逗号分隔 + */ + def createTableAsSelectFields(srcTableName: String, destTableName: String, cols: String): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.createTableAsSelectFields(srcTableName, destTableName, cols) + } + + /** + * 将数据插入到指定表的分区中 + * + * @param srcTableName + * 来源表 + * @param destTableName + * 目标表 + * @param ds + * 分区名 + * @param cols + * 多个列,逗号分隔 + */ + def insertIntoPartition(srcTableName: String, destTableName: String, ds: String, cols: String, partitionName: String = FireHiveConf.partitionName): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.insertIntoPartition(srcTableName, destTableName, ds, cols, partitionName) + } + + /** + * 将sql执行结果插入到目标表指定分区中 + * + * @param destTableName + * 目标表名 + * @param ds + * 分区名 + * @param querySQL + * 查询语句 + */ + def insertIntoPartitionAsSelect(destTableName: String, ds: String, querySQL: String, partitionName: String = FireHiveConf.partitionName, overwrite: Boolean = false): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.insertIntoPartitionAsSelect(destTableName, ds, querySQL, partitionName, overwrite) + } + + /** + * 将sql执行结果插入到目标表指定分区中 + * + * @param destTableName + * 目标表名 + * @param querySQL + * 查询sql语句 + */ + def insertIntoDymPartitionAsSelect(destTableName: String, querySQL: String, partitionName: String = FireHiveConf.partitionName): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.insertIntoDymPartitionAsSelect(destTableName, querySQL, partitionName) + } + + /** + * 修改表名 + * + * @param oldTableName + * 表名称 + * @param newTableName + * 新的表名 + */ + def rename(oldTableName: String, newTableName: String): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.rename(oldTableName, newTableName) + } + + /** + * 将表从一个db移动到另一个db中 + * + * @param tableName + * 表名 + * @param oldDB + * 老库名称 + * @param newDB + * 新库名称 + */ + def moveDB(tableName: String, oldDB: String, newDB: String): Unit = { + SparkSingletonFactory.getSparkSession.sqlContext.moveDB(tableName, oldDB, newDB) + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/rest/SparkSystemRestful.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/rest/SparkSystemRestful.scala new file mode 100644 index 0000000..d683917 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/rest/SparkSystemRestful.scala @@ -0,0 +1,515 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.rest + +import com.google.common.collect.Table +import com.zto.fire.common.anno.Rest +import com.zto.fire.common.bean.rest.ResultMsg +import com.zto.fire.common.bean.rest.spark.{ColumnMeta, FunctionMeta, SparkInfo, TableMeta} +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.common.enu.{ErrorCode, RequestMethod} +import com.zto.fire.common.util.{ExceptionBus, _} +import com.zto.fire.core.rest.{RestCase, SystemRestful} +import com.zto.fire.spark.BaseSpark +import org.apache.commons.lang3.StringUtils +import spark._ +import com.zto.fire._ + +import java.util + + +/** + * 系统预定义的restful服务,为Spark计算引擎提供接口服务 + * + * @author ChengLong 2019-3-16 10:16:38 + */ +private[fire] class SparkSystemRestful(val baseSpark: BaseSpark) extends SystemRestful(baseSpark) { + private var sparkInfoBean: SparkInfo = _ + + /** + * 注册Spark引擎接口 + */ + override def register: Unit = { + this.baseSpark.restfulRegister + .addRest(RestCase(RequestMethod.DELETE.toString, s"/system/kill", kill)) + .addRest(RestCase(RequestMethod.DELETE.toString, s"/system/cancelJob", cancelJob)) + .addRest(RestCase(RequestMethod.DELETE.toString, s"/system/cancelStage", cancelStage)) + .addRest(RestCase(RequestMethod.POST.toString, s"/system/sql", sql)) + .addRest(RestCase(RequestMethod.GET.toString, s"/system/sparkInfo", sparkInfo)) + .addRest(RestCase(RequestMethod.GET.toString, s"/system/counter", counter)) + .addRest(RestCase(RequestMethod.GET.toString, s"/system/multiCounter", multiCounter)) + .addRest(RestCase(RequestMethod.POST.toString, s"/system/multiTimer", multiTimer)) + .addRest(RestCase(RequestMethod.POST.toString, s"/system/log", log)) + .addRest(RestCase(RequestMethod.POST.toString, s"/system/env", env)) + .addRest(RestCase(RequestMethod.GET.toString, s"/system/listDatabases", listDatabases)) + .addRest(RestCase(RequestMethod.POST.toString, s"/system/listTables", listTables)) + .addRest(RestCase(RequestMethod.POST.toString, s"/system/listColumns", listColumns)) + .addRest(RestCase(RequestMethod.POST.toString, s"/system/listFunctions", listFunctions)) + .addRest(RestCase(RequestMethod.POST.toString, s"/system/setConf", setConf)) + .addRest(RestCase(RequestMethod.GET.toString, s"/system/datasource", datasource)) + .addRest(RestCase(RequestMethod.POST.toString, s"/system/collectDatasource", collectDatasource)) + } + + /** + * 用于更新配置信息 + */ + @Rest("/system/setConf") + def setConf(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + this.logger.info(s"请求fire更新配置信息:$json") + val confMap = JSONUtils.parseObject[java.util.HashMap[String, String]](json) + if (ValueUtils.noEmpty(confMap)) { + PropUtils.setProperties(confMap) + this.baseSpark._conf.setAll(PropUtils.settings) + this.baseSpark.acc.broadcastNewConf(this.baseSpark.sc, this.baseSpark._conf) + } + msg.buildSuccess("配置信息已更新", ErrorCode.SUCCESS.toString) + } catch { + case e: Exception => { + this.logger.error(s"[setConf] 设置配置信息失败:json=$json", e) + msg.buildError("设置配置信息失败", ErrorCode.ERROR) + } + } + } + + /** + * 根据函数信息 + */ + @Rest("/system/listFunctions") + def listFunctions(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + // 参数合法性检查 + val dbName = JSONUtils.getValue(json, "dbName", "") + + // 获取已注册的函数 + val funList = new util.LinkedList[FunctionMeta]() + if (StringUtils.isNotBlank(dbName)) { + this.baseSpark.catalog.listFunctions(dbName).collect().foreach(fun => { + funList.add(new FunctionMeta(fun.description, fun.database, fun.name, fun.className, fun.isTemporary)) + }) + } else { + this.baseSpark.catalog.listFunctions().collect().foreach(fun => { + funList.add(new FunctionMeta(fun.description, fun.database, fun.name, fun.className, fun.isTemporary)) + }) + } + this.logger.info(s"[listFunctions] 获取[$dbName]函数信息成功:json=$json") + msg.buildSuccess(funList, s"获取[$dbName]函数信息成功") + } catch { + case e: Exception => { + this.logger.error(s"[log] 获取函数信息失败:json=$json", e) + msg.buildError("获取函数信息失败", ErrorCode.ERROR) + } + } + } + + /** + * 根据表名获取字段信息 + */ + @Rest("/system/listColumns") + def listColumns(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + // 参数合法性检查 + val dbName = JSONUtils.getValue(json, "dbName", "memory") + val tableName = JSONUtils.getValue(json, "tableName", "") + if (StringUtils.isBlank(dbName) || StringUtils.isBlank(tableName)) { + return msg.buildError("获取表元字段信息失败,库名和表名不能为空", ErrorCode.PARAM_ILLEGAL) + } + + // 区分内存临时表和物理表 + val columns = if ("memory".equals(dbName)) { + this.baseSpark.catalog.listColumns(tableName) + } else { + this.baseSpark.catalog.listColumns(dbName, tableName) + } + + // 将字段元数据信息封装 + val columnList = new util.LinkedList[ColumnMeta] + columns.collect().foreach(column => { + val meta = new ColumnMeta.Builder().setColumnName(column.name) + .setBucket(column.isBucket) + .setDatabase(dbName) + .setDataType(column.dataType) + .setTableName(tableName) + .setDescription(column.description) + .setNullable(column.nullable) + .setPartition(column.isPartition).build() + columnList.add(meta) + }) + + this.logger.info(s"[listColumns] 获取[$dbName.$tableName]字段信息成功:json=$json") + msg.buildSuccess(columnList, s"获取[$dbName.$tableName]字段信息成功") + } catch { + case e: Exception => { + this.logger.error(s"[log] 获取表字段信息失败:json=$json", e) + msg.buildError("获取表字段信息失败", ErrorCode.ERROR) + } + } + } + + /** + * 获取指定数据库下所有的表信息 + */ + @Rest("/system/listTables") + def listTables(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + // 参数合法性检查 + val dbName = JSONUtils.getValue(json, "dbName", "memory") + if (StringUtils.isBlank(dbName)) { + return msg.buildError("获取表元数据信息失败,库名不能为空", ErrorCode.PARAM_ILLEGAL) + } + + val tableList = new util.LinkedList[TableMeta] + if ("memory".equals(dbName)) { + // 内存临时表元数据信息 + this.baseSpark.catalog.listTables().collect().foreach(table => { + if (StringUtils.isBlank(table.database)) { + tableList.add(new TableMeta(table.description, "memory", table.name, table.tableType, table.isTemporary)) + } + }) + } else { + // 获取hive表元数据信息 + this.baseSpark.catalog.listTables(dbName).collect().foreach(table => { + if (StringUtils.isNotBlank(table.database)) { + tableList.add(new TableMeta(table.description, table.database, table.name, table.tableType, table.isTemporary)) + } + }) + } + this.logger.info(s"[listTables] 获取[$dbName]表元数据信息成功:json=$json") + msg.buildSuccess(tableList, s"获取[$dbName]表元数据信息成功") + } catch { + case e: Exception => { + this.logger.error(s"[log] 获取表元数据信息失败:json=$json", e) + msg.buildError("获取表元数据信息失败", ErrorCode.ERROR) + } + } + } + + /** + * 获取数据库列表 + */ + @Rest("/system/listDatabases") + def listDatabases(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + try { + // 获取所有的数据库名称 + val dbList = new util.LinkedList[String]() + this.baseSpark.catalog.listDatabases().collect().foreach(db => dbList.add(db.name)) + // 由于spark临时表没有库名,此处约定memory统一作为临时表所在的库 + dbList.add("memory") + + this.logger.info(s"[listDatabases] 获取数据库列表成功") + msg.buildSuccess(dbList, "获取数据库列表成功") + } catch { + case e: Exception => { + this.logger.error(s"[log] 获取数据库列表失败", e) + msg.buildError("获取数据库列表失败", ErrorCode.ERROR) + } + } + } + + /** + * 获取counter累加器中的值 + */ + @Rest("/system/counter") + def counter(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + val counter = this.baseSpark.acc.getCounter + this.logger.info(s"[counter] 获取单值累加器成功:counter=$counter") + msg.buildSuccess(counter, "获取单值累加器成功") + } catch { + case e: Exception => { + this.logger.error(s"[log] 获取单值累加器失败:json=$json", e) + msg.buildError("获取多值累加器失败", ErrorCode.ERROR) + } + } + } + + /** + * 获取多值累加器中的值 + */ + @Rest("/system/multiCounter") + def multiCounter(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + this.logger.info(s"[multiCounter] 获取多值累加器成功") + msg.buildSuccess(this.baseSpark.acc.getMultiCounter, "获取多值累加器成功") + } catch { + case e: Exception => { + this.logger.error(s"[log] 获取多值累加器失败:json=$json", e) + msg.buildError("获取多值累加器失败", ErrorCode.ERROR) + } + } + } + + /** + * 获取timer累加器中的值 + */ + @Rest("/system/multiTimer") + def multiTimer(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + val cells = new util.HashSet[Table.Cell[String, String, Long]]() + cells.addAll(this.baseSpark.acc.getMultiTimer.cellSet()) + val clear = JSONUtils.getValue(json, "clear", false) + + if (clear) this.baseSpark.acc.multiTimer.reset + this.logger.info(s"[multiTimer] 获取timer累加器成功") + + msg.buildSuccess(cells, "获取timer累加器成功") + } catch { + case e: Exception => { + this.logger.error(s"[log] 获取timer累加器失败:json=$json", e) + msg.buildError("获取timer累加器失败", ErrorCode.ERROR) + } + } + } + + /** + * 获取运行时日志 + */ + @Rest("/system/log") + def log(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + val logs = new StringBuilder("[") + this.baseSpark.acc.getLog.iterator().foreach(log => { + logs.append(log + ",") + }) + + // 参数校验与参数获取 + val clear = JSONUtils.getValue(json, "clear", false) + if (clear) this.baseSpark.acc.logAccumulator.reset + + if (logs.length > 0 && logs.endsWith(",")) { + this.logger.info(s"[log] 日志获取成功:json=$json") + msg.buildSuccess(logs.substring(0, logs.length - 1) + "]", "日志获取成功") + } else { + this.logger.info(s"[log] 日志记录数为空:json=$json") + msg.buildError("日志记录数为空", ErrorCode.NOT_FOUND) + } + } catch { + case e: Exception => { + this.logger.error(s"[log] 日志获取失败:json=$json", e) + msg.buildError("日志获取失败", ErrorCode.ERROR) + } + } + } + + /** + * 获取运行时状态信息,包括GC、jvm、thread、memory、cpu等 + */ + @Rest("/system/env") + def env(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + val envInfo = new StringBuilder("[") + this.baseSpark.acc.getEnv.iterator().foreach(env => { + envInfo.append(env + ",") + }) + + // 参数校验与参数获取 + val clear = JSONUtils.getValue(json, "clear", false) + if (clear) this.baseSpark.acc.logAccumulator.reset + + if (envInfo.length > 0 && envInfo.endsWith(",")) { + this.logger.info(s"[env] 运行时信息获取成功:json=$json") + msg.buildSuccess(envInfo.substring(0, envInfo.length - 1) + "]", "运行时信息获取成功") + } else { + this.logger.info(s"[env] 运行时信息记录数为空:json=$json") + msg.buildError("运行时信息记录数为空", ErrorCode.NOT_FOUND) + } + } catch { + case e: Exception => { + this.logger.error(s"[env] 运行时信息获取失败:json=$json", e) + msg.buildError("运行时信息获取失败", ErrorCode.ERROR) + } + } + } + + /** + * kill 当前 Spark 任务 + */ + @Rest("/system/kill") + def kill(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + // 参数校验与参数获取 + val stopGracefully = JSONUtils.getValue(json, "stopGracefully", true) + this.baseSpark.after(this.baseSpark.args) + this.baseSpark.shutdown(stopGracefully) + ProcessUtil.executeCmds(s"yarn application -kill ${this.baseSpark.applicationId}", s"kill -9 ${OSUtils.getPid}") + this.logger.info(s"[kill] kill任务成功:json=$json") + System.exit(0) + msg.buildSuccess("任务停止成功", ErrorCode.SUCCESS.toString) + } catch { + case e: Exception => { + this.logger.error(s"[kill] 执行kill任务失败:json=$json", e) + msg.buildError("执行kill任务失败", ErrorCode.ERROR) + } + } + } + + /** + * 取消job的执行 + */ + @Rest("/system/cancelJob") + def cancelJob(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + // 参数校验与参数获取 + val jobId = JSONUtils.getValue(json, "id", -1) + if (jobId <= 0) { + this.logger.warn(s"[cancelJob] 参数不合法:json=$json") + return msg.buildError(s"参数不合法:json=$json", ErrorCode.ERROR) + } + + this.baseSpark.sc.cancelJob(jobId, s"被管控平台kill:${DateFormatUtils.formatCurrentDateTime()}") + this.logger.info(s"[cancelJob] kill job成功:json=$json") + msg.buildSuccess("kill job 成功", ErrorCode.SUCCESS.toString) + } catch { + case e: Exception => { + this.logger.error(s"[cancelJob] kill job失败:json=$json", e) + msg.buildError("kill job失败", ErrorCode.ERROR) + } + } + } + + /** + * 取消stage的执行 + */ + @Rest("/system/cancelStage") + def cancelStage(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + // 参数校验与参数获取 + val stageId = JSONUtils.getValue(json, "id", -1) + if (stageId <= 0) { + this.logger.warn(s"[cancelStage] 参数不合法:json=$json") + return msg.buildError(s"参数不合法:json=$json", ErrorCode.ERROR) + } + + this.baseSpark.sc.cancelStage(stageId, s"被管控平台kill:${DateFormatUtils.formatCurrentDateTime()}") + this.logger.info(s"[cancelStage] kill stage[${stageId}] 成功:json=$json") + msg.buildSuccess("kill stage 成功", ErrorCode.SUCCESS.toString) + } catch { + case e: Exception => { + this.logger.error(s"[cancelStage] kill stage失败:json=$json", e) + msg.buildError("kill stage失败", ErrorCode.ERROR) + } + } + } + + + /** + * 用于执行sql语句 + */ + @Rest(value = "/system/sql", method = "post") + def sql(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + // 参数校验与参数获取 + val sql = JSONUtils.getValue(json, "sql", "") + + // sql合法性检查 + if (StringUtils.isBlank(sql) || !sql.toLowerCase.trim.startsWith("select ")) { + this.logger.warn(s"[sql] sql不合法,在线调试功能只支持查询操作:json=$json") + return msg.buildError(s"sql不合法,在线调试功能只支持查询操作", ErrorCode.ERROR) + } + + if (this.baseSpark == null || this.baseSpark._spark == null) { + this.logger.warn(s"[sql] 系统正在初始化,请稍后再试:json=$json") + return "系统正在初始化,请稍后再试" + } + + val sqlResult = this.baseSpark._spark.sql(sql.replace("memory.", "")).limit(1000).showString() + this.logger.info(s"成功执行以下查询:${sql}\n执行结果如下:\n" + sqlResult) + msg.buildSuccess(sqlResult, ErrorCode.SUCCESS.toString) + } catch { + case e: Exception => { + this.logger.error(s"[sql] 执行用户sql失败:json=$json", e) + msg.buildError("执行用户sql失败,异常堆栈:" + ExceptionBus.stackTrace(e), ErrorCode.ERROR) + } + } + } + + /** + * 获取当前的spark运行时信息 + */ + @Rest("/system/sparkInfo") + def sparkInfo(request: Request, response: Response): AnyRef = { + val msg = new ResultMsg + val json = request.body + try { + if (this.sparkInfoBean == null) { + this.sparkInfoBean = new SparkInfo + this.sparkInfoBean.setAppName(this.baseSpark.appName) + this.sparkInfoBean.setClassName(this.baseSpark.className) + this.sparkInfoBean.setFireVersion(FireFrameworkConf.fireVersion) + this.sparkInfoBean.setConf(this.baseSpark._spark.conf.getAll) + this.sparkInfoBean.setVersion(this.baseSpark.sc.version) + this.sparkInfoBean.setMaster(this.baseSpark.sc.master) + this.sparkInfoBean.setApplicationId(this.baseSpark.sc.applicationId) + this.sparkInfoBean.setApplicationAttemptId(this.baseSpark.sc.applicationAttemptId.getOrElse("")) + this.sparkInfoBean.setUi(this.baseSpark.webUI) + this.sparkInfoBean.setPid(OSUtils.getPid) + this.sparkInfoBean.setStartTime(DateFormatUtils.formatUnixDateTime(this.baseSpark.startTime * 1000)) + this.sparkInfoBean.setExecutorMemory(this.baseSpark.sc.getConf.get("spark.executor.memory", "1")) + this.sparkInfoBean.setExecutorInstances(this.baseSpark.sc.getConf.get("spark.executor.instances", "1")) + this.sparkInfoBean.setExecutorCores(this.baseSpark.sc.getConf.get("spark.executor.cores", "1")) + this.sparkInfoBean.setDriverCores(this.baseSpark.sc.getConf.get("spark.driver.cores", "1")) + this.sparkInfoBean.setDriverMemory(this.baseSpark.sc.getConf.get("spark.driver.memory", "1")) + this.sparkInfoBean.setDriverMemoryOverhead(this.baseSpark.sc.getConf.get("spark.yarn.driver.memoryOverhead", "0")) + this.sparkInfoBean.setDriverHost(this.baseSpark.sc.getConf.get("spark.driver.host", "0")) + this.sparkInfoBean.setDriverPort(this.baseSpark.sc.getConf.get("spark.driver.port", "0")) + this.sparkInfoBean.setRestPort(this.baseSpark.restfulRegister.restPort.toString) + this.sparkInfoBean.setExecutorMemoryOverhead(this.baseSpark.sc.getConf.get("spark.yarn.executor.memoryOverhead", "0")) + this.sparkInfoBean.setProperties(PropUtils.cover) + this.sparkInfoBean.computeCpuMemory() + } + this.sparkInfoBean.setUptime(DateFormatUtils.runTime(this.baseSpark.startTime)) + this.sparkInfoBean.setBatchDuration(this.baseSpark.batchDuration + "") + this.sparkInfoBean.setTimestamp(DateFormatUtils.formatCurrentDateTime()) + this.logger.info(s"[sparkInfo] 获取spark信息成功:json=$json") + msg.buildSuccess(this.sparkInfoBean, ErrorCode.SUCCESS.toString) + } catch { + case e: Exception => { + this.logger.error(s"[sparkInfo] 获取spark信息失败:json=$json", e) + msg.buildError("获取spark信息失败", ErrorCode.ERROR) + } + } + } + +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/sink/FireSink.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/sink/FireSink.scala new file mode 100644 index 0000000..a55a3dc --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/sink/FireSink.scala @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.sink + +import com.zto.fire.spark.util.{SparkSingletonFactory, SparkUtils} +import org.apache.spark.internal.Logging +import org.apache.spark.sql.DataFrame +import org.apache.spark.sql.execution.streaming.Sink + +/** + * Fire框架组件sink父类 + * + * @author ChengLong 2019年12月23日 10:09:55 + * @since 0.4.1 + */ +private[fire] abstract class FireSink extends Sink with Logging { + @volatile protected var latestBatchId = -1L + protected lazy val spark = SparkSingletonFactory.getSparkSession + + /** + * 将内部row类型的DataFrame转为Row类型的DataFrame + * + * @param df + * InternalRow类型的DataFrame + * @return + * Row类型的DataFrame + */ + protected def toExternalRow(df: DataFrame): DataFrame = { + SparkUtils.toExternalRow(df) + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/sink/JdbcStreamSink.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/sink/JdbcStreamSink.scala new file mode 100644 index 0000000..061ac01 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/sink/JdbcStreamSink.scala @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.sink + +import java.util.Objects + +import com.zto.fire._ +import com.zto.fire.jdbc.conf.FireJdbcConf +import org.apache.commons.lang3.StringUtils +import org.apache.spark.sql.DataFrame + +/** + * jdbc sink组件,支持jdbc操作 + * + * @param options + * jdbc相关参数 + * @author ChengLong 2019年12月23日 13:06:30 + * @since 0.4.1 + */ +class JdbcStreamSink(options: Map[String, String]) extends FireSink { + + override def addBatch(batchId: Long, data: DataFrame): Unit = { + println("latestBatchId=" + this.latestBatchId) + if (batchId <= latestBatchId) { + logInfo(s"Skipping already committed batch $batchId") + } else { + val sql = options.getOrElse("sql", "") + Objects.requireNonNull(sql, "sql语句不能为空.") + val fields = options.getOrElse("fields", "") + val batch = options.getOrElse("batch", FireJdbcConf.batchSize() + "").toInt + val keyNum = options.getOrElse("keyNum", "1").toInt + + this.toExternalRow(data).jdbcBatchUpdate(sql, if (StringUtils.isNotBlank(fields)) fields.split(",") else null, batch, keyNum) + latestBatchId = batchId + } + } +} \ No newline at end of file diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/task/SparkInternalTask.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/task/SparkInternalTask.scala new file mode 100644 index 0000000..f8f5e42 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/task/SparkInternalTask.scala @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.task + +import com.zto.fire._ +import com.zto.fire.common.anno.Scheduled +import com.zto.fire.common.conf.FireFrameworkConf +import com.zto.fire.common.util.{DatasourceManager, JSONUtils} +import com.zto.fire.core.task.FireInternalTask +import com.zto.fire.spark.BaseSpark + +/** + * 定时任务调度器,用于定时执行Spark框架内部指定的任务 + * + * @author ChengLong 2019年11月5日 10:11:31 + */ +private[fire] class SparkInternalTask(baseSpark: BaseSpark) extends FireInternalTask(baseSpark) { + + /** + * 定时采集运行时的jvm、gc、thread、cpu、memory、disk等信息 + * 并将采集到的数据存放到EnvironmentAccumulator中 + */ + @Scheduled(fixedInterval = 60000, scope = "all", initialDelay = 60000L, concurrent = false) + override def jvmMonitor: Unit = super.jvmMonitor + + + /*@Scheduled(fixedInterval = 10000, scope = "driver", initialDelay = 30000L, concurrent = false) + def showException: Unit = { + val queue = this.baseSpark.acc.getLog + queue.foreach(log => println(log)) + }*/ + + /** + * 数据源收集任务,收集driver与executor用到的数据源信息3600000L + */ + @Scheduled(fixedInterval = 10000L, scope = "all", initialDelay = 60000L, concurrent = false, repeatCount = 100) + def datasource: Unit = { + if (FireFrameworkConf.restEnable) { + val datasourceMap = DatasourceManager.get + if (datasourceMap.nonEmpty) { + val json = JSONUtils.toJSONString(datasourceMap) + this.restInvoke("/system/collectDatasource", json) + } + } + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/udf/UDFs.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/udf/UDFs.scala new file mode 100644 index 0000000..e723141 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/udf/UDFs.scala @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.udf + +import java.util.Date + +import com.zto.fire.common.util.{DateFormatUtils, NumberFormatUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.spark.sql.SparkSession + +/** + * 通用的自定义UDF工具函数集合 + * Created by ChengLong on 2017-01-06. + */ +object UDFs extends Serializable { + + /** + * 批量注册系统内置的udf函数 + */ + def registerSysUDF(spark: SparkSession): Unit = { + // ==================== 日期相关 ==================== + spark.udf.register("addTimer", Timer.addTimer _) + spark.udf.register("addYears", Timer.addYears _) + spark.udf.register("addMons", Timer.addMons _) + spark.udf.register("addDays", Timer.addDays _) + spark.udf.register("addHours", Timer.addHours _) + spark.udf.register("addMins", Timer.addMins _) + spark.udf.register("addSecs", Timer.addSecs _) + spark.udf.register("dateSchemaFormat", Timer.dateSchemaFormat _) + spark.udf.register("dateStrSchemaFormat", Timer.dateStrSchemaFormat _) + spark.udf.register("isSameDay", Timer.isSameDay _) + spark.udf.register("isBig", Timer.isBig _) + spark.udf.register("isSmall", Timer.isSmall _) + spark.udf.register("isBetween", Timer.isBetween _) + spark.udf.register("date", Timer.date _) + spark.udf.register("interval", Timer.interval _) + spark.udf.register("runTime", Timer.runTime _) + spark.udf.register("truncateMinute", Timer.truncateMinute _) + spark.udf.register("truncateHour", Timer.truncateHour _) + + // ==================== 字符串相关 ==================== + spark.udf.register("isNull", Str.isNull _) + spark.udf.register("isNotNull", Str.isNotNull _) + spark.udf.register("len", Str.len _) + spark.udf.register("reverse", Str.reverse _) + spark.udf.register("contains", Str.contains _) + + // ==================== 数字相关 ==================== + spark.udf.register("floor", Num.floor _) + spark.udf.register("long2Int", Num.long2Int _) + spark.udf.register("bigDecimal2Long", Num.bigDecimal2Long _) + spark.udf.register("ifnull", Num.ifnull _) + spark.udf.register("truncate", Num.truncate _) + spark.udf.register("truncate_decimal", Num.truncateDecimal _) + } + + /** + * 时间相关的udf函数 + * 时间戳格式为:yyyy-MM-dd hh:mm:ss + */ + object Timer { + + /** + * 指定时间字段,对日期进行加减 + * + * @param field + * 'year'、'month'、'day'、'hour'、'minute'、'second' + * @param dateTimeStr + * 格式:yyyy-MM-dd hh:mm:ss + * @param count + * 正负数 + * @return + * 计算后的日期 + */ + def addTimer(field: String, dateTimeStr: String, count: Int): String = { + DateFormatUtils.addTimer(field, dateTimeStr, count) + } + + /** + * 对指定的时间字段进行年度加减 + */ + def addYears(dateTimeStr: String, years: Int): String = { + DateFormatUtils.addYears(dateTimeStr, years) + } + + /** + * 对指定的时间字段进行月份加减 + */ + def addMons(dateTimeStr: String, mons: Int): String = { + DateFormatUtils.addMons(dateTimeStr, mons) + } + + /** + * 对指定的时间字段进行天加减 + */ + def addDays(dateTimeStr: String, days: Int): String = { + DateFormatUtils.addDays(dateTimeStr, days) + } + + /** + * 对指定的时间字段进行天加减 + */ + def addWeeks(dateTimeStr: String, weeks: Int): String = { + DateFormatUtils.addWeeks(dateTimeStr, weeks) + } + + /** + * 对指定的时间字段进行小时加减 + */ + def addHours(dateTimeStr: String, hours: Int): String = { + DateFormatUtils.addHours(dateTimeStr, hours) + } + + /** + * 对指定的时间字段进行分钟加减 + */ + def addMins(dateTimeStr: String, minutes: Int): String = { + DateFormatUtils.addMins(dateTimeStr, minutes) + } + + /** + * 对指定的时间字段进行秒钟加减 + */ + def addSecs(dateTimeStr: String, seconds: Int): String = { + DateFormatUtils.addSecs(dateTimeStr, seconds) + } + + /** + * 对字段进行格式转换 + */ + def dateStrSchemaFormat(dateTimeStr: String, srcSchema: String, destSchema: String): String = { + if (StringUtils.isBlank(dateTimeStr)) "" else DateFormatUtils.dateSchemaFormat(dateTimeStr, srcSchema, destSchema) + } + + /** + * 获取两个时间间隔的毫秒数 + * + * @param before + * 开始时间(小) + * @param after + * 结束时间(大) + * @return + */ + def interval(before: String, after: String): Long = { + DateFormatUtils.interval(before, after) + } + + /** + * 计算运行时长 + * + * @param time + * 形如:3日11时21分15秒 + */ + def runTime(time: Long): String = { + DateFormatUtils.runTime(time) + } + + /** + * 判断两个字段是否为同一天 + */ + def isSameDay(day1: String, day2: String): Boolean = { + DateFormatUtils.isSameDay(day1, day2) + } + + /** + * day1是否大于day2 + */ + def isBig(day1: String, day2: String): Boolean = { + DateFormatUtils.isBig(day1, day2) + } + + /** + * day1是否小于day2 + */ + def isSmall(day1: String, day2: String): Boolean = { + DateFormatUtils.isSmall(day1, day2) + } + + /** + * 指定字段日期是否介于day1与day2之间 + */ + def isBetween(day: String, day1: String, day2: String) = { + DateFormatUtils.isBetween(day, day1, day2) + } + + /** + * 截取到年月日 + */ + def date(dateTime: String): String = { + if (StringUtils.isNotBlank(dateTime) && dateTime.length > 10) dateTime.substring(0, 10) else dateTime + } + + /** + * 对字段进行格式转换 + */ + def dateSchemaFormat(dateTime: Date, srcSchema: String, destSchema: String): String = { + this.dateStrSchemaFormat(DateFormatUtils.formatDateTime(dateTime), srcSchema, destSchema) + } + + /** + * 将yyyy-MM-dd hh:mm:ss类型日期truncate为分钟 + */ + def truncateMinute(dateTime: String): String = { + DateFormatUtils.truncateMinute(dateTime) + } + + /** + * 获取整点小时 + */ + def truncateHour(dateStr: String): String = { + DateFormatUtils.truncateHour(dateStr) + } + } + + /** + * 对字段进行字符串相关操作 + */ + object Str { + + /** + * 如果字段为空,则返回true,否则返回false + */ + def isNull(field: String): Boolean = { + if (StringUtils.isBlank(field) || field.trim.length() == 0 || "null".equalsIgnoreCase(field.trim) || """\N""".equalsIgnoreCase(field.trim)) { + true + } else { + false + } + } + + /** + * 如果字段为空,则返回false,否则返回true + */ + def isNotNull(field: String): Boolean = { + !isNull(field) + } + + /** + * 计算长度 + */ + def len(field: String): Int = { + if (this.isNull(field)) 0 else field.length + } + + /** + * 字符串反转 + */ + def reverse(str: String): String = { + StringUtils.reverse(str) + } + + /** + * 是否包含 + * + * @param field + * 字段名称 + * @param str + * 包含的字符串 + * @return + */ + def contains(field: String, str: String): Boolean = { + if (StringUtils.isBlank(field) || StringUtils.isBlank(str)) { + false + } else { + field.contains(str) + } + } + } + + /** + * 数值相关 + */ + object Num { + + /** + * floor操作 + */ + def floor(field: Double): Int = { + NumberFormatUtils.floor(field) + } + + /** + * 将Long转为Integer + */ + def long2Int(field: java.lang.Long): java.lang.Integer = { + NumberFormatUtils.long2Int(field) + } + + /** + * 将BigDecimal转为Long类型 + */ + def bigDecimal2Long(field: java.math.BigDecimal): java.lang.Long = { + NumberFormatUtils.bigDecimal2Long(field) + } + + /** + * 判断是否为空 + */ + def ifnull(decimal: java.math.BigDecimal, defaultVal: java.math.BigDecimal): java.math.BigDecimal = { + NumberFormatUtils.ifnull(decimal, defaultVal) + } + + /** + * 类似于round,但不会四舍五入 + * + * @param value + * 目标值 + * @param scale + * 精度 + * @return + */ + def truncate(value: Double, scale: Int): Double = { + NumberFormatUtils.truncate(value, scale) + } + + /** + * 截取精度 + * + * @param scale + * 精度 + * @return + */ + def truncateDecimal(bigDecimal: java.math.BigDecimal, scale: Int): java.math.BigDecimal = { + NumberFormatUtils.truncateDecimal(bigDecimal, scale) + } + } + +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/KuduUtils.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/KuduUtils.scala new file mode 100644 index 0000000..d8a2e2e --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/KuduUtils.scala @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.util + +import java.lang.reflect.Field + +import com.zto.fire._ +import com.zto.fire.common.anno.FieldName +import com.zto.fire.common.util.ReflectionUtils +import com.zto.fire.spark.ext.module.KuduContextExt +import org.apache.commons.lang3.StringUtils +import org.apache.spark.sql.Row +import org.apache.spark.sql.hive.HiveContext +import org.apache.spark.sql.types._ + +import scala.collection.mutable.ListBuffer +import scala.reflect.{ClassTag, classTag} + +/** + * kudu工具类 + * + * @author ChengLong 2019-6-23 13:32:15 + */ +object KuduUtils { + + /** + * 将kudu的JavaBean转为Row + * 实体Class类型 + * + * @return + * Spark SQL Row对象 + */ + def kuduBean2Row[T: ClassTag](bean: T): Row = { + val beanClazz = classTag[T].runtimeClass + val values = ListBuffer[AnyRef]() + beanClazz.getDeclaredFields.foreach(field => { + ReflectionUtils.setAccessible(field) + val anno = field.getAnnotation(classOf[FieldName]) + if (anno != null && anno.id()) { + values += field.get(bean) + } + }) + Row(values: _*) + } + + /** + * 将kudu的JavaBean转为Row + * + * @param beanClazz + * 实体Class类型 + * @return + * Spark SQL Row对象 + */ + def bean2Row(beanClazz: Class[_]): Row = { + val fieldList = ListBuffer[Field]() + beanClazz.getDeclaredFields.foreach(field => { + ReflectionUtils.setAccessible(field) + val anno = field.getAnnotation(classOf[FieldName]) + val begin = if (anno == null) true else !anno.disuse() + if (begin) { + fieldList += field + } + }) + Row(fieldList) + } + + /** + * 将Row转为自定义bean,以Row中的Field为基准 + * bean中的field名称要与DataFrame中的field名称保持一致 + */ + def kuduRowToBean[T](row: Row, clazz: Class[T]): T = { + val obj = clazz.newInstance() + if (row != null && clazz != null) { + try { + row.schema.fieldNames.foreach(fieldName => { + clazz.getDeclaredFields.foreach(field => { + ReflectionUtils.setAccessible(field) + if (field.getName.equalsIgnoreCase(fieldName)) { + val index = row.fieldIndex(fieldName) + val fieldType = field.getType + if (fieldType eq classOf[String]) field.set(obj, row.getString(index)) + else if (fieldType eq classOf[java.lang.Integer]) field.set(obj, row.getAs[IntegerType](index)) + else if (fieldType eq classOf[java.lang.Double]) field.set(obj, row.getAs[DoubleType](index)) + else if (fieldType eq classOf[java.lang.Long]) field.set(obj, row.getAs[LongType](index)) + else if (fieldType eq classOf[java.math.BigDecimal]) field.set(obj, row.getAs[DecimalType](index)) + else if (fieldType eq classOf[java.lang.Float]) field.set(obj, row.getAs[FloatType](index)) + else if (fieldType eq classOf[java.lang.Boolean]) field.set(obj, row.getAs[BooleanType](index)) + else if (fieldType eq classOf[java.lang.Short]) field.set(obj, row.getAs[ShortType](index)) + else if (fieldType eq classOf[java.util.Date]) field.set(obj, row.getAs[DateType](index)) + } + }) + }) + } catch { + case e: Exception => e.printStackTrace() + } + } + obj + } + + /** + * 根据实体bean构建kudu表schema(只构建主键字段) + * + * @return StructField集合 + */ + def buildSchemaFromKuduBean(beanClazz: Class[_]): List[StructField] = { + val fieldMap = ReflectionUtils.getAllFields(beanClazz) + val strutFields = new ListBuffer[StructField]() + for (map <- fieldMap.entrySet) { + val field: Field = map.getValue + val fieldType: Class[_] = field.getType + val anno: FieldName = field.getAnnotation(classOf[FieldName]) + var fieldName: String = map.getKey + var nullable: Boolean = true + val begin = if (anno == null) { + false + } else { + if (StringUtils.isNotBlank(anno.value)) { + fieldName = anno.value + } + nullable = anno.nullable() + !anno.disuse + } + if (begin && anno.id) { + if (fieldType eq classOf[String]) strutFields += DataTypes.createStructField(fieldName, DataTypes.StringType, nullable) + else if (fieldType eq classOf[java.lang.Integer]) strutFields += DataTypes.createStructField(fieldName, DataTypes.IntegerType, nullable) + else if (fieldType eq classOf[java.lang.Double]) strutFields += DataTypes.createStructField(fieldName, DataTypes.DoubleType, nullable) + else if (fieldType eq classOf[java.lang.Long]) strutFields += DataTypes.createStructField(fieldName, DataTypes.LongType, nullable) + else if (fieldType eq classOf[java.math.BigDecimal]) strutFields += DataTypes.createStructField(fieldName, DataTypes.DoubleType, nullable) + else if (fieldType eq classOf[java.lang.Float]) strutFields += DataTypes.createStructField(fieldName, DataTypes.FloatType, nullable) + else if (fieldType eq classOf[java.lang.Boolean]) strutFields += DataTypes.createStructField(fieldName, DataTypes.BooleanType, nullable) + else if (fieldType eq classOf[java.lang.Short]) strutFields += DataTypes.createStructField(fieldName, DataTypes.ShortType, nullable) + else if (fieldType eq classOf[java.util.Date]) strutFields += DataTypes.createStructField(fieldName, DataTypes.DateType, nullable) + } + } + strutFields.toList + } + + /** + * 将表名包装为以impala::开头的表 + * + * @param tableName + * 库名.表名 + * @return + * 包装后的表名 + */ + def packageKuduTableName(tableName: String): String = { + if (StringUtils.isBlank(tableName)) throw new IllegalArgumentException("表名不能为空") + if (tableName.startsWith("impala::")) { + tableName + } else { + s"impala::$tableName" + } + } + + /** + * 以Map的方式获取Hive表的字段名称和类型 + * + * @param tableName + * db.hiveTable + * @return + * Map[FieldName, FieldType] + */ + def getTableSchemaAsMap(hiveContext: HiveContext, kuduContext: KuduContextExt, tableName: String): Map[String, String] = { + val dataFrame = if (tableName.startsWith("impala")) { + kuduContext.loadKuduTable(tableName) + } else { + hiveContext.table(tableName) + } + + dataFrame.schema.map(s => { + (s.name, s.dataType.simpleString) + }).toMap + } +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/RocketMQUtils.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/RocketMQUtils.scala new file mode 100644 index 0000000..afdaa45 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/RocketMQUtils.scala @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.util + +import com.zto.fire.common.conf.FireRocketMQConf +import com.zto.fire.common.util.{LogUtils, StringsUtils} +import org.apache.commons.lang3.StringUtils +import org.apache.rocketmq.spark.{ConsumerStrategy, RocketMQConfig} +import org.slf4j.LoggerFactory + +import com.zto.fire._ + +/** + * RocketMQ相关工具类 + * + * @author ChengLong + * @since 1.0.0 + * @create 2020-06-29 10:50 + */ +object RocketMQUtils { + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * rocketMQ配置信息 + * + * @param groupId + * 消费组 + * @return + * rocketMQ相关配置 + */ + def rocketParams(rocketParam: JMap[String, String] = null, + groupId: String = null, + rocketNameServer: String = null, + tag: String = null, + keyNum: Int = 1): JMap[String, String] = { + + val optionParams = if (rocketParam != null) rocketParam else new JHashMap[String, String]() + if (StringUtils.isNotBlank(groupId)) optionParams.put(RocketMQConfig.CONSUMER_GROUP, groupId) + + // rocket name server 配置 + val confNameServer = FireRocketMQConf.rocketNameServer(keyNum) + val finalNameServer = if (StringUtils.isNotBlank(confNameServer)) confNameServer else rocketNameServer + if (StringUtils.isNotBlank(finalNameServer)) optionParams.put(RocketMQConfig.NAME_SERVER_ADDR, finalNameServer) + + // tag配置 + val confTag = FireRocketMQConf.rocketConsumerTag(keyNum) + val finalTag = if (StringUtils.isNotBlank(confTag)) confTag else tag + if (StringUtils.isNotBlank(finalTag)) optionParams.put(RocketMQConfig.CONSUMER_TAG, finalTag) + + // 每个分区拉取的消息数 + val maxSpeed = FireRocketMQConf.rocketPullMaxSpeedPerPartition(keyNum) + if (StringUtils.isNotBlank(maxSpeed) && StringsUtils.isInt(maxSpeed)) optionParams.put(RocketMQConfig.MAX_PULL_SPEED_PER_PARTITION, maxSpeed) + + // 以spark.rocket.conf.开头的配置优先级最高 + val confMap = FireRocketMQConf.rocketConfMap(keyNum) + if (confMap.nonEmpty) optionParams.putAll(confMap) + // 日志记录RocketMQ的配置信息 + LogUtils.logMap(this.logger, optionParams.toMap, s"RocketMQ configuration. keyNum=$keyNum.") + + optionParams + } + + /** + * 根据消费位点字符串获取ConsumerStrategy实例 + * @param offset + * latest/earliest + */ + def valueOfStrategy(offset: String): ConsumerStrategy = { + if ("latest".equalsIgnoreCase(offset)) { + ConsumerStrategy.lastest + } else { + ConsumerStrategy.earliest + } + } + +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/SparkSingletonFactory.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/SparkSingletonFactory.scala new file mode 100644 index 0000000..a7c8bba --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/SparkSingletonFactory.scala @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.util + +import com.zto.fire.common.conf.{FireFrameworkConf, FireKuduConf} +import com.zto.fire.common.enu.JobType +import com.zto.fire.core.util.SingletonFactory +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.hbase.conf.FireHBaseConf +import com.zto.fire.spark.connector.HBaseBulkConnector +import com.zto.fire.spark.ext.module.KuduContextExt +import org.apache.commons.lang3.StringUtils +import org.apache.kudu.spark.kudu.KuduContext +import org.apache.spark.sql.SparkSession +import org.apache.spark.streaming.StreamingContext +import org.apache.spark.{SparkContext, SparkEnv} + +/** + * 单例工厂,用于创建单例的对象 + * Created by ChengLong on 2018-04-25. + */ +object SparkSingletonFactory extends SingletonFactory { + private[this] var sparkSession: SparkSession = _ + private[this] var streamingContext: StreamingContext = _ + @transient private[this] var hbaseContext: HBaseBulkConnector = _ + @transient private[this] var kuduContext: KuduContextExt = _ + + /** + * 获取SparkSession实例 + * + * @return + * SparkSession实例 + */ + def getSparkSession: SparkSession = this.synchronized { + this.sparkSession + } + + /** + * SparkSession赋值 + */ + private[fire] def setSparkSession(sparkSession: SparkSession): Unit = this.synchronized { + require(sparkSession != null, "SparkSession实例不能为空") + this.sparkSession = sparkSession + } + + /** + * 设置StreamingContext + * 允许重复赋值,兼容热重启导致的StreamingContext重新被创建 + */ + private[fire] def setStreamingContext(ssc: StreamingContext): Unit = this.synchronized { + require(ssc != null, "StreamingContext实例不能为空") + this.streamingContext = ssc + } + + /** + * 获取StreamingContext实例 + */ + def getStreamingContext: StreamingContext = this.synchronized { + assert(this.streamingContext != null, "StreamingContext还没初始化,请稍后再试") + this.streamingContext + } + + + /** + * 获取单例的HBaseContext对象 + * + * @param sparkContext + * SparkContext实例 + * @return + */ + def getHBaseContextInstance(sparkContext: SparkContext, keyNum: Int = 1): HBaseBulkConnector = this.synchronized { + if (this.hbaseContext == null && StringUtils.isNotBlank(FireHBaseConf.hbaseCluster())) { + this.hbaseContext = new HBaseBulkConnector(sparkContext, HBaseConnector.getConfiguration(keyNum)) + } + this.hbaseContext + } + + /** + * 获取单例的KuduContext对象 + * + * @param sparkContext + * SparkContext实例 + * @return + */ + def getKuduContextInstance(sparkContext: SparkContext): KuduContextExt = this.synchronized { + if (this.kuduContext == null && StringUtils.isNotBlank(FireKuduConf.kuduMaster)) { + val kuduContextTmp = new KuduContext(FireKuduConf.kuduMaster, sparkContext) + this.kuduContext = new KuduContextExt(this.sparkSession.sqlContext, kuduContextTmp) + } + this.kuduContext + } + +} diff --git a/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/SparkUtils.scala b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/SparkUtils.scala new file mode 100644 index 0000000..24ef443 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/com/zto/fire/spark/util/SparkUtils.scala @@ -0,0 +1,569 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.spark.util + +import com.zto.fire._ +import com.zto.fire.common.anno.FieldName +import com.zto.fire.common.conf.{FireFrameworkConf, FireHDFSConf, FireHiveConf} +import com.zto.fire.common.util._ +import com.zto.fire.spark.conf.FireSparkConf +import org.apache.commons.lang3.StringUtils +import org.apache.hadoop.conf.Configuration +import org.apache.spark.SparkEnv +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.CatalystTypeConverters +import org.apache.spark.sql.types._ +import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession} +import org.slf4j.LoggerFactory + +import java.lang.reflect.Field +import scala.collection.mutable.{ArrayBuffer, ListBuffer} +import scala.util.Try + + +/** + * Spark 相关的工具类 + * Created by ChengLong on 2016-11-24. + */ +object SparkUtils { + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * 将Row转为自定义bean,以JavaBean中的Field为基准 + * bean中的field名称要与DataFrame中的field名称保持一致 + */ + def sparkRowToBean[T](row: Row, clazz: Class[T]): T = { + val obj = clazz.newInstance() + if (row != null && clazz != null) { + try { + clazz.getDeclaredFields.foreach(field => { + ReflectionUtils.setAccessible(field) + val anno = field.getAnnotation(classOf[FieldName]) + // 如果没有加注解,或者加了注解但没有打disuse=true + if (anno == null || (anno != null && !anno.disuse())) { + val fieldName = if (anno != null && StringUtils.isNotBlank(anno.value())) anno.value() else field.getName + if (this.containsColumn(row, fieldName.trim)) { + val index = row.fieldIndex(fieldName.trim) + val fieldType = field.getType + if (fieldType eq classOf[String]) field.set(obj, row.getString(index)) + else if (fieldType eq classOf[java.lang.Integer]) field.set(obj, row.getAs[IntegerType](index)) + else if (fieldType eq classOf[java.lang.Double]) field.set(obj, row.getAs[DoubleType](index)) + else if (fieldType eq classOf[java.lang.Long]) field.set(obj, row.getAs[LongType](index)) + else if (fieldType eq classOf[java.math.BigDecimal]) field.set(obj, row.getAs[DecimalType](index)) + else if (fieldType eq classOf[java.lang.Float]) field.set(obj, row.getAs[FloatType](index)) + else if (fieldType eq classOf[java.lang.Boolean]) field.set(obj, row.getAs[BooleanType](index)) + else if (fieldType eq classOf[java.lang.Short]) field.set(obj, row.getAs[ShortType](index)) + else if (fieldType eq classOf[java.util.Date]) field.set(obj, row.getAs[DateType](index)) + } + } + }) + } catch { + case e: Exception => e.printStackTrace() + } + } + obj + } + + /** + * 将SparkRow迭代映射为对象的迭代 + * + * @param it + * Row迭代器 + * @param clazz + * 待映射的自定义JavaBean + * @tparam T + * 泛型 + * @return + * 映射为对象的集合 + */ + def sparkRowToBean[T](it: Iterator[Row], clazz: Class[T], toUppercase: Boolean = false): Iterator[T] = { + /** + * 用于索引给定的字段名称在Row中的index + * 同时兼容标注了@FieldName的字段可以被正常索引到 + */ + def fieldIndex(row: Row, fieldName: String, annoFieldName: String): Int = { + try { + row.fieldIndex(annoFieldName) + } catch { + case _: Exception => { + try { + row.fieldIndex(fieldName) + } catch { + case e: Exception => { + this.logger.error(s"将Spark Row转JavaBean失败,未能匹配${fieldName}或${annoFieldName}", e) + -1 + } + } + } + } + } + + val list = ListBuffer[T]() + if (it != null && clazz != null) { + val fields = clazz.getDeclaredFields + it.foreach(row => { + val obj = clazz.newInstance() + fields.foreach(field => { + ReflectionUtils.setAccessible(field) + val anno = field.getAnnotation(classOf[FieldName]) + // 如果没有加注解,或者加了注解但没有打disuse=true + if (anno == null || (anno != null && !anno.disuse())) { + var fieldName = if (anno != null && StringUtils.isNotBlank(anno.value())) anno.value() else field.getName + fieldName = if (toUppercase) fieldName.toUpperCase else fieldName + // 兼容标注了@FieldName的字段 + if (this.containsColumn(row, fieldName) || this.containsColumn(row, field.getName)) { + val index = fieldIndex(row, field.getName, fieldName.trim) + if (index >= 0) { + val fieldType = field.getType + if (fieldType eq classOf[String]) field.set(obj, row.getString(index)) + else if (fieldType eq classOf[java.lang.Integer]) field.set(obj, row.getAs[IntegerType](index)) + else if (fieldType eq classOf[java.lang.Long]) field.set(obj, row.getAs[LongType](index)) + else if (fieldType eq classOf[java.math.BigDecimal]) field.set(obj, row.getAs[DecimalType](index)) + else if (fieldType eq classOf[java.lang.Boolean]) field.set(obj, row.getAs[BooleanType](index)) + else if (fieldType eq classOf[java.lang.Double]) field.set(obj, row.getAs[DoubleType](index)) + else if (fieldType eq classOf[java.lang.Float]) field.set(obj, row.getAs[FloatType](index)) + else if (fieldType eq classOf[java.lang.Short]) field.set(obj, row.getAs[ShortType](index)) + else if (fieldType eq classOf[java.util.Date]) field.set(obj, row.getAs[DateType](index)) + } + } + } + }) + list += obj + }) + } + list.iterator + } + + /** + * 判断指定的Row中是否包含指定的列名 + * + * @param row + * DataFrame中的行 + * @param columnName + * 列名 + * @return + * true: 存在 false:不存在 + */ + def containsColumn(row: Row, columnName: String): Boolean = { + Try { + try { + row.fieldIndex(columnName) + } + }.isSuccess + } + + /** + * 根据实体bean构建schema信息 + * + * @return StructField集合 + */ + def buildSchemaFromBean(beanClazz: Class[_], upper: Boolean = false): List[StructField] = { + val fieldMap = ReflectionUtils.getAllFields(beanClazz) + val strutFields = new ListBuffer[StructField]() + for (map <- fieldMap.entrySet) { + val field: Field = map.getValue + val fieldType: Class[_] = field.getType + val anno: FieldName = field.getAnnotation(classOf[FieldName]) + var fieldName: String = map.getKey + var nullable: Boolean = true + val disuse = if (anno == null) { + false + } else { + if (StringUtils.isNotBlank(anno.value)) { + fieldName = anno.value + } + nullable = anno.nullable() + anno.disuse() + } + if (!disuse) { + if (upper) fieldName = fieldName.toUpperCase + if (fieldType eq classOf[String]) strutFields += DataTypes.createStructField(fieldName, DataTypes.StringType, nullable) + else if (fieldType eq classOf[java.lang.Integer]) strutFields += DataTypes.createStructField(fieldName, DataTypes.IntegerType, nullable) + else if (fieldType eq classOf[java.lang.Double]) strutFields += DataTypes.createStructField(fieldName, DataTypes.DoubleType, nullable) + else if (fieldType eq classOf[java.lang.Long]) strutFields += DataTypes.createStructField(fieldName, DataTypes.LongType, nullable) + else if (fieldType eq classOf[java.math.BigDecimal]) strutFields += DataTypes.createStructField(fieldName, DataTypes.DoubleType, nullable) + else if (fieldType eq classOf[java.lang.Float]) strutFields += DataTypes.createStructField(fieldName, DataTypes.FloatType, nullable) + else if (fieldType eq classOf[java.lang.Boolean]) strutFields += DataTypes.createStructField(fieldName, DataTypes.BooleanType, nullable) + else if (fieldType eq classOf[java.lang.Short]) strutFields += DataTypes.createStructField(fieldName, DataTypes.ShortType, nullable) + else if (fieldType eq classOf[java.util.Date]) strutFields += DataTypes.createStructField(fieldName, DataTypes.DateType, nullable) + } + } + strutFields.toList + } + + /** + * 获取kafka中json数据的before和after信息 + * + * @param beanClazz + * json数据对应的java bean类型 + * @param isMySQL + * 是否为mysql解析的消息 + * @param fieldNameUpper + * 字段名称是否为大写 + * @param parseAll + * 是否解析所有字段信息 + * @return + */ + def buildSchema2Kafka(beanClazz: Class[_], parseAll: Boolean = false, isMySQL: Boolean = true, fieldNameUpper: Boolean = false): StructType = { + if (parseAll) { + val structTypes = new StructType() + .add("table", StringType) + .add("op_type", StringType) + .add("op_ts", StringType) + .add("current_ts", StringType) + .add("gtid", StringType) + .add("logFile", StringType) + .add("offset", StringType) + .add("schema", StringType) + .add("when", StringType) + .add("after", StructType(SparkUtils.buildSchemaFromBean(beanClazz, fieldNameUpper))) + .add("before", StructType(SparkUtils.buildSchemaFromBean(beanClazz, fieldNameUpper))) + if (isMySQL) structTypes.add("pos", LongType) else structTypes.add("pos", StringType) + } else { + new StructType().add("table", StringType) + .add("after", StructType(SparkUtils.buildSchemaFromBean(beanClazz, fieldNameUpper))) + } + } + + + /** + * 获取表的全名 + * + * @param dbName + * 表所在的库名 + * @param tableName + * 表名 + * @return + * 库名.表名 + */ + def getFullTableName(dbName: String = FireHiveConf.defaultDB, tableName: String): String = { + val dbNameStr = if (StringUtils.isBlank(dbName)) FireHiveConf.defaultDB else dbName + s"$dbNameStr.$tableName" + } + + /** + * 分割topic列表,返回set集合 + * + * @param topics + * 多个topic以指定分隔符分割 + * @return + */ + def topicSplit(topics: String, splitStr: String = ","): Set[String] = { + requireNonEmpty(topics)("topic不能为空,请在配置文件中[ spark.kafka.topics ]配置") + topics.split(splitStr).filter(topic => StringUtils.isNotBlank(topic)).map(topic => topic.trim).toSet + } + + /** + * 获取webui地址 + */ + def getWebUI(spark: SparkSession): String = { + val optConf = spark.conf.getOption("spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES") + + if (optConf.isDefined) { + optConf.get + } else { + "" + } + } + + /** + * 获取applicationId + * + * @param spark + * @return + */ + def getApplicationId(spark: SparkSession): String = { + spark.sparkContext.applicationId + } + + /** + * 使用配置文件中的spark.streaming.batch.duration覆盖传参的batchDuration + * + * @param batchDuration + * 代码中指定的批次时间 + * @param hotRestart 是否热重启,热重启优先级最高 + * @return + * 被配置文件覆盖后的批次时间 + */ + def overrideBatchDuration(batchDuration: Long, hotRestart: Boolean): Long = { + if (hotRestart) return batchDuration + val confBathDuration = FireSparkConf.confBathDuration + if (confBathDuration == -1) { + batchDuration + } else { + Math.abs(confBathDuration) + } + } + + /** + * 获取spark任务的webUI地址信息 + * + * @return + */ + def getUI(webUI: String): String = { + val line = new StringBuilder() + webUI.split(",").foreach(url => { + line.append(StringsUtils.hrefTag(url) + StringsUtils.brTag("")) + }) + + line.toString() + } + + /** + * 用于判断当前是否为executor + * + * @return true: executor false: driver + */ + def isExecutor: Boolean = { + val executorId = this.getExecutorId + if (StringUtils.isNotBlank(executorId) && !"driver".equalsIgnoreCase(executorId)) true else false + } + + /** + * 获取当前executor id + * + * @return + * executor id或driver + */ + def getExecutorId: String = { + if (SparkEnv.get != null) SparkEnv.get.executorId else "" + } + + /** + * 获取入口类名 + */ + def getMainClass: String = { + if (SparkEnv.get != null) SparkEnv.get.conf.get(FireFrameworkConf.DRIVER_CLASS_NAME, "") else "" + } + + /** + * 用于判断当前是否为driver + * + * @return true: driver false: executor + */ + def isDriver: Boolean = { + val label = this.getExecutorId + if (StringUtils.isBlank(label) || "driver".equalsIgnoreCase(label)) true else false + } + + /** + * 是否是集群模式 + * + * @return + * true: 集群模式 false:本地模式 + */ + def isCluster: Boolean = { + OSUtils.isLinux + } + + /** + * 是否是本地模式 + * + * @return + * true: 本地模式 false:集群模式 + */ + def isLocal: Boolean = { + !isCluster + } + + /** + * 判断是否为yarn-client模式 + * + * @return + * true: yarn-client模式 + */ + def isYarnClientMode: Boolean = { + "client".equalsIgnoreCase(this.deployMode) + } + + /** + * 判断是否为yarn-cluster模式 + * + * @return + * true: yarn-cluster模式 + */ + def isYarnClusterMode: Boolean = { + "cluster".equalsIgnoreCase(this.deployMode) + } + + /** + * 获取spark任务运行模式 + */ + def deployMode: String = { + SparkSingletonFactory.getSparkSession.conf.get("spark.submit.deployMode") + } + + /** + * 优先从配置文件中获取配置信息,若获取不到,则从SparkEnv中获取 + * + * @param key + * 配置的key + * @param default + * 配置为空则返回default + * @return + * 配置的value + */ + def getConf(key: String, default: String = ""): String = { + var value = PropUtils.getString(key, default) + if (StringUtils.isBlank(value) && SparkEnv.get != null) { + value = SparkEnv.get.conf.get(key, default) + } + value + } + + /** + * 将指定的schema转为小写 + * + * @param schema + * 转为小写的列 + * @return + * 转为小写的field数组 + */ + def schemaToLowerCase(schema: StructType): ArrayBuffer[String] = { + val cols = ArrayBuffer[String]() + schema.foreach(field => { + val fieldName = field.name + cols += (s"$fieldName as ${fieldName.toLowerCase}") + }) + cols + } + + /** + * 将内部row类型的DataFrame转为Row类型的DataFrame + * + * @param df + * InternalRow类型的DataFrame + * @return + * Row类型的DataFrame + */ + def toExternalRow(df: DataFrame): DataFrame = { + val schema = df.schema + val mapedRowRDD = df.queryExecution.toRdd.mapPartitions { rows => + val converter = CatalystTypeConverters.createToScalaConverter(schema) + rows.map(converter(_).asInstanceOf[Row]) + } + SparkSingletonFactory.getSparkSession.createDataFrame(mapedRowRDD, schema) + } + + /** + * 从配置文件中读取并执行hive set的sql + */ + def executeHiveConfSQL(spark: SparkSession): Unit = { + if (spark != null) { + val confMap = FireHiveConf.hiveConfMap + confMap.foreach(kv => spark.sql(s"set ${kv._1}=${kv._2}")) + LogUtils.logMap(this.logger, confMap, "Execute hive sql conf.") + } + } + + /** + * 分配次执行指定的业务逻辑 + * + * @param rdd + * rdd.foreachPartition + * @param batch + * 多大批次执行一次sinkFun中定义的操作 + * @param mapFun + * 将Row类型映射为E类型的逻辑,并将处理后的数据放到listBuffer中 + * @param sinkFun + * 具体处理逻辑,将数据sink到目标源 + */ + def rddForeachPartitionBatch[T, E](rdd: RDD[T], mapFun: T => E, sinkFun: ListBuffer[E] => Unit, batch: Int = 1000): Unit = { + rdd.foreachPartition(it => { + var count: Int = 0 + val list = ListBuffer[E]() + + it.foreach(row => { + count += 1 + val result = mapFun(row) + if (result != null) list += result + + // 分批次执行 + if (count == Math.abs(batch)) { + sinkFun(list) + count = 0 + list.clear() + } + }) + + // 将剩余的数据一次执行掉 + if (list.nonEmpty) { + sinkFun(list) + list.clear() + } + }) + } + + /** + * 分配次执行指定的业务逻辑 + * + * @param df + * df.foreachPartition + * @param batch + * 多大批次执行一次sinkFun中定义的操作 + * @param mapFun + * 将Row类型映射为E类型的逻辑,并将处理后的数据放到listBuffer中 + * @param sinkFun + * 具体处理逻辑,将数据sink到目标源 + */ + def datasetForeachPartitionBatch[T, E](df: Dataset[T], mapFun: T => E, sinkFun: ListBuffer[E] => Unit, batch: Int = 1000): Unit = { + df.foreachPartition((it: Iterator[T]) => { + var count: Int = 0 + val list = ListBuffer[E]() + + it.foreach(row => { + count += 1 + val result = mapFun(row) + if (result != null) list += result + + // 分批次执行 + if (count == Math.abs(batch)) { + sinkFun(list) + count = 0 + list.clear() + } + }) + + // 将剩余的数据一次执行掉 + if (list.nonEmpty) { + sinkFun(list) + list.clear() + } + }) + } + + /** + * 配置化spark DataSource api中的options选项,可通过配置文件方式读取并覆盖代码中指定相同的配置项 + * + * @param options + * 可为空,如果为空,则必须在配置文件中指定 + * @param keyNum + * 用于区分多个数据源 + */ + def optionsEnhance(options: Map[String, String] = Map.empty, keyNum: Int = 1): Map[String, String] = { + val map = collection.mutable.Map[String, String]() + map ++= options + map ++= PropUtils.sliceKeysByNum(FireSparkConf.SPARK_DATASOURCE_OPTIONS_PREFIX, keyNum) + if (map.isEmpty) { + throw new IllegalArgumentException(s"spark datasource options不能为空,请通过配置文件指定,以${FireSparkConf.SPARK_DATASOURCE_OPTIONS_PREFIX}为前缀,以${keyNum}为后缀.") + } + this.logger.info(s"--> Spark DataSource options信息(keyNum=$keyNum)<--") + map.foreach(option => this.logger.info(s"${option._1} = ${option._2}")) + map.toMap + } +} diff --git a/fire-engines/fire-spark/src/main/scala/org/apache/rocketmq/spark/RocketMqUtils.scala b/fire-engines/fire-spark/src/main/scala/org/apache/rocketmq/spark/RocketMqUtils.scala new file mode 100644 index 0000000..18079e5 --- /dev/null +++ b/fire-engines/fire-spark/src/main/scala/org/apache/rocketmq/spark/RocketMqUtils.scala @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.rocketmq.spark + +import java.util.Properties +import java.{lang => jl, util => ju} + +import org.apache.commons.lang.StringUtils +import org.apache.rocketmq.client.consumer.DefaultMQPullConsumer +import org.apache.rocketmq.common.message.{Message, MessageExt, MessageQueue} +import org.apache.rocketmq.spark.streaming.{ReliableRocketMQReceiver, RocketMQReceiver} +import org.apache.spark.SparkContext +import org.apache.spark.api.java.{JavaRDD, JavaSparkContext} +import org.apache.spark.rdd.RDD +import org.apache.spark.storage.StorageLevel +import org.apache.spark.streaming.api.java.{JavaInputDStream, JavaStreamingContext} +import org.apache.spark.streaming.dstream.InputDStream +import org.apache.spark.streaming.{MQPullInputDStream, RocketMqRDD, StreamingContext} +import org.slf4j.LoggerFactory + +object RocketMqUtils { + private lazy val logger = LoggerFactory.getLogger(this.getClass) + + /** + * Scala constructor for a batch-oriented interface for consuming from rocketmq. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param sc SparkContext + * @param groupId it is for rocketMq for identifying the consumer + * @param offsetRanges offset ranges that define the RocketMq data belonging to this RDD + * @param optionParams optional configs, see [[RocketMQConfig]] for more details. + * @param locationStrategy map from TopicQueueId to preferred host for processing that partition. + * In most cases, use [[LocationStrategy.PreferConsistent]] + * @return RDD[MessageExt] + */ + def createRDD( + sc: SparkContext, + groupId: String, + offsetRanges: ju.Map[TopicQueueId, Array[OffsetRange]], + optionParams: ju.Map[String, String] = new ju.HashMap, + locationStrategy: LocationStrategy = PreferConsistent + ): RDD[MessageExt] = { + + val preferredHosts = locationStrategy match { + case PreferConsistent => ju.Collections.emptyMap[TopicQueueId, String]() + case PreferFixed(hostMap) => hostMap + } + new RocketMqRDD(sc, groupId, optionParams, offsetRanges, preferredHosts, false) + } + + /** + * Java constructor for a batch-oriented interface for consuming from rocketmq. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param jsc SparkContext + * @param groupId it is for rocketMq for identifying the consumer + * @param offsetRanges offset ranges that define the RocketMq data belonging to this RDD + * @param optionParams optional configs, see [[RocketMQConfig]] for more details. + * @param locationStrategy map from TopicQueueId to preferred host for processing that partition. + * In most cases, use [[LocationStrategy.PreferConsistent]] + * @return JavaRDD[MessageExt] + */ + def createJavaRDD( + jsc: JavaSparkContext, + groupId: String, + offsetRanges: ju.Map[TopicQueueId, Array[OffsetRange]], + optionParams: ju.Map[String, String] = new ju.HashMap, + locationStrategy: LocationStrategy = PreferConsistent + ): JavaRDD[MessageExt] = { + new JavaRDD(createRDD(jsc.sc, groupId, offsetRanges, optionParams, locationStrategy)) + } + + /** + * Scala constructor for a RocketMq DStream + * @param groupId it is for rocketMq for identifying the consumer + * @param topics the topics for the rocketmq + * @param consumerStrategy consumerStrategy In most cases, pass in [[ConsumerStrategy.lastest]], + * see [[ConsumerStrategy]] for more details + * @param autoCommit whether commit the offset to the rocketmq server automatically or not. If the user + * implement the [[OffsetCommitCallback]], the autoCommit must be set false + * @param forceSpecial Generally if the rocketmq server has checkpoint for the [[MessageQueue]], then the consumer + * will consume from the checkpoint no matter we specify the offset or not. But if forceSpecial is true, + * the rocketmq will start consuming from the specific available offset in any case. + * @param failOnDataLoss Zero data lost is not guaranteed when topics are deleted. If zero data lost is critical, + * the user must make sure all messages in a topic have been processed when deleting a topic. + * @param locationStrategy map from TopicQueueId to preferred host for processing that partition. + * In most cases, use [[LocationStrategy.PreferConsistent]] + * @param optionParams optional configs, see [[RocketMQConfig]] for more details. + * @return InputDStream[MessageExt] + */ + def createMQPullStream( + ssc: StreamingContext, + groupId: String, + topics: ju.Collection[jl.String], + consumerStrategy: ConsumerStrategy, + autoCommit: Boolean, + forceSpecial: Boolean, + failOnDataLoss: Boolean, + locationStrategy: LocationStrategy = PreferConsistent, + optionParams: ju.Map[String, String] = new ju.HashMap + ): InputDStream[MessageExt] = { + + new MQPullInputDStream(ssc, groupId, topics, optionParams, locationStrategy, consumerStrategy, autoCommit, forceSpecial, + failOnDataLoss) + } + + def createMQPullStream( + ssc: StreamingContext, + groupId: String, + topic: String, + consumerStrategy: ConsumerStrategy, + autoCommit: Boolean, + forceSpecial: Boolean, + failOnDataLoss: Boolean, + optionParams: ju.Map[String, String] + ): InputDStream[MessageExt] = { + val topics = new ju.ArrayList[String]() + topics.add(topic) + new MQPullInputDStream(ssc, groupId, topics, optionParams, PreferConsistent, consumerStrategy, autoCommit, forceSpecial, + failOnDataLoss) + } + + /** + * Java constructor for a RocketMq DStream + * @param groupId it is for rocketMq for identifying the consumer + * @param topics the topics for the rocketmq + * @param consumerStrategy consumerStrategy In most cases, pass in [[ConsumerStrategy.lastest]], + * see [[ConsumerStrategy]] for more details + * @param autoCommit whether commit the offset to the rocketmq server automatically or not. If the user + * implement the [[OffsetCommitCallback]], the autoCommit must be set false + * @param forceSpecial Generally if the rocketmq server has checkpoint for the [[MessageQueue]], then the consumer + * will consume from the checkpoint no matter we specify the offset or not. But if forceSpecial is true, + * the rocketmq will start consuming from the specific available offset in any case. + * @param failOnDataLoss Zero data lost is not guaranteed when topics are deleted. If zero data lost is critical, + * the user must make sure all messages in a topic have been processed when deleting a topic. + * @param locationStrategy map from TopicQueueId to preferred host for processing that partition. + * In most cases, use [[LocationStrategy.PreferConsistent]] + * @param optionParams optional configs, see [[RocketMQConfig]] for more details. + * @return JavaInputDStream[MessageExt] + */ + def createJavaMQPullStream( + ssc: JavaStreamingContext, + groupId: String, + topics: ju.Collection[jl.String], + consumerStrategy: ConsumerStrategy, + autoCommit: Boolean, + forceSpecial: Boolean, + failOnDataLoss: Boolean, + locationStrategy: LocationStrategy = PreferConsistent, + optionParams: ju.Map[String, String] = new ju.HashMap + ): JavaInputDStream[MessageExt] = { + val inputDStream = createMQPullStream(ssc.ssc, groupId, topics, consumerStrategy, + autoCommit, forceSpecial, failOnDataLoss, locationStrategy, optionParams) + new JavaInputDStream(inputDStream) + } + + def createJavaMQPullStream( + ssc: JavaStreamingContext, + groupId: String, + topics: ju.Collection[jl.String], + consumerStrategy: ConsumerStrategy, + autoCommit: Boolean, + forceSpecial: Boolean, + failOnDataLoss: Boolean): JavaInputDStream[MessageExt] = { + val inputDStream = createMQPullStream(ssc.ssc, groupId, topics, consumerStrategy, + autoCommit, forceSpecial, failOnDataLoss) + new JavaInputDStream(inputDStream) + } + + def mkPullConsumerInstance(groupId: String, optionParams: ju.Map[String, String], instance: String): DefaultMQPullConsumer = { + val consumer = new DefaultMQPullConsumer(groupId) + if (optionParams.containsKey(RocketMQConfig.PULL_TIMEOUT_MS)) + consumer.setConsumerTimeoutMillisWhenSuspend(optionParams.get(RocketMQConfig.PULL_TIMEOUT_MS).toLong) + val finalInstance = optionParams.getOrDefault("consumer.instance", instance) + + if (StringUtils.isNotBlank(finalInstance)) { + consumer.setInstanceName(finalInstance) + logger.warn(s"consumer.instance标识为:${finalInstance}") + } + + if (optionParams.containsKey(RocketMQConfig.NAME_SERVER_ADDR)) + consumer.setNamesrvAddr(optionParams.get(RocketMQConfig.NAME_SERVER_ADDR)) + + consumer.start() + consumer.setOffsetStore(consumer.getDefaultMQPullConsumerImpl.getOffsetStore) + consumer + } + + /** + * For creating Java push mode unreliable DStream + * @param jssc + * @param properties + * @param level + * @return + */ + def createJavaMQPushStream( + jssc: JavaStreamingContext, + properties: Properties, + level: StorageLevel + ): JavaInputDStream[Message] = createJavaMQPushStream(jssc, properties, level, false) + + /** + * For creating Java push mode reliable DStream + * @param jssc + * @param properties + * @param level + * @return + */ + def createJavaReliableMQPushStream( + jssc: JavaStreamingContext, + properties: Properties, + level: StorageLevel + ): JavaInputDStream[Message] = createJavaMQPushStream(jssc, properties, level, true) + + /** + * For creating Java push mode DStream + * @param jssc + * @param properties + * @param level + * @param reliable + * @return + */ + def createJavaMQPushStream( + jssc: JavaStreamingContext, + properties: Properties, + level: StorageLevel, + reliable: Boolean + ): JavaInputDStream[Message] = { + if (jssc == null || properties == null || level == null) return null + val receiver = if (reliable) new ReliableRocketMQReceiver(properties, level) else new RocketMQReceiver(properties, level) + val ds = jssc.receiverStream(receiver) + ds + } + +} diff --git a/fire-engines/pom.xml b/fire-engines/pom.xml new file mode 100644 index 0000000..9e72c31 --- /dev/null +++ b/fire-engines/pom.xml @@ -0,0 +1,83 @@ + + + + + 4.0.0 + fire-engines_2.12 + pom + fire-engines + + + com.zto.fire + fire-parent_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + fire-spark + fire-flink + + + + + com.zto.fire + fire-common_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-core_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-jdbc_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-hbase_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-metrics_${scala.binary.version} + ${project.version} + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-examples/flink-examples/pom.xml b/fire-examples/flink-examples/pom.xml new file mode 100644 index 0000000..11d430e --- /dev/null +++ b/fire-examples/flink-examples/pom.xml @@ -0,0 +1,288 @@ + + + + + 4.0.0 + flink-examples_${flink.reference} + jar + flink-examples + + + com.zto.fire + fire-examples_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + + com.zto.fire + fire-flink_${flink.reference} + ${project.version} + + + com.sparkjava + spark-core + ${sparkjava.version} + + + + org.apache.flink + flink-java + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-scala_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-streaming-scala_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-clients_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-runtime-web_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-queryable-state-runtime_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-queryable-state-client-java + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-statebackend-rocksdb_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-kafka_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.kafka + kafka_${scala.binary.version} + ${kafka.version} + ${maven.scope} + + + org.apache.flink + flink-table-api-java-bridge_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-api-java + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-api-scala-bridge_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-planner_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-planner-blink_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-table-common + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-hive_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-jdbc_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-json + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-connector-elasticsearch-base_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + org.apache.flink + flink-hadoop-compatibility_${scala.binary.version} + ${flink.version} + ${maven.scope} + + + + + org.apache.rocketmq + rocketmq-client + ${rocketmq.version} + + + org.apache.rocketmq + rocketmq-acl + ${rocketmq.version} + + + + org.apache.flink + flink-shaded-hadoop-2-uber + 2.6.5-8.0 + ${maven.scope} + + + javax.servlet + servlet-api + + + + + + + org.apache.hive + hive-exec + ${hive.apache.version} + ${maven.scope} + + + calcite-core + org.apache.calcite + + + + + + + org.apache.hbase + hbase-common + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-server + ${hbase.version} + ${maven.scope} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-client_${scala.binary.version} + ${hbase.version} + ${maven.scope} + + + calcite-core + org.apache.calcite + + + + + + + org.apache.hudi + hudi-flink-bundle_${scala.binary.version} + ${hudi.version} + ${maven.scope} + + + + org.apache.rocketmq + rocketmq-flink_${flink.major.version}_${scala.binary.version} + ${rocketmq.external.version} + + + + com.oracle + ojdbc6 + 11.2.0.3 + ${maven.scope} + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-examples/flink-examples/src/main/java/com/zto/fire/examples/bean/People.java b/fire-examples/flink-examples/src/main/java/com/zto/fire/examples/bean/People.java new file mode 100644 index 0000000..ba069e3 --- /dev/null +++ b/fire-examples/flink-examples/src/main/java/com/zto/fire/examples/bean/People.java @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.bean; + +import java.math.BigDecimal; +import java.util.LinkedList; +import java.util.List; + +public class People { + private Long id; + private String name; + private Integer age; + private Double length; + private BigDecimal data; + + public People() { + } + + public People(Long id, String name, Integer age, Double length, BigDecimal data) { + this.id = id; + this.name = name; + this.age = age; + this.length = length; + this.data = data; + } + + public static List createList() { + List list = new LinkedList<>(); + for (int i=0; i<10; i++) { + list.add(new People((long) i, "admin_" + i, i, i * 0.1, new BigDecimal(i * 10.1012))); + } + return list; + } +} diff --git a/fire-examples/flink-examples/src/main/java/com/zto/fire/examples/bean/Student.java b/fire-examples/flink-examples/src/main/java/com/zto/fire/examples/bean/Student.java new file mode 100644 index 0000000..a8959d0 --- /dev/null +++ b/fire-examples/flink-examples/src/main/java/com/zto/fire/examples/bean/Student.java @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.bean; + +import com.zto.fire.common.anno.FieldName; +import com.zto.fire.common.util.JSONUtils; +import com.zto.fire.hbase.bean.HBaseBaseBean; +import com.zto.fire.common.util.DateFormatUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.math.BigDecimal; +import java.util.Arrays; +import java.util.LinkedList; +import java.util.List; +import java.util.Objects; + +/** + * 对应HBase表的JavaBean + * + * @author ChengLong 2019-6-20 16:06:16 + */ +public class Student extends HBaseBaseBean { + @FieldName(value = "Student", disuse = true) + protected static final transient Logger logger = LoggerFactory.getLogger(Student.class); + private Long id; + private String name; + private Integer age; + // 多列族情况下需使用family单独指定 + private String createTime; + // 若JavaBean的字段名称与HBase中的字段名称不一致,需使用value单独指定 + // 此时hbase中的列名为length1,而不是length + // @FieldName(family = "info", value = "length1") + private BigDecimal length; + private Boolean sex; + + /** + * rowkey的构建 + * + * @return + */ + @Override + public Student buildRowKey() { + this.rowKey = this.id.toString(); + return this; + } + + public Student(Long id, String name) { + this.id = id; + this.name = name; + } + + public Student(Long id, String name, Integer age) { + this.id = id; + this.name = name; + this.age = age; + } + + public Student(Long id, String name, Integer age, BigDecimal length, Boolean sex, String createTime) { + this.id = id; + this.name = name; + this.age = age; + this.length = length; + this.sex = sex; + this.createTime = createTime; + } + + public Student(Long id, String name, Integer age, BigDecimal length) { + this.id = id; + this.name = name; + this.age = age; + this.length = length; + } + + public Student() { + } + + public Student(Long id) { + this.id = id; + } + + public String getCreateTime() { + return createTime; + } + + public void setCreateTime(String createTime) { + this.createTime = createTime; + } + + public BigDecimal getLength() { + return length; + } + + public void setLength(BigDecimal length) { + this.length = length; + } + + public Boolean getSex() { + return sex; + } + + public void setSex(Boolean sex) { + this.sex = sex; + } + + public Long getId() { + return id; + } + + public void setId(Long id) { + this.id = id; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public Integer getAge() { + return age; + } + + public void setAge(Integer age) { + this.age = age; + } + + @Override + public String toString() { + return JSONUtils.toJSONString(this); + } + + public static List newStudentList() { + String dateTime = DateFormatUtils.formatCurrentDateTime(); + return Arrays.asList( + new Student(1L, "admin", 12, BigDecimal.valueOf(12.1), true, dateTime), + new Student(2L, "root", 22, BigDecimal.valueOf(22), true, dateTime), + new Student(3L, "scala", 11, BigDecimal.valueOf(11), true, dateTime), + new Student(4L, "spark", 15, BigDecimal.valueOf(15), true, dateTime), + new Student(5L, "java", 16, BigDecimal.valueOf(16.1), true, dateTime), + new Student(6L, "hive", 17, BigDecimal.valueOf(17.1), true, dateTime), + new Student(7L, "presto", 18, BigDecimal.valueOf(18.1), true, dateTime), + new Student(8L, "flink", 19, BigDecimal.valueOf(19.1), true, dateTime), + new Student(9L, "streaming", 10, BigDecimal.valueOf(10.1), true, dateTime), + new Student(10L, "sql", 12, BigDecimal.valueOf(12.1), true, dateTime) + ); + } + + /** + * 构建student集合 + * + * @return + */ + public static List buildStudentList() { + List studentList = new LinkedList<>(); + try { + for (int i = 1; i <= 1; i++) { + Thread.sleep(500); + Student stu = new Student(1L, "root", i + 1, BigDecimal.valueOf((long) 1 + i), true, DateFormatUtils.formatCurrentDateTime()); + studentList.add(stu); + } + + for (int i = 1; i <= 2; i++) { + Thread.sleep(500); + Student stu = new Student(2L, "admin", i + 2, BigDecimal.valueOf(2019.05180919 + i), false, DateFormatUtils.formatCurrentDateTime()); + studentList.add(stu); + } + + for (int i = 1; i <= 3; i++) { + Thread.sleep(500); + Student stu = new Student(3L, "spark", i + 3, BigDecimal.valueOf(33.1415926 + i)); + studentList.add(stu); + } + + for (int i = 1; i <= 3; i++) { + Thread.sleep(500); + Student stu = new Student(4L, "flink", i + 4, BigDecimal.valueOf(4.2 + i), true, DateFormatUtils.formatCurrentDateTime()); + studentList.add(stu); + } + + for (int i = 1; i <= 3; i++) { + Thread.sleep(500); + Student stu = new Student(5L, "hadoop", i + 5, BigDecimal.valueOf(5.5 + i), false, DateFormatUtils.formatCurrentDateTime()); + studentList.add(stu); + } + for (int i = 1; i <= 3; i++) { + Thread.sleep(500); + Student stu = new Student(6L, "hbase", i + 6, BigDecimal.valueOf(66.66 + i), true, DateFormatUtils.formatCurrentDateTime()); + studentList.add(stu); + } + } catch (Exception e) { + logger.error("Sleep线程异常", e); + } + + return studentList; + } + + @Override + public boolean equals(Object o) { + if (this == o) return true; + if (!(o instanceof Student)) return false; + Student student = (Student) o; + return Objects.equals(id, student.id) && + Objects.equals(name, student.name) && + Objects.equals(age, student.age) && + Objects.equals(createTime, student.createTime) && + Objects.equals(length, student.length) && + Objects.equals(sex, student.sex); + } + + @Override + public int hashCode() { + return Objects.hash(id, name, age, createTime, length, sex); + } +} diff --git a/fire-examples/flink-examples/src/main/java/com/zto/fire/sql/SqlCommandParser.java b/fire-examples/flink-examples/src/main/java/com/zto/fire/sql/SqlCommandParser.java new file mode 100644 index 0000000..118f020 --- /dev/null +++ b/fire-examples/flink-examples/src/main/java/com/zto/fire/sql/SqlCommandParser.java @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.sql; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.IOUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.*; +import java.util.function.Function; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public final class SqlCommandParser { + + protected static final transient Logger logger = LoggerFactory.getLogger(SqlCommandParser.class); + + private SqlCommandParser() {} + + public static List parse(List lines) { + List calls = new ArrayList<>(); + StringBuilder stmt = new StringBuilder(); + for (String line : lines) { + if (line.trim().isEmpty() || line.startsWith("--")) { + // skip empty line and comment line + continue; + } + stmt.append("\n").append(line); + if (line.trim().endsWith(";")) { + Optional optionalCall = parse(stmt.toString()); + if (optionalCall.isPresent()) { + calls.add(optionalCall.get()); + } else { + throw new RuntimeException("Unsupported command '" + stmt.toString() + "'"); + } + // clear string builder + stmt.setLength(0); + } + } + return calls; + } + + public static Optional parse(String stmt) { + // normalize + stmt = stmt.trim(); + // remove ';' at the end + if (stmt.endsWith(";")) { + stmt = stmt.substring(0, stmt.length() - 1).trim(); + } + + // parse + for (SqlCommand cmd : SqlCommand.values()) { + final Matcher matcher = cmd.pattern.matcher(stmt); + if (matcher.matches()) { + final String[] groups = new String[matcher.groupCount()]; + for (int i = 0; i < groups.length; i++) { + groups[i] = matcher.group(i + 1); + } + return cmd.operandConverter.apply(groups) + .map((operands) -> new SqlCommandCall(cmd, operands)); + } + } + return Optional.empty(); + } + + private static final Function> NO_OPERANDS = + (operands) -> Optional.of(new String[0]); + + private static final Function> SINGLE_OPERAND = + (operands) -> Optional.of(new String[]{operands[0]}); + + private static final int DEFAULT_PATTERN_FLAGS = Pattern.CASE_INSENSITIVE | Pattern.DOTALL; + + /** + * Supported SQL commands. + */ + public enum SqlCommand { + INSERT_INTO( + "(INSERT\\s+INTO.*)", + SINGLE_OPERAND), + + CREATE_TABLE( + "(CREATE\\s+TABLE.*)", + SINGLE_OPERAND), + CREATE_VIEW( + "(CREATE\\s+VIEW.*)", + SINGLE_OPERAND), + + SET( + "SET(\\s+(\\S+)\\s*=(.*))?", // whitespace is only ignored on the left side of '=' + (operands) -> { + if (operands.length < 3) { + return Optional.empty(); + } else if (operands[0] == null) { + return Optional.of(new String[0]); + } + return Optional.of(new String[]{operands[1], operands[2]}); + }); + + public final Pattern pattern; + public final Function> operandConverter; + + SqlCommand(String matchingRegex, Function> operandConverter) { + this.pattern = Pattern.compile(matchingRegex, DEFAULT_PATTERN_FLAGS); + this.operandConverter = operandConverter; + } + + @Override + public String toString() { + return super.toString().replace('_', ' '); + } + + public boolean hasOperands() { + return operandConverter != NO_OPERANDS; + } + } + + /** + * Call of SQL command with operands and command type. + */ + public static class SqlCommandCall { + public final SqlCommand command; + public final String[] operands; + + public SqlCommandCall(SqlCommand command, String[] operands) { + this.command = command; + this.operands = operands; + } + + public SqlCommandCall(SqlCommand command) { + this(command, new String[0]); + } + + @Override + public boolean equals(Object o) { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + SqlCommandCall that = (SqlCommandCall) o; + return command == that.command && Arrays.equals(operands, that.operands); + } + + @Override + public int hashCode() { + int result = Objects.hash(command); + result = 31 * result + Arrays.hashCode(operands); + return result; + } + + @Override + public String toString() { + return command + "(" + Arrays.toString(operands) + ")"; + } + } + + private static FileSystem getFiledSystem() throws IOException { + Configuration configuration = new Configuration(); + FileSystem fileSystem = FileSystem.get(configuration); + return fileSystem; + } + + public static void copyHdfsFileToLocal(String filePath, String disFile){ + + logger.info("copy hdfs to local :" + filePath + ", hdfs:" + disFile); + + FSDataInputStream fsDataInputStream = null; + try { + File file = new File(disFile); + if(file.exists()){ + file.delete(); + file = new File(disFile); + } + Path path = new Path(filePath); + fsDataInputStream = getFiledSystem().open(path); + IOUtils.copyBytes(fsDataInputStream, new FileOutputStream(file), 4096, false); + } catch (IOException e) { + e.printStackTrace(); + } finally { + if(fsDataInputStream != null){ + IOUtils.closeStream(fsDataInputStream); + } + } + } + + private static void writeHDFS(String localPath, String hdfsPath){ + + logger.info("copy file to hdfs :" + localPath + ", hdfs:" + hdfsPath); + + FSDataOutputStream outputStream = null; + FileInputStream fileInputStream = null; + + try { + Path path = new Path(hdfsPath); + + FileSystem fileSystem = getFiledSystem(); + if(fileSystem.exists(path)){ + fileSystem.delete(path); + } + + outputStream = fileSystem.create(path); + fileInputStream = new FileInputStream(new File(localPath)); + + IOUtils.copyBytes(fileInputStream, outputStream,4096, false); + + } catch (IOException e) { + e.printStackTrace(); + }finally { + if(fileInputStream != null){ + IOUtils.closeStream(fileInputStream); + } + if(outputStream != null){ + IOUtils.closeStream(outputStream); + } + } + } + + /** + * 接受sql文件,上传到hdfs,供run-application模式下载文件到本地,文件必须为绝对路径 + * @param args + */ + public static void main(String[] args) { + + String sqlFile = args[0]; + + logger.info("sqlFile:" + sqlFile); + + String path = "/tmp/" + sqlFile.substring(sqlFile.lastIndexOf("/") + 1); + + logger.info("path:" + path); + + writeHDFS(sqlFile, path); + + } + +} diff --git a/fire-examples/flink-examples/src/main/resources/FlinkSqlCommit.properties b/fire-examples/flink-examples/src/main/resources/FlinkSqlCommit.properties new file mode 100644 index 0000000..9331e22 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/FlinkSqlCommit.properties @@ -0,0 +1,18 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +hive.cluster=batch \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory b/fire-examples/flink-examples/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory new file mode 100644 index 0000000..a49aff8 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory @@ -0,0 +1 @@ +com.zto.fire.flink.sql.connector.rocketmq.RocketMQDynamicTableFactory \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/acc/FlinkAccTest.properties b/fire-examples/flink-examples/src/main/resources/acc/FlinkAccTest.properties new file mode 100644 index 0000000..d484d6e --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/acc/FlinkAccTest.properties @@ -0,0 +1,23 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.kafka.brokers.name = test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = flink +flink.kafka.group.id = fire +flink.kafka.enable.auto.commit = true +flink.max.parallelism = 8 \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/batch/FireMapFunctionTest.properties b/fire-examples/flink-examples/src/main/resources/batch/FireMapFunctionTest.properties new file mode 100644 index 0000000..7b6b2b7 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/batch/FireMapFunctionTest.properties @@ -0,0 +1,18 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.fire.config_center.enable=false \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/batch/FlinkBrocastTest.properties b/fire-examples/flink-examples/src/main/resources/batch/FlinkBrocastTest.properties new file mode 100644 index 0000000..aa02971 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/batch/FlinkBrocastTest.properties @@ -0,0 +1,18 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.hive.cluster = test \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/common.properties b/fire-examples/flink-examples/src/main/resources/common.properties new file mode 100644 index 0000000..0964929 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/common.properties @@ -0,0 +1,61 @@ +# 定义url的别名与url对应关系,后续可通过别名进行配置 +flink.db.jdbc.url.map.test = jdbc:mysql://192.168.0.1:3306/fire +# 支持别名或直接指定url +flink.db.jdbc.url = test +flink.db.jdbc.driver = com.mysql.jdbc.Driver +flink.db.jdbc.user = root +flink.db.jdbc.password = root +flink.db.jdbc.batch.size = 10 + +flink.db.jdbc.url2 = jdbc:mysql://192.168.0.2:3306/fire2 +flink.db.jdbc.driver2 = com.mysql.jdbc.Driver +flink.db.jdbc.user2 = root +flink.db.jdbc.password2 = root +# 每个批次提交的数据大小,默认1000条 +flink.db.jdbc.batch.size2 = 2 + +flink.db.jdbc.url3 = jdbc:mysql://192.168.0.3:3306/fire3 +flink.db.jdbc.driver3 = com.mysql.jdbc.Driver +flink.db.jdbc.user3 = root +flink.db.jdbc.password3 = root +flink.db.jdbc.isolation.level3 = none +# 每个批次插入、更新、删除的数据量,默认为1000 +flink.db.jdbc.batch.size3 = 2000 + +flink.db.jdbc.url5 = jdbc:mysql://192.168.0.4:3306/fire5 +flink.db.jdbc.driver5 = com.mysql.jdbc.Driver +flink.db.jdbc.user5 = root +flink.db.jdbc.password5 = root +flink.db.jdbc.isolation.level5 = none +# 每个批次插入、更新、删除的数据量,默认为1000 +flink.db.jdbc.batch.size5 = 2000 + +flink.db.jdbc.url6 = jdbc:mysql://192.168.0.6:3306/fire6 +flink.db.jdbc.driver6 = com.mysql.jdbc.Driver +flink.db.jdbc.user6 = root +flink.db.jdbc.password6 = root + +# 支持别名或直接指定url +flink.db.jdbc.url7 = jdbc:mysql://192.168.0.7:3306/fire7 +flink.db.jdbc.driver7 = com.mysql.jdbc.Driver +flink.db.jdbc.user7 = root +flink.db.jdbc.password7 = root + +# 支持别名或直接指定url +flink.db.jdbc.url8 = jdbc:mysql://192.168.0.8:3306/fire8 +flink.db.jdbc.driver8 = com.mysql.jdbc.Driver +flink.db.jdbc.user8 = root +flink.db.jdbc.password8 = root + +# 关系型数据库连接信息 +flink.db.jdbc.url9 = jdbc:clickhouse://192.168.0.9:8123/fire9 +flink.db.jdbc.driver9 = ru.yandex.clickhouse.ClickHouseDriver +flink.db.jdbc.user9 = default +flink.db.jdbc.password9 = default + +flink.db.jdbc.url10 = jdbc:mysql://192.168.0.10:3306/fire10 +flink.db.jdbc.driver10 = com.mysql.jdbc.Driver +flink.db.jdbc.user10 = root +flink.db.jdbc.password10 = root +flink.db.jdbc.batch.size10 = 3 +flink.fire.sink.jdbc.default.flushInterval10 = 30000 \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/connector/ConnectorTest.properties b/fire-examples/flink-examples/src/main/resources/connector/ConnectorTest.properties new file mode 100644 index 0000000..660b761 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/connector/ConnectorTest.properties @@ -0,0 +1,47 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +#flink.hive.cluster = test +flink.sql.udf.fireUdf.enable = false +flink.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = fire +flink.kafka.group.id = fire +flink.fire.rest.filter.enable = false +flink.fire.config_center.enable = true +flink.fire.rest.url.show.enable = true + +flink.db.jdbc.batch.size3 = 3 +flink.stream.checkpoint.interval = 1000 + +# flink所支持的参数 +state.checkpoints.num-retained = 3 +state.backend.incremental = true +state.backend.rocksdb.files.open = 5000 +flink.sql.log.enable = true +flink.sql_with.replaceMode.enable = true + +# sql中with表达,配置方法是以flink.sql.with开头,跟上connector的key,以数字结尾,用于区分不同的connector +flink.sql.with.connector=jdbc +flink.sql.with.url=jdbc:mysql://localhost:3306/mydatabase +flink.sql.with.table-name=users +flink.sql.with.password=123456 + +flink.sql.with.connector2=jdbc2 +flink.sql.with.url2=jdbc2:mysql://localhost:3306/mydatabase +flink.sql.with.table-name2=users2 +flink.sql.with.password2=root \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/connector/FlinkHudiTest.properties b/fire-examples/flink-examples/src/main/resources/connector/FlinkHudiTest.properties new file mode 100644 index 0000000..723770e --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/connector/FlinkHudiTest.properties @@ -0,0 +1,18 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.stream.checkpoint.interval = 60000 \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/connector/FlinkSqlCommit.properties b/fire-examples/flink-examples/src/main/resources/connector/FlinkSqlCommit.properties new file mode 100644 index 0000000..9d48b75 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/connector/FlinkSqlCommit.properties @@ -0,0 +1,20 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.stream.checkpoint.interval = 60000 +flink.sql.submit.hive.metastore.url = thrift://SHTL009046107:9083 +flink.sql.submit.hive.version = 1.1.1 \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/connector/RocketMQConnectorTest.properties b/fire-examples/flink-examples/src/main/resources/connector/RocketMQConnectorTest.properties new file mode 100644 index 0000000..aac86c4 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/connector/RocketMQConnectorTest.properties @@ -0,0 +1,30 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.log.level=INFO +rocket.cluster.map.ZmsClusterX=localhost:9876 +flink.streaming.batch.duration=10 +# 非必须配置项:默认是大数据的rocket地址 ZmsClusterX +flink.rocket.brokers.name=192.168.1.174:9876;192.168.1.179:9876 +flink.rocket.topics=SCANRECORD +flink.rocket.consumer.instance=FireFramework +#flink.hbase.cluster=streaming +flink.rocket.group.id=sjzn_spark_scanrecord_test +flink.rocket.pull.max.speed.per.partition=15000 +flink.rocket.consumer.tag=1||2||3||4||5||8||44||45 +flink.streaming.backpressure.enabled=false +flink.streaming.backpressure.initialRate=100 \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/connector/RocketTest.properties b/fire-examples/flink-examples/src/main/resources/connector/RocketTest.properties new file mode 100644 index 0000000..2b8d557 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/connector/RocketTest.properties @@ -0,0 +1,55 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.sql.udf.fireUdf.enable = false +flink.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = fire +flink.kafka.group.id = fire +flink.fire.rest.filter.enable = false +flink.fire.config_center.enable = true +flink.fire.rest.url.show.enable = true + +flink.db.jdbc.batch.size3 = 3 +#flink.stream.checkpoint.interval = 1000 + +# flink所支持的参数 +state.checkpoints.num-retained = 3 +state.backend.incremental = true +state.backend.rocksdb.files.open = 5000 +flink.sql.log.enable = true +flink.sql_with.replaceMode.enable = true + +# sql中with表达,配置方法是以flink.sql.with开头,跟上connector的key,以数字结尾,用于区分不同的connector +flink.sql.with.connector=jdbc +flink.sql.with.url=jdbc:mysql://localhost:3306/mydatabase +flink.sql.with.table-name=users +flink.sql.with.password=123456 + +flink.sql.with.connector2=jdbc2 +flink.sql.with.url2=jdbc2:mysql://localhost:3306/mydatabase +flink.sql.with.table-name2=users2 +flink.sql.with.password2=root + +flink.rocket.topics=fire +flink.rocket.group.id=fire +flink.rocket.brokers.name=localhost:9876 + +# 另一个rocketmq实例 +flink.rocket.topics2=fire +flink.rocket.group.id2=fire2 +flink.rocket.brokers.name2=localhost:9876 \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/connector/kafka/KafkaConsumer.properties b/fire-examples/flink-examples/src/main/resources/connector/kafka/KafkaConsumer.properties new file mode 100644 index 0000000..9d44c01 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/connector/kafka/KafkaConsumer.properties @@ -0,0 +1,19 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.sql.conf.table.exec.state.ttl = 1 ms +flink.kafka.brokers.name = test \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/log4j.properties b/fire-examples/flink-examples/src/main/resources/log4j.properties new file mode 100644 index 0000000..14451bd --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/log4j.properties @@ -0,0 +1,32 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +log4j.rootLogger = WARN, stdout, D + +### 输出到控制台 ### +log4j.appender.stdout = org.apache.log4j.ConsoleAppender +log4j.appender.stdout.Target = System.out +log4j.appender.stdout.layout = org.apache.log4j.PatternLayout +log4j.appender.stdout.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss.SSS} [%thread]-[%p]-[%c] %m%n + +### 输出到日志文件 ### +log4j.appender.D = org.apache.log4j.DailyRollingFileAppender +log4j.appender.D.File = ./fire.log +log4j.appender.D.Append = true +log4j.appender.D.Threshold = INFO +log4j.appender.D.layout = org.apache.log4j.PatternLayout +log4j.appender.D.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss.SSS} [%thread]-[%p]-[%c]-[%l] %m%n \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/FlinkHiveTest.properties b/fire-examples/flink-examples/src/main/resources/stream/FlinkHiveTest.properties new file mode 100644 index 0000000..4b03ace --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/FlinkHiveTest.properties @@ -0,0 +1,25 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = fire +flink.kafka.group.id = fire +spark.fire.rest.filter.enable = false +flink.hive.cluster = batch +flink.stream.checkpoint.interval = 10000 +flink.hive.support.enable = true \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/FlinkPartitioner.properties b/fire-examples/flink-examples/src/main/resources/stream/FlinkPartitioner.properties new file mode 100644 index 0000000..02205ff --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/FlinkPartitioner.properties @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.kafka.brokers.name = test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = flink +flink.kafka.group.id = fire +flink.kafka.enable.auto.commit = true +spark.fire.rest.filter.enable = false +flink.hive.cluster = test \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/FlinkRetractStreamTest.properties b/fire-examples/flink-examples/src/main/resources/stream/FlinkRetractStreamTest.properties new file mode 100644 index 0000000..6ffa24d --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/FlinkRetractStreamTest.properties @@ -0,0 +1,28 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.hive.cluster = test +flink.kafka.brokers.name = test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = flink +flink.kafka.group.id = fire +flink.kafka.enable.auto.commit = true +flink.fire.rest.filter.enable = false +flink.default.parallelism = 8 +flink.max.parallelism = 8 + +# 关系型数据库连接信息(jdbc信息统一配置在common.properties中) \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/FlinkSinkTest.properties b/fire-examples/flink-examples/src/main/resources/stream/FlinkSinkTest.properties new file mode 100644 index 0000000..e3130e5 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/FlinkSinkTest.properties @@ -0,0 +1,23 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = fire +flink.kafka.group.id = fire +flink.kafka.enable.auto.commit = true +spark.fire.rest.filter.enable = false \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/FlinkSourceTest.properties b/fire-examples/flink-examples/src/main/resources/stream/FlinkSourceTest.properties new file mode 100644 index 0000000..aa02971 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/FlinkSourceTest.properties @@ -0,0 +1,18 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.hive.cluster = test \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/FlinkTest.properties b/fire-examples/flink-examples/src/main/resources/stream/FlinkTest.properties new file mode 100644 index 0000000..dfae900 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/FlinkTest.properties @@ -0,0 +1,27 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.kafka.brokers.name = zmsNew +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = sjzn_spark_arrival_date_forecast_topic +flink.kafka.group.id = fire +flink.kafka.enable.auto.commit = true +flink.hbase.cluster = test +flink.log.level.fire_conf.com.zto.fire.common.db = warn +flink.log.level.fire_conf.org.apache.kafka = error +flink.hello = fire +spark.fire.rest.filter.enable = false \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/HBaseTest.properties b/fire-examples/flink-examples/src/main/resources/stream/HBaseTest.properties new file mode 100644 index 0000000..70a9017 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/HBaseTest.properties @@ -0,0 +1,29 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = fire +flink.kafka.group.id = fire +flink.kafka.enable.auto.commit = true +flink.fire.rest.filter.enable = false +flink.stream.checkpoint.interval = 30 + +# 关系型数据库连接信息 +flink.hbase.cluster = test +flink.hbase.cluster2 = test +flink.hbase.cluster3 = test \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/JdbcTest.properties b/fire-examples/flink-examples/src/main/resources/stream/JdbcTest.properties new file mode 100644 index 0000000..9abbe04 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/JdbcTest.properties @@ -0,0 +1,39 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +######################################################################################### +# JDBC数据源配置信息详见:common.properties,公共数据源配置可放到common.properties中,便于维护 # +######################################################################################### + + +#flink.hive.cluster = test +flink.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = fire +flink.kafka.group.id = fire +flink.fire.rest.filter.enable = false +flink.fire.config_center.enable = true +flink.fire.rest.url.show.enable = true + +# flink所支持的参数 +state.checkpoints.num-retained = 3 +state.backend.incremental = true +state.backend.rocksdb.files.open = 5000 + +hello.world = 2020 +hello.world.flag = false +hello.world.flag2 = false \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/ListStateTest.properties b/fire-examples/flink-examples/src/main/resources/stream/ListStateTest.properties new file mode 100644 index 0000000..68255b5 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/ListStateTest.properties @@ -0,0 +1,26 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.kafka.brokers.name = test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = flink +flink.kafka.group.id = fire +flink.kafka.enable.auto.commit = true +spark.fire.rest.filter.enable = false +flink.stream.time.characteristic = EventTime +flink.default.parallelism = 1 +flink.fire.config_center.enable=false \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/Test.properties b/fire-examples/flink-examples/src/main/resources/stream/Test.properties new file mode 100644 index 0000000..e725b1f --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/Test.properties @@ -0,0 +1,51 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.hive.cluster = test +flink.sql.udf.fireUdf.enable = false +flink.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = fire +flink.kafka.group.id = fire +flink.fire.rest.filter.enable = false +flink.fire.config_center.enable = true +flink.fire.rest.url.show.enable = true + +flink.db.jdbc.batch.size3 = 3 +#flink.stream.checkpoint.interval = 1000 + +# flink所支持的参数 +state.checkpoints.num-retained = 3 +state.backend.incremental = true +state.backend.rocksdb.files.open = 5000 +flink.sql.log.enable = true +flink.sql_with.replaceMode.enable = true + +# sql中with表达,配置方法是以flink.sql.with开头,跟上connector的key,以数字结尾,用于区分不同的connector +flink.sql.with.connector=jdbc +flink.sql.with.url=jdbc:mysql://localhost:3306/mydatabase +flink.sql.with.table-name=users +flink.sql.with.password=123456 + +flink.sql.with.connector2=jdbc2 +flink.sql.with.url2=jdbc2:mysql://localhost:3306/mydatabase +flink.sql.with.table-name2=users2 +flink.sql.with.password2=root + +flink.rocket.topics=fire +flink.rocket.group.id=fire +flink.rocket.brokers.name=localhost:9876 \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/UDFTest.properties b/fire-examples/flink-examples/src/main/resources/stream/UDFTest.properties new file mode 100644 index 0000000..55dd5fd --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/UDFTest.properties @@ -0,0 +1,19 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +#flink.sql.conf.pipeline.jars=file://J://udf//udf.jar +flink.sql.conf.pipeline.jars=file:///home/baseuser/project/udf.jar \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/WatermarkTest.properties b/fire-examples/flink-examples/src/main/resources/stream/WatermarkTest.properties new file mode 100644 index 0000000..c0ba2cf --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/WatermarkTest.properties @@ -0,0 +1,26 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.kafka.brokers.name = test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = flink +flink.kafka.group.id = fire +flink.kafka.enable.auto.commit = true +spark.fire.rest.filter.enable = false +flink.stream.time.characteristic = EventTime +flink.default.parallelism = 2 +flink.hive.cluster = test \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/resources/stream/WindowTest.properties b/fire-examples/flink-examples/src/main/resources/stream/WindowTest.properties new file mode 100644 index 0000000..a14ef84 --- /dev/null +++ b/fire-examples/flink-examples/src/main/resources/stream/WindowTest.properties @@ -0,0 +1,25 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +flink.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +flink.kafka.topics = fire +flink.kafka.group.id = fire +flink.kafka.enable.auto.commit = true +flink.fire.rest.filter.enable = false +flink.default.parallelism = 8 +flink.max.parallelism = 8 \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/Test.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/Test.scala new file mode 100644 index 0000000..fd43771 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/Test.scala @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink + +import com.zto.fire._ +import com.zto.fire.common.conf.FireHiveConf +import com.zto.fire.common.util.{JSONUtils, PropUtils, StringsUtils} +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import org.apache.commons.lang3.StringUtils +import org.apache.flink.api.scala._ +import org.apache.flink.table.api.SqlDialect +import org.apache.flink.table.catalog.ObjectPath + +/** + * Flink流式计算任务模板 + * + * @author ChengLong + * @since 1.0.0 + * @create 2021-01-18 17:24 + */ +object Test extends BaseFlinkStreaming { + + override def process: Unit = { + this.tableEnv.useCatalog(FireHiveConf.hiveCatalogName) + this.tableEnv.getConfig.setSqlDialect(SqlDialect.HIVE) + this.fire.sql( + """ + |insert into hive.tmp.fire select * from tmp.account + |""".stripMargin) + this.fire.sql( + """ + |select * from tmp.fire + |""".stripMargin).print() + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/acc/FlinkAccTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/acc/FlinkAccTest.scala new file mode 100644 index 0000000..456a026 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/acc/FlinkAccTest.scala @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.acc + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkStreaming +import com.zto.fire.flink.ext.function.FireMapFunction +import org.apache.flink.api.scala._ +import org.apache.flink.streaming.api.scala.DataStream + +/** + * fire-flink计数器与自定义累加器的使用 + * + * @author ChengLong 2020年1月11日 14:08:56 + * @since 0.4.1 + */ +object FlinkAccTest extends BaseFlinkStreaming { + + /** + * 生命周期方法:具体的用户开发的业务逻辑代码 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + override def process: Unit = { + val dstream = this.fire.createCollectionStream(1 to 100) + // 使用内置的计数器 + this.testFlinkCounter(dstream) + } + + /** + * Fire中内置计数器的使用 + */ + def testFlinkCounter(dstream: DataStream[Int]): Unit = { + // FireMapFunction功能较RichMapFunction等更为强大,推荐使用 + // 创建FireMapFunction类型的内部类,支持Map、MapPartition、FlatMap等操作 + // 在不同的map函数中进行累加全局有效 + dstream.map(new FireMapFunction[Int, Int]() { + override def map(value: Int): Int = { + // 多值计数器根据累加器的值类型区分不同的计数器,比如传参为Double类型,则累加至DoubleCounter中 + this.addCounter("LongCount", value.longValue()) + this.addCounter("IntCount", value) + this.addCounter("IntCount2", value * 2) + this.addCounter("DoubleCount", value.doubleValue()) + Thread.sleep(5000) + value + } + }) + + val result = this.fire.start + + // 获取计数器中的值 + val longCount = result.getAccumulatorResult[Long]("LongCount") + println("累加值Long:" + longCount) + val doubleCount = result.getAccumulatorResult[Double]("DoubleCount") + println("累加值Double:" + doubleCount) + val intCount = result.getAccumulatorResult[Integer]("IntCount") + println("累加值IntCount:" + intCount) + val intCount2 = result.getAccumulatorResult[Integer]("IntCount2") + println("累加值IntCount2:" + intCount2) + Thread.currentThread().join() + + this.stop + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/batch/FireMapFunctionTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/batch/FireMapFunctionTest.scala new file mode 100644 index 0000000..b94ffd0 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/batch/FireMapFunctionTest.scala @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.batch + +import java.lang +import java.util.UUID + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkBatch +import com.zto.fire.flink.ext.function.FireMapFunction +import org.apache.flink.api.common.state.StateTtlConfig +import org.apache.flink.api.common.time.Time +import org.apache.flink.configuration.Configuration +import org.apache.flink.util.Collector +import org.apache.flink.api.scala._ + +/** + * 用于演示FireMapFunction的使用,FireMapFunction比RichMapFunction功能更强大 + * 提供了多值计数器、常用API函数的便捷使用等,甚至同时支持:map、flatMap、mapPartition等操作 + * 内部对状态的api进行了封装,使用起来更简洁 + * + * @author ChengLong 2020-4-9 15:59:19 + */ +object FireMapFunctionTest extends BaseFlinkBatch { + lazy val dataset = this.fire.createCollectionDataSet(1 to 10) + lazy val dataset2 = this.fire.createCollectionDataSet(1 to 3) + + override def process: Unit = { + this.testMap + this.testMapPartition + this.testFlatMap + } + + /** + * 使用FireMapFunction进行Map算子操作 + */ + private def testMap: Unit = { + dataset.map(new FireMapFunction[Int, String]() { + lazy val ttlConfig = StateTtlConfig.newBuilder(Time.days(1)).build() + // 获取广播变量 + lazy val brocastValue = this.getBroadcastVariable[Int]("values") + + override def map(value: Int): String = { + // 累加器使用详见:FlinkAccTest.scala + this.addCounter("IntCount", 2) + this.addCounter("LongCount", 3L) + + // 广播变量 + this.brocastValue.foreach(println) + // 状态使用,具有懒加载的能力,根据name从缓存中获取valueState,不需要声明为成员变量或在open方法中初始化 + val valueState = this.getState[Int]("fire", ttlConfig) + valueState.update(valueState.value()) + + val listState = this.getListState[Int]("fire_list") + listState.add(value) + + val mapState = this.getMapState[Int, Int]("fire_map", ttlConfig) + mapState.put(value, value) + value.toString + } + }).withBroadcastSet(dataset2, "values").print() + } + + /** + * 使用FireMapFunction进行Map算子操作 + */ + private def testMapPartition: Unit = { + dataset.mapPartition(new FireMapFunction[Int, String]() { + override def open(parameters: Configuration): Unit = { + // 执行初始化操作,如创建数据库连接池,调用次数与并行度一致 + } + + override def mapPartition(values: lang.Iterable[Int], out: Collector[String]): Unit = { + values.iterator().foreach(i => out.collect(i.toString)) + } + + override def close(): Unit = { + // 执行清理操作,如释放数据库连接,关闭文件句柄,调用次数与并行度一致 + } + + }).print() + } + + /** + * 使用FireMapFunction进行FlatMap算子操作 + */ + private def testFlatMap: Unit = { + dataset.flatMap(new FireMapFunction[Int, String] { + override def flatMap(value: Int, out: Collector[String]): Unit = { + out.collect(value + " - " + UUID.randomUUID().toString) + } + }).print() + } + + def main(args: Array[String]): Unit = { + this.init() + this.stop + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/batch/FlinkBatchTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/batch/FlinkBatchTest.scala new file mode 100644 index 0000000..5d2c7c5 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/batch/FlinkBatchTest.scala @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.batch + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkBatch +import org.apache.flink.api.common.accumulators.IntCounter +import org.apache.flink.api.common.functions.RichMapFunction +import org.apache.flink.api.scala._ +import org.apache.flink.configuration.Configuration +import org.apache.flink.core.fs.FileSystem + +object FlinkBatchTest extends BaseFlinkBatch { + + /** + * 生命周期方法:具体的用户开发的业务逻辑代码 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + override def process: Unit = { + this.testAccumulator + } + + def testAccumulator: Unit = { + val result = this.fire.createCollectionDataSet(1 to 10).map(new RichMapFunction[Int, Int] { + val counter = new IntCounter() + + override def open(parameters: Configuration): Unit = { + this.getRuntimeContext.addAccumulator("myCounter", this.counter) + } + + override def map(value: Int): Int = { + this.counter.add(value) + value + } + }) + result.writeAsText("J:\\test\\flink.result", FileSystem.WriteMode.OVERWRITE) + + val result2 = this.fire.createCollectionDataSet(1 to 10).map(new RichMapFunction[Int, Int] { + override def map(value: Int): Int = { + this.getRuntimeContext.getIntCounter("myCounter").add(value) + value + } + }) + result2.writeAsText("J:\\test\\flink.result", FileSystem.WriteMode.OVERWRITE) + val count = this.fire.execute("counter").getAccumulatorResult[Int]("myCounter") + println("累加器结果:" + count) + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/batch/FlinkBrocastTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/batch/FlinkBrocastTest.scala new file mode 100644 index 0000000..7a2cc64 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/batch/FlinkBrocastTest.scala @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.batch + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkBatch +import com.zto.fire.flink.ext.function.FireMapFunction +import org.apache.flink.api.scala._ + +/** + * flink广播变量的使用 + * + * @author ChengLong 2020年2月18日 13:53:06 + */ +object FlinkBrocastTest extends BaseFlinkBatch { + + override def process: Unit = { + val ds = this.fire.createCollectionDataSet(Seq(1, 2, 3, 4, 5)) + // flink中可以广播的数据必须是Dataset + val brocastDS = this.fire.createCollectionDataSet(Seq("a", "b", "c", "d", "e")) + + ds.map(new FireMapFunction[Int, String] { + // 获取广播变量中的值给当前成员变量(若不想在open方法中获取值,请使用lazy关键字) + lazy val broadcastSet: Seq[String] = this.getBroadcastVariable[String]("brocastDS") + + override def map(value: Int): String = { + this.broadcastSet(value - 1) + } + + // 每次使用必须通过withBroadcastSet进行广播 + }).withBroadcastSet(brocastDS, "brocastDS").print() + } + + def main(args: Array[String]): Unit = { + this.init() + this.stop + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/FlinkHudiTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/FlinkHudiTest.scala new file mode 100644 index 0000000..fd9b8d4 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/FlinkHudiTest.scala @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.connector + +import com.zto.fire.common.conf.FireKafkaConf +import com.zto.fire.flink.BaseFlinkStreaming + +object FlinkHudiTest extends BaseFlinkStreaming { + + /** + * 生命周期方法:具体的用户开发的业务逻辑代码 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + override def process: Unit = { + + var sql = + """ + |CREATE TABLE hudi_table_test( + | uuid VARCHAR(20), + | action VARCHAR(10), + | age INT, + | ts BIGINT, + | ds VARCHAR(20) + |) + |PARTITIONED BY (ds) + |WITH ( + | 'connector' = 'hudi', + | 'path' = 'hdfs:///user/flink/huditest/hudi_table_test', + | 'table.type' = 'MERGE_ON_READ', + | 'compaction.delta_commits' = '3', + | 'compaction.delta_seconds' = '300', + | 'hoodie.datasource.write.hive_style_partitioning' = 'true' + |) + |""".stripMargin + + this.tableEnv.executeSql(sql) + + sql = + s""" + |CREATE TABLE kafka_source_table ( + | uuid VARCHAR(20), + | action VARCHAR(10), + | age INT, + | ts BIGINT, + | ds VARCHAR(20) + |) WITH ( + | 'connector' = 'kafka', + | 'topic' = 'kafka_hudi_test', + | 'properties.bootstrap.servers' = '${FireKafkaConf.kafkaBrokers()}', + | 'properties.group.id' = 'testGroup', + | 'scan.startup.mode' = 'earliest-offset', + | 'format' = 'json' + |) + |""".stripMargin + + this.tableEnv.executeSql(sql) + + sql = + """ + |INSERT INTO hudi_table_test SELECT uuid,action,age,ts,ds FROM kafka_source_table + |""".stripMargin + + this.tableEnv.executeSql(sql) + + } + + def main(args: Array[String]): Unit = { + this.init() + } +} \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanConnectorTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanConnectorTest.scala new file mode 100644 index 0000000..167fbe5 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanConnectorTest.scala @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.connector.bean + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkStreaming + +/** + * Flink流式计算任务模板 + * + * @author ChengLong + * @since 1.0.0 + * @create 2021-01-18 17:24 + */ +object BeanConnectorTest extends BaseFlinkStreaming { + + override def process: Unit = { + val dstream = this.fire.createKafkaDirectStream() + this.fire.sql( + """ + |CREATE table source ( + | id bigint, + | name string, + | age int, + | length double, + | data DECIMAL(10, 5) + |) + |WITH + | ( + | 'connector' = 'bean', + | 'table-name' = 'source', + | 'duration' = '5000', + | 'repeat-times' = '5' + | ) + |""".stripMargin) + + this.fire.sql( + """ + |CREATE table sink ( + | id bigint, + | name string, + | age int, + | length double, + | data DECIMAL(10, 5) + |) + |WITH + | ( + | 'connector' = 'bean', + | 'table-name' = 'sink' + | ) + |""".stripMargin) + this.fire.sql( + """ + |insert into sink select * from source + |""".stripMargin) + dstream.print() + this.fire.start + } + + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanDynamicTableFactory.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanDynamicTableFactory.scala new file mode 100644 index 0000000..5365df2 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanDynamicTableFactory.scala @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.connector.bean + +import com.zto.fire._ +import org.apache.flink.configuration.ConfigOption +import org.apache.flink.table.connector.sink.DynamicTableSink +import org.apache.flink.table.connector.source.DynamicTableSource +import org.apache.flink.table.factories.{DynamicTableFactory, DynamicTableSinkFactory, DynamicTableSourceFactory, FactoryUtil} +import org.apache.flink.table.utils.TableSchemaUtils + +/** + * sql connector的source与sink创建工厂 + * + * @author ChengLong 2021-5-7 15:48:03 + */ +class BeanDynamicTableFactory extends DynamicTableSourceFactory with DynamicTableSinkFactory { + val IDENTIFIER = "bean" + + /** + * 告诉工厂,如何创建Table Source实例 + */ + override def createDynamicTableSource(context: DynamicTableFactory.Context): DynamicTableSource = { + val helper = FactoryUtil.createTableFactoryHelper(this, context) + val config = helper.getOptions + helper.validate() + + val physicalSchema = TableSchemaUtils.getPhysicalSchema(context.getCatalogTable.getSchema) + new BeanDynamicTableSource(physicalSchema, + config, + physicalSchema.toRowDataType) + } + + override def factoryIdentifier(): String = this.IDENTIFIER + + /** + * 必填参数列表 + */ + override def requiredOptions(): JSet[ConfigOption[_]] = { + val set = new JHashSet[ConfigOption[_]] + set.add(BeanOptions.TABLE_NAME) + set + } + + /** + * 可选的参数列表 + */ + override def optionalOptions(): JSet[ConfigOption[_]] = { + val optionalOptions = new JHashSet[ConfigOption[_]] + optionalOptions.add(BeanOptions.DURATION) + optionalOptions.add(BeanOptions.repeatTimes) + optionalOptions + } + + /** + * 创建table sink实例,在BeanDynamicTableSink中定义接收到的RowData如何sink + */ + override def createDynamicTableSink(context: DynamicTableFactory.Context): DynamicTableSink = { + val helper = FactoryUtil.createTableFactoryHelper(this, context) + val physicalSchema = TableSchemaUtils.getPhysicalSchema(context.getCatalogTable.getSchema) + val config = helper.getOptions + helper.validate() + val dataType = context.getCatalogTable.getSchema.toPhysicalRowDataType + new BeanDynamicTableSink(physicalSchema, config, dataType) + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanDynamicTableSink.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanDynamicTableSink.scala new file mode 100644 index 0000000..a309542 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanDynamicTableSink.scala @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.connector.bean + +import com.zto.fire.predef._ +import org.apache.flink.configuration.ReadableConfig +import org.apache.flink.streaming.api.functions.sink.{RichSinkFunction, SinkFunction} +import org.apache.flink.table.api.TableSchema +import org.apache.flink.table.connector.ChangelogMode +import org.apache.flink.table.connector.sink.{DynamicTableSink, SinkFunctionProvider} +import org.apache.flink.table.data.RowData +import org.apache.flink.table.types.DataType + +/** + * sql connector的sink + * @author ChengLong 2021-5-7 15:48:03 + */ +class BeanDynamicTableSink(tableSchema: TableSchema, options: ReadableConfig, dataType: DataType) extends DynamicTableSink { + override def getChangelogMode(requestedMode: ChangelogMode): ChangelogMode = ChangelogMode.insertOnly() + + override def copy(): DynamicTableSink = new BeanDynamicTableSink(tableSchema, options, dataType) + + override def asSummaryString(): JString = "bean-sink" + + /** + * 核心逻辑,定义如何将数据sink + */ + override def getSinkRuntimeProvider(context: DynamicTableSink.Context): DynamicTableSink.SinkRuntimeProvider = { + SinkFunctionProvider.of(new RichSinkFunction[RowData] { + override def invoke(value: RowData, context: SinkFunction.Context): Unit = { + println("sink---> " + value.toString) + } + }) + } +} \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanDynamicTableSource.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanDynamicTableSource.scala new file mode 100644 index 0000000..2a1617d --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanDynamicTableSource.scala @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.connector.bean + +import com.zto.fire.common.util.DateFormatUtils +import com.zto.fire.examples.bean.People +import com.zto.fire.flink.util.FlinkUtils +import com.zto.fire.predef._ +import org.apache.flink.configuration.ReadableConfig +import org.apache.flink.streaming.api.functions.source.{RichSourceFunction, SourceFunction} +import org.apache.flink.table.api.TableSchema +import org.apache.flink.table.connector.ChangelogMode +import org.apache.flink.table.connector.source.{DynamicTableSource, ScanTableSource, SourceFunctionProvider} +import org.apache.flink.table.data.RowData +import org.apache.flink.table.types.DataType +import org.apache.flink.table.types.logical.RowType + +/** + * 定义source table + * + * @author ChengLong 2021-5-7 15:48:03 + */ +class BeanDynamicTableSource(tableSchema: TableSchema, options: ReadableConfig, producedDataType: DataType) extends ScanTableSource { + + override def getChangelogMode: ChangelogMode = ChangelogMode.insertOnly() + + override def copy(): DynamicTableSource = new BeanDynamicTableSource(tableSchema, options, producedDataType) + + override def asSummaryString(): String = "bean" + + /** + * 核心逻辑,定义如何产生source表的数据 + */ + override def getScanRuntimeProvider(scanContext: ScanTableSource.ScanContext): ScanTableSource.ScanRuntimeProvider = { + // source table的schema + val rowType = this.tableSchema.toRowDataType.getLogicalType.asInstanceOf[RowType] + // 将自定义的source function传入 + SourceFunctionProvider.of(new BeanSourceFunction(rowType, options), false) + } + +} + +/** + * 自定义的sink function,用于通知flink sql,如何将RowData数据收集起来 + */ +class BeanSourceFunction(rowType: RowType, options: ReadableConfig) extends RichSourceFunction[RowData] { + + override def run(ctx: SourceFunction.SourceContext[RowData]): Unit = { + // 指定每次sink多久以后进行下一次的sink + val duration = options.get(BeanOptions.DURATION) + // 获取配置的重复次数,指定重发几次 + val times = options.get(BeanOptions.repeatTimes) + for (i <- 1 to times) { + People.createList().foreach(people => { + // 通过ctx收集sink的数据 + ctx.collect(FlinkUtils.bean2RowData(people, rowType)) + }) + println(s"================${DateFormatUtils.formatCurrentDateTime()}==================") + Thread.sleep(duration) + } + } + + override def cancel(): Unit = {} + +} \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanOptions.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanOptions.scala new file mode 100644 index 0000000..a7ee693 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/bean/BeanOptions.scala @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.connector.bean + +import com.zto.fire.{JInt, JLong} +import org.apache.flink.configuration.{ConfigOption, ConfigOptions} + +/** + * 自定义sql connector支持的选项 + * + * @author ChengLong 2021-5-7 15:48:03 + */ +object BeanOptions { + val TABLE_NAME: ConfigOption[String] = ConfigOptions + .key("table-name") + .stringType + .noDefaultValue + .withDescription("The name of impala table to connect.") + + val DURATION: ConfigOption[JLong] = ConfigOptions + .key("duration") + .longType() + .defaultValue(3000L) + .withDescription("The duration of data send.") + + val repeatTimes: ConfigOption[JInt] = ConfigOptions + .key("repeat-times") + .intType() + .defaultValue(5) + .withDescription("The repeat times.") +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/kafka/KafkaConsumer.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/kafka/KafkaConsumer.scala new file mode 100644 index 0000000..df071d1 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/kafka/KafkaConsumer.scala @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.connector.kafka + +import com.zto.fire._ +import com.zto.fire.common.conf.FireKafkaConf +import com.zto.fire.flink.BaseFlinkStreaming + +object KafkaConsumer extends BaseFlinkStreaming { + + override def process: Unit = { + // this.insertPrint + this.streamJoin + } + + def streamJoin: Unit = { + val table = this.flink.sql( + s""" + |CREATE TABLE kafka ( + | id int, + | name string, + | age int, + | length string, + | before row, + | code as before.bill_code, + | bage as before.bage, + | sex boolean + |) WITH ( + | 'connector' = 'kafka', + | 'topic' = 'fire', + | 'properties.bootstrap.servers' = '${FireKafkaConf.kafkaBrokers()}', + | 'properties.group.id' = 'fire', + | 'scan.startup.mode' = 'latest-offset', + | 'value.format' = 'json' + |) + |""".stripMargin) + + this.flink.sql( + s""" + |CREATE TABLE kafka2 ( + | id int, + | name string, + | age int, + | length string, + | before row, + | code as before.bill_code, + | bage as before.bage, + | sex boolean + |) WITH ( + | 'connector' = 'kafka', + | 'topic' = 'fire2', + | 'properties.bootstrap.servers' = '${FireKafkaConf.kafkaBrokers()}', + | 'properties.group.id' = 'fire2', + | 'scan.startup.mode' = 'latest-offset', + | 'value.format' = 'json' + |) + |""".stripMargin) + + this.fire.sql( + """ + |create view kafka_join + |as + |select + | k1.id, + | k2.name, + | k2.before.bill_code as bill_code, + | k1.bage, + | k2.bage + |from kafka k1 left join kafka2 k2 + | on k1.before.bill_code=k2.code + |where k1.bage > 10 + |""".stripMargin) + + this.fire.sql( + """ + |select * from kafka_join + |""".stripMargin).print() + } + + def insertPrint: Unit = { + this.flink.sql( + s""" + |CREATE TABLE kafka ( + | id int, + | name string, + | age int, + | length string, + | before row, + | -- code as before.bill_code, + | -- bage as before.bage, + | sex boolean + |) WITH ( + | 'connector' = 'kafka', + | 'topic' = 'fire', + | 'properties.bootstrap.servers' = '${FireKafkaConf.kafkaBrokers()}', + | 'properties.group.id' = 'fire', + | 'scan.startup.mode' = 'latest-offset', + | 'value.format' = 'json' + |) + |""".stripMargin) + + this.fire.sql( + """ + |create table `print` with('connector' = 'print') like kafka (EXCLUDING ALL) + |""".stripMargin) + + this.fire.sql( + """ + |insert into print select * from kafka + |""".stripMargin) + } + + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/rocketmq/RocketMQConnectorTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/rocketmq/RocketMQConnectorTest.scala new file mode 100644 index 0000000..7a5af20 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/rocketmq/RocketMQConnectorTest.scala @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.connector.rocketmq + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkStreaming + +/** + * Flink流式计算任务模板 + * + * @author ChengLong + * @since 1.0.0 + * @create 2021-01-18 17:24 + */ +object RocketMQConnectorTest extends BaseFlinkStreaming { + + override def process: Unit = { + this.fire.sql(""" + |CREATE table source ( + | id bigint, + | name string, + | age int, + | length double, + | data DECIMAL(10, 5) + |) WITH + | ( + | 'connector' = 'fire-rocketmq', + | 'format' = 'json', + | 'rocket.brokers.name' = 'ZmsClusterX', + | 'rocket.topics' = 'fire', + | 'rocket.group.id' = 'fire', + | 'rocket.consumer.tag' = '*' + | ) + |""".stripMargin) + + this.fire.sql( + """ + |select * from source + |""".stripMargin).print() + } + + + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/rocketmq/RocketTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/rocketmq/RocketTest.scala new file mode 100644 index 0000000..89d3994 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/connector/rocketmq/RocketTest.scala @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.connector.rocketmq + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkStreaming + +/** + * Flink流式计算任务消费rocketmq + * + * @author ChengLong + * @since 2.0.0 + * @create 2021-5-13 14:26:24 + */ +object RocketTest extends BaseFlinkStreaming { + + override def process: Unit = { + this.fire.createRocketMqPullStreamWithTag().print() + // this.fire.createRocketMqPullStreamWithKey() + // this.fire.createRocketMqPullStream() + + // 从另一个rocketmq中消费数据 + this.fire.createRocketMqPullStream(keyNum = 2).print() + this.fire.start + } + + + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkHiveTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkHiveTest.scala new file mode 100644 index 0000000..1247445 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkHiveTest.scala @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.common.util.JSONUtils +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import org.apache.flink.api.scala._ + + +/** + * flink 整合hive的例子,在流中join hive数据 + * + * @author ChengLong 2020年4月3日 09:05:53 + */ +object FlinkHiveTest extends BaseFlinkStreaming { + + override def process: Unit = { + // 第三个参数需指定hive-site.xml具体的目录路径 + val dstream = this.fire.createKafkaDirectStream().map(t => JSONUtils.parseObject[Student](t)) + // 调用startNewChain与setParallelism一样,都有会导致使用新的slotGroup,也都是作用于点之前的算子 + // startNewChain后,前面的那个算子会使用default的parallelism + dstream.filter(s => s != null).startNewChain().map(s => { + Thread.sleep(1000 * 60) + s + }).createOrReplaceTempView("kafka") + this.flink.sql("select * from kafka").print() + // 查询操作 + this.flink.sql("select * from tmp.zto_scan_send order by bill_code limit 10")//.createOrReplaceTempView("scan_send") + val joinedTable = this.flink.sql("select t1.bill_code, t2.name from scan_send t1 left join kafka t2 on t1.bill_code=t2.name") + + this.fire.start + } + + override def before(args: Array[String]): Unit = { + if (args != null) { + args.foreach(x => println("main方法参数:" + x)) + } + } + + def main(args: Array[String]): Unit = { + this.init(args = args) + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkPartitioner.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkPartitioner.scala new file mode 100644 index 0000000..b03fc8c --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkPartitioner.scala @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkStreaming +import org.apache.flink.api.common.functions.Partitioner +import org.apache.flink.api.scala._ + +/** + * flink重分区 + * + * @author ChengLong 2020-4-10 09:50:26 + */ +object FlinkPartitioner extends BaseFlinkStreaming { + + override def process: Unit = { + val dstream = this.fire.createCollectionStream(1 to 10) + // 将当前所有输出值都输出到下游算子的第一个实例中,会导致严重的性能问题,谨慎使用 + dstream.global.print() + // 将当前输出中的每一条记录随机输出到下游的每一个实例中,可显著解决数据倾斜问题 + dstream.shuffle.print() + // 将当前输出以循环的方式输出到下游算子的每一个实例中,可显著解决数据倾斜问题,比shuffle方式分配的更均匀 + dstream.rebalance.print() + // 基于上下游Operator的并行度,将记录以循环的方式输出到下游Operator的每个实例。举例: 上游并行度是2,下游是4, + // 则上游一个并行度以循环的方式将记录输出到下游的两个并行度上;上游另一个并行度以循环的方式将记录输出到下游另两个并行度上。 + // 若上游并行度是4,下游并行度是2,则上游两个并行度将记录输出到下游一个并行度上;上游另两个并行度将记录输出到下游另一个并行度上 + // 相当于小范围的rebalance操作 + dstream.rescale.print() + // 将上游数据全部输出到下游每一个算子的实例中,适合于大数据集Join小数据集的场景 + dstream.broadcast.print() + // 将记录输出到下游本地的operator实例,ForwardPartitioner分区器要求上下游算子并行度一样,上下游Operator同属一个SubTasks + dstream.forward.print() + // 将记录按Key的Hash值输出到下游Operator实例 + // dstream.map(t => (t, t)).keyBy(KeySelector[Int, Int]()) + // 自定义分区,需继承Partitioner并实现自己的partition分区算法 + dstream.map(t => (t, t)).partitionCustom(new HashPartitioner, 0).print() + this.fire.start + + this.stop + } + + /** + * Flink自定义分区 + */ + class HashPartitioner extends Partitioner[Int] { + override def partition(key: Int, numPartitions: Int): Int = { + if (key % 2 == 0) { + 0 + } else { + 1 + } + } + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkRetractStreamTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkRetractStreamTest.scala new file mode 100644 index 0000000..76ca879 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkRetractStreamTest.scala @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.common.util.JSONUtils +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import org.apache.flink.api.scala._ +import org.apache.flink.types.Row + +object FlinkRetractStreamTest extends BaseFlinkStreaming { + + val tableName = "spark_test" + + /** + * 生命周期方法:具体的用户开发的业务逻辑代码 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + override def process: Unit = { + val dstream = this.fire.createKafkaDirectStream().map(json => JSONUtils.parseObject[Student](json)).shuffle + dstream.createOrReplaceTempView("student") + val table = this.fire.sqlQuery("select name, age, createTime, length, sex from student group by name, age, createTime, length, sex") + + val fields = "name, age, createTime, length, sex" + val sql = s"INSERT INTO $tableName ($fields) VALUES (?, ?, ?, ?, ?)" + // 方式一、table中的列顺序和类型需与jdbc sql中的占位符顺序保持一致 + table.jdbcBatchUpdate(sql, keyNum = 10) + // 方式二、自定义row取数规则,该种方式较灵活,可定义取不同的列,顺序仍需与sql占位符保持一致 + table.jdbcBatchUpdate2(sql, batch = 10, flushInterval = 10000, keyNum = 10)(row => Seq(row.getField(0), row.getField(1), row.getField(2), row.getField(3), row.getField(4))) + + // toRetractStream支持状态更新、删除操作,比例sql中含有group by 等聚合操作,后进来的记录会导致已有的聚合结果不正确 + // 使用toRetractStream后会将之前的旧的聚合结果重新发送一次,并且tuple中的flag标记为false,然后再发送一条正确的结果 + // 类似于structured streaming中自动维护结果表,并进行update操作 + this.tableEnv.toRetractStream[Row](table).print() + + this.fire.start + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkSinkTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkSinkTest.scala new file mode 100644 index 0000000..eba8a8f --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkSinkTest.scala @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.common.util.JSONUtils +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import org.apache.flink.api.scala._ +import org.apache.flink.configuration.Configuration +import org.apache.flink.streaming.api.functions.sink.{RichSinkFunction, SinkFunction} + +/** + * 自定义sink的实现 + */ +object FlinkSinkTest extends BaseFlinkStreaming { + + override def process: Unit = { + val dstream = this.fire.createDirectStream().map(json => JSONUtils.parseObject[Student](json)) + dstream.map(t => t.getName).addSink(new MySink).setParallelism(1) + + this.fire.start + } + + def main(args: Array[String]): Unit = { + this.init() + } +} + +class MySink extends RichSinkFunction[String] { + + /** + * open方法中可以创建数据库连接等初始化操作 + * 注:若setParallelism(10)则会执行10次open方法 + */ + override def open(parameters: Configuration): Unit = { + println("=========执行open方法========") + } + + /** + * close方法用于释放资源,如数据库连接等 + */ + override def close(): Unit = { + println("=========执行close方法========") + } + + override def invoke(value: String, context: SinkFunction.Context): Unit = { + println("---> " + value) + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkSourceTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkSourceTest.scala new file mode 100644 index 0000000..35bf0e1 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkSourceTest.scala @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkStreaming +import org.apache.flink.api.scala._ +import org.apache.flink.configuration.Configuration +import org.apache.flink.streaming.api.functions.source.{RichParallelSourceFunction, SourceFunction} +import org.apache.flink.streaming.api.windowing.time.Time + +/** + * 自定义source + * @author ChengLong 2020-4-7 14:30:08 + */ +object FlinkSourceTest extends BaseFlinkStreaming { + + override def process: Unit = { + val dstream = this.fire.addSource(new MySource).setParallelism(2) + // 注意Time的包不要导错,来自org.apache.flink.streaming.api.windowing.time.Time + dstream.timeWindowAll(Time.seconds(2)).sum(0).setParallelism(1).print + + this.fire.start + } + + def main(args: Array[String]): Unit = { + this.init() + } +} + +/** + * 自定义source组件 + * 支持多并行度 + */ +class MySource extends RichParallelSourceFunction[Long] { + private var isRunning = false + private var index = 1 + + /** + * open方法中可以创建数据库连接等初始化操作 + * 注:若setParallelism(10)则会执行10次open方法 + */ + override def open(parameters: Configuration): Unit = { + this.isRunning = true + println("=========执行open方法========") + } + + /** + * 持续不断的将消息发送给flink + * @param ctx + */ + override def run(ctx: SourceFunction.SourceContext[Long]): Unit = { + while (this.isRunning) { + this.index += 1 + ctx.collect(this.index) + Thread.sleep(1000) + } + } + + /** + * 当任务被cancel时调用 + */ + override def cancel(): Unit = { + this.isRunning = false + println("=========执行cancel方法==========") + } + + /** + * close方法用于释放资源,如数据库连接等 + */ + override def close(): Unit = { + println("=========执行close方法==========") + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkStateTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkStateTest.scala new file mode 100644 index 0000000..e18d2b9 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkStateTest.scala @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.flink.BaseFlinkStreaming +import com.zto.fire.flink.ext.function.FireMapFunction +import org.apache.flink.api.common.functions.RichAggregateFunction +import org.apache.flink.api.common.state.StateTtlConfig +import org.apache.flink.api.common.time.Time +import org.apache.flink.api.scala._ + +/** + * 用于演示基于FireMapFunction的状态使用 + * 本例演示KeyedStream相关的状态使用,也就是说stream是通过keyBy分组过的 + * 1. 由于经过keyBy算子进行了分组,因此相同key的算子都会跑到同一个subtask中执行,并行度的改变也就不会影响状态数据的一致性 + * 2. 状态在不同的task之间是隔离的,也就是说对同一个keyedStream进行多次map操作,每个map中的状态是不一样的,是隔离开来的 + * + * @author ChengLong 2021年1月5日09:13:50 + * @since 2.0.0 + */ +object FlinkStateTest extends BaseFlinkStreaming { + // 将dstream声明为成员变量时,一定要加lazy关键字,避免env还没初始化导致空指针异常 + lazy val dstream = this.fire.createCollectionStream(Seq((1, 1), (1, 2), (1, 3), (1, 6), (1, 9), (2, 1), (2, 2), (3, 1))).keyBy(0) + + /** + * 一、基于FireMapFunction演示ValueState、ListState、MapState的使用 + */ + private def testSimpleState: Unit = { + this.dstream.map(new FireMapFunction[(Int, Int), Int]() { + // 定义状态的ttl时间,如果不在open方法中定义 + lazy val ttlConfig = StateTtlConfig.newBuilder(Time.days(1)).build() + // 如果状态放到成员变量中声明,则需加lazy关键字 + lazy val listState = this.getListState[Int]("list_state") + + override def map(value: (Int, Int)): Int = { + // FireMapFunction中提供的API,通过名称获取对应的状态实例,该API具有缓存的特性 + // 因此不需要放到open或声明为成员变量,每次直接通过this.getXxxState即可获取同一实例 + // 第一个参数是状态实例名称,不可重复,ttlConfig参数如果不指定,则默认不启用ttl,生产环境强烈建议开启 + // 1. ValueState与KeyedStream中的每个key是一一对应的 + val valueState = this.getState[Int]("value_state", ttlConfig) + valueState.update(value._2 + valueState.value()) + logger.warn(s"key=${value._1} 状态结果:" + valueState.value()) + Thread.sleep(10000) + + // 2. 获取ListState,该状态的特点是KeyedStream中的每个key都单独对应一个List集合 + listState.add(value._2) + listState.add(value._2 + 1) + + // 3. 获取mapState,该状态的特点是KeyedStream中的每个key都单独对应一个Map集合 + val mapState = this.getMapState[Int, Int]("map_state", ttlConfig) + mapState.put(value._1, value._2) + mapState.put(value._1 + 1, value._2) + + value._2 + } + }).uname(uid = "simpleState", name = "状态累加") // 通过uname进行uid与name的指定 + } + + /** + * 二、基于FireMapFunction演示AggregatingState、getReducingState的使用 + */ + private def testFunctionState: Unit = { + this.dstream.map(new FireMapFunction[(Int, Int), Int]() { + // 1. ReducingState状态演示,将Int类型数据保存到状态中 + // 该ReduceFunction中定义的逻辑是将当前状态中的值与传入的新值进行累加,然后重新update到状态中 + // 方法的第二个参数是reduce的具体逻辑,本示例演示的是累加 + lazy val reduceState = this.getReducingState[Int]("reduce_state", (a: Int, b: Int) => a + b) + + // 2. AggregatingState状态使用,将Int类型数据保存到状态中 + // 需要创建AggregateFunction,泛型意义依次为:输入数据类型、累加器类型,聚合结果类型 + lazy val aggrState = this.getAggregatingState[Int]("aggr_state", this.newRichAggregateFunction) + + override def map(value: (Int, Int)): Int = { + // 1. reduceState状态使用 + this.reduceState.add(value._2) + this.logger.warn(s"reduceState当前结果:key=${value._1} state=${this.reduceState.get()}") + + // 2. AggregatingState状态使用 + this.aggrState.add(value) + this.aggrState.get() + this.logger.warn(s"aggrState当前结果:key=${value._1} state=${this.aggrState.get()}") + + value._2 + } + + /** + * 创建一个RichAggregateFunction的子类 + * 在该子类中构建AggregateFunction对象,并定义好聚合的逻辑 + * 定义将输入数据与状态中的数据进行累加 + */ + def newRichAggregateFunction: RichAggregateFunction[(Int, Int), Int, Int] = { + new RichAggregateFunction[(Int, Int), Int, Int]() { + /** 迭代状态的初始值 */ + override def createAccumulator(): Int = 0 + + /** 每一条输入数据,和迭代数据如何迭代 */ + override def add(value: (Int, Int), accumulator: Int): Int = value._2 + accumulator + + /** 返回数据,对最终的迭代数据如何处理,并返回结果 */ + override def getResult(accumulator: Int): Int = accumulator + + /** 多个分区的迭代数据如何合并 */ + override def merge(a: Int, b: Int): Int = a + b + } + } + }).uname("testFunctionState") + } + + /** + * 三、演示mapWithState的使用 + */ + def testWithState: Unit = { + // [String, Int]分表表示map后类型与状态的类型 + // 每个case中返回值中Some(xxx)中的xxx就是下一次同样key的数据进来以后状态获取到的数据 + // 也就是自动将Some(xxx)中的xxx数据update到ValueState中 + // 本例是将上一次的状态与当前进入的值进行累加,更新到状态中 + this.dstream.mapWithState[String, Int]({ + // 当第一次进入,状态中没有值时,给当前value + case (value: (Int, Int), None) => { + logger.warn(s"状态为空:当前key=${value._1} value=${value._2}") + (value._1.toString, Some(value._2)) + } + // 后续进入,状态中有值时,则累加当前进入的数据到状态中 + case (value: (Int, Int), state: Some[Int]) => { + // 从state中get到的数据是上一次同一个key的sum值,因此通过state.get的值总是滞后于sum的 + val sum = value._2 + state.get + logger.warn(s"当前key=${value._1} value=${value._2} state=${state.get} sum=$sum") + (value._1.toString, Some(sum)) + } + }).uid("flatMapWithState").name("计算状态") + } + + /** + * 业务逻辑处理,该方法会被fire自动调用,可避免main方法中代码过于臃肿 + */ + override def process: Unit = { + this.fire.setParallelism(3) + // 演示ValueState、ListState、MapState的使用 + this.testSimpleState + // 演示AggregatingState、getReducingState的使用 + // this.testFunctionState + // 演示mapWithState的使用 + // this.testWithState + + this.fire.start("Flink State Test") + } + + def main(args: Array[String]): Unit = { + this.init() + this.stop + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkTest.scala new file mode 100644 index 0000000..748fe64 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/FlinkTest.scala @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.common.util.{JSONUtils, PropUtils} +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import com.zto.fire.flink.util.FlinkUtils +import org.apache.flink.api.scala._ +import org.apache.flink.types.Row + +object FlinkTest extends BaseFlinkStreaming { + + /** + * 生命周期方法:具体的用户开发的业务逻辑代码 + * 注:此方法会被自动调用,不需要在main中手动调用 + */ + override def process: Unit = { + /*val dstream = this.fire.createKafkaDirectStream().filter(str => JsonUtils.isJson(str)).map(json => { + JsonUtils.parseObject[Student](json) + }).setParallelism(2) + + dstream.createOrReplaceTempView("student") + val table = this.fire.sqlQuery("select * from student") + println("fire.rest.url========>" + PropUtils.getString("fire.rest.url", "not_found")) + // toRetractStream支持状态更新、删除操作,比例sql中含有group by 等聚合操作,后进来的记录会导致已有的聚合结果不正确 + // 使用toRetractStream后会将之前的旧的聚合结果重新发送一次,并且tuple中的flag标记为false,然后再发送一条正确的结果 + // 类似于structured streaming中自动维护结果表,并进行update操作 + this.tableEnv.toRetractStream[Row](table).map(t => t._2).addSink(row => { + println("fire.rest.url========>" + PropUtils.getString("fire.rest.url", "not_found")) + println("是否为TaskManager========>" + FlinkUtils.isJobManager) + println("运行模式========>" + FlinkUtils.runMode) + })*/ + this.fire.createKafkaDirectStream().map(t => { + this.logger.info(t) + }).print() + + // 不指定job name,则默认当前类名 + // this.fire.start + this.fire.start("Fire Test") + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/HBaseTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/HBaseTest.scala new file mode 100644 index 0000000..71050a2 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/HBaseTest.scala @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.common.util.JSONUtils +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import com.zto.fire.flink.util.FlinkUtils +import com.zto.fire.hbase.HBaseConnector +import org.apache.flink.api.scala._ +import org.apache.flink.streaming.api.scala.DataStream + +import scala.collection.mutable.ListBuffer + +/** + * flink hbase sink + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-5-25 16:32:50 + */ +object HBaseTest extends BaseFlinkStreaming { + lazy val tableName = "fire_test_1" + lazy val tableName2 = "fire_test_2" + lazy val tableName3 = "fire_test_3" + lazy val tableName5 = "fire_test_5" + lazy val tableName6 = "fire_test_6" + lazy val tableName7 = "fire_test_7" + lazy val tableName8 = "fire_test_8" + lazy val tableName9 = "fire_test_9" + lazy val tableName10 = "fire_test_10" + lazy val tableName11 = "fire_test_11" + lazy val tableName12 = "fire_test_12" + + /** + * table的hbase sink + */ + def testTableHBaseSink(stream: DataStream[Student]): Unit = { + stream.createOrReplaceTempView("student") + val table = this.flink.sqlQuery("select id, name, age from student group by id, name, age") + // 方式一、自动将row转为对应的JavaBean + // 注意:table对象上调用hbase api,需要指定泛型 + table.hbasePutTable[Student](this.tableName).setParallelism(1) + this.fire.hbasePutTable[Student](table, this.tableName2, keyNum = 2) + + // 方式二、用户自定义取数规则,从row中创建HBaseBaseBean的子类 + table.hbasePutTable2[Student](this.tableName3)(row => new Student(1L, row.getField(1).toString, row.getField(2).toString.toInt)) + // 或者 + this.fire.hbasePutTable2[Student](table, this.tableName5, keyNum = 2)(row => new Student(1L, row.getField(1).toString, row.getField(2).toString.toInt)) + } + + /** + * table的hbase sink + */ + def testTableHBaseSink2(stream: DataStream[Student]): Unit = { + val table = this.fire.sqlQuery("select id, name, age from student group by id, name, age") + + // 方式二、用户自定义取数规则,从row中创建HBaseBaseBean的子类 + table.hbasePutTable2(this.tableName6)(row => new Student(1L, row.getField(1).toString, row.getField(2).toString.toInt)) + // 或者 + this.flink.hbasePutTable2(table, this.tableName7, keyNum = 2)(row => new Student(1L, row.getField(1).toString, row.getField(2).toString.toInt)) + } + + /** + * stream hbase sink + */ + def testStreamHBaseSink(stream: DataStream[Student]): Unit = { + // 方式一、DataStream中的数据类型为HBaseBaseBean的子类 + // stream.hbasePutDS(this.tableName) + this.fire.hbasePutDS[Student](stream, this.tableName8) + + // 方式二、将value组装为HBaseBaseBean的子类,逻辑用户自定义 + stream.hbasePutDS2(this.tableName9, keyNum = 2)(value => value) + // 或者 + this.fire.hbasePutDS2(stream, this.tableName10)(value => value) + } + + /** + * stream hbase sink + */ + def testStreamHBaseSink2(stream: DataStream[Student]): Unit = { + // 方式二、将value组装为HBaseBaseBean的子类,逻辑用户自定义 + stream.hbasePutDS2(this.tableName11)(value => value) + // 或者 + this.fire.hbasePutDS2(stream, this.tableName12, keyNum = 2)(value => value) + } + + /** + * hbase的基本操作 + */ + def testHBase: Unit = { + // get操作 + val getList = ListBuffer(HBaseConnector.buildGet("1")) + val student = HBaseConnector.get(this.tableName, classOf[Student], getList, 1) + if (student != null) println(JSONUtils.toJSONString(student)) + // scan操作 + val studentList = HBaseConnector.scan(this.tableName, classOf[Student], HBaseConnector.buildScan("0", "9"), 1) + if (studentList != null) println(JSONUtils.toJSONString(studentList)) + // delete操作 + HBaseConnector.deleteRows(this.tableName, Seq("1")) + } + + + override def process: Unit = { + val stream = this.fire.createKafkaDirectStream().filter(t => JSONUtils.isLegal(t)).map(json => JSONUtils.parseObject[Student](json)).setParallelism(1) + HBaseConnector.truncateTable(this.tableName) + HBaseConnector.truncateTable(this.tableName2) + HBaseConnector.truncateTable(this.tableName3) + HBaseConnector.truncateTable(this.tableName5) + this.testTableHBaseSink(stream) + this.testStreamHBaseSink(stream) + this.testStreamHBaseSink2(stream) + this.testTableHBaseSink2(stream) + this.testHBase + + this.fire.start + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/JdbcTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/JdbcTest.scala new file mode 100644 index 0000000..911670e --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/JdbcTest.scala @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.common.util.{DateFormatUtils, JSONUtils, PropUtils} +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import com.zto.fire.flink.util.FlinkUtils +import org.apache.flink.api.scala._ +import org.apache.flink.streaming.api.scala.DataStream + +/** + * flink jdbc sink + * + * @author ChengLong + * @since 1.1.0 + * @create 2020-05-22 11:10 + */ +object JdbcTest extends BaseFlinkStreaming { + lazy val tableName = "spark_test" + lazy val tableName2 = "spark_test2" + + val fields = "name, age, createTime, length, sex".split(",") + + def sql(tableName: String): String = s"INSERT INTO $tableName (${fields.mkString(",")}) VALUES (?, ?, ?, ?, ?)" + + /** + * table的jdbc sink + */ + def testTableJdbcSink(stream: DataStream[Student]): Unit = { + stream.createOrReplaceTempView("student") + val table = this.fire.sqlQuery("select name, age, createTime, length, sex from student group by name, age, createTime, length, sex") + + // 方式一、table中的列顺序和类型需与jdbc sql中的占位符顺序保持一致 + table.jdbcBatchUpdate(sql(this.tableName)).setParallelism(1) + // 或者 + this.fire.jdbcBatchUpdateTable(table, sql(this.tableName2)).setParallelism(1) + + // 方式二、自定义row取数规则,适用于row中的列个数和顺序与sql占位符不一致的情况 + table.jdbcBatchUpdate2(sql(this.tableName), flushInterval = 10000, keyNum = 2)(row => { + Seq(row.getField(0), row.getField(1), row.getField(2), row.getField(3), row.getField(4)) + }) + // 或者 + this.flink.jdbcBatchUpdateTable2(table, sql(this.tableName2), keyNum = 2)(row => { + Seq(row.getField(0), row.getField(1), row.getField(2), row.getField(3), row.getField(4)) + }).setParallelism(1) + } + + /** + * stream jdbc sink + */ + def testStreamJdbcSink(stream: DataStream[Student]): Unit = { + // 方式一、指定字段列表,内部根据反射,自动获取DataStream中的数据并填充到sql中的占位符 + // 此处fields有两层含义:1. sql中的字段顺序(对应表) 2. DataStream中的JavaBean字段数据(对应JavaBean) + // 注:要保证DataStream中字段名称是JavaBean的名称,非表中字段名称 顺序要与占位符顺序一致,个数也要一致 + stream.jdbcBatchUpdate(sql(this.tableName), fields, keyNum = 6).setParallelism(3) + // 或者 + this.fire.jdbcBatchUpdateStream(stream, sql(this.tableName2), fields, keyNum = 6).setParallelism(1) + + // 方式二、通过用户指定的匿名函数方式进行数据的组装,适用于上面方法无法反射获取值的情况,适用面更广 + stream.jdbcBatchUpdate2(sql(this.tableName), 3, 30000, keyNum = 7) { + // 在此处指定取数逻辑,定义如何将dstream中每列数据映射到sql中的占位符 + value => Seq(value.getName, value.getAge, DateFormatUtils.formatCurrentDateTime(), value.getLength, value.getSex) + }.setParallelism(1) + + // 或者 + this.flink.jdbcBatchUpdateStream2(stream, sql(this.tableName2), keyNum = 7) { + value => Seq(value.getName, value.getAge, DateFormatUtils.formatCurrentDateTime(), value.getLength, value.getSex) + }.setParallelism(2) + } + + def testJdbc: Unit = { + // 执行查询操作 + val studentList = this.flink.jdbcQuery(s"select * from $tableName", clazz = classOf[Student]) + val dataStream = this.env.fromCollection(studentList) + dataStream.print() + + // 执行增删改操作 + this.flink.jdbcUpdate(s"delete from $tableName") + } + + /** + * 用于测试分布式配置 + */ + def logConf: Unit = { + println(s"isJobManager=${FlinkUtils.isJobManager} isTaskManager=${FlinkUtils.isTaskManager} hello.world=" + PropUtils.getString("hello.world", "not_found")) + println(s"isJobManager=${FlinkUtils.isJobManager} isTaskManager=${FlinkUtils.isTaskManager} hello.world.flag=" + PropUtils.getBoolean("hello.world.flag", false)) + println(s"isJobManager=${FlinkUtils.isJobManager} isTaskManager=${FlinkUtils.isTaskManager} hello.world.flag2=" + PropUtils.getBoolean("hello.world.flag", false, keyNum = 2)) + } + + override def process: Unit = { + this.logConf + val stream = this.fire.createKafkaDirectStream().filter(t => JSONUtils.isLegal(t)).map(json => { + this.logConf + JSONUtils.parseObject[Student](json) + }) + this.testTableJdbcSink(stream) + this.testStreamJdbcSink(stream) + this.testJdbc + + this.fire.start("JdbcTest") + } + + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/UDFTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/UDFTest.scala new file mode 100644 index 0000000..1843ca1 --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/UDFTest.scala @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import org.apache.flink.api.scala._ +import org.apache.flink.table.functions.ScalarFunction + +/** + * 自定义udf测试 + * + * @author ChengLong 2020年1月13日 10:36:39 + * @since 0.4.1 + */ +object UDFTest extends BaseFlinkStreaming { + override def process: Unit = { + this.flink.setParallelism(10) + val dataset = this.flink.createCollectionStream(Student.newStudentList()).map(t => t).setParallelism(5) + this.tableEnv.registerDataStream("test", dataset) + // 注册udf + this.tableEnv.createTemporarySystemFunction("appendFire", classOf[Udf]) + // 在sql中使用自定义的udf + this.flink.sql("select fireUdf(name), fireUdf(age) from test").print() + dataset.print("dataset") + + this.flink.execute() + } + + def main(args: Array[String]): Unit = { + this.init() + } + +} + + +class Udf extends ScalarFunction { + /** + * 为指定字段的值追加fire字符串 + * + * @param field + * 字段名称 + * @return + * 追加fire字符串后的字符串 + */ + def eval(field: String): String = field + "->fire" + + /** + * 支持函数的重载,会自动判断输入字段的类型调用相应的函数 + */ + def eval(field: JInt): String = field + "-> Int fire" +} \ No newline at end of file diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/WatermarkTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/WatermarkTest.scala new file mode 100644 index 0000000..7194afb --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/WatermarkTest.scala @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.common.util.{DateFormatUtils, JSONUtils} +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import com.zto.fire.flink.ext.watermark.FirePeriodicWatermarks +import org.apache.commons.lang3.StringUtils +import org.apache.flink.api.scala._ +import org.apache.flink.streaming.api.scala.OutputTag +import org.apache.flink.streaming.api.scala.function.WindowFunction +import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows +import org.apache.flink.streaming.api.windowing.time.Time +import org.apache.flink.streaming.api.windowing.windows.TimeWindow +import org.apache.flink.util.Collector + +import java.text.SimpleDateFormat + +/** + * 水位线的使用要求: + * 1. 开启EventTime:flink.stream.time.characteristic = EventTime + * 2. 不同的task中有多个水位线实例,本地测试为了尽快看到效果,要降低并行度 + * 3. 多个task中的水位线会取最早的 + * 4. 水位线触发条件:1)多个task中时间最早的水位线时间 >= window窗口end时间 2)窗口中有数据 + * 5. 水位线是为了解决乱序和延迟数据的问题 + * 6. 乱序数据超过水位线的三种处理方式:1. 丢弃(默认) 2. allowedLateness,相当于进一步宽容的时间 3. sideOutputLateData:将延迟数据收集起来,统一处理 + * + * @author ChengLong 2020-4-13 15:58:38 + */ +object WatermarkTest extends BaseFlinkStreaming { + + override def process: Unit = { + // source端接入消息并解析 + val dstream = this.fire.createKafkaDirectStream().filter(str => StringUtils.isNotBlank(str) && str.contains("}")).map(str => { + val student = JSONUtils.parseObject[Student](str) + (student, DateFormatUtils.formatDateTime(student.getCreateTime).getTime) + }) + + // 分配并计算水位线,默认允许最大的乱序时间为10s,若需指定,则通过构造方法传参new FirePeriodicWatermarks(100) + val watermarkDS = dstream.assignTimestampsAndWatermarks(new FirePeriodicWatermarks[(Student, Long)]() { + val format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS") + + /** + * 抽取eventtime字段 + */ + override def extractTimestamp(element: (Student, Long), previousElementTimestamp: Long): Long = { + println("---> 抽取eventtime:" + element._2 + " 最新水位线值:" + this.watermark.getTimestamp) + element._2 + } + + }).setParallelism(1) // 并行度调整为1的好处是能尽快观察到水位线的效果,否则要等多个task满足条件,不易观察结果 + + val windowDStream = watermarkDS + .keyBy(_._1) + .window(TumblingEventTimeWindows.of(Time.seconds(3))) + // 最大允许延迟的数据3s,算上水位线允许最大的乱序时间10s,一共允许最大的延迟时间为13s + .allowedLateness(Time.seconds(3)) + // 收集延期的数据 + .sideOutputLateData(this.outputTag.asInstanceOf[OutputTag[(Student, Long)]]) + .apply(new WindowFunctionTest) + + windowDStream.print().setParallelism(1) + // 获取由于延迟太久而被丢弃的数据 + windowDStream.getSideOutput[(Student, Long)](this.outputTag.asInstanceOf[OutputTag[(Student, Long)]]).map(t => ("丢弃", t)).print() + + this.fire.start + } + + /** + * 泛型说明: + * 1. IN: The type of the input value. + * 2. OUT: The type of the output value. + * 3. KEY: The type of the key. + */ + class WindowFunctionTest extends WindowFunction[(Student, Long), (Student, Long), Student, TimeWindow] { + override def apply(key: Student, window: TimeWindow, input: Iterable[(Student, Long)], out: Collector[(Student, Long)]): Unit = { + println("-->" + JSONUtils.toJSONString(key)) + val sortedList = input.toList.sortBy(_._2) + sortedList.foreach(t => { + println("---> " + JSONUtils.toJSONString(t._1)) + out.collect(t) + }) + } + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/WindowTest.scala b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/WindowTest.scala new file mode 100644 index 0000000..57bd9aa --- /dev/null +++ b/fire-examples/flink-examples/src/main/scala/com/zto/fire/examples/flink/stream/WindowTest.scala @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.flink.stream + +import com.zto.fire._ +import com.zto.fire.common.util.JSONUtils +import com.zto.fire.examples.bean.Student +import com.zto.fire.flink.BaseFlinkStreaming +import org.apache.flink.api.scala._ +import org.apache.flink.streaming.api.TimeCharacteristic +import org.apache.flink.streaming.api.scala.DataStream +import org.apache.flink.streaming.api.windowing.time.Time + +/** + * window相当于将源源不断的流按一定的规则切分成有界流,然后为每个有界流分别计算 + * 当程序挂掉重启后,window中的数据不会丢失,会接着之前的window继续计算 + * 注:不建议使用windowAll,该api会将数据发送到同一个分区,造成严重的性能问题 + * + * @author ChengLong 2020-4-18 14:34:58 + */ +object WindowTest extends BaseFlinkStreaming { + + override def process: Unit = { + val dstream = this.fire.createKafkaDirectStream().map(t => JSONUtils.parseObject[Student](t)).map(s => (s.getName, s.getAge)) + this.testTimeWindow(dstream) + + this.fire.start + } + + /** + * 如果是keyedStream,则窗口函数为countWindow + */ + private def testCountWindow(dstream: DataStream[(String, Integer)]): Unit = { + dstream.keyBy(_._1) + // 第一个参数表示窗口大小,窗口的容量是2条记录,达到2条会满,作为一个单独的window实例 + // 第二个参数如果不指定,则表示为滚动窗口(没有重叠),如果指定则为滑动窗口(有重叠) + // 以下表示每隔1条数据统计一次window数据,而这个window中包含2条记录 + .countWindow(2, 1) + .sum(1).print() + } + + /** + * 如果是普通的Stream,则窗口函数为countWindowAll + */ + def testCountWindowAll(dstream: DataStream[(String, Integer)]): Unit = { + // 表示每2条计算一次,每次将计算好的两条记录结果打印 + dstream.countWindowAll(2).sum(1).print() + } + + /** + * 时间窗口 + */ + def testTimeWindow(dstream: DataStream[(String, Integer)]): Unit = { + // 窗口的宽度为1s,每隔1s钟处理过去1s的数据,这1s的时间内窗口中的记录数可多可少 + dstream.timeWindowAll(Time.seconds(1)).sum(1).print() + // 创建一个基于process时间(支持event时间)的滑动窗口,窗口大小为10秒,每隔5秒创建一个 + dstream.keyBy(_._1).slidingTimeWindow(Time.seconds(10), Time.seconds(5), timeCharacteristic = TimeCharacteristic.ProcessingTime).sum(1).printToErr() + // 创建一个滚动窗口 + dstream.keyBy(_._1).tumblingTimeWindow(Time.seconds(10)).sum(1).print() + // 创建一个session会话窗口,当5秒内没有消息进入,则单独划分一个窗口 + dstream.keyBy(_._1).sessionTimeWindow(Time.seconds(5)).sum(1).printToErr() + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/pom.xml b/fire-examples/pom.xml new file mode 100644 index 0000000..6a75c21 --- /dev/null +++ b/fire-examples/pom.xml @@ -0,0 +1,79 @@ + + + + + 4.0.0 + fire-examples_2.12 + pom + fire-examples + + + spark-examples + flink-examples + + + + com.zto.fire + fire-parent_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + + com.zto.fire + fire-common_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-core_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-jdbc_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-hbase_${scala.binary.version} + ${project.version} + + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-examples/spark-examples/pom.xml b/fire-examples/spark-examples/pom.xml new file mode 100644 index 0000000..b89abf5 --- /dev/null +++ b/fire-examples/spark-examples/pom.xml @@ -0,0 +1,339 @@ + + + + + 4.0.0 + spark-examples_${spark.reference} + jar + spark-examples + + + com.zto.fire + fire-examples_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + + com.zto.fire + fire-common_${scala.binary.version} + ${project.version} + + + com.zto.fire + fire-spark_${spark.reference} + ${project.version} + + + + com.fasterxml.jackson.core + jackson-databind + 2.10.0 + ${maven.scope} + + + com.fasterxml.jackson.core + jackson-core + 2.10.0 + ${maven.scope} + + + + + org.apache.spark + spark-core_${scala.binary.version} + + + com.esotericsoftware.kryo + kryo + + + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-sql_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-streaming_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-hive_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-sql-kafka-0-10_${scala.binary.version} + ${spark.version} + + + org.apache.spark + spark-streaming_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.spark + spark-streaming-kafka-0-10_${scala.binary.version} + ${spark.version} + + + + + org.apache.hadoop + hadoop-common + ${hadoop.version} + ${maven.scope} + + + org.apache.hadoop + hadoop-hdfs + ${hadoop.version} + ${maven.scope} + + + org.apache.hadoop + hadoop-client + ${hadoop.version} + ${maven.scope} + + + + + org.apache.hbase + hbase-common + ${hbase.version} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-server + ${hbase.version} + + + org.apache.hbase + hbase-client + + + + + org.apache.hbase + hbase-client_${scala.binary.version} + ${hbase.version} + + + org.apache.hbase + hbase-spark${spark.major.version}_${scala.binary.version} + ${hbase.version} + + + org.apache.hbase + hbase-client + + + + + + + org.apache.kudu + kudu-spark${spark.major.version}_${scala.binary.version} + ${kudu.version} + ${maven.scope} + + + org.apache.kudu + kudu-client + ${kudu.version} + ${maven.scope} + + + + + org.apache.rocketmq + rocketmq-client + ${rocketmq.version} + + + org.apache.rocketmq + rocketmq-spark${spark.major.version}_${scala.binary.version} + ${rocketmq.external.version} + + + + + org.apache.spark + spark-avro_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.hudi + hudi-spark-bundle_${scala.binary.version} + 0.7.0 + ${maven.scope} + + + ru.yandex.clickhouse + clickhouse-jdbc + 0.2.4 + ${maven.scope} + + + com.google.guava + guava + ${guava.version} + + + + + + + hadoop-2.7 + + org.spark-project.hive + 1.2.1.spark2 + + + true + + + + org.apache.spark + spark-hive_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.hive + hive-common + + + org.apache.hive + hive-exec + + + org.apache.hive + hive-metastore + + + org.apache.hive + hive-serde + + + org.apache.hive + hive-shims + + + + + org.apache.spark + spark-hive-thriftserver_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + org.apache.hive + hive-cli + + + org.apache.hive + hive-jdbc + + + org.apache.hive + hive-beeline + + + + + ${hive.group} + hive-cli + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-jdbc + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-beeline + ${hive.version} + ${maven.scope} + + + + ${hive.group} + hive-common + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-metastore + ${hive.version} + ${maven.scope} + + + ${hive.group} + hive-exec + ${hive.version} + ${maven.scope} + + + org.apache.commons + commons-lang3 + + + org.apache.spark + spark-core_2.10 + + + + + + + hadoop-3.2 + + + org.apache.spark + spark-hive_${scala.binary.version} + ${spark.version} + ${maven.scope} + + + + + diff --git a/fire-examples/spark-examples/src/main/java/com/zto/fire/examples/bean/Hudi.java b/fire-examples/spark-examples/src/main/java/com/zto/fire/examples/bean/Hudi.java new file mode 100644 index 0000000..708b747 --- /dev/null +++ b/fire-examples/spark-examples/src/main/java/com/zto/fire/examples/bean/Hudi.java @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.bean; + +import com.zto.fire.common.util.DateFormatUtils; +import com.zto.fire.common.util.JSONUtils; + +import java.time.LocalDateTime; +import java.time.format.DateTimeFormatter; +import java.util.Arrays; +import java.util.Date; +import java.util.List; + +/** + * @author ChengLong + * @create 2021-02-07 16:45 + * @since 1.0.0 + */ +public class Hudi { + private Long id; + private String name; + private Integer age; + private Boolean sex; + private String createTime; + private String ds; + private static int num = 0; + + public Hudi(Long id, String name, Integer age, Boolean sex) { + this.id = id; + this.name = name; + this.age = age; + this.sex = sex; + this.createTime = DateFormatUtils.formatCurrentDateTime(); + if (num % 2 == 0) { + this.ds = DateFormatUtils.formatBySchema(new Date(), "yyyyMMdd"); + } else { + this.ds = "20200206"; + } + num += 1; + } + + public Hudi() { + + } + + public String getDs() { + return ds; + } + + public void setDs(String ds) { + this.ds = ds; + } + + public String getCreateTime() { + return createTime; + } + + public void setCreateTime(String createTime) { + this.createTime = createTime; + } + + public Hudi(Long id) { + this.id = id; + } + + public Boolean getSex() { + return sex; + } + + public void setSex(Boolean sex) { + this.sex = sex; + } + + public Long getId() { + return id; + } + + public void setId(Long id) { + this.id = id; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public Integer getAge() { + return age; + } + + public void setAge(Integer age) { + this.age = age; + } + + @Override + public String toString() { + return JSONUtils.toJSONString(this); + } + + public static List newHudiList() { + return Arrays.asList( + new Hudi(1L, "admin", 12, true), + new Hudi(2L, "root", 22, true), + new Hudi(3L, "scala", 11, true), + new Hudi(4L, "spark", 15, true), + new Hudi(5L, "java", 16, true), + new Hudi(6L, "hive", 17, true), + new Hudi(7L, "presto", 18, true), + new Hudi(8L, "flink", 19, true), + new Hudi(9L, "streaming", 20, true), + new Hudi(10L, "sql", 12, true) + ); + } + + public static void main(String[] args) { + LocalDateTime dateTime = LocalDateTime.of(2020, 2, 8, 15, 50, 30); + dateTime.plusYears(1); + System.out.println(dateTime.format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"))); + System.out.println(LocalDateTime.now().format(DateTimeFormatter.ISO_LOCAL_DATE_TIME)); + } +} diff --git a/fire-examples/spark-examples/src/main/java/com/zto/fire/examples/bean/Student.java b/fire-examples/spark-examples/src/main/java/com/zto/fire/examples/bean/Student.java new file mode 100644 index 0000000..a4f6c88 --- /dev/null +++ b/fire-examples/spark-examples/src/main/java/com/zto/fire/examples/bean/Student.java @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.bean; + +import com.zto.fire.common.anno.FieldName; +import com.zto.fire.common.util.DateFormatUtils; +import com.zto.fire.common.util.JSONUtils; +import com.zto.fire.hbase.bean.HBaseBaseBean; + +import java.math.BigDecimal; +import java.util.Arrays; +import java.util.List; +import java.util.Objects; + +/** + * 对应HBase表的JavaBean + * + * @author ChengLong 2019-6-20 16:06:16 + */ +// @HConfig(multiVersion = true) +public class Student extends HBaseBaseBean { + private Long id; + private String name; + private Integer age; + // 多列族情况下需使用family单独指定 + private String createTime; + // 若JavaBean的字段名称与HBase中的字段名称不一致,需使用value单独指定 + // 此时hbase中的列名为length1,而不是length + @FieldName(family = "data", value = "length1") + private BigDecimal length; + private Boolean sex; + + /** + * rowkey的构建 + * + * @return + */ + @Override + public Student buildRowKey() { + this.rowKey = this.id.toString(); + return this; + } + + public Student(Long id, String name) { + this.id = id; + this.name = name; + } + + public Student(Long id, String name, Integer age) { + this.id = id; + this.name = name; + this.age = age; + } + + public Student(Long id, String name, Integer age, BigDecimal length, Boolean sex, String createTime) { + this.id = id; + this.name = name; + this.age = age; + this.length = length; + this.sex = sex; + this.createTime = createTime; + } + + public Student(Long id, String name, Integer age, BigDecimal length) { + this.id = id; + this.name = name; + this.age = age; + this.length = length; + } + + public Student() { + + } + + public Student(Long id) { + this.id = id; + } + + public String getCreateTime() { + return createTime; + } + + public void setCreateTime(String createTime) { + this.createTime = createTime; + } + + public BigDecimal getLength() { + return length; + } + + public void setLength(BigDecimal length) { + this.length = length; + } + + public Boolean getSex() { + return sex; + } + + public void setSex(Boolean sex) { + this.sex = sex; + } + + public Long getId() { + return id; + } + + public void setId(Long id) { + this.id = id; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public Integer getAge() { + return age; + } + + public void setAge(Integer age) { + this.age = age; + } + + @Override + public String toString() { + return JSONUtils.toJSONString(this); + } + + public static List newStudentList() { + String dateTime = DateFormatUtils.formatCurrentDateTime(); + return Arrays.asList( + new Student(1L, "admin", 12, BigDecimal.valueOf(12.1), true, dateTime), + new Student(2L, "root", 22, BigDecimal.valueOf(22), true, dateTime), + new Student(3L, "scala", 11, BigDecimal.valueOf(11), true, dateTime), + new Student(4L, "spark", 15, BigDecimal.valueOf(15), true, dateTime), + new Student(5L, "java", 16, BigDecimal.valueOf(16.1), true, dateTime), + new Student(6L, "hive", 17, BigDecimal.valueOf(17.1), true, dateTime), + new Student(7L, "presto", 18, BigDecimal.valueOf(18.1), true, dateTime), + new Student(8L, "flink", 19, BigDecimal.valueOf(19.1), true, dateTime), + new Student(9L, "streaming", 10, BigDecimal.valueOf(10.1), true, dateTime), + new Student(10L, "sql", 12, BigDecimal.valueOf(12.1), true, dateTime) + ); + } + + @Override + public boolean equals(Object o) { + if (this == o) return true; + if (!(o instanceof Student)) return false; + Student student = (Student) o; + return Objects.equals(id, student.id) && + Objects.equals(name, student.name) && + Objects.equals(age, student.age) && + Objects.equals(createTime, student.createTime) && + Objects.equals(length, student.length) && + Objects.equals(sex, student.sex); + } + + @Override + public int hashCode() { + return Objects.hash(id, name, age, createTime, length, sex); + } +} diff --git a/fire-examples/spark-examples/src/main/resources/HiveClusterReader.properties b/fire-examples/spark-examples/src/main/resources/HiveClusterReader.properties new file mode 100644 index 0000000..5450586 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/HiveClusterReader.properties @@ -0,0 +1,23 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# spark 任务的appName,不配置则取类名 +spark.appName = hiveReader +spark.log.level = ERROR +# ------------------- < hive 配置 > ------------------- # +# hive 集群名称(batch离线hive/streaming 180集群hive/test本地测试hive),用于spark跨集群读取hive元数据信息 +spark.hive.cluster = batch \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/HudiTest.properties b/fire-examples/spark-examples/src/main/resources/HudiTest.properties new file mode 100644 index 0000000..5857def --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/HudiTest.properties @@ -0,0 +1,22 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.serializer=org.apache.spark.serializer.KryoSerializer +spark.local.cores=2 +spark.default.parallelism=2 +spark.hive.cluster=test +spark.sql.hive.convertMetastoreParquet=false \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/JdbcSinkTest.properties b/fire-examples/spark-examples/src/main/resources/JdbcSinkTest.properties new file mode 100644 index 0000000..636fd56 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/JdbcSinkTest.properties @@ -0,0 +1,28 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics = fire +# 非必须配置项:默认为appName +spark.kafka.group.id = fire +spark.fire.rest.filter.enable = false +spark.hive.cluster = test +spark.hbase.cluster = test +spark.chkpoint.dir = /tmp/spark-checkpoint/ +spark.sql.shuffle.partitions = 300 +spark.sql.streaming.metricsEnabled = true \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/MapTest.properties b/fire-examples/spark-examples/src/main/resources/MapTest.properties new file mode 100644 index 0000000..fe00db4 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/MapTest.properties @@ -0,0 +1,25 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.hive.cluster=test +spark.hbase.cluster=test +# 非必须配置项:默认就是这个地址 +spark.kafka.brokers.name = test +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics = fire +# 非必须配置项:默认为appName +spark.kafka.group.id = fire \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/RocketTest.properties b/fire-examples/spark-examples/src/main/resources/RocketTest.properties new file mode 100644 index 0000000..d1c3458 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/RocketTest.properties @@ -0,0 +1,29 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.log.level=INFO +spark.streaming.batch.duration=10 +# 非必须配置项:默认是大数据的rocket地址 ZmsClusterX +spark.rocket.brokers.name=192.168.1.174:9876;192.168.1.179:9876 +spark.rocket.topics=SCANRECORD +spark.rocket.consumer.instance=FireFramework +#spark.hbase.cluster=streaming +spark.rocket.group.id=sjzn_spark_scanrecord_test +spark.rocket.pull.max.speed.per.partition=15000 +spark.rocket.consumer.tag=1||2||3||4||5||8||44||45 +spark.streaming.backpressure.enabled=false +spark.streaming.backpressure.initialRate=100 \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/ScheduleTest.properties b/fire-examples/spark-examples/src/main/resources/ScheduleTest.properties new file mode 100644 index 0000000..022e5ad --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/ScheduleTest.properties @@ -0,0 +1,20 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.fire.task.schedule.enable = true +# 定时任务黑名单,配置方法名,多个以逗号分隔,配置的方法将不再被定时任务定时拉起 +spark.fire.scheduler.blacklist = jvmMonitor,setConf2,registerAcc \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/StructuredStreamingTest.properties b/fire-examples/spark-examples/src/main/resources/StructuredStreamingTest.properties new file mode 100644 index 0000000..33d4d64 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/StructuredStreamingTest.properties @@ -0,0 +1,23 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.kafka.brokers.name = zmsNew +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.group.id = fire +spark.kafka.topics = sjzn_spark_scan_send_topic +spark.hive.cluster = batch +spark.kafka.poll.timeout.ms = 10000 \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/Test.properties b/fire-examples/spark-examples/src/main/resources/Test.properties new file mode 100644 index 0000000..43fbf22 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/Test.properties @@ -0,0 +1,32 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +hive.cluster=test +hbase.cluster=test +# 非必须配置项:默认是大数据的kafka地址 +kafka.brokers.name=bigdata_test +kafka.topics=fire +kafka.group.id=fire2 +fire.rest.filter.enable=true +fire.streaming.remember=10000 +fire.rest.enable=true +fire.config_center.local.enable=false +fire.task.schedule.enable=true +local.cores=6 +#fire.config_center.enable=false +fire.scheduler.blacklist=jvmMonitor +fire.hello=fire2020 \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/TestHive.properties b/fire-examples/spark-examples/src/main/resources/TestHive.properties new file mode 100644 index 0000000..9628dd0 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/TestHive.properties @@ -0,0 +1,19 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.hive.cluster=test +spark.fire.config_center.enable=false \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/ThreadTest.properties b/fire-examples/spark-examples/src/main/resources/ThreadTest.properties new file mode 100644 index 0000000..0e6672e --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/ThreadTest.properties @@ -0,0 +1,34 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# 非必须配置项:spark 任务的appName,不配置则取类名 +# spark.appName = test +spark.log.level = INFO +# 非必须配置项:默认就是这个地址 +spark.kafka.brokers.name = zms +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics = aries_binlog_order +# 非必须配置项:默认为appName +spark.kafka.group.id = OrderDetailMainCommon + +# ------------------- < hbase 配置 > ------------------- # +# 用于区分不同的hbase集群: batch/streaming/old +spark.hbase.cluster = streaming + +# spark的参数可以直接写在下面,都会被加载,覆盖程序中默认的配置信息 +spark.speculation = false +spark.streaming.concurrentJobs = 1 \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/acc/FireAccTest.properties b/fire-examples/spark-examples/src/main/resources/acc/FireAccTest.properties new file mode 100644 index 0000000..dee4bae --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/acc/FireAccTest.properties @@ -0,0 +1,26 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +spark.log.level = warn +# 非必须配置项:默认就是这个地址 +spark.kafka.brokers.name = zmsNew +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics = sjzn_spark_order_unique_topic +# 非必须配置项:默认为appName +spark.kafka.group.id = fire +spark.fire.rest.filter.enable = false +spark.hive.cluster = batch \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/common.properties b/fire-examples/spark-examples/src/main/resources/common.properties new file mode 100644 index 0000000..7a2ac42 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/common.properties @@ -0,0 +1,39 @@ +# 定义url的别名与url对应关系,后续可通过别名进行配置 +spark.db.jdbc.url.map.test = jdbc:mysql://192.168.0.1:3306/fire +# 支持别名或直接指定url +spark.db.jdbc.url = test +spark.db.jdbc.driver = com.mysql.jdbc.Driver +spark.db.jdbc.user = root +spark.db.jdbc.password = root +spark.db.jdbc.batch.size = 10 + +# 配置另一个数据源,对应的操作需对应加数字后缀,如:this.spark.jdbcQueryDF2(sql, Seq(1, 2, 3), classOf[Student]) +spark.db.jdbc.url2 = jdbc:mysql://192.168.0.2:3306/fire2 +spark.db.jdbc.driver2 = com.mysql.jdbc.Driver +spark.db.jdbc.user2 = root +spark.db.jdbc.password2 = root +# 每个批次提交的数据大小,默认1000条 +spark.db.jdbc.batch.size2 = 2 + +spark.db.jdbc.url3 = jdbc:mysql://192.168.0.3:3306/fire3 +spark.db.jdbc.driver3 = com.mysql.jdbc.Driver +spark.db.jdbc.user3 = root +spark.db.jdbc.password3 = root +# 事务的隔离级别NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, SERIALIZABLE,默认为READ_UNCOMMITTED +spark.db.jdbc.isolation.level3 = none +# 每个批次插入、更新、删除的数据量,默认为1000 +spark.db.jdbc.batch.size3 = 2000 + +spark.db.jdbc.url5 = jdbc:mysql://192.168.0.5:3306/fire5 +spark.db.jdbc.driver5 = com.mysql.jdbc.Driver +spark.db.jdbc.user5 = root +spark.db.jdbc.password5 = root +# 事务的隔离级别NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, SERIALIZABLE,默认为READ_UNCOMMITTED +spark.db.jdbc.isolation.level5 = none +# 每个批次插入、更新、删除的数据量,默认为1000 +spark.db.jdbc.batch.size5 = 2000 + +spark.db.jdbc.url6 = jdbc:mysql://192.168.0.6:3306/fire6 +spark.db.jdbc.driver6 = com.mysql.jdbc.Driver +spark.db.jdbc.user6 = root +spark.db.jdbc.password6 = root \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/datasource/DataSourceTest.properties b/fire-examples/spark-examples/src/main/resources/datasource/DataSourceTest.properties new file mode 100644 index 0000000..fe81ec5 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/datasource/DataSourceTest.properties @@ -0,0 +1,53 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# 一、hudi datasource,全部基于配置文件进行配置 +spark.datasource.format=org.apache.hudi +spark.datasource.saveMode=Append +# 用于区分调用save(path)还是saveAsTable +spark.datasource.isSaveTable=false +# 传入到底层save或saveAsTable方法中 +spark.datasource.saveParam=/user/hive/warehouse/hudi.db/hudi_bill_event_test + +# 以spark.datasource.options.为前缀的配置用于配置hudi相关的参数,可覆盖代码中同名的配置 +spark.datasource.options.hoodie.datasource.write.recordkey.field=id +spark.datasource.options.hoodie.datasource.write.precombine.field=id +spark.datasource.options.hoodie.datasource.write.partitionpath.field=ds +spark.datasource.options.hoodie.table.name=hudi.hudi_bill_event_test +spark.datasource.options.hoodie.datasource.write.hive_style_partitioning=true +spark.datasource.options.hoodie.datasource.write.table.type=MERGE_ON_READ +spark.datasource.options.hoodie.insert.shuffle.parallelism=128 +spark.datasource.options.hoodie.upsert.shuffle.parallelism=128 +spark.datasource.options.hoodie.fail.on.timeline.archiving=false +spark.datasource.options.hoodie.clustering.inline=true +spark.datasource.options.hoodie.clustering.inline.max.commits=8 +spark.datasource.options.hoodie.clustering.plan.strategy.target.file.max.bytes=1073741824 +spark.datasource.options.hoodie.clustering.plan.strategy.small.file.limit=629145600 +spark.datasource.options.hoodie.clustering.plan.strategy.daybased.lookback.partitions=2 + +# 二、配置第二个数据源,以数字后缀作为区分,部分使用配置文件进行配置 +spark.datasource.format2=org.apache.hudi2 +spark.datasource.saveMode2=Overwrite +# 用于区分调用save(path)还是saveAsTable +spark.datasource.isSaveTable2=false +# 传入到底层save或saveAsTable方法中 +spark.datasource.saveParam2=/user/hive/warehouse/hudi.db/hudi_bill_event_test2 + +# 三、配置第三个数据源,用于代码中进行read操作 +spark.datasource.format3=org.apache.hudi3 +spark.datasource.loadParam3=/user/hive/warehouse/hudi.db/hudi_bill_event_test3 +spark.datasource.options.hoodie.datasource.write.recordkey.field3=id3 \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/hbase/HBaseBulkTest.properties b/fire-examples/spark-examples/src/main/resources/hbase/HBaseBulkTest.properties new file mode 100644 index 0000000..8511b24 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/hbase/HBaseBulkTest.properties @@ -0,0 +1,26 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# 非必须配置项:spark 任务的appName,不配置则取类名 +spark.appName = HBaseBulkTest +spark.log.level = info +# ------------------- < hbase 配置 > ------------------- # +# 用于区分不同的hbase集群: batch/streaming/old +spark.hbase.cluster = test +spark.hbase.cluster2 = test +spark.fire.hbase.scan.repartitions = 3 +spark.fire.hbase.storage.level = DISK_ONLY \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/hbase/HBaseConnectorTest.properties b/fire-examples/spark-examples/src/main/resources/hbase/HBaseConnectorTest.properties new file mode 100644 index 0000000..b005781 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/hbase/HBaseConnectorTest.properties @@ -0,0 +1,27 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# 非必须配置项:spark 任务的appName,不配置则取类名 +spark.appName = HBaseConnectorTest +spark.log.level = warn +# ------------------- < hbase 配置 > ------------------- # +# 用于区分不同的hbase集群: batch/streaming/old +spark.hbase.cluster = test +spark.hbase.cluster2 = test +spark.fire.hbase.scan.repartitions = 3 +spark.fire.hbase.storage.level = DISK_ONLY +fire.shutdown.auto.exit = true \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/hbase/HBaseHadoopTest.properties b/fire-examples/spark-examples/src/main/resources/hbase/HBaseHadoopTest.properties new file mode 100644 index 0000000..679fc11 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/hbase/HBaseHadoopTest.properties @@ -0,0 +1,27 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# 非必须配置项:spark 任务的appName,不配置则取类名 +spark.appName = HBaseHadoopTest +spark.log.level = ERROR +# ------------------- < hbase 配置 > ------------------- # +# 用于区分不同的hbase集群: batch/streaming/old +spark.hbase.cluster = test +spark.hbase.cluster2 = test +# 通过HBase scan后repartition的分区数,需根据scan后的数据量做配置 +spark.fire.hbase.scan.partitions = 3 +spark.fire.hbase.storage.level = DISK_ONLY \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/hbase/HBaseStreamingTest.properties b/fire-examples/spark-examples/src/main/resources/hbase/HBaseStreamingTest.properties new file mode 100644 index 0000000..27cedea --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/hbase/HBaseStreamingTest.properties @@ -0,0 +1,41 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# 非必须配置项:spark 任务的appName,不配置则取类名 +# spark.appName = test +spark.log.level = INFO +# 非必须配置项:默认是大数据的kafka地址,如果连zms,则将bigdata替换为zms +spark.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics = fire +# 非必须配置项:默认为appName +spark.kafka.group.id = fire +spark.streaming.batch.duration = 30 +spark.hvie.cluster = test + +# ------------------- < hbase 配置 > ------------------- # +# 用于区分不同的hbase集群: batch/streaming/old +spark.hbase.cluster = test +spark.hbase.cluster2 = test +spark.fire.rest.filter.enable = false +spark.fire.hbase.scan.repartitions = 30 +spark.fire.hbase.storage.level = DISK_ONLY +spark.fire.rest.url.hostname = true + +# spark的参数可以直接写在下面,都会被加载,覆盖程序中默认的配置信息 +spark.speculation = false +spark.streaming.concurrentJobs = 1 \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/jdbc/JdbcStreamingTest.properties b/fire-examples/spark-examples/src/main/resources/jdbc/JdbcStreamingTest.properties new file mode 100644 index 0000000..296d9f7 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/jdbc/JdbcStreamingTest.properties @@ -0,0 +1,32 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +######################################################################################### +# JDBC数据源配置信息详见:common.properties,公共数据源配置可放到common.properties中,便于维护 # +######################################################################################### + + +# 非必须配置项:spark 任务的appName,不配置则取类名 +# spark.appName = test +spark.log.level = INFO +# 非必须配置项:默认是大数据的kafka地址,如果连zms,则将bigdata替换为zms +spark.kafka.brokers.name = bigdata_test +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics = fire +# 非必须配置项:默认为appName +spark.kafka.group.id = fire +spark.fire.rest.filter.enable = false \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/jdbc/JdbcTest.properties b/fire-examples/spark-examples/src/main/resources/jdbc/JdbcTest.properties new file mode 100644 index 0000000..d93511a --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/jdbc/JdbcTest.properties @@ -0,0 +1,45 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +######################################################################################### +# JDBC数据源配置信息详见:common.properties,公共数据源配置可放到common.properties中,便于维护 # +######################################################################################### + + +# 非必须配置项:spark 任务的appName,不配置则取类名 +# spark.appName = test +tableName = spark_test +#tableName = t_hosts + +spark.log.level = INFO +spark.fire.jdbc.storage.level = DISK_ONLY +spark.fire.jdbc.query.partitions = 12 +spark.fire.acc.enable = true +spark.log.level.fire_conf.com.zto.fire= info +# fire框架埋点日志开关,关闭以后将不再打印埋点日志 +spark.fire.log.enable = true +# 用于限定fire框架中sql日志的字符串长度 +spark.fire.log.sql.length = 100 +#spark.fire.jdbc.storage.level = memory_and_disk_ser +# 通过JdbcConnector查询后将数据集放到多少个分区中,需根据实际的结果集做配置 +#spark.fire.jdbc.query.partitions = 10 +spark.fire.rest.filter.enable = false + +hello.world = 2020 +spark.fire.config_center.enable = true +hello.world.flag = false +hello.world.flag2 = false \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/streaming/KafkaTest.properties b/fire-examples/spark-examples/src/main/resources/streaming/KafkaTest.properties new file mode 100644 index 0000000..6665898 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/streaming/KafkaTest.properties @@ -0,0 +1,53 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# 非必须配置项:默认就是这个地址 +spark.kafka.brokers.name = zms +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics = aries_binlog_preorder +# 非必须配置项:默认为appName +spark.kafka.group.id = fire +spark.fire.rest.filter.enable = false +spark.log.level.fire_conf.com.zto.fire= debug +spark.streaming.stopGracefullyOnShutdown= false +spark.kafka.brokers.name2 = zms +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics2 = cainiao_push_log,zto_scan_rec +# 非必须配置项:默认为appName +spark.kafka.group.id2 = fire + +spark.kafka.brokers.name3 = zmsNew +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics3 = sjzn_spark_order_unique_topic +# 非必须配置项:默认为appName +spark.kafka.group.id3 = fire + +# 自定义的kafka地址 +spark.kafka.brokers.name5 = zmsNew +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics5 = sjzn_spark_binlog_order_topic +# 非必须配置项:默认为appName +spark.kafka.group.id5 = fire + + +# ------------------- < hbase 配置 > ------------------- # +# 用于区分不同的hbase集群: batch/streaming/old +spark.hive.cluster = batch +# spark的参数可以直接写在下面,都会被加载,覆盖程序中默认的配置信息 +spark.speculation = false +spark.streaming.concurrentJobs = 1 +spark.streaming.batch.duration = 30 \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/resources/streaming/LoadTest.properties b/fire-examples/spark-examples/src/main/resources/streaming/LoadTest.properties new file mode 100644 index 0000000..f20d7b0 --- /dev/null +++ b/fire-examples/spark-examples/src/main/resources/streaming/LoadTest.properties @@ -0,0 +1,32 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# 非必须配置项:默认就是这个地址 +spark.kafka.brokers.name = zmsNew +# 必须配置项:kafka的topic列表,以逗号分隔 +spark.kafka.topics = sjzn_spark_scan_car_topic +# 非必须配置项:默认为appName +spark.kafka.group.id = fire + +# ------------------- < hbase 配置 > ------------------- # +# 用于区分不同的hbase集群: batch/streaming/old +spark.hbase.cluster = streaming +spark.hive.cluster = batch + +# spark的参数可以直接写在下面,都会被加载,覆盖程序中默认的配置信息 +spark.speculation = false +spark.streaming.concurrentJobs = 1 \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/HudiTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/HudiTest.scala new file mode 100644 index 0000000..310841d --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/HudiTest.scala @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark + +import java.util.Date + +import com.zto.fire._ +import com.zto.fire.common.util.DateFormatUtils +import com.zto.fire.examples.bean.Hudi +import com.zto.fire.spark.BaseSparkCore +import org.apache.hudi.DataSourceWriteOptions._ +import org.apache.hudi.DataSourceReadOptions._ +import org.apache.hudi.QuickstartUtils._ +import org.apache.hudi.config.HoodieIndexConfig +import org.apache.hudi.config.HoodieWriteConfig._ +import org.apache.hudi.index.HoodieIndex +import org.apache.spark.sql.SaveMode._ + +/** + * hudi测试 + * + * @author ChengLong + * @since 1.0.0 + * @create 2021-02-07 13:50 + */ +object HudiTest extends BaseSparkCore { + val tableName = "t_hudi" + val basePath = "J:\\hudi" + val dataGen = new DataGenerator + + /** + * 将DataFrame Overwrite到指定的路径下 + */ + def insert: Unit = { + val df = this.spark.createDataFrame(Hudi.newHudiList(), classOf[Hudi]) + df.write.format("org.apache.hudi") + .options(getQuickstartWriteConfigs) + .option(PRECOMBINE_FIELD_OPT_KEY, "id") + .option(RECORDKEY_FIELD_OPT_KEY, "name") + .option(PARTITIONPATH_FIELD_OPT_KEY, "ds") + .option(TABLE_NAME, tableName) + .mode(Overwrite) + .save(basePath) + } + + + /** + * 将DataFrame Overwrite到指定的路径下,并将表信息同步到hive元数据中 + */ + def insertHive: Unit = { + val df = this.spark.createDataFrame(Hudi.newHudiList(), classOf[Hudi]) + df.write.format("org.apache.hudi") + .options(getQuickstartWriteConfigs) + // 设置主键列名 + .option(PRECOMBINE_FIELD_OPT_KEY, "id") + // 设置数据更新时间的列名 + .option(RECORDKEY_FIELD_OPT_KEY, "name") + // 分区列设置 + .option(PARTITIONPATH_FIELD_OPT_KEY, "ds") + .option(TABLE_NAME, tableName) + // 设置要同步的hive库名 + .option(HIVE_DATABASE_OPT_KEY, "tmp") + // 设置要同步的hive表名 + .option(HIVE_TABLE_OPT_KEY, "t_hudi") + // 设置数据集注册并同步到hive + .option(HIVE_SYNC_ENABLED_OPT_KEY, "true") + .option(META_SYNC_ENABLED_OPT_KEY, "true") + // 设置当分区变更时,当前数据的分区目录是否变更 + .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true") + // 设置要同步的分区列名 + .option(HIVE_PARTITION_FIELDS_OPT_KEY, "ds") + // 设置jdbc 连接同步 + .option(HIVE_URL_OPT_KEY, this.conf.getString("hive.jdbc.url")) + .option(HIVE_USER_OPT_KEY, "admin") + .option(HIVE_PASS_OPT_KEY, this.conf.getString("hive.jdbc.password")) + // hudi表名称设置 + // 用于将分区字段值提取到Hive分区列中的类,这里我选择使用当前分区的值同步 + .option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, "org.apache.hudi.hive.MultiPartKeysValueExtractor") + // 设置索引类型目前有HBASE,INMEMORY,BLOOM,GLOBAL_BLOOM 四种索引 为了保证分区变更后能找到必须设置全局GLOBAL_BLOOM + .option(HoodieIndexConfig.INDEX_TYPE_PROP, HoodieIndex.IndexType.GLOBAL_BLOOM.name()) + // 并行度参数设置 + .option("hoodie.insert.shuffle.parallelism", "2") + .option("hoodie.upsert.shuffle.parallelism", "2") + .mode(Overwrite) + .save("hdfs://ns1/user/hive/warehouse/tmp.db/t_hudi") + } + + /** + * 根据给定的DataFrame进行更新操作 + */ + def update: Unit = { + val df = this.spark.createDataFrame(Seq(new Hudi(1L, "admin", 122, true), new Hudi(2L, "root", 222, false)), classOf[Hudi]) + df.write.format("org.apache.hudi") + .options(getQuickstartWriteConfigs) + .option(PRECOMBINE_FIELD_OPT_KEY, "id") + .option(RECORDKEY_FIELD_OPT_KEY, "name") + .option(PARTITIONPATH_FIELD_OPT_KEY, "ds") + .option(TABLE_NAME, tableName) + // 将mode从Overwrite改成Append就是更新操作 + .mode(Append) + .save(basePath) + } + + /** + * 根据给定的DataFrame进行删除操作 + */ + def delete: Unit = { + val df = this.spark.createDataFrame(Seq(new Hudi(1L, "admin", 122, true)), classOf[Hudi]) + df.write.format("org.apache.hudi") + .options(getQuickstartWriteConfigs) + // 执行delete操作 + .option(OPERATION_OPT_KEY,"delete") + .option(PRECOMBINE_FIELD_OPT_KEY, "id") + .option(RECORDKEY_FIELD_OPT_KEY, "name") + .option(PARTITIONPATH_FIELD_OPT_KEY, "ds") + .option("hoodie.bulkinsert.shuffle.parallelism", 2) + .option("insert_shuffle_parallelism", 2) + .option("upsert_shuffle_parallelism", 2) + .option(TABLE_NAME, tableName) + // 将mode从Overwrite改成Append就是更新操作 + .mode(Append) + .save(basePath) + } + + /** + * 从hudi中读取数据 + */ + def read: Unit = { + val roViewDF = this.spark + .read.format("org.apache.hudi") + // /*的个数与PARTITIONPATH_FIELD_OPT_KEY指定的目录级数有关,如果分区路径是:region/country/city,则是四个/* + // 如果分区路径是ds=20200208这种,则是两个/*。所以这个 /*数=PARTITIONPATH_FIELD_OPT_KEY+1 + .load(basePath + "/*/*") + //load(basePath) 如果使用 "/partitionKey=partitionValue" 文件夹命名格式,Spark将自动识别分区信息 + + roViewDF.createOrReplaceTempView(tableName) + spark.sql(s"select * from $tableName order by id").show(false) + } + + /** + * 从hudi中读取数据 + */ + def readHDFS: Unit = { + val roViewDF = this.spark + .read.format("org.apache.hudi") + // /*的个数与PARTITIONPATH_FIELD_OPT_KEY指定的目录级数有关,如果分区路径是:region/country/city,则是四个/* + // 如果分区路径是ds=20200208这种,则是两个/*。所以这个 /*数=PARTITIONPATH_FIELD_OPT_KEY+1 + .load("hdfs://ns1/tmp/hudi2/*") + //load(basePath) 如果使用 "/partitionKey=partitionValue" 文件夹命名格式,Spark将自动识别分区信息 + + roViewDF.createOrReplaceTempView(tableName) + spark.sql(s"select * from $tableName order by id").show(false) + } + + /** + * 增量查询 + */ + def readNew: Unit = { + val newDF = this.spark.read.format("org.apache.hudi") + .option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL) + // 指定beginTime,只查询从beginTime之后的最新数据 + .option(BEGIN_INSTANTTIME_OPT_KEY, DateFormatUtils.addSecs(new Date(), -20)) + .load(basePath + "/*/*") + newDF.createOrReplaceTempView("new_table") + this.spark.sql("select * from new_table").show(false) + } + + override def process: Unit = { + /*this.insert + println("step 1.读取数据") + this.read + this.update + println("step 2.读取更新后的数据") + this.readNew + this.delete + println("step 2.读取删除后的数据") + this.read*/ + this.insertHive + // this.readHDFS + } + + def main(args: Array[String]): Unit = { + this.init() + this.stop + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/Test.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/Test.scala new file mode 100644 index 0000000..f29dad3 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/Test.scala @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark + +import com.zto.fire._ +import com.zto.fire.common.util.{DateFormatUtils, PropUtils} +import com.zto.fire.examples.bean.Student +import com.zto.fire.examples.spark.jdbc.JdbcTest.tableName +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.spark.{BaseSparkCore, BaseSparkStreaming} + + +/** + * 基于Fire进行Spark Streaming开发 + */ +object Test extends BaseSparkCore { + + override def process: Unit = { + val ds = this.fire.createDataFrame(Student.newStudentList(), classOf[Student]) + ds.createOrReplaceTempView("test") + this.fire.sql("select * from test").print() + this.fire.sql("select * from dim.baseorganize_addzero limit 10").show() + this.fire.stop + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/TestHive.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/TestHive.scala new file mode 100644 index 0000000..c11d4d0 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/TestHive.scala @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark + +import com.zto.fire.spark.BaseSparkCore + +object TestHive extends BaseSparkCore { + + override def process: Unit = { + this.fire.sql("use dim") + this.fire.sql("show tables").show(100, false) + } + + + def main(args: Array[String]): Unit = { + this.init(args = args) + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/acc/FireAccTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/acc/FireAccTest.scala new file mode 100644 index 0000000..61fd351 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/acc/FireAccTest.scala @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.acc + +import java.util.concurrent.TimeUnit + +import com.zto.fire._ +import com.zto.fire.common.anno.Scheduled +import com.zto.fire.common.util.{DateFormatUtils, PropUtils} +import com.zto.fire.spark.BaseSparkStreaming + + +/** + * 用于演示与测试Fire框架内置的累加器 + * + * @author ChengLong 2019年9月10日 09:50:16 + */ +object FireAccTest extends BaseSparkStreaming { + val key = "fire.partitions" + + override def process: Unit = { + if (this.args != null) { + this.args.foreach(arg => println(arg + " ")) + } + val dstream = this.fire.createKafkaDirectStream() + dstream.foreachRDD(rdd => { + rdd.coalesce(this.conf.getInt(key, 10)).foreachPartition(t => { + println("conf=" + this.conf.getInt(key, 10) + " PropUtils=" + PropUtils.getString(key)) + // 单值累加器 + this.acc.addCounter(1) + // 多值累加器,根据key的不同分别进行数据的累加 + this.acc.addMultiCounter("multiCounter", 1) + this.acc.addMultiCounter("partitions", 1) + // 多时间维度累加器,比多值累加器多了一个时间维度,如:hbaseWriter 2019-09-10 11:00:00 10 + this.acc.addMultiTimer("multiTimer", 1) + }) + }) + + // 定时打印fire内置累加器中的值 + this.runAsSchedule(this.printAcc, 0, 10, true, TimeUnit.MINUTES) + + this.fire.start + } + + /** + * 打印累加器中的值 + */ + def printAcc: Unit = { + println(s"===============${DateFormatUtils.formatCurrentDateTime()}=============") + this.acc.getMultiTimer.cellSet().foreach(t => println(s"key:" + t.getRowKey + " 时间:" + t.getColumnKey + " " + t.getValue + "条")) + + println("单值:" + this.acc.getCounter) + this.acc.getMultiCounter.foreach(t => { + println("多值:key=" + t._1 + " value=" + t._2) + }) + val size = this.acc.getMultiTimer.cellSet().size() + + println(s"======multiTimer.size=${size}==log.size=${this.acc.getLog.size()}======") + } + + @Scheduled(fixedInterval = 60 * 1000, scope = "all") + def loadTable: Unit = { + println(s"${DateFormatUtils.formatCurrentDateTime()}=================== 每分钟执行loadTable ===================") + } + + @Scheduled(cron = "0 0 * * * ?") + def loadTable2: Unit = { + println(s"${DateFormatUtils.formatCurrentDateTime()}=================== 每小时执行loadTable2 ===================") + } + + @Scheduled(cron = "0 0 9 * * ?") + def loadTable3: Unit = { + println(s"${DateFormatUtils.formatCurrentDateTime()}=================== 每天9点执行loadTable3 ===================") + } + + + def main(args: Array[String]): Unit = { + this.init(1, false, args) + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/datasource/DataSourceTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/datasource/DataSourceTest.scala new file mode 100644 index 0000000..305cb74 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/datasource/DataSourceTest.scala @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.datasource + +import com.zto.fire._ +import com.zto.fire.examples.bean.Student +import com.zto.fire.spark.BaseSparkCore +import org.apache.spark.sql.SaveMode + + +/** + * Spark DataSource API示例 + */ +object DataSourceTest extends BaseSparkCore { + + override def process: Unit = { + val ds = this.fire.createDataFrame(Student.newStudentList(), classOf[Student]) + ds.createOrReplaceTempView("test") + + val dataFrame = this.fire.sql("select * from test") + + // 一、 dataFrame.write.format.mode.save中的所有参数均可通过配置文件指定 + // dataFrame.writeEnhance() + + // 二、 dataFrame.write.mode.save中部分参数通过配置文件指定,或全部通过方法硬编码指定 + val savePath = "/user/hive/warehouse/hudi.db/hudi_bill_event_test" + + // 如果代码中与配置文件中均指定了options,则相同的options配置文件优先级更高,不同的option均生效 + val options = Map( + "hoodie.datasource.write.recordkey.field" -> "id", + "hoodie.datasource.write.precombine.field" -> "id" + ) + + // 使用keyNum标识读取配置文件中不同配置后缀的options信息 + // dataFrame.writeEnhance("org.apache.hudi", SaveMode.Append, savePath, options = options, keyNum = 2) + + // read.format.mode.load(path) + this.fire.readEnhance(keyNum = 3) + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseConnectorTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseConnectorTest.scala new file mode 100644 index 0000000..6c58a6d --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseConnectorTest.scala @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.hbase + +import java.nio.charset.StandardCharsets +import com.zto.fire._ +import com.zto.fire.examples.bean.Student +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.spark.BaseSparkCore +import org.apache.hadoop.hbase.client.Get +import org.apache.spark.sql.Encoders + +import scala.collection.mutable.ListBuffer + +/** + * 在spark中使用java 同步 api (HBaseConnector) 的方式读写hbase表 + * 注:适用于少量数据的实时读写,更轻量 + * + * @author ChengLong 2019-5-9 09:37:25 + */ +object HBaseConnectorTest extends BaseSparkCore { + private val tableName1 = "fire_test_1" + private val tableName2 = "fire_test_2" + + /** + * 使用HBaseConnector插入一个集合,可以是list、set等集合 + * 但集合的类型必须为HBaseBaseBean的子类 + */ + def testHbasePutList: Unit = { + val studentList = Student.newStudentList() + this.fire.hbasePutList(this.tableName1, studentList) + } + + /** + * 使用HBaseConnector插入一个rdd的数据 + * rdd的类型必须为HBaseBaseBean的子类 + */ + def testHbasePutRDD: Unit = { + val studentList = Student.newStudentList() + val studentRDD = this.fire.createRDD(studentList, 2) + // 为空的字段不插入 + studentRDD.hbasePutRDD(this.tableName1) + } + + /** + * 使用HBaseConnector插入一个DataFrame的数据 + */ + def testHBasePutDF: Unit = { + val studentList = Student.newStudentList() + val studentDF = this.fire.createDataFrame(studentList, classOf[Student]) + // 每个批次插100条 + studentDF.hbasePutDF(this.tableName1, classOf[Student]) + } + + /** + * 使用HBaseConnector插入一个Dataset的数据 + * dataset的类型必须为HBaseBaseBean的子类 + */ + def testHBasePutDS: Unit = { + val studentList = Student.newStudentList() + val studentDS = this.fire.createDataset(studentList)(Encoders.bean(classOf[Student])) + // 以多版本形式插入 + studentDS.hbasePutDS(this.tableName2, classOf[Student]) + } + + /** + * 使用HBaseConnector get数据,并将结果以list方式返回 + */ + def testHbaseGetList: Unit = { + println("===========testHbaseGetList===========") + val rowKeys = Seq("1", "2", "3", "5", "6") + val studentList = this.fire.hbaseGetList2(this.tableName1, classOf[Student], rowKeys) + studentList.foreach(println) + + val getList = ListBuffer[Get]() + rowKeys.map(rowkey => (getList += new Get(rowkey.getBytes(StandardCharsets.UTF_8)))) + // 获取多版本形式存放的记录,并获取最新的两个版本就 + val studentList2 = this.fire.hbaseGetList(this.tableName1, classOf[Student], getList) + studentList2.foreach(println) + } + + /** + * 使用HBaseConnector get数据,并将结果以RDD方式返回 + */ + def testHbaseGetRDD: Unit = { + println("===========testHBaseConnectorGetRDD===========") + val getList = Seq("1", "2", "3", "5", "6") + val getRDD = this.fire.createRDD(getList, 2) + // 以多版本方式get,并将结果集封装到rdd中返回 + val studentRDD = this.fire.hbaseGetRDD(this.tableName1, classOf[Student], getRDD) + studentRDD.printEachPartition + } + + /** + * 使用HBaseConnector get数据,并将结果以DataFrame方式返回 + */ + def testHbaseGetDF: Unit = { + println("===========testHBaseConnectorGetDF===========") + val getList = Seq("1", "2", "3", "4", "5", "6") + val getRDD = this.fire.createRDD(getList, 3) + // get到的结果以dataframe形式返回 + val studentDF = this.fire.hbaseGetDF(this.tableName1, classOf[Student], getRDD) + studentDF.show(100, false) + } + + /** + * 使用HBaseConnector get数据,并将结果以Dataset方式返回 + */ + def testHBaseGetDS: Unit = { + println("===========testHBaseGetDS===========") + val getList = Seq("1", "2", "3", "4", "5", "6") + val getRDD = this.fire.createRDD(getList, 2) + // 指定在多版本获取时只取最新的两个版本 + val studentDS = this.fire.hbaseGetDS(this.tableName1, classOf[Student], getRDD) + studentDS.show(100, false) + } + + /** + * 使用HBaseConnector scan数据,并以list方式返回 + */ + def testHbaseScanList: Unit = { + println("===========testHbaseScanList===========") + val list = this.fire.hbaseScanList2(this.tableName1, classOf[Student], "1", "6") + list.foreach(println) + } + + /** + * 使用HBaseConnector scan数据,并以RDD方式返回 + */ + def testHbaseScanRDD: Unit = { + println("===========testHbaseScanRDD===========") + val rdd = this.fire.hbaseScanRDD2(this.tableName1, classOf[Student], "1", "6") + rdd.repartition(3).printEachPartition + } + + /** + * 使用HBaseConnector scan数据,并以DataFrame方式返回 + */ + def testHbaseScanDF: Unit = { + println("===========testHbaseScanDF===========") + val dataFrame = this.fire.hbaseScanDF2(this.tableName1, classOf[Student], "1", "6") + dataFrame.repartition(3).show(100, false) + } + + /** + * 使用HBaseConnector scan数据,并以DataFrame方式返回 + */ + def testHbaseScanDS: Unit = { + println("===========testHbaseScanDF===========") + val dataSet = this.fire.hbaseScanDS2(this.tableName1, classOf[Student], "1", "6") + dataSet.show(100, false) + } + + /** + * 根据指定的rowKey list,批量删除指定的记录 + */ + def testHbaseDeleteList: Unit = { + val rowKeyList = Seq(1.toString, 2.toString, 5.toString, 8.toString) + this.fire.hbaseDeleteList(this.tableName1, rowKeyList) + } + + /** + * 根据指定的rowKey rdd,批量删除指定的记录 + */ + def testHBaseDeleteRDD: Unit = { + val rowKeyList = Seq(1.toString, 2.toString, 3.toString, 4.toString, 5.toString, 6.toString, 7.toString, 8.toString, 9.toString, 10.toString) + val rowKeyRDD = this.fire.createRDD(rowKeyList, 2) + rowKeyRDD.hbaseDeleteRDD(this.tableName1) + } + + /** + * 根据指定的rowKey dataset,批量删除指定的记录 + */ + def testHbaseDeleteDS: Unit = { + val rowKeyList = Seq(1.toString, 2.toString, 5.toString, 8.toString) + val rowKeyDS = this.fire.createDataset(rowKeyList)(Encoders.STRING) + rowKeyDS.hbaseDeleteDS(this.tableName1) + } + + /** + * Spark处理过程 + * 注:此方法会被自动调用 + */ + override def process: Unit = { + // 指定是否以多版本的形式读写 + // this.testHBaseDeleteRDD + + this.testHbaseDeleteDS + HBaseConnector.truncateTable(this.tableName1) + HBaseConnector.truncateTable(this.tableName2, keyNum = 2) + + // this.testHbasePutRDD + // this.testHbasePutList + this.testHBasePutDF + this.testHBasePutDS + + println("=========get========") + this.testHbaseGetList + this.testHbaseGetRDD + this.testHbaseGetDF + this.testHBaseGetDS + + println("=========scan========") + this.testHbaseScanList + this.testHbaseScanRDD + this.testHbaseScanDF + } + + def main(args: Array[String]): Unit = { + this.init() + this.stop + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseHadoopTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseHadoopTest.scala new file mode 100644 index 0000000..b6068be --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseHadoopTest.scala @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.hbase + +import com.zto.fire._ +import com.zto.fire.examples.bean.Student +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.spark.BaseSparkCore +import org.apache.hadoop.hbase.client.Scan +import org.apache.hadoop.hbase.filter.{CompareFilter, RegexStringComparator, RowFilter} +import org.apache.spark.sql.{Encoders, Row} + +/** + * 本示例演示Spark提供的hbase api封装后的使用 + * 注:使用Spark写hbase的方式适用于海量数据离线写 + * + * @author ChengLong 2019-5-9 09:37:25 + */ +object HBaseHadoopTest extends BaseSparkCore { + private val tableName6 = "fire_test_6" + private val tableName7 = "fire_test_7" + + /** + * 基于saveAsNewAPIHadoopDataset封装,将rdd数据保存到hbase中 + */ + def testHbaseHadoopPutRDD: Unit = { + val studentRDD = this.fire.createRDD(Student.newStudentList(), 2) + this.fire.hbaseHadoopPutRDD(this.tableName6, studentRDD, keyNum = 2) + // 方式二:直接基于rdd进行方法调用 + // studentRDD.hbaseHadoopPutRDD(this.tableName1) + } + + /** + * 基于saveAsNewAPIHadoopDataset封装,将DataFrame数据保存到hbase中 + */ + def testHbaseHadoopPutDF: Unit = { + val studentRDD = this.fire.createRDD(Student.newStudentList(), 2) + val studentDF = this.fire.createDataFrame(studentRDD, classOf[Student]) + // 由于DataFrame相较于Dataset和RDD是弱类型的数据集合,所以需要传递具体的类型classOf[Type] + this.fire.hbaseHadoopPutDF(this.tableName7, studentDF, classOf[Student]) + // 方式二:基于DataFrame进行方法调用 + // studentDF.hbaseHadoopPutDF(this.tableName3, classOf[Student]) + } + + /** + * 基于saveAsNewAPIHadoopDataset封装,将Dataset数据保存到hbase中 + */ + def testHbaseHadoopPutDS: Unit = { + val studentDS = this.fire.createDataset(Student.newStudentList())(Encoders.bean(classOf[Student])) + this.fire.hbaseHadoopPutDS(this.tableName7, studentDS) + // 方式二:基于DataFrame进行方法调用 + // studentDS.hbaseHadoopPutDS(this.tableName3) + } + + /** + * 基于saveAsNewAPIHadoopDataset封装,将不是HBaseBaseBean结构对应的DataFrame保存到hbase中 + * 注:此方法与hbaseHadoopPutDF不同之处在于,它不强制要求该DataFrame一定要与HBaseBaseBean的子类对应 + * 但需要指定rowKey的构建规则,相对与hbaseHadoopPutDF来说,少了中间的两次转换,性能会更高 + */ + def testHbaseHadoopPutDFRow: Unit = { + /** + * 构建main_order rowkey + */ + val buildRowKey = (row: Row) => { + // 将id字段作为rowKey + row.getAs("id").toString + } + + val studentRDD = this.fire.createRDD(Student.newStudentList(), 2) + this.fire.createDataFrame(studentRDD, classOf[Student]).createOrReplaceTempView("student") + // 指定rowKey构建的函数 + this.fire.sql("select age,createTime,id,length,name,sex from student").hbaseHadoopPutDFRow(this.tableName7, buildRowKey) + } + + /** + * 使用Spark的方式scan海量数据,并将结果集映射为RDD + */ + def testHBaseHadoopScanRDD: Unit = { + println("===========testHBaseHadoopScanRDD===========") + val studentRDD = this.fire.hbaseHadoopScanRDD2(this.tableName6, classOf[Student], "1", "6", keyNum = 2) + studentRDD.printEachPartition + } + + /** + * 使用Spark的方式scan海量数据,并将结果集映射为DataFrame + */ + def testHBaseHadoopScanDF: Unit = { + println("===========testHBaseHadoopScanDF===========") + val studentDF = this.fire.hbaseHadoopScanDF2(this.tableName7, classOf[Student], "1", "6") + studentDF.show(100, false) + } + + /** + * 使用Spark的方式scan海量数据,并将结果集映射为Dataset + */ + def testHBaseHadoopScanDS: Unit = { + println("===========testHBaseHadoopScanDS===========") + val studentDS = this.fire.hbaseHadoopScanDS2(this.tableName7, classOf[Student], "1", "6") + studentDS.show(100, false) + } + + /** + * Spark处理过程 + * 注:此方法会被自动调用 + */ + override def process: Unit = { + HBaseConnector.truncateTable(this.tableName6, keyNum = 2) + HBaseConnector.truncateTable(this.tableName7) + this.testHbaseHadoopPutRDD + // this.testHbaseHadoopPutDF + // this.testHbaseHadoopPutDS + this.testHbaseHadoopPutDFRow + + this.testHBaseHadoopScanRDD + this.testHBaseHadoopScanDF + this.testHBaseHadoopScanDS + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseStreamingTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseStreamingTest.scala new file mode 100644 index 0000000..173a170 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HBaseStreamingTest.scala @@ -0,0 +1,39 @@ +package com.zto.fire.examples.spark.hbase + +import com.zto.fire._ +import com.zto.fire.examples.bean.Student +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.spark.BaseSparkStreaming + +/** + * 通过hbase相关api,将数据实时写入到hbase中 + * @author ChengLong 2019-5-26 13:21:59 + */ +object HBaseStreamingTest extends BaseSparkStreaming { + private val tableName8 = "fire_test_8" + private val tableName9 = "fire_test_9" + + override def process: Unit = { + val dstream = this.fire.createKafkaDirectStream() + HBaseConnector.truncateTable(this.tableName8) + HBaseConnector.truncateTable(this.tableName9, keyNum = 2) + + dstream.repartition(3).foreachRDD(rdd => { + rdd.foreachPartition(it => { + HBaseConnector.insert(this.tableName8, Student.newStudentList()) + val student = HBaseConnector.get(this.tableName9, classOf[Student], Seq("1", "2")) + student.foreach(t => logger.error("HBase1 Get结果:" + t)) + + HBaseConnector.insert(this.tableName9, Student.newStudentList()) + val student2 = HBaseConnector.get(this.tableName8, classOf[Student], Seq("2", "3"), keyNum = 2) + student2.foreach(t => logger.error("HBase2 Get结果:" + t)) + }) + }) + + this.fire.start() + } + + def main(args: Array[String]): Unit = { + this.init(30, false) + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HbaseBulkTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HbaseBulkTest.scala new file mode 100644 index 0000000..0404623 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HbaseBulkTest.scala @@ -0,0 +1,230 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.hbase + +import com.zto.fire._ +import com.zto.fire.examples.bean.Student +import com.zto.fire.hbase.HBaseConnector +import com.zto.fire.spark.BaseSparkCore +import org.apache.spark.sql.{Encoders, Row} + + +/** + * 本示例用于演示spark中使用bulk api完成HBase的读写 + * 注:bulk api相较于java api,在速度上会更快,但目前暂不支持多版本读写 + * + * @author ChengLong 2019-5-18 09:20:52 + */ +object HBaseBulkTest extends BaseSparkCore { + private val tableName3 = "fire_test_3" + private val tableName5 = "fire_test_5" + + /** + * 使用id作为rowKey + */ + val buildStudentRowKey = (row: Row) => { + row.getAs("id").toString + } + + /** + * 使用bulk的方式将rdd写入到hbase + */ + def testHbaseBulkPutRDD: Unit = { + // 方式一:将rdd的数据写入到hbase中,rdd类型必须为HBaseBaseBean的子类 + val rdd = this.fire.createRDD(Student.newStudentList(), 2) + // rdd.hbaseBulkPutRDD(this.tableName2) + // 方式二:使用this.fire.hbaseBulkPut将rdd中的数据写入到hbase + this.fire.hbaseBulkPutRDD(this.tableName5, rdd) + + // 第二个参数指定false表示不插入为null的字段到hbase中 + // rdd.hbaseBulkPutRDD(this.tableName2, insertEmpty = false) + // 第三个参数为true表示以多版本json格式写入 + // rdd.hbaseBulkPutRDD(this.tableName3, false, true) + } + + /** + * 使用bulk的方式将DataFrame写入到hbase + */ + def testHbaseBulkPutDF: Unit = { + // 方式一:将DataFrame的数据写入到hbase中 + val rdd = this.fire.createRDD(Student.newStudentList(), 2) + val studentDF = this.fire.createDataFrame(rdd, classOf[Student]) + // insertEmpty=false表示为空的字段不插入 + studentDF.hbaseBulkPutDF(this.tableName3, classOf[Student], keyNum = 2) + // 方式二: + // this.fire.hbaseBulkPutDF(this.tableName2, studentDF, classOf[Student]) + } + + /** + * 使用bulk的方式将Dataset写入到hbase + */ + def testHbaseBulkPutDS: Unit = { + // 方式一:将DataFrame的数据写入到hbase中 + val rdd = this.fire.createRDD(Student.newStudentList(), 2) + val studentDataset = this.fire.createDataset(rdd)(Encoders.bean(classOf[Student])) + // multiVersion=true表示以多版本形式插入 + studentDataset.hbaseBulkPutDS(this.tableName5) + // 方式二: + // this.fire.hbaseBulkPutDS(this.tableName3, studentDataset) + } + + /** + * 使用bulk方式根据rowKey集合获取数据,并将结果集以RDD形式返回 + */ + def testHBaseBulkGetSeq: Unit = { + println("===========testHBaseBulkGetSeq===========") + // 方式一:使用rowKey集合读取hbase中的数据 + val seq = Seq(1.toString, 2.toString, 3.toString, 5.toString, 6.toString) + val studentRDD = this.fire.hbaseBulkGetSeq(this.tableName5, seq, classOf[Student]) + studentRDD.foreach(println) + // 方式二:使用this.fire.hbaseBulkGetRDD + /*val studentRDD2 = this.fire.hbaseBulkGetSeq(this.tableName2, seq, classOf[Student]) + studentRDD2.foreach(println)*/ + } + + /** + * 使用bulk方式根据rowKey获取数据,并将结果集以RDD形式返回 + */ + def testHBaseBulkGetRDD: Unit = { + println("===========testHBaseBulkGetRDD===========") + // 方式一:使用rowKey读取hbase中的数据,rowKeyRdd类型为String + val rowKeyRdd = this.fire.createRDD(Seq(1.toString, 2.toString, 3.toString, 5.toString, 6.toString), 2) + val studentRDD = rowKeyRdd.hbaseBulkGetRDD(this.tableName3, classOf[Student], keyNum = 2) + studentRDD.foreach(println) + // 方式二:使用this.fire.hbaseBulkGetRDD + // val studentRDD2 = this.fire.hbaseBulkGetRDD(this.tableName2, rowKeyRdd, classOf[Student]) + // studentRDD2.foreach(println) + } + + /** + * 使用bulk方式根据rowKey获取数据,并将结果集以DataFrame形式返回 + */ + def testHBaseBulkGetDF: Unit = { + println("===========testHBaseBulkGetDF===========") + // 方式一:使用rowKey读取hbase中的数据,rowKeyRdd类型为String + val rowKeyRdd = this.fire.createRDD(Seq(1.toString, 2.toString, 3.toString, 5.toString, 6.toString), 2) + val studentDF = rowKeyRdd.hbaseBulkGetDF(this.tableName5, classOf[Student]) + studentDF.show(100, false) + // 方式二:使用this.fire.hbaseBulkGetDF + val studentDF2 = this.fire.hbaseBulkGetDF(this.tableName5, rowKeyRdd, classOf[Student]) + studentDF2.show(100, false) + } + + /** + * 使用bulk方式根据rowKey获取数据,并将结果集以Dataset形式返回 + */ + def testHBaseBulkGetDS: Unit = { + println("===========testHBaseBulkGetDS===========") + // 方式一:使用rowKey读取hbase中的数据,rowKeyRdd类型为String + val rowKeyRdd = this.fire.createRDD(Seq(1.toString, 2.toString, 3.toString, 5.toString, 6.toString), 2) + val studentDS = rowKeyRdd.hbaseBulkGetDS(this.tableName5, classOf[Student]) + studentDS.show(100, false) + // 方式二:使用this.fire.hbaseBulkGetDF + // val studentDS2 = this.fire.hbaseBulkGetDS(this.tableName2, rowKeyRdd, classOf[Student]) + // studentDS2.show(100, false) + } + + /** + * 使用bulk方式进行scan,并将结果集映射为RDD + */ + def testHbaseBulkScanRDD: Unit = { + println("===========testHbaseBulkScanRDD===========") + // scan操作,指定rowKey的起止或直接传入自己构建的scan对象实例,返回类型为RDD[Student] + val scanRDD = this.fire.hbaseBulkScanRDD2(this.tableName5, classOf[Student], "1", "6") + scanRDD.foreach(println) + } + + /** + * 使用bulk方式进行scan,并将结果集映射为DataFrame + */ + def testHbaseBulkScanDF: Unit = { + println("===========testHbaseBulkScanDF===========") + // scan操作,指定rowKey的起止或直接传入自己构建的scan对象实例,返回类型为DataFrame + val scanDF = this.fire.hbaseBulkScanDF2(this.tableName5, classOf[Student], "1", "6") + scanDF.show(100, false) + } + + /** + * 使用bulk方式进行scan,并将结果集映射为Dataset + */ + def testHbaseBulkScanDS: Unit = { + println("===========testHbaseBulkScanDS===========") + // scan操作,指定rowKey的起止或直接传入自己构建的scan对象实例,返回类型为Dataset[Student] + val scanDS = this.fire.hbaseBulkScanDS(this.tableName5, classOf[Student], HBaseConnector.buildScan("1", "6")) + scanDS.show(100, false) + } + + /** + * 使用bulk方式批量删除指定的rowKey对应的数据 + */ + def testHBaseBulkDeleteRDD: Unit = { + // 方式一:使用rowKey读取hbase中的数据,rowKeyRdd类型为String + val rowKeyRdd = this.fire.createRDD(Seq(1.toString, 2.toString, 5.toString, 6.toString), 2) + // 根据rowKey删除 + rowKeyRdd.hbaseBulkDeleteRDD(this.tableName5) + + // 方式二:使用this.fire.hbaseBulkDeleteRDD + // this.fire.hbaseBulkDeleteRDD(this.tableName1, rowKeyRdd) + } + + /** + * 使用bulk方式批量删除指定的rowKey对应的数据 + */ + def testHBaseBulkDeleteDS: Unit = { + // 方式一:使用rowKey读取hbase中的数据,rowKeyRdd类型为String + val rowKeyRdd = this.fire.createRDD(Seq(1.toString, 2.toString, 5.toString, 6.toString), 2) + // 根据rowKey删除 + this.fire.createDataset(rowKeyRdd)(Encoders.STRING).hbaseBulkDeleteDS(this.tableName5) + + // 方式二:使用this.fire.hbaseBulkDeleteDS + // this.fire.hbaseBulkDeleteDS(this.tableName1, rowKeyRdd) + } + + + /** + * Spark处理过程 + * 注:此方法会被自动调用 + */ + override def process: Unit = { + this.testHBaseBulkDeleteRDD + HBaseConnector.truncateTable(this.tableName3, keyNum = 2) + HBaseConnector.truncateTable(this.tableName5) + // this.testHBaseBulkDeleteDS + + // this.testHbaseBulkPutRDD + this.testHbaseBulkPutDF + this.testHbaseBulkPutDS + + println("=========get========") + this.testHBaseBulkGetRDD + this.testHBaseBulkGetDF + this.testHBaseBulkGetDS + this.testHBaseBulkGetSeq + + println("=========scan========") + this.testHbaseBulkScanRDD + this.testHbaseBulkScanDF + this.testHbaseBulkScanDS + } + + def main(args: Array[String]): Unit = { + this.init() + this.stop + } + +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HiveQL.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HiveQL.scala new file mode 100644 index 0000000..be7c0ff --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hbase/HiveQL.scala @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.hbase + +/** + * Hive sql + * @author ChengLong 2019-1-16 09:53:45 + */ +object HiveQL { + + /** + * 执行order main sql + * @param tableName + * @return + */ + def saveMainOrder(tableName: String): String = { + s""" + |select + |gtid, + |logFile, + |offset, + |op_type, + |pos, + |schema, + |table, + |msg_when, + |after.*, + |before.bill_code before_bill_code, + |before.order_code before_order_code + |from ${tableName} + |where op_type<>'D' + |and after.bill_code<>'' + |and substr(table,0,6)='order_' + |and substr(table,0,7)<>'order_r' + """.stripMargin + } + + /** + * 执行delete order main sql + * @param tableName + * @return + */ + def deleteMainOrder(tableName: String): String = { + s""" + |select + |gtid, + |logFile, + |offset, + |op_type, + |pos, + |schema, + |table, + |msg_when, + |before.* + |from ${tableName} + |where op_type='D' + |and before.bill_code<>'' + |and before.order_create_date>'2018-06-01' + |and substr(table,0,6)='order_' + |and substr(table,0,7)<>'order_r' + """.stripMargin + } + + /** + * 执行save replica order sql + * @param tableName + * @return + */ + def saveReplicaOrder(tableName: String): String = { + s""" + |select + |gtid, + |logFile, + |offset, + |op_type, + |pos, + |schema, + |table, + |msg_when, + |after.*, + |before.bill_code before_bill_code, + |before.order_code before_order_code + |from ${tableName} + |where op_type<>'D' + |and after.bill_code<>'' + |and substr(table,0,7)='order_r' + """.stripMargin + } + + /** + * 执行delete replica order sql + * @param tableName + * @return + */ + def deleteReplicaOrder(tableName: String): String = { + s""" + |select + |gtid, + |logFile, + |offset, + |op_type, + |pos, + |schema, + |table, + |msg_when, + |before.* + |from ${tableName} + |where op_type='D' + |and before.order_create_date>'2018-06-01' + |and before.bill_code<>'' + |and substr(table,0,7)='order_r' + """.stripMargin + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hive/CrossHiveClusterReader.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hive/CrossHiveClusterReader.scala new file mode 100644 index 0000000..6df08f1 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hive/CrossHiveClusterReader.scala @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.hive + +import com.zto.fire._ +import com.zto.fire.common.util.DateFormatUtils +import com.zto.fire.spark.BaseSparkCore + +object CrossHiveClusterReader extends BaseSparkCore { + val dfsUrl = "hdfs://192.168.25.37:8020/user/hive/warehouse/ba.db/one_two_disp_dm" + + def main(args: Array[String]): Unit = { + this.init() + var startTime = DateFormatUtils.currentTime + val sendaDF = this.hiveContext.read.option("header", "true").option("inferSchema", "true") + .format("orc") + .load(this.dfsUrl) + sendaDF.createOrReplaceTempView("tmp1") + this.fire.sql("select count(1) from tmp1 where ds>=20190315").show() + println(DateFormatUtils.runTime(startTime)) + + startTime = DateFormatUtils.currentTime + val sendaDF2 = this.hiveContext.read.option("header", "true") + .option("inferSchema", "true") + .format("orc").load( + s"${this.dfsUrl}/ds=20190315", + s"${this.dfsUrl}/ds=20190316", + s"${this.dfsUrl}/ds=20190317", + s"${this.dfsUrl}/ds=20190318", + s"${this.dfsUrl}/ds=20190319", + s"${this.dfsUrl}/ds=20190320", + s"${this.dfsUrl}/ds=20190321", + s"${this.dfsUrl}/ds=20190322" + ) + sendaDF2.createOrReplaceTempView("tmp2") + this.fire.sql("select count(1) from tmp2").show() + println(DateFormatUtils.runTime(startTime)) + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hive/HiveClusterReader.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hive/HiveClusterReader.scala new file mode 100644 index 0000000..d60a661 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/hive/HiveClusterReader.scala @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.hive + +import com.zto.fire.spark.BaseSparkCore + +/** + * 本示例用于演示spark读取不同hive集群,配置文件请见 HiveClusterReader.properties,继承自BaseSparkCore表示是一个离线的spark程序 + * 如果需要使用不同的hive集群,只需在该类同名的配置文件中加一下配置即可:hive.cluster=streaming,表示读取180实时集群的hive元数据 + * + * @author ChengLong 2019-5-17 10:39:19 + */ +object HiveClusterReader extends BaseSparkCore { + + def main(args: Array[String]): Unit = { + // 必须调用init()方法完成sparkSession的初始化 + this.init() + + // spark为sparkSession的实例,已经在init()中完成初始化,可以直接通过this.fire或this.spark方式调用 + this.fire.sql("use tmp") + this.fire.sql("show tables").show(100, false) + + this.fire.stop() + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/jdbc/JdbcStreamingTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/jdbc/JdbcStreamingTest.scala new file mode 100644 index 0000000..d0784ee --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/jdbc/JdbcStreamingTest.scala @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.jdbc + +import com.zto.fire._ +import com.zto.fire.jdbc.JdbcConnector +import com.zto.fire.spark.BaseSparkStreaming + +object JdbcStreamingTest extends BaseSparkStreaming { + val tableName = "spark_test" + + /** + * Streaming的处理过程强烈建议放到process中,保持风格统一 + * 注:此方法会被自动调用,在以下两种情况下,必须将逻辑写在process中 + * 1. 开启checkpoint + * 2. 支持streaming热重启(可在不关闭streaming任务的前提下修改batch时间) + */ + override def process: Unit = { + val dstream = this.fire.createKafkaDirectStream() + + dstream.repartition(5).foreachRDD(rdd => { + rdd.foreachPartition(it => { + val sql = s"select id from $tableName limit 1" + JdbcConnector.executeQueryCall(sql, callback = _ => 1) + }) + }) + + this.fire.start + } + + def main(args: Array[String]): Unit = { + this.init(10, false) + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/jdbc/JdbcTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/jdbc/JdbcTest.scala new file mode 100644 index 0000000..8325900 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/jdbc/JdbcTest.scala @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.jdbc + +import com.zto.fire._ +import com.zto.fire.common.util.{DateFormatUtils, JSONUtils} +import com.zto.fire.examples.bean.Student +import com.zto.fire.jdbc.JdbcConnector +import com.zto.fire.spark.BaseSparkCore +import com.zto.fire.spark.util.SparkUtils +import org.apache.spark.sql.SaveMode + +/** + * Spark jdbc操作 + * + * @author ChengLong 2019-6-17 15:17:38 + */ +object JdbcTest extends BaseSparkCore { + lazy val tableName = "spark_test" + lazy val tableName2 = "t_cluster_info" + lazy val tableName3 = "t_cluster_status" + + /** + * 使用jdbc方式对关系型数据库进行增删改操作 + */ + def testJdbcUpdate: Unit = { + val timestamp = DateFormatUtils.formatCurrentDateTime() + // 执行insert操作 + val insertSql = s"INSERT INTO $tableName (name, age, createTime, length, sex) VALUES (?, ?, ?, ?, ?)" + this.fire.jdbcUpdate(insertSql, Seq("admin", 12, timestamp, 10.0, 1)) + // 更新配置文件中指定的第二个关系型数据库 + this.fire.jdbcUpdate(insertSql, Seq("admin", 12, timestamp, 10.0, 1), keyNum = 2) + + // 执行更新操作 + val updateSql = s"UPDATE $tableName SET name=? WHERE id=?" + this.fire.jdbcUpdate(updateSql, Seq("root", 1)) + + // 执行批量操作 + val batchSql = s"INSERT INTO $tableName (name, age, createTime, length, sex) VALUES (?, ?, ?, ?, ?)" + + this.fire.jdbcBatchUpdate(batchSql, Seq(Seq("spark1", 21, timestamp, 100.123, 1), + Seq("flink2", 22, timestamp, 12.236, 0), + Seq("flink3", 22, timestamp, 12.236, 0), + Seq("flink4", 22, timestamp, 12.236, 0), + Seq("flink5", 27, timestamp, 17.236, 0))) + + // 执行批量更新 + this.fire.jdbcBatchUpdate(s"update $tableName set sex=? where id=?", Seq(Seq(1, 1), Seq(2, 2), Seq(3, 3), Seq(4, 4), Seq(5, 5), Seq(6, 6))) + + // 方式一:通过this.fire方式执行delete操作 + val sql = s"DELETE FROM $tableName WHERE id=?" + this.fire.jdbcUpdate(sql, Seq(2)) + // 方式二:通过JdbcConnector.executeUpdate + + // 同一个事务 + /*val connection = this.jdbc.getConnection() + this.fire.jdbcBatchUpdate("insert", connection = connection, commit = false, closeConnection = false) + this.fire.jdbcBatchUpdate("delete", connection = connection, commit = false, closeConnection = false) + this.fire.jdbcBatchUpdate("update", connection = connection, commit = true, closeConnection = true)*/ + } + + + /** + * 使用jdbc方式对关系型数据库进行查询操作 + */ + def testJdbcQuery: Unit = { + val sql = s"select * from $tableName where id in (?, ?, ?)" + + // 执行sql查询,并对查询结果集进行处理 + this.fire.jdbcQueryCall(sql, Seq(1, 2, 3), callback = rs => { + while (rs.next()) { + // 对每条记录进行处理 + println("driver=> id=" + rs.getLong(1)) + } + 1 + }) + + // 将查询结果集以List[JavaBean]方式返回 + val list = this.fire.jdbcQuery(sql, Seq(1, 2, 3), classOf[Student]) + // 方式二:使用JdbcConnector + list.foreach(x => println(JSONUtils.toJSONString(x))) + + // 将结果集封装到RDD中 + val rdd = this.fire.jdbcQueryRDD(sql, Seq(1, 2, 3), classOf[Student]) + rdd.printEachPartition + + // 将结果集封装到DataFrame中 + val df = this.fire.jdbcQueryDF(sql, Seq(1, 2, 3), classOf[Student]) + df.show(10, false) + + // 将jdbc查询结果集封装到Dataset中 + val ds = this.fire.jdbcQueryDS(sql, Seq(1, 2, 3), classOf[Student]) + ds.show(10, false) + } + + /** + * 使用spark方式对表进行数据加载操作 + */ + def testTableLoad: Unit = { + // 一次加载整张的jdbc小表,注:大表严重不建议使用该方法 + this.fire.jdbcTableLoadAll(this.tableName).show(100, false) + // 根据指定分区字段的上下边界分布式加载数据 + this.fire.jdbcTableLoadBound(this.tableName, "id", 1, 10, 2).show(100, false) + val where = Array[String]("id >=1 and id <=3", "id >=6 and id <=9", "name='root'") + // 根据指定的条件进行数据加载,条件的个数决定了load数据的并发度 + this.fire.jdbcTableLoad(tableName, where).show(100, false) + } + + /** + * 使用spark方式批量写入DataFrame数据到关系型数据库 + */ + def testTableSave: Unit = { + // 批量将DataFrame数据写入到对应结构的关系型表中 + val df = this.fire.createDataFrame(Student.newStudentList(), classOf[Student]) + // 第二个参数默认为SaveMode.Append,可以指定SaveMode.Overwrite + df.jdbcTableSave(this.tableName, SaveMode.Overwrite) + // 利用sparkSession方式将DataFrame数据保存到配置的第二个数据源中 + this.fire.jdbcTableSave(df, this.tableName, SaveMode.Overwrite) + } + + /** + * 将DataFrame数据写入到关系型数据库中 + */ + def testDataFrameSave: Unit = { + val df = this.fire.createDataFrame(Student.newStudentList(), classOf[Student]) + + val insertSql = s"INSERT INTO spark_test(name, age, createTime, length, sex) VALUES (?, ?, ?, ?, ?)" + // 指定部分DataFrame列名作为参数,顺序要对应sql中问号占位符的顺序,batch用于指定批次大小,默认取spark.db.jdbc.batch.size配置的值 + df.jdbcBatchUpdate(insertSql, Seq("name", "age", "createTime", "length", "sex"), batch = 100) + + df.createOrReplaceTempViewCache("student") + val sqlDF = this.fire.sql("select name, age, createTime from student where id>=1").repartition(1) + // 若不指定字段,则默认传入当前DataFrame所有列,且列的顺序与sql中问号占位符顺序一致 + sqlDF.jdbcBatchUpdate("insert into spark_test(name, age, createTime) values(?, ?, ?)") + // 等同以上方式 + // this.fire.jdbcBatchUpdateDF(sqlDF, "insert into spark_test(name, age, createTime) values(?, ?, ?)") + } + + /** + * 在executor中执行jdbc操作 + */ + def testExecutor: Unit = { + JdbcConnector.executeQueryCall(s"select id from $tableName limit 1", null, callback = _ => { + // this.mark() + Thread.sleep(1000) + // this.log(s"=============driver123 $tableName2=============") + 1 + }) + JdbcConnector.executeQueryCall(s"select id from $tableName limit 1", null, callback = _ => { + // this.log(s"=============driver $tableName2=============") + 1 + }, keyNum = 2) + this.logger.info("driver sql执行成功") + val rdd = this.fire.createRDD(1 to 3, 3) + rdd.foreachPartition(it => { + it.foreach(i => { + JdbcConnector.executeQueryCall(s"select id from $tableName limit 1", null, callback = _ => { + // this.log("------------------------- executorId: " + SparkUtils.getExecutorId + " date:" + DateFormatUtils.formatCurrentDate()) + 1 + }) + }) + this.logger.info("sql执行成功") + }) + + this.logConf + val rdd2 = this.fire.createRDD(1 to 3, 3) + rdd2.foreachPartition(it => { + it.foreach(i => { + JdbcConnector.executeQueryCall(s"select id from $tableName limit 1", null, callback = _ => { + this.logConf + 1 + }, keyNum = 2) + this.logger.info("sql执行成功") + }) + }) + } + + /** + * 用于测试分布式配置 + */ + def logConf: Unit = { + println(s"executorId=${SparkUtils.getExecutorId} hello.world=" + this.conf.getString("hello.world", "not_found")) + println(s"executorId=${SparkUtils.getExecutorId} hello.world.flag=" + this.conf.getBoolean("hello.world.flag", false)) + println(s"executorId=${SparkUtils.getExecutorId} hello.world.flag2=" + this.conf.getBoolean("hello.world.flag", false, keyNum = 2)) + } + + override def process: Unit = { + // 测试环境测试 + this.testJdbcUpdate + this.testJdbcQuery + this.testTableLoad + this.testTableSave + this.testDataFrameSave + // 测试配置分发 + this.testExecutor + } + + def main(args: Array[String]): Unit = { + this.init(args = args) + + Thread.currentThread().join() + } +} \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/schedule/ScheduleTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/schedule/ScheduleTest.scala new file mode 100644 index 0000000..02c5312 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/schedule/ScheduleTest.scala @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.schedule + +import com.zto.fire._ +import com.zto.fire.common.anno.Scheduled +import com.zto.fire.common.util.DateFormatUtils +import com.zto.fire.spark.BaseSparkStreaming +import com.zto.fire.spark.util.SparkUtils + +/** + * 用于测试定时任务 + * + * @author ChengLong 2019年11月5日 17:27:20 + * @since 0.3.5 + */ +object ScheduleTest extends BaseSparkStreaming { + + /** + * 声明了@Scheduled注解的方法是定时任务方法,会周期性执行 + * + * @cron cron表达式 + * @scope 默认同时在driver端和executor端执行,如果指定了driver,则只在driver端定时执行 + * @concurrent 上一个周期定时任务未执行完成时是否允许下一个周期任务开始执行 + * @startAt 用于指定第一次开始执行的时间 + * @initialDelay 延迟多长时间开始执行第一次定时任务 + */ + @Scheduled(cron = "0/5 * * * * ?", scope = "driver", concurrent = false, startAt = "2021-01-21 11:30:00", initialDelay = 60000) + def loadTable: Unit = { + this.logger.info("更新维表动作") + } + + /** + * 只在driver端执行,不允许同一时刻同时执行该方法 + * startAt用于指定首次执行时间 + */ + @Scheduled(cron = "0/5 * * * * ?", scope = "all", concurrent = false) + def test2: Unit = { + this.logger.info("executorId=" + SparkUtils.getExecutorId + "====方法 test2() 每5秒执行====" + DateFormatUtils.formatCurrentDateTime()) + } + + + // 每天凌晨4点01将锁标志设置为false,这样下一个批次就可以先更新维表再执行sql + @Scheduled(cron = "0 1 4 * * ?") + def updateTableJob: Unit = this.lock.compareAndSet(true, false) + + // 用于缓存变更过的维表,只有当定时任务将标记设置为可更新时才会真正拉取最新的表 + def cacheTable: Unit = { + // 加载完成维表以后上锁 + if (this.lock.compareAndSet(false, true)) { + this.fire.uncache("test") + this.fire.cacheTables("test") + } + } + + override def process: Unit = { + // 用于注册其他类下带有@Scheduler标记的方法 + this.registerSchedule(new Tasks) + // 重复注册的任务会自动去重 + this.registerSchedule(new Tasks) + + // 更新并缓存维表动作,具体要根据锁的标记判断是否执行 + this.cacheTable + } + + def main(args: Array[String]): Unit = { + this.init() + } + +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/schedule/Tasks.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/schedule/Tasks.scala new file mode 100644 index 0000000..b6fb854 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/schedule/Tasks.scala @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.schedule + +import com.zto.fire.common.anno.Scheduled +import com.zto.fire.common.util.DateFormatUtils +import com.zto.fire.spark.util.SparkUtils + +/** + * 定时任务注册类 + * 1. 可序列化 + * 2. 方法不带任何参数 + * + * @author ChengLong 2019年11月5日 17:29:35 + * @since 0.3.5 + */ +class Tasks extends Serializable { + + /** + * 只在driver端执行,不允许同一时刻同时执行该方法 + * startAt用于指定首次执行时间 + */ + @Scheduled(cron = "0/15 * * * * ?", scope = "all", concurrent = false) + def test5: Unit = { + println("executorId=" + SparkUtils.getExecutorId + "====方法 test5() 每15秒执行====" + DateFormatUtils.formatCurrentDateTime()) + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/sql/LoadTestSQL.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/sql/LoadTestSQL.scala new file mode 100644 index 0000000..48abaaf --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/sql/LoadTestSQL.scala @@ -0,0 +1,301 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.sql + +/** + * 用于集群压测程序的SQL + * @author ChengLong 2019年10月25日 13:32:19 + */ +object LoadTestSQL { + + def jsonParseSQL: String = { + """ + |select + | after.CAR_SIGN_CODE, + | nvl(after.CAR_SIGN_CODE_OLD,'') CAR_SIGN_CODE_OLD, + | after.CAR_DATE, + | after.SCAN_SITE_ID, + | after.PRE_OR_NEX_STA_ID, + | after.PRE_OR_NEXT_STATION, + | from_unixtime( + | unix_timestamp(after.CAR_DATE,'yyyy-MM-dd HH:mm:ss'), + | 'yyyy-MM-dd HH:mm:ss' + | ) ARRIVE_DATE + |from test t1 + |left join dim_c2c_cost t2 + |on after.SCAN_SITE_ID=t2.start_site_id + |and after.PRE_OR_NEX_STA_ID=t2.end_site_id + |and substr(after.CAR_DATE,12,2)=t2.hh24 + |""".stripMargin + } + + def loadSQL: String = { + """ + |select + | f.site_name as site_name, + | f.site_id as site_id, + | f.collect_date as collect_date, + | sum(f.rec_count) as rec_count, + | sum(f.rec_weight) as rec_weight, + | sum(f.send_count) as send_count, + | sum(f.send_weight) as send_weight, + | sum(f.send_bag_count) as send_bag_count, + | sum(f.send_bag_bill_count) as send_bag_bill_count, + | sum(f.send_bag_weight) as send_bag_weight, + | sum(f.come_count) as come_count, + | sum(f.come_weight) as come_weight, + | sum(f.come_bag_count) as come_bag_count, + | sum(f.come_bag_weight) as come_bag_weight, + | sum(f.come_bag_bill_count) as come_bag_bill_count, + | sum(f.disp_count) as disp_count, + | sum(f.disp_weight) as disp_weight, + | sum(f.sign_count) as sign_count, + | f.ds as ds + | from ( + | select f.scan_site as site_name, + | f.scan_site_id as site_id, + | f.scan_date as collect_date, + | count(f.bill_code) as rec_count, + | sum(nvl(f.weight,0)) as rec_weight, + | 0 as send_count, + | 0 as send_weight, + | 0 as send_bag_count, + | 0 as send_bag_bill_count, + | 0 as send_bag_weight, + | 0 as come_count, + | 0 as come_weight, + | 0 as come_bag_count, + | 0 as come_bag_bill_count, + | 0 as come_bag_weight, + | 0 as disp_count, + | 0 as disp_weight, + | 0 as sign_count, + | f.ds as ds + | from (select a.scan_site, + | a.scan_site_id, + | to_date(a.scan_date) as scan_date, + | last_value(a.bill_code) over (partition by a.scan_site_id,a.bill_code order by a.scan_date desc) as bill_code, + | max(a.weight) over (partition by a.scan_site_id,a.bill_code) as weight, + | a.ds as ds + | from dw.dw_zt_zto_scan_rec a + | where a.ds>='20191020' + | and a.ds<'20191021' + | and a.scan_site_id>0 + | ) f group by f.scan_site,f.scan_site_id,f.scan_date,f.ds + | + | union all + | select f.scan_site as site_name,f.scan_site_id as site_id,f.scan_date as collect_date, + | 0 as rec_count, + | 0 as rec_weight, + | count(f.bill_code) as send_count, + | sum(f.weight) as send_weight, + | 0 as send_bag_count, + | 0 as send_bag_bill_count, + | 0 as send_bag_weight, + | 0 as come_count, + | 0 as come_weight, + | 0 as come_bag_count, + | 0 as come_bag_bill_count, + | 0 as come_bag_weight, + | 0 as disp_count, + | 0 as disp_weight, + | 0 as sign_count, + | f.ds as ds + | from (select a.scan_site, + | a.scan_site_id, + | to_date(a.scan_date) as scan_date, + | last_value(a.bill_code) over (partition by a.scan_site_id,a.bill_code order by a.scan_date desc) as bill_code, + | max(a.weight) over (partition by a.scan_site_id,a.bill_code) as weight, + | a.ds as ds + | from dw.dw_zt_zto_scan_send a + | where a.ds>='20191020' + | and a.ds<'20191021' + | and a.scan_site_id>0 + | ) f group by f.scan_site,f.scan_site_id,f.scan_date,f.ds + | union all + | select f.scan_site as site_name,f.scan_site_id as site_id,f.scan_date as collect_date, + | 0 as rec_count, + | 0 as rec_weight, + | 0 as send_count, + | 0 as send_weight, + | count(f.bill_code) as send_bag_count, + | sum(d.bagbillsum) as send_bag_bill_count, + | sum(nvl(f.weight,0)) as send_bag_weight, + | 0 as come_count, + | 0 as come_weight, + | 0 as come_bag_count, + | 0 as come_bag_bill_count, + | 0 as come_bag_weight, + | 0 as disp_count, + | 0 as disp_weight, + | 0 as sign_count, + | f.ds as ds + | from (select a.scan_site, + | a.scan_site_id, + | to_date(a.scan_date) as scan_date , + | LAST_VALUE(a.bill_code) over (partition by a.scan_site_id,a.bill_code order by a.scan_date desc) as bill_code, + | max(a.weight) over (partition by a.scan_site_id,a.bill_code) as weight, + | a.ds as ds + | from dw.dw_zt_zto_scan_send_bag a + | where a.ds>='20191020' + | and a.ds<'20191021' + | and a.scan_site_id>0 + | ) f + | left join (select sum(bagbillsum) as bagbillsum, bill_code as owner_bag_no from dw.zto_bagbillsum_weight where ds>='20190920' + | and ds<='20191020' + | group by bill_code + | ) d on d.owner_bag_no=f.bill_code + | group by f.scan_site,f.scan_site_id,f.scan_date,f.ds + | union all + | select f.scan_site as site_name,f.scan_site_id as site_id,f.scan_date as collect_date, + | 0 as rec_count, + | 0 as rec_weight, + | 0 as send_count, + | 0 as send_weight, + | 0 as send_bag_count, + | 0 as send_bag_bill_count, + | 0 as send_bag_weight, + | count(f.bill_code) as come_count, + | sum(f.weight) as come_weight, + | 0 as come_bag_count, + | 0 as come_bag_bill_count, + | 0 as come_bag_weight, + | 0 as disp_count, + | 0 as disp_weight, + | 0 as sign_count, + | f.ds as ds + | from (select a.scan_site, + | a.scan_site_id, + | to_date(a.scan_date) as scan_date, + | last_value(a.bill_code) over (partition by a.scan_site_id,a.bill_code order by a.scan_date desc) as bill_code, + | max(a.weight) over (partition by a.scan_site_id,a.bill_code) as weight, + | a.ds as ds + | from dw.dw_zt_zto_scan_come a + | where a.ds>='20191020' + | and a.ds<'20191021' + | and a.scan_site_id>0 + | ) f group by f.scan_site,f.scan_site_id,f.scan_date,f.ds + | union all + | select f.scan_site as site_name,f.scan_site_id as site_id,f.scan_date as collect_date, + | 0 as rec_count, + | 0 as rec_weight, + | 0 as send_count, + | 0 as send_weight, + | 0 as send_bag_count, + | 0 as send_bag_bill_count, + | 0 as send_bag_weight, + | 0 as come_count, + | 0 as come_weight, + | count(f.bill_code) as come_bag_count, + | sum(bagbillsum) as come_bag_bill_count, + | sum(nvl(f.weight,0)) as come_bag_weight, + | 0 as disp_count, + | 0 as disp_weight, + | 0 as sign_count, + | f.ds as ds + | from (select a.scan_site, + | a.scan_site_id, + | to_date(a.scan_date) as scan_date, + | last_value(a.bill_code) over (partition by a.scan_site_id,a.bill_code order by a.scan_date desc) as bill_code, + | max(a.weight) over (partition by a.scan_site_id,a.bill_code) as weight, + | a.ds as ds + | from dw.dw_zt_zto_scan_come_bag a + | where a.ds>='20191020' + | and a.ds<'20191021' + | and a.scan_site_id>0 + | ) f + | left join ( + | select sum(bagbillsum) as bagbillsum,bill_code as owner_bag_no from dw.zto_bagbillsum_weight where ds>='20190920' + | and ds<='20191020' + | group by bill_code + |) d on d.owner_bag_no=f.bill_code + | group by f.scan_site,f.scan_site_id,f.scan_date,f.ds + | union all + | select f.scan_site as site_name,f.scan_site_id as site_id,f.scan_date as collect_date, + | 0 as rec_count, + | 0 as rec_weight, + | 0 as send_count, + | 0 as send_weight, + | 0 as send_bag_count, + | 0 as send_bag_bill_count, + | 0 as send_bag_weight, + | 0 as come_count, + | 0 as come_weight, + | 0 as come_bag_count, + | 0 as come_bag_bill_count, + | 0 as come_bag_weight, + | count(f.bill_code) as disp_count, + | sum(nvl(f.weight,0)) as disp_weight, + | 0 as sign_count, + | f.ds as ds + | from (select a.scan_site, + | a.scan_site_id, + | to_date(a.scan_date) as scan_date, + | last_value(a.bill_code) over (partition by a.scan_site_id,a.bill_code order by a.scan_date desc) as bill_code, + | max(a.weight) over (partition by a.scan_site_id,a.bill_code) as weight, + | a.ds as ds + | from dw.dw_zt_zto_scan_disp a + | where a.ds>='20191020' + | and a.ds<'20191021' + | and a.scan_site_id>0 + | ) f group by f.scan_site,f.scan_site_id,f.scan_date,f.ds + | union all + | select + | r.record_site as site_name, + | r.record_site_id as site_id, + | to_date(record_date) as collect_date, + | 0 as rec_count, + | 0 as rec_weight, + | 0 as send_count, + | 0 as send_weight, + | 0 as send_bag_count, + | 0 as send_bag_bill_count, + | 0 as send_bag_weight, + | 0 as come_count, + | 0 as come_weight, + | 0 as come_bag_count, + | 0 as come_bag_bill_count, + | 0 as come_bag_weight, + | 0 as disp_count, + | 0 as disp_weight, + | count(r.bill_code) as sign_count, + | r.ds as ds + | from dw.dw_zt_zto_sign r + | where r.ds>='20191020' + | and r.ds<='20191021' + | and r.record_site_id>0 + | group by r.record_site,r.record_site_id,to_date(record_date),r.ds + | ) f + | group by f.site_name , + | f.site_id , + | f.collect_date, + | f.ds + |DISTRIBUTE BY rand() + |""".stripMargin + } + + def cacheDim: String = { + """ + |select cast(start_site_id as string), + | cast(end_site_id as string), + | substr(concat('0',cast(actual_start_date_hour as string)),-2,2) as hh24, + | cast(c2c_hour_percent50_onway_hour as string) as cost_time + |from ba.zy_tmp_center_onway_hour_configure + """.stripMargin + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/KafkaTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/KafkaTest.scala new file mode 100644 index 0000000..58f3ceb --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/KafkaTest.scala @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.streaming + +import com.zto.fire._ +import com.zto.fire.common.anno.Scheduled +import com.zto.fire.common.util.DateFormatUtils +import com.zto.fire.spark.BaseSparkStreaming + +/** + * kafka json解析 + * + * @author ChengLong 2019-6-26 16:52:58 + */ +object KafkaTest extends BaseSparkStreaming { + + // 每天凌晨4点01将锁标志设置为false,这样下一个批次就可以先更新维表再执行sql + @Scheduled(cron = "0 1 4 * * ?") + def updateTableJob: Unit = this.lock.compareAndSet(true, false) + + // 用于缓存变更过的维表,只有当定时任务将标记设置为可更新时才会真正拉取最新的表 + def cacheTable: Unit = { + // 加载完成维表以后上锁 + if (this.lock.compareAndSet(false, true)) { + // cache维表逻辑 + } + } + + override def process: Unit = { + val dstream = this.fire.createKafkaDirectStream() + dstream.foreachRDD(rdd => { + // 更新并缓存维表动作,具体要根据锁的标记判断是否执行 + this.cacheTable + + // 一、将json解析并注册为临时表,默认不cache临时表 + rdd.kafkaJson2Table("test", cacheTable = true) + // toLowerDF表示将大写的字段转为小写 + this.fire.sql("select * from test").toLowerDF.show(1, false) + /*this.fire.sql("select after.* from test").toLowerDF.show(1, false) + this.fire.sql("select after.* from test where after.order_type=1").toLowerDF.show(1, false)*/ + + // 二、直接将json按指定的schema解析(只解析after),fieldNameUpper=true表示按大写方式解析,并自动转为小写 + // rdd.kafkaJson2DF(classOf[OrderCommon], fieldNameUpper = true).show(1, false) + // 递归解析所有指定的字段,包括before、table、offset等字段 + // rdd.kafkaJson2DF(classOf[OrderCommon], parseAll = true, fieldNameUpper = true, isMySQL = false).show(1, false) + + this.fire.uncache("test") + }) + + val dstream2 = this.fire.createKafkaDirectStream(keyNum = 2) + dstream2.print(1) + val dstream3 = this.fire.createKafkaDirectStream(keyNum = 3) + dstream3.count().foreachRDD(rdd => { + println("count=" + rdd.count()) + }) + dstream3.print(1) + val dstream5 = this.fire.createKafkaDirectStream(keyNum = 5) + dstream5.print(1) + + this.fire.start + } + + @Scheduled(fixedInterval = 60 * 1000, scope = "all") + def loadTable: Unit = { + println(s"${DateFormatUtils.formatCurrentDateTime()}=================== 每分钟执行loadTable ===================") + this.conf.settingsMap.foreach(conf => println(conf._1 + " -> " + conf._2)) + } + + @Scheduled(cron = "0 0 * * * ?") + def loadTable2: Unit = { + println(s"${DateFormatUtils.formatCurrentDateTime()}=================== 每小时执行loadTable2 ===================") + } + + @Scheduled(cron = "0 0 9 * * ?") + def loadTable3: Unit = { + println(s"${DateFormatUtils.formatCurrentDateTime()}=================== 每天9点执行loadTable3 ===================") + } + + + def main(args: Array[String]): Unit = { + this.init(10, false) + this.stop + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/LoadTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/LoadTest.scala new file mode 100644 index 0000000..3b54875 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/LoadTest.scala @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.streaming + +import com.zto.fire._ +import com.zto.fire.common.anno.Scheduled +import com.zto.fire.examples.spark.sql.LoadTestSQL +import com.zto.fire.spark.BaseSparkStreaming + +/** + * kafka json解析 + * @author ChengLong 2019-6-26 16:52:58 + */ +object LoadTest extends BaseSparkStreaming { + + /** + * 缓存维表 + */ + def loadNewConfigTable: Unit ={ + spark.sql(LoadTestSQL.cacheDim).cache().createOrReplaceTempView("dim_c2c_cost") + } + + /** + * 重复压测 + */ + @Scheduled(fixedInterval = 1000 * 60 * 1, concurrent = false, initialDelay = 1000 * 30) + def reload(): Unit = { + this.fire.sql(LoadTestSQL.loadSQL).show(10, false) + } + + /** + * Streaming的处理过程强烈建议放到process中,保持风格统一 + * 注:此方法会被自动调用,在以下两种情况下,必须将逻辑写在process中 + * 1. 开启checkpoint + * 2. 支持streaming热重启(可在不关闭streaming任务的前提下修改batch时间) + */ + override def process: Unit = { + this.loadNewConfigTable + + val dstream = this.fire.createKafkaDirectStream() + dstream.foreachRDD(rdd => { + if (rdd.isNotEmpty) { + // 一、将json解析并注册为临时表,默认不cache临时表 + rdd.kafkaJson2Table("test", cacheTable = true) + // toLowerDF表示将大写的字段转为小写 + this.fire.sql("select after.* from test").toLowerDF.show(1, false) + this.fire.sql(LoadTestSQL.jsonParseSQL) + } + }) + + this.fire.start + } + + def main(args: Array[String]): Unit = { + this.init(10, false) + this.stop + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/RocketTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/RocketTest.scala new file mode 100644 index 0000000..3b6f8ac --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/streaming/RocketTest.scala @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.streaming + +import com.zto.fire._ +import com.zto.fire.spark.BaseSparkStreaming + +/** + * 消费rocketmq中的数据 + */ +object RocketTest extends BaseSparkStreaming { + override def process: Unit = { + //读取RocketMQ消息流 + val dStream = this.fire.createRocketMqPullStream() + dStream.foreachRDD(rdd => { + if (!rdd.isEmpty()) { + val source = rdd.map(msgExt => new String(msgExt.getBody).replace("messageBody", "")) + import fire.implicits._ + this.fire.read.json(source.toDS()).createOrReplaceTempView("tmp_scanrecord") + this.fire.sql( + """ + |select * + |from tmp_scanrecord + |""".stripMargin).show(10,false) + } + }) + + dStream.rocketCommitOffsets + this.fire.start() + } + + def main(args: Array[String]): Unit = { + this.init(10, false) + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/structured/JdbcSinkTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/structured/JdbcSinkTest.scala new file mode 100644 index 0000000..b073387 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/structured/JdbcSinkTest.scala @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.structured + +import com.zto.fire._ +import com.zto.fire.examples.bean.Student +import com.zto.fire.spark.BaseStructuredStreaming + +/** + * 结构化流测试 + */ +object JdbcSinkTest extends BaseStructuredStreaming { + + override def process: Unit = { + // 接入kafka并解析json,支持大小写,默认表名为kafka + val kafkaDataset = this.fire.loadKafkaParseJson() + // 直接使用或sql + /*kafkaDataset.print() + this.fire.sql("select * from kafka").print()*/ + + // jdbc的sql语句 + val insertSql = "insert into spark_test(name, age, createTime, length, sex, rowKey) values(?,?,?,?,?,?)" + + // 将流数据持续写入到关系型数据库中(插入部分列) + kafkaDataset.select("data.name", "data.age", "data.createTime", "data.length", "data.sex", "data.rowKey").jdbcBatchUpdate(insertSql, keyNum = 6) + // 插入所有列并在Seq中列举DataFrame指定顺序,该顺序必须与insertSql中的问号占位符存在绑定关系 + kafkaDataset.select("data.*").jdbcBatchUpdate(insertSql, Seq("name", "age", "createTime", "length", "sex", "rowKey"), keyNum = 6) + + this.fire.createDataFrame(Student.newStudentList(), classOf[Student]).createOrReplaceTempViewCache("student") + this.fire.sql( + """ + |select + | t.name, + | s.length + |from kafka t left join student s + | on t.name=s.name + |""".stripMargin).print() + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/structured/MapTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/structured/MapTest.scala new file mode 100644 index 0000000..78d9cdf --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/structured/MapTest.scala @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.structured + +import com.zto.fire._ +import com.zto.fire.examples.bean.Student +import com.zto.fire.spark.BaseStructuredStreaming +import org.apache.spark.sql.Encoders +import com.zto.fire.spark.util.SparkUtils + +/** + * 对结构化流执行map、mapPartition操作 + * @author ChengLong 2020年1月3日 18:00:59 + */ +object MapTest extends BaseStructuredStreaming { + + override def process: Unit = { + this.fire.loadKafkaParseJson() + + // 将字段转为与JavaBean对应的类型 + val sqlDF = this.fire.sql("select cast(age as int), createTime, cast(length as decimal), name, rowKey, cast(sex as boolean) from kafka") + + // 执行map操作 + sqlDF.map(row => { + // 执行任意的操作 + println("=========hello===========") + // 将row转为JavaBean + SparkUtils.sparkRowToBean(row, classOf[Student]) + // 指定Encoders,必须是具有schema的目标类型,map后的类型即为Encoders中要指定的类型。不支持对普通数值类型的map,必须是DateType的子类 + })(Encoders.bean(classOf[Student])).print() + + // mapPartition操作 + sqlDF.mapPartitions(it => SparkUtils.sparkRowToBean(it, classOf[Student]))(Encoders.bean(classOf[Student])).print() + } + + def main(args: Array[String]): Unit = { + this.init() + } +} diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/structured/StructuredStreamingTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/structured/StructuredStreamingTest.scala new file mode 100644 index 0000000..a4904f5 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/structured/StructuredStreamingTest.scala @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.structured + +import com.zto.fire._ +import com.zto.fire.spark.BaseStructuredStreaming + +/** + * 使用fire进行structured streaming开发的demo + * + * @author ChengLong 2019年12月23日 22:16:59 + */ +object StructuredStreamingTest extends BaseStructuredStreaming { + + /** + * structured streaming处理逻辑 + */ + override def process: Unit = { + // 接入kafka消息,并将消息解析为DataFrame,同时注册临时表,表名默认为kafka,也可传参手动指定表名 + val kafkaDataset = this.fire.loadKafkaParseJson() + // 进行sql查询,支持嵌套的json,并且支持大小写的json + this.fire.sql("select table, after.bill_code, after.scan_site from kafka").print() + // 使用api的方式进行查询操作 + kafkaDataset.select("after.PDA_CODE", "after.bill_code").print(numRows = 1, truncate = false) + } + + def main(args: Array[String]): Unit = { + this.init() + } +} \ No newline at end of file diff --git a/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/thread/ThreadTest.scala b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/thread/ThreadTest.scala new file mode 100644 index 0000000..fde1277 --- /dev/null +++ b/fire-examples/spark-examples/src/main/scala/com/zto/fire/examples/spark/thread/ThreadTest.scala @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.examples.spark.thread + +import com.zto.fire._ +import com.zto.fire.common.util.DateFormatUtils +import com.zto.fire.spark.BaseSparkStreaming + +/** + * 在driver中启用线程池的示例 + * 1. 开启子线程执行一个任务 + * 2. 开启子线程执行周期性任务 + */ +object ThreadTest extends BaseSparkStreaming { + + def main(args: Array[String]): Unit = { + // 第二个参数为true表示开启checkPoint机制 + this.init(10L, false) + } + + /** + * Streaming的处理过程强烈建议放到process中,保持风格统一 + * 注:此方法会被自动调用,在以下两种情况下,必须将逻辑写在process中 + * 1. 开启checkpoint + * 2. 支持streaming热重启(可在不关闭streaming任务的前提下修改batch时间) + */ + override def process: Unit = { + // 第一次执行时延迟两分钟,每隔1分钟执行一次showSchema函数 + this.runAsSchedule(this.showSchema, 1, 1) + // 以子线程方式执行print方法中的逻辑 + this.runAsThread(this.print) + + val dstream = this.fire.createKafkaDirectStream() + dstream.foreachRDD(rdd => { + println("count--> " + rdd.count()) + }) + + this.fire.start + } + + /** + * 以子线程方式执行一次 + */ + def print: Unit = { + println("==========子线程执行===========") + } + + /** + * 查看表结构信息 + */ + def showSchema: Unit = { + println(s"${DateFormatUtils.formatCurrentDateTime()}--------------> atFixRate <----------------") + this.fire.sql("use tmp") + this.fire.sql("show tables").show(false) + } +} diff --git a/fire-metrics/pom.xml b/fire-metrics/pom.xml new file mode 100644 index 0000000..34db232 --- /dev/null +++ b/fire-metrics/pom.xml @@ -0,0 +1,88 @@ + + + + + 4.0.0 + fire-metrics_${scala.binary.version} + fire-metrics + + + com.zto.fire + fire-parent_2.12 + 2.0.0-SNAPSHOT + ../pom.xml + + + + 3.1.5 + 4.7.1 + + + + + io.dropwizard.metrics + metrics-core + ${codahale.metrics.version} + + + io.dropwizard.metrics + metrics-jvm + ${codahale.metrics.version} + + + io.dropwizard.metrics + metrics-json + ${codahale.metrics.version} + + + io.dropwizard.metrics + metrics-ganglia + ${codahale.metrics.version} + + + io.dropwizard.metrics + metrics-graphite + ${codahale.metrics.version} + + + org.antlr + antlr4-runtime + ${antlr.version} + + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + + src/main/resources + true + + + + diff --git a/fire-metrics/src/main/java/com/zto/fire/metrics/MetricsTest.scala b/fire-metrics/src/main/java/com/zto/fire/metrics/MetricsTest.scala new file mode 100644 index 0000000..90a07e6 --- /dev/null +++ b/fire-metrics/src/main/java/com/zto/fire/metrics/MetricsTest.scala @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.metrics + +import java.util.Random +import java.util.concurrent.TimeUnit + +import com.codahale.metrics.jvm.{FileDescriptorRatioGauge, GarbageCollectorMetricSet, MemoryUsageGaugeSet, ThreadStatesGaugeSet} +import com.codahale.metrics.{ConsoleReporter, MetricRegistry, Slf4jReporter} + +/** + * Metrics模块测试 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-17 10:11 + */ +class MetricsTest { + val metrics = new MetricRegistry() + + // @Test + def testMeter: Unit = { + val reporter = ConsoleReporter.forRegistry(metrics).convertRatesTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.MILLISECONDS).build + reporter.start(1, TimeUnit.SECONDS) + + val requests = metrics.meter("requests") + (1 to 100).foreach(i => { + requests.mark() + Thread.sleep(10) + }) + Thread.sleep(1000) + } + + // @Test + def testHistogram: Unit = { + val reporter = ConsoleReporter.forRegistry(metrics).convertRatesTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.MILLISECONDS).build + reporter.start(1, TimeUnit.SECONDS) + + val reporter2 = Slf4jReporter.forRegistry(metrics).convertDurationsTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.MILLISECONDS).withLoggingLevel(Slf4jReporter.LoggingLevel.ERROR).build + reporter2.start(1, TimeUnit.SECONDS) + + val resultCounts = metrics.histogram(MetricRegistry.name(classOf[MetricsTest], "result-counts")) + val random = new Random() + (1 to 1000).foreach(i => { + resultCounts.update(random.nextInt(100)) + Thread.sleep(10) + }) + Thread.sleep(1000) + } + + // @Test + def testJvm: Unit = { + val reporter2 = ConsoleReporter.forRegistry(metrics) + .convertRatesTo(TimeUnit.SECONDS) + .convertDurationsTo(TimeUnit.MILLISECONDS) + .build + reporter2.start(3, TimeUnit.SECONDS) + val reporter = Slf4jReporter.forRegistry(metrics).convertDurationsTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.MILLISECONDS).withLoggingLevel(Slf4jReporter.LoggingLevel.ERROR).build + reporter.start(5, TimeUnit.SECONDS) + + metrics.register("jvm.gc", new GarbageCollectorMetricSet()) + metrics.register("jvm.memroy", new MemoryUsageGaugeSet()) + metrics.register("jvm.thread-states", new ThreadStatesGaugeSet()) + metrics.register("jvm.fd.usage", new FileDescriptorRatioGauge()) + + Thread.sleep(100000) + } +} diff --git a/fire-metrics/src/test/scala/com.zto.fire.metrics/MetricsTest.scala b/fire-metrics/src/test/scala/com.zto.fire.metrics/MetricsTest.scala new file mode 100644 index 0000000..90bb191 --- /dev/null +++ b/fire-metrics/src/test/scala/com.zto.fire.metrics/MetricsTest.scala @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.zto.fire.metrics + +import java.util.Random +import java.util.concurrent.TimeUnit + +import com.codahale.metrics.jvm.{FileDescriptorRatioGauge, GarbageCollectorMetricSet, MemoryUsageGaugeSet, ThreadStatesGaugeSet} +import com.codahale.metrics.{ConsoleReporter, MetricRegistry, Slf4jReporter} +import org.antlr.v4.runtime.tree.ParseTreeWalker +import org.antlr.v4.runtime.{CharStreams, CommonTokenStream} +import org.junit.Test + +/** + * Metrics模块测试 + * + * @author ChengLong + * @since 2.0.0 + * @create 2020-12-17 10:11 + */ +class MetricsTest { + val metrics = new MetricRegistry() + + // @Test + def testMeter: Unit = { + val reporter = ConsoleReporter.forRegistry(metrics).convertRatesTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.MILLISECONDS).build + reporter.start(1, TimeUnit.SECONDS) + + val requests = metrics.meter("requests") + (1 to 100).foreach(i => { + requests.mark() + Thread.sleep(10) + }) + Thread.sleep(1000) + } + + // @Test + def testHistogram: Unit = { + val reporter = ConsoleReporter.forRegistry(metrics).convertRatesTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.MILLISECONDS).build + reporter.start(1, TimeUnit.SECONDS) + + val reporter2 = Slf4jReporter.forRegistry(metrics).convertDurationsTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.MILLISECONDS).withLoggingLevel(Slf4jReporter.LoggingLevel.ERROR).build + reporter2.start(1, TimeUnit.SECONDS) + + val resultCounts = metrics.histogram(MetricRegistry.name(classOf[MetricsTest], "result-counts")) + val random = new Random() + (1 to 1000).foreach(i => { + resultCounts.update(random.nextInt(100)) + Thread.sleep(10) + }) + Thread.sleep(1000) + } + + // @Test + def testJvm: Unit = { + val reporter2 = ConsoleReporter.forRegistry(metrics) + .convertRatesTo(TimeUnit.SECONDS) + .convertDurationsTo(TimeUnit.MILLISECONDS) + .build + reporter2.start(3, TimeUnit.SECONDS) + val reporter = Slf4jReporter.forRegistry(metrics).convertDurationsTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.MILLISECONDS).withLoggingLevel(Slf4jReporter.LoggingLevel.ERROR).build + reporter.start(5, TimeUnit.SECONDS) + + metrics.register("jvm.gc", new GarbageCollectorMetricSet()) + metrics.register("jvm.memroy", new MemoryUsageGaugeSet()) + metrics.register("jvm.thread-states", new ThreadStatesGaugeSet()) + metrics.register("jvm.fd.usage", new FileDescriptorRatioGauge()) + + Thread.sleep(100000) + } + + /*@Test + def testAntlr: Unit = { + val input = CharStreams.fromString( + """ + |a=(1+2+3)*10/5 + |a + |""".stripMargin) + val lexer = new HelloLexer(input) + val tokens = new CommonTokenStream(lexer) + val parser = new HelloParser(tokens) + val tree = parser.prog() + val visitor = new HelloMyVisitor() + visitor.visit(tree) + } + + @Test + def testArrayInit: Unit = { + val input = CharStreams.fromString("{1,2,{3}}") + val lexer = new ArrayInitLexer(input) + val tokens = new CommonTokenStream(lexer) + val parser = new ArrayInitParser(tokens) + val tree = parser.init() + val walker = new ParseTreeWalker + walker.walk(new MyArrayInitListener(), tree) + println() + }*/ + +} diff --git a/pom.xml b/pom.xml new file mode 100644 index 0000000..59b1490 --- /dev/null +++ b/pom.xml @@ -0,0 +1,342 @@ + + 4.0.0 + com.zto.fire + fire-parent_2.12 + pom + 2.0.0-SNAPSHOT + fire-parent + + + ${project.version} + 3.0.2 + 3 + 1.12.2 + 1.12 + 0.9.0-SNAPSHOT + 2.12 + 12 + provided + 1.4.0 + 0.11.0.2 + 2.8.0 + 2.6.0-cdh5.12.1 + 1.1.0 + 1.1.0-cdh5.12.1 + org.apache.hive + 1.2.0-cdh5.12.1 + 2.5.30 + 2.10.5 + 4.8.0 + 0.0.2 + 5.1.30 + 18.0 + UTF-8 + ${scala.binary.version}.${scala.minor.version} + ${spark.version}_${scala.binary.version} + ${flink.version}_${scala.binary.version} + + + + fire-common + fire-core + fire-metrics + fire-examples + fire-connectors + fire-engines + + + + + + org.scala-lang + scala-library + ${scala.version} + + + com.google.guava + guava + ${guava.version} + ${maven.scope} + + + junit + junit + 4.12 + test + + + com.fasterxml.jackson.core + jackson-databind + ${jackson.version} + ${maven.scope} + + + commons-io + commons-io + 2.4 + ${maven.scope} + + + org.apache.commons + commons-lang3 + 3.5 + ${maven.scope} + + + log4j + log4j + 1.2.17 + ${maven.scope} + + + com.esotericsoftware + kryo + 4.0.0 + ${maven.scope} + + + com.sparkjava + spark-core + ${sparkjava.version} + + + org.quartz-scheduler + quartz + 2.3.1 + + + com.github.oshi + oshi-core + 3.12.2 + ${maven.scope} + + + + + + zto + http://maven.dev.ztosys.com/nexus/content/groups/public/ + + true + + + true + + + + aliyun + https://maven.aliyun.com/repository/central + + true + + + true + + + + central + https://mirrors.huaweicloud.com/repository/maven/ + + true + + + true + + + + + + + zto + http://maven.dev.ztosys.com/nexus/content/groups/public/ + + true + + + true + + + + aliyun + https://maven.aliyun.com/repository/central + + true + + + true + + + + central + https://mirrors.huaweicloud.com/repository/maven/ + + true + + + true + + + + + + + zto-releases + http://maven.dev.ztosys.com/nexus/content/repositories/releases + + + zto-snapshots + http://maven.dev.ztosys.com/nexus/content/repositories/snapshots + + + + + + + + true + org.apache.maven.plugins + maven-compiler-plugin + + 1.8 + 1.8 + + + + + org.scala-tools + maven-scala-plugin + 2.15.2 + + + + scala-compile-first + process-resources + + compile + + + + + + scala-test-compile + process-test-resources + + testCompile + + + + + + + org.codehaus.mojo + build-helper-maven-plugin + + + + add-source + generate-sources + + add-source + + + + src/main/scala + + + + + + + add-test-source + generate-test-sources + + add-test-source + + + + src/test/scala + + + + + + + + + org.apache.maven.plugins + maven-eclipse-plugin + 2.10 + + true + true + + org.scala-ide.sdt.core.scalanature + org.eclipse.jdt.core.javanature + + + org.scala-ide.sdt.core.scalabuilder + + + org.scala-ide.sdt.launching.SCALA_CONTAINER + org.eclipse.jdt.launching.JRE_CONTAINER + + + + org.scala-lang:scala-library + org.scala-lang:scala-compiler + + + **/*.scala + **/*.java + + + + + + + org.apache.maven.plugins + maven-surefire-plugin + 2.19.1 + + + **/*.java + **/*.scala + + + + + + org.apache.maven.plugins + maven-shade-plugin + 2.4.2 + + + package + + shade + + + + + + + *:* + + META-INF/*.SF + META-INF/*.DSA + META-INF/*.RSA + + + + zto-${project.artifactId}-${project.version} + + + + +