DorisWriter支持将大批量数据写入Doris中。
DorisWriter 通过Doris原生支持Stream load方式导入数据, DorisWriter会将reader
读取的数据进行缓存在内存中,拼接成Json文本,然后批量导入至Doris。
这里是一份从Stream读取数据后导入至Doris的配置文件。
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["emp_no", "birth_date", "first_name","last_name","gender","hire_date"],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://localhost:3306/demo"],
"table": ["employees_1"]
}
],
"username": "root",
"password": "xxxxx",
"where": ""
}
},
"writer": {
"name": "doriswriter",
"parameter": {
"loadUrl": ["172.16.0.13:8030"],
"column": ["emp_no", "birth_date", "first_name","last_name","gender","hire_date"],
"username": "root",
"password": "xxxxxx",
"postSql": ["select count(1) from all_employees_info"],
"preSql": [],
"flushInterval":30000,
"connection": [
{
"jdbcUrl": "jdbc:mysql://172.16.0.13:9030/demo",
"selectedDatabase": "demo",
"table": ["all_employees_info"]
}
],
"loadProps": {
"format": "json",
"strip_outer_array": true
}
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
jdbcUrl
loadUrl
;
,doriswriter 将以轮询的方式访问。username
password
connection.selectedDatabase
connection.table
column
preSql
postSql
maxBatchRows
batchSize
maxRetries
labelPrefix
labelPrefix + UUID
组成全局唯一的 label,确保数据不会重复导入datax_doris_writer_
loadProps
这里包括导入的数据格式:format等,导入数据格式默认我们使用csv,支持JSON,具体可以参照下面类型转换部分,也可以参照上面Stream load 官方信息
必选:否
默认值:无
默认传入的数据均会被转为字符串,并以\t
作为列分隔符,\n
作为行分隔符,组成csv
文件进行StreamLoad导入操作。
默认是csv格式导入,如需更改列分隔符, 则正确配置 loadProps
即可:
"loadProps": {
"column_separator": "\\x01",
"line_delimiter": "\\x02"
}
如需更改导入格式为json
, 则正确配置 loadProps
即可:
"loadProps": {
"format": "json",
"strip_outer_array": true
}
更多信息请参照 Doris 官网:Stream load - Apache Doris