GDBWriter插件实现了写入数据到GDB实例的功能。GDBWriter通过Gremlin Client
连接远程GDB实例,获取Reader的数据,生成写入DSL语句,将数据写入到GDB。
GDBWriter通过DataX框架获取Reader生成的协议数据,使用g.addV/E(GDB___label).property(id, GDB___id).property(GDB___PK1, GDB___PV1)...
语句写入数据到GDB实例。
可以配置Gremlin Client
工作在session模式,由客户端控制事务,在一次事务中实现多个记录的批量写入。
因为GDB中点和边的配置不同,导入时需要区分点和边的配置。
这里是一份从内存生成点数据导入GDB实例的配置
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column" : [
{
"random": "1,100",
"type": "double"
},
{
"random": "1000,1200",
"type": "long"
},
{
"random": "60,64",
"type": "string"
},
{
"random": "100,1000",
"type": "long"
},
{
"random": "32,48",
"type": "string"
}
],
"sliceRecordCount": 1000
}
},
"writer": {
"name": "gdbwriter",
"parameter": {
"host": "gdb-endpoint",
"port": 8182,
"username": "root",
"password": "***",
"writeMode": "INSERT",
"labelType": "VERTEX",
"label": "#{1}",
"idTransRule": "none",
"session": true,
"maxRecordsInBatch": 64,
"column": [
{
"name": "id",
"value": "#{0}",
"type": "string",
"columnType": "primaryKey"
},
{
"name": "vertex_propKey",
"value": "#{2}",
"type": "string",
"columnType": "vertexSetProperty"
},
{
"name": "vertex_propKey",
"value": "#{3}",
"type": "long",
"columnType": "vertexSetProperty"
},
{
"name": "vertex_propKey2",
"value": "#{4}",
"type": "string",
"columnType": "vertexProperty"
}
]
}
}
}
]
}
}
这里是一份从内存生成边数据导入GDB实例的配置
注意 下面配置导入边时,需要提前在GDB实例中写入点,要求分别存在id为
person-{{i}}
和book-{{i}}
的点,其中i取值0~100。
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column" : [
{
"random": "100,200",
"type": "double"
},
{
"random": "1,100",
"type": "long"
},
{
"random": "1,100",
"type": "long"
},
{
"random": "2000,2200",
"type": "long"
},
{
"random": "60,64",
"type": "string"
}
],
"sliceRecordCount": 1000
}
},
"writer": {
"name": "gdbwriter",
"parameter": {
"host": "gdb-endpoint",
"port": 8182,
"username": "root",
"password": "***",
"writeMode": "INSERT",
"labelType": "EDGE",
"label": "#{3}",
"idTransRule": "none",
"srcIdTransRule": "labelPrefix",
"dstIdTransRule": "labelPrefix",
"srcLabel":"person-",
"dstLabel":"book-",
"session":false,
"column": [
{
"name": "id",
"value": "#{0}",
"type": "string",
"columnType": "primaryKey"
},
{
"name": "id",
"value": "#{1}",
"type": "string",
"columnType": "srcPrimaryKey"
},
{
"name": "id",
"value": "#{2}",
"type": "string",
"columnType": "dstPrimaryKey"
},
{
"name": "edge_propKey",
"value": "#{4}",
"type": "string",
"columnType": "edgeProperty"
}
]
}
}
}
]
}
}
host
port
username
password
label
labelType
srcLabel
dstLabel
writeMode
idTransRule
srcIdTransRule
dstIdTransRule
session
Gremlin Client
的session模式写入数据maxRecordsInBatch
Gremlin Client
的session模式时,一次事务处理的记录数column
column -> name
column -> value
column -> type
column -> columnType
json
> {"properties":[
> {"k":"name","t":"string","v":"tom"},
> {"k":"age","t":"int","v":"20"},
> {"k":"sex","t":"string","v":"male"}
> ]}
>
> # json格式同样支持给点添加SET属性,格式如下
> {"properties":[
> {"k":"name","t":"string","v":"tom","c":"set"},
> {"k":"name","t":"string","v":"jack","c":"set"},
> {"k":"age","t":"int","v":"20"},
> {"k":"sex","t":"string","v":"male"}
> ]}
>
GDB实例规格
DataX压测机器
{
id: random double(1~10000)
from: random long(1~40000000)
to: random long(1~40000000)
label: random long(20000000 ~ 20005000)
propertyKey: random string(len: 120~128)
propertyName: random string(len: 120~128)
}
分点和边的配置,具体配置与上述的示例配置相似,下面列出关键的差异点
增加并发任务数量
"channel": 32
使用session模式
"session": true
增加事务批量处理记录个数
"maxRecordsInBatch": 128
点导入性能:
边导入性能:
1.0.20
版本及以上。