#
Extract Transform Load
Data sources & business requirements for ETL have great variance. ETL pipelines can be composed easily with LogBus. The simple example below demonstrates how TFKS vacuums/curates/coalesces daily OpenSearch indices into a monthly index which helps maintain a healthy operational posture.
templates:
tfks:
path: ../templates.yml
pipeline:
extract:
module: read-opensearch
config:
index: !!js/function >-
function() {
return 'logbus.journal-' + this.moment.utc().subtract(1, 'month').format('YYYY.MM.*')
}
scroll: 1m
search:
size: 333
endpoint: !!js/function >-
() => 'http://localhost:9200'
transform:
module: js
inputs: [extract]
config:
function: !!js/function >-
function(doc) {
const event = doc._source
event._id = doc._id
event._index = doc._index.slice(0, -3)
return event
}
load:
module: write-opensearch
inputs: [transform]
outputs: []
config:
bufferSize: 1000
endpoint: !!js/function >-
() => 'http://localhost:9200'
errors:
template: tfks.errors
stats:
template: tfks.stats
log:
inputs: [load, errors, stats]
Being bound by a single core does not have to be a limiting factor for large data sets. For example, multiple LogBus processes could operate on their own slice of the data, improving ingestion performance. The single-core workaround will depend on the system being queried. A general tactic is to shard the data set in the query (eg WHERE id % NUM_CONSUMERS = CONSUMER).