# Extract Transform Load

Data sources & business requirements for ETL have great variance. ETL pipelines can be composed easily with LogBus. The simple example below demonstrates how TFKS vacuums/curates/coalesces daily OpenSearch indices into a monthly index which helps maintain a healthy operational posture.

templates:
  tfks:
    path: ../templates.yml

pipeline:

  extract:
    module: read-opensearch
    config:
      index: !!js/function >-
        function() {
          return 'logbus.journal-' + this.moment.utc().subtract(1, 'month').format('YYYY.MM.*')
        }
      scroll: 1m
      search:
        size: 333
      endpoint: !!js/function >-
        () => 'http://localhost:9200'

  transform:
    module: js
    inputs: [extract]
    config:
      function: !!js/function >-
        function(doc) {
          const event = doc._source
          event._id = doc._id
          event._index = doc._index.slice(0, -3)
          return event
        }

  load:
    module: write-opensearch
    inputs: [transform]
    outputs: []
    config:
      bufferSize: 1000
      endpoint: !!js/function >-
        () => 'http://localhost:9200'

  errors:
    template: tfks.errors

  stats:
    template: tfks.stats

  log:
    inputs: [load, errors, stats]

Being bound by a single core does not have to be a limiting factor for large data sets. For example, multiple LogBus processes could operate on their own slice of the data, improving ingestion performance. The single-core workaround will depend on the system being queried. A general tactic is to shard the data set in the query (eg WHERE id % NUM_CONSUMERS = CONSUMER).