For every star on GitHub, we'll donate $2 to clean up our waterways. Star us now!
Meltano takes a modular approach to data engineering in general and EL(T) in particular, where your project and pipelines are composed of plugins of different types, most notably extractors (Singer taps), loaders (Singer targets), transformers (dbt and dbt models), and orchestrators (currently Airflow, with Dagster in development).
Meltano provides the glue to make these components work together smoothly and enables consistent configuration and deployment.
To learn how to manage your project’s plugins, refer to the Plugin Management guide.
In order to use a given package as a plugin in a project, assuming it meets the requirements of the plugin type in question, Meltano needs to know:
Together, a package’s location (1) and the metadata (2) describing it in terms Meltano can understand make up the base plugin description. In your project, plugins extend this description with a specific configuration (3) and a unique name.
This means that different configurations of the same package (base plugin)
would be represented in your project as separate plugins with their own unique names,
that can be thought of as differently initialized instances of the same class.
For example: extractors tap-postgres--billing
and tap-postgres--events
derived from base extractor tap-postgres
,
or tap-google-analytics--client-foo
and tap-google-analytics--client-bar
derived from base extractor tap-google-analytics
.
Each plugin in a project can either:
To learn how to add a plugin to your project, refer to the Plugin Management guide.
Base plugin descriptions for many popular extractors (Singer taps), loaders (Singer targets), and other plugins have already been collected by users and contributed to Meltano Hub, making them supported out of the box.
To find discoverable plugins, run meltano discover
or refer to the lists of Extractors, Loaders, etc.
To learn how to add a discoverable plugin to your project using a shadowing plugin definition or inheriting plugin definition, refer to the Plugin Management guide.
In the case of various popular data sources and destinations, multiple alternative implementations of Singer taps (extractors) and targets (loaders) exist, some of which are forks of an original (canonical) version that evolved in their own direction, while others were developed independently from the start.
These different implementations and their repositories typically use the same name (tap-<source>
or target-<destination>
)
and may on the surface appear interchangeable, but often vary significantly in terms of exact behavior, quality, and supported settings.
In its index of discoverable plugins, Meltano considers these different implementations different variants of the same plugin, that share a plugin name and other source/destination-specific details (like a logo and description), but have their own implementation-specific variant name and metadata (like capabilities and settings).
Every discoverable plugin has a default variant that is known to work well and recommended for new users, which will be added to your project unless you explicitly select a different one. Users who already have experience with a different variant (or have specific reasons to prefer it) can explicitly choose to add it to their project instead of the default, so that they get the same behavior and can use the same settings as before. If the variant in question is not discoverable yet, it can be added as a custom plugin.
When multiple variants of a discoverable plugin are available, meltano discover
will list their names alongside the plugin name.
To learn how to add a non-default variant of a discoverable plugin to your project, refer to the Plugin Management guide.
If you’d like to use a package in your project whose base plugin description isn’t discoverable yet, you’ll need to collect and provide this metadata yourself.
To learn how to add a custom plugin to your project using a custom plugin definition, refer to the Plugin Management guide.
Once you've got the plugin working in your project, please consider contributing its description to Meltano Hub to make it discoverable and supported out of the box for new users!
If you’d like to use the same package (base plugin) in your project multiple times with different configurations, you can add a new plugin that inherits from an existing one.
The new plugin will inherit its parent’s base plugin description and configuration as if they were defaults, which can then be overridden as appropriate.
For performance reasons, inherited plugins with an identical pip_url
to their parent share the parents underlying python virtualenv.
If you would prefer to create a separate virtualenv for an inherited plugin, modify it’s pip_url
to be different to its parent.
To learn how to add an inheriting plugin to your project using an inheriting plugin definition, refer to the Plugin Management guide.
When you add a plugin to your project using meltano add
, the discoverable plugin definition of the plugin will be downloaded and added to your project under plugins/<plugin_type>/<plugin_name>--<variant_name>.lock
. This will ensure that the plugin’s definition will be stable and version-controlled.
Later invocations of the plugin will use this file to determine the settings, installation source, etc.
Meltano supports the following types of plugins:
Extractors are pip packages used by meltano elt
as part of data integration.
They are responsible for pulling data out of arbitrary data sources: databases, SaaS APIs, or file formats.
Meltano supports Singer taps: executables that implement the Singer specification.
To learn which extractors are discoverable and supported out of the box, refer to the Extractors page or run meltano discover extractors
.
Extractors support the following extras:
catalog
extra #
_catalog
<EXTRACTOR>__CATALOG
, e.g. TAP_GITLAB__CATALOG
meltano elt
CLI option: --catalog
An extractor’s catalog
extra holds a path to a catalog file (relative to the project directory) to be provided to the extractor
when it is run in sync mode using meltano elt
or meltano invoke
.
If a catalog path is not set, the catalog will be generated on the fly by running the extractor in discovery mode and applying the schema, selection, and metadata rules to the discovered file.
Selection filter rules are always applied to manually provided catalogs as well as discovered ones.
While this extra can be managed using meltano config
or environment variables like any other setting,
a catalog file is typically provided using meltano elt
’s --catalog
option.
If the catalog does not seem to take effect, you may need to validate the capabilities of the tap.
Manage this extra directly in your meltano.yml
project file:
extractors:
- name: tap-gitlab
catalog: extract/tap-gitlab.catalog.json
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <extractor> set _catalog <path>
export <EXTRACTOR>__CATALOG=<path>
meltano elt <extractor> <loader> --catalog <path>
# For example:
meltano config tap-gitlab set _catalog extract/tap-gitlab.catalog.json
export TAP_GITLAB__CATALOG=extract/tap-gitlab.catalog.json
meltano elt tap-gitlab target-jsonl --catalog extract/tap-gitlab.catalog.json
load_schema
extra #
_load_schema
<EXTRACTOR>__LOAD_SCHEMA
, e.g. TAP_GITLAB__LOAD_SCHEMA
$MELTANO_EXTRACTOR_NAMESPACE
, which will expand to the extractor’s namespace
, e.g. tap_gitlab
for tap-gitlab
An extractor’s load_schema
extra
holds the name of the database schema extracted data should be loaded into,
when this extractor is used in a pipeline with a loader for a database that supports schemas, like PostgreSQL or Snowflake.
The value of this extra can be referenced from a loader’s configuration using the MELTANO_EXTRACT__LOAD_SCHEMA
pipeline environment variable.
It is used as the default value for the target-postgres
and target-snowflake
schema
settings.
Manage this extra directly in your meltano.yml
project file:
extractors:
- name: tap-gitlab
load_schema: gitlab_data
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <extractor> set _load_schema <schema>
export <EXTRACTOR>__LOAD_SCHEMA=<schema>
# For example:
meltano config tap-gitlab set _load_schema gitlab_data
export TAP_GITLAB__LOAD_SCHEMA=gitlab_data
metadata
extra #
_metadata
, alias: metadata
<EXTRACTOR>__METADATA
, e.g. TAP_GITLAB__METADATA
{}
(an empty object)An extractor’s metadata
extra holds an object describing
Singer stream and property metadata
rules that are applied to the extractor’s discovered catalog file
when the extractor is run using meltano elt
or meltano invoke
.
These rules are not applied when a catalog is provided manually.
Stream (entity) metadata <key>: <value>
pairs (e.g. {"replication-method": "INCREMENTAL"}
) are nested under top-level entity identifiers that correspond to Singer stream tap_stream_id
values.
These nested properties can also be thought of and interacted with as settings named _metadata.<entity>.<key>
.
Property (attribute) metadata <key>: <value>
pairs (e.g. {"is-replication-key": true}
) are nested under top-level entity identifiers and second-level attribute identifiers that correspond to Singer stream property names.
These nested properties can also be thought of and interacted with as settings named _metadata.<entity>.<attribute>.<key>
.
Unix shell-style wildcards can be used in entity and attribute identifiers to match multiple entities and/or attributes at once.
Entity and attribute names can be discovered using meltano select --list --all <plugin>
.
Manage this extra directly in your meltano.yml
project file:
extractors:
- name: tap-postgres
metadata:
some_stream_id:
replication-method: INCREMENTAL
replication-key: created_at
created_at:
is-replication-key: true
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <extractor> set _metadata <entity> <key> <value>
meltano config <extractor> set _metadata <entity> <attribute> <key> <value>
export <EXTRACTOR>__METADATA='{"<entity>": {"<key>": "<value>", "<attribute>": {"<key>": "<value>"}}}'
# Once metadata has been set in `meltano.yml`, environment variables can be used
# to override specific nested properties:
export <EXTRACTOR>__METADATA_<ENTITY>_<ATTRIBUTE>_<KEY>=<value>
# For example:
meltano config tap-postgres set _metadata some_stream_id replication-method INCREMENTAL
meltano config tap-postgres set _metadata some_stream_id replication-key created_at
meltano config tap-postgres set _metadata some_stream_id created_at is-replication-key true
export TAP_POSTGRES__METADATA_SOME_STREAM_ID_REPLICATION_METHOD=FULL_TABLE
schema
extra #
_schema
<EXTRACTOR>__SCHEMA
, e.g. TAP_GITLAB__SCHEMA
{}
(an empty object)An extractor’s schema
extra holds an object describing
Singer stream schema override
rules that are applied to the extractor’s discovered catalog file
when the extractor is run using meltano elt
or meltano invoke
.
These rules are not applied when a catalog is provided manually.
JSON Schema descriptions for specific properties (attributes) (e.g. {"type": ["string", "null"], "format": "date-time"}
) are nested under top-level entity identifiers that correspond to Singer stream tap_stream_id
values, and second-level attribute identifiers that correspond to Singer stream property names.
These nested properties can also be thought of and interacted with as settings named _schema.<entity>.<attribute>
and _schema.<entity>.<attribute>.<key>
.
Unix shell-style wildcards can be used in entity and attribute identifiers to match multiple entities and/or attributes at once.
Entity and attribute names can be discovered using meltano select --list --all <plugin>
.
If a schema is specified for a property that does not yet exist in the discovered stream’s schema, the property (and its schema) will be added to the catalog.
This allows you to define a full schema for taps such as tap-dynamodb
that do not themselves have the ability to discover the schema of their streams.
Manage this extra directly in your meltano.yml
project file:
extractors:
- name: tap-postgres
schema:
some_stream_id:
created_at:
type: ["string", "null"]
format: date-time
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <extractor> set _schema <entity> <attribute> <schema description>
meltano config <extractor> set _schema <entity> <attribute> <key> <value>
export <EXTRACTOR>__SCHEMA='{"<entity>": {"<attribute>": {"<key>": "<value>"}}}'
# Once schema descriptions have been set in `meltano.yml`, environment variables can be used
# to override specific nested properties:
export <EXTRACTOR>__SCHEMA_<ENTITY>_<ATTRIBUTE>_<KEY>=<value>
# For example:
meltano config tap-postgres set _metadata some_stream_id created_at type '["string", "null"]'
meltano config tap-postgres set _metadata some_stream_id created_at format date-time
export TAP_POSTGRES__SCHEMA_SOME_STREAM_ID_CREATED_AT_FORMAT=date
select
extra #
_select
<EXTRACTOR>__SELECT
, e.g. TAP_GITLAB__SELECT
["*.*"]
An extractor’s select
extra holds an array of entity selection rules
that are applied to the extractor’s discovered catalog file
when the extractor is run using meltano elt
or meltano invoke
.
These rules are not applied when a catalog is provided manually.
A selection rule is comprised of an entity identifier that corresponds to a Singer stream’s tap_stream_id
value, and an attribute identifier that that corresponds to a Singer stream property name, separated by a period (.
). Rules indicating that an entity or attribute should be excluded are prefixed with an exclamation mark (!
). Unix shell-style wildcards can be used in entity and attribute identifiers to match multiple entities and/or attributes at once.
Entity and attribute names can be discovered using meltano select --list --all <plugin>
.
While this extra can be managed using meltano config
or environment variables like any other setting,
selection rules are typically specified using meltano select
.
Manage this extra directly in your meltano.yml
project file:
extractors:
- name: tap-gitlab
select:
- project_members.*
- commits.*
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <extractor> set _select '["<entity>.<attribute>", ...]'
export <EXTRACTOR>__SELECT='["<entity>.<attribute>", ...]'
meltano select <extractor> <entity> <attribute>
# For example:
meltano config tap-gitlab set _select '["project_members.*", "commits.*"]'
export TAP_GITLAB__SELECT='["project_members.*", "commits.*"]'
meltano select tap-gitlab project_members "*"
meltano select tap-gitlab commits "*"
select_filter
extra #
_select_filter
<EXTRACTOR>__SELECT_FILTER
, e.g. TAP_GITLAB__SELECT_FILTER
meltano elt
CLI options: --select
and --exclude
[]
An extractor’s select_filter
extra holds an array of entity selection filter rules
that are applied to the extractor’s discovered or provided catalog file
when the extractor is run using meltano elt
or meltano invoke
,
after schema, selection, and metadata rules are applied.
It can be used to only extract records for specific matching entities, or to extract records for all entities except for those specified, by letting you apply filters on top of configured entity selection rules.
Selection filter rules use entity identifiers that correspond to Singer stream tap_stream_id
values. Rules indicating that an entity should be excluded are prefixed with an exclamation mark (!
). Unix shell-style wildcards can be used in entity identifiers to match multiple entities at once.
Entity names can be discovered using meltano select --list --all <plugin>
.
While this extra can be managed using meltano config
or environment variables like any other setting,
selection filers are typically specified using meltano elt
’s --select
and --exclude
options.
Manage this extra directly in your meltano.yml
project file:
extractors:
- name: tap-gitlab
select:
- project_members.*
- commits.*
select_filter:
- commits
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <extractor> set _select_filter '["<entity>", ...]'
meltano config <extractor> set _select_filter '["!<entity>", ...]'
export <EXTRACTOR>__SELECT_FILTER='["<entity>", ...]'
export <EXTRACTOR>__SELECT_FILTER='["!<entity>", ...]'
meltano elt <extractor> <loader> --select <entity>
meltano elt <extractor> <loader> --exclude <entity>
# For example:
meltano config tap-gitlab set _select_filter '["commits"]'
meltano config tap-gitlab set _select_filter '["!project_members"]'
export TAP_GITLAB__SELECT_FILTER='["commits"]'
export TAP_GITLAB__SELECT_FILTER='["!project_members"]'
meltano elt tap-gitlab target-jsonl --select commits
meltano elt tap-gitlab target-jsonl --exclude project_members
state
extra #
_state
<EXTRACTOR>__STATE
, e.g. TAP_GITLAB__STATE
meltano elt
CLI option: --state
An extractor’s state
extra holds a path to a state file (relative to the project directory) to be provided to the extractor
when it is run as part of a pipeline using meltano elt
.
If a state path is not set, the state will be looked up automatically based on the ELT run’s State ID.
While this extra can be managed using meltano config
or environment variables like any other setting,
a state file is typically provided using meltano elt
’s --state
option.
Manage this extra directly in your meltano.yml
project file:
extractors:
- name: tap-gitlab
state: extract/tap-gitlab.state.json
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <extractor> set _state <path>
export <EXTRACTOR>__STATE=<path>
meltano elt <extractor> <loader> --state <path>
# For example:
meltano config tap-gitlab set _state extract/tap-gitlab.state.json
export TAP_GITLAB__STATE=extract/tap-gitlab.state.json
meltano elt tap-gitlab target-jsonl --state extract/tap-gitlab.state.json
Loaders are pip packages used by meltano elt
as part of data integration.
They are responsible for loading extracted data into arbitrary data destinations: databases, SaaS APIs, or file formats.
Meltano supports Singer targets: executables that implement the Singer specification.
To learn which loaders are discoverable and supported out of the box, refer to the Loaders page or run meltano discover loaders
.
Loaders support the following extras:
dialect
extra #
_dialect
<LOADER>__DIALECT
, e.g. TARGET_POSTGRES__DIALECT
$MELTANO_LOADER_NAMESPACE
, which will expand to the loader’s namespace
. Note that this default has been overridden on discoverable loaders, e.g. postgres
for target-postgres
and snowflake
for target-snowflake
.A loader’s dialect
extra
holds the name of the dialect of the target database, so that
transformers in the same pipeline can determine the type of database to connect to.
The value of this extra can be referenced from a transformer’s configuration using the MELTANO_LOAD__DIALECT
pipeline environment variable.
It is used as the default value for dbt
’s target
setting, and should therefore correspond to a target name in transform/profile/profiles.yml
.
Manage this extra directly in your meltano.yml
project file:
loaders:
- name: target-example-db
dialect: example-db
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <loader> set _dialect <dialect>
export <LOADER>__DIALECT=<dialect>
# For example:
meltano config target-example-db set _dialect example-db
export TARGET_EXAMPLE_DB__DIALECT=example-db
target_schema
extra #
_target_schema
<LOADER>__TARGET_SCHEMA
, e.g. TARGET_POSTGRES__TARGET_SCHEMA
$MELTANO_LOAD_SCHEMA
, which will expand to the value of the loader’s schema
settingA loader’s target_schema
extra
holds the name of the database schema the loader has been configured to load data into (assuming the destination supports schemas), so that
transformers in the same pipeline can determine the database schema to load data from.
The value of this extra is usually not set explicitly, since its should correspond to the value of the loader’s own “target schema” setting.
If the name of this setting is not schema
, its value can be referenced from the extra’s value using $MELTANO_LOAD_<TARGET_SCHEMA_SETTING>
, e.g. $MELTANO_LOAD_DESTINATION_SCHEMA
for setting destination_schema
.
The value of this extra can be referenced from a transformer’s configuration using the MELTANO_LOAD__TARGET_SCHEMA
pipeline environment variable.
It is used as the default value for dbt
’s source_schema
setting.
Manage this extra directly in your meltano.yml
project file:
loaders:
- name: target-example-db
settings:
- name: destination_schema
target_schema: $MELTANO_LOAD_DESTINATION_SCHEMA # Value of `destination_schema` setting
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <loader> set _target_schema <schema>
export <LOADER>__TARGET_SCHEMA=<schema>
# For example:
meltano config target-example-db set _target_schema '$MELTANO_LOAD_DESTINATION_SCHEMA'
# If the target schema cannot be determined dynamically using a setting reference:
meltano config target-example-db set _target_schema explicit_target_schema
export TARGET_EXAMPLE_DB__TARGET_SCHEMA=explicit_target_schema
Transforms are dbt packages containing dbt models,
that are used by meltano elt
as part of data transformation.
Together with the dbt transformer, they are responsible for transforming data that has been loaded into a database (data warehouse) into a different format, usually one more appropriate for analysis.
When a transform is added to your project using meltano add
,
the dbt package Git repository referenced by its pip_url
will be added to your project’s transform/packages.yml
and the package will be enabled in transform/dbt_project.yml
.
Transforms support the following extras:
package_name
extra #
_package_name
<TRANSFORM>__PACKAGE_NAME
, e.g. TAP_GITLAB__PACKAGE_NAME
$MELTANO_TRANSFORM_NAMESPACE
, which will expand to the transform’s namespace
, e.g. tap_gitlab
for tap-gitlab
A transform’s package_name
extra
holds the name of the dbt package’s internal dbt project: the value of name
in dbt_project.yml
.
When a transform is added to your project using meltano add
, this name will be added to the models
dictionary in transform/dbt_project.yml
.
The value of this extra can be referenced from a transformer’s configuration using the MELTANO_TRANSFORM__PACKAGE_NAME
pipeline environment variable.
It is included in the default value for dbt
’s models
setting: $MELTANO_TRANSFORM__PACKAGE_NAME $MELTANO_EXTRACTOR_NAMESPACE my_meltano_model
.
Manage this extra directly in your meltano.yml
project file:
transforms:
- name: dbt-facebook-ads
namespace: tap_facebook
package_name: facebook_ads
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <transform> set _package_name <name>
export <TRANSFORM>__PACKAGE_NAME=<name>
# For example:
meltano config dbt-facebook-ads set _package_name facebook_ads
export DBT_FACEBOOK_ADS__PACKGE_NAME=facebook_ads
vars
extra #
_vars
<TRANSFORM>__VARS
, e.g. TAP_GITLAB__VARS
{}
(an empty object)A transform’s vars
extra holds an object representing dbt model variables
that can be referenced from a model using the var
function.
When a transform is added to your project using meltano add
, this object will be used as the dbt model’s vars
object in transform/dbt_project.yml
.
Because these variables are handled by dbt rather than Meltano, environment variables can be referenced using the env_var
function instead of $VAR
or ${VAR}
.
Manage this extra directly in your meltano.yml
project file:
transforms:
- name: tap-gitlab
vars:
schema: '{{ env_var(''DBT_SOURCE_SCHEMA'') }}'
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <transform> set _vars <key> <value>
export <TRANSFORM>__VARS='{"<key>": "<value>"}'
# For example
meltano config --plugin-type=transform tap-gitlab set _vars schema "{{ env_var('DBT_SOURCE_SCHEMA') }}"
export TAP_GITLAB__VARS='{"schema": "{{ env_var(''DBT_SOURCE_SCHEMA'') }}"}'
Orchestrators are pip packages responsible for orchestrating a project’s scheduled pipelines.
Meltano supports Apache Airflow out of the box, but can be used with any tool capable of reading the output of meltano schedule list --format=json
and executing each pipeline’s meltano elt
command on a schedule.
When the airflow
orchestrator is added to your project using meltano add
,
its related file bundle will automatically be added as well.
Transformers are pip packages used by meltano elt
as part of data transformation.
They are responsible for running transforms.
Meltano supports dbt and its dbt models out of the box.
When the dbt
transformer is added to your project using meltano add
,
its related file bundle will automatically be added as well.
File bundles are pip packages bundling files you may want in your project.
When a file bundle is added to your project using meltano add
,
the bundled files will automatically be added as well.
The file bundle itself will not be added to your meltano.yml
project file unless it contains files that are
managed by the file bundle and to be updated automatically when meltano upgrade
is run.
update
extra #
_update
<BUNDLE>__UPDATE
, e.g. DBT__UPDATE
{}
(an empty object)A file bundle’s update
extra holds an object mapping file paths (of files inside the bundle, relative to the project root) to booleans. Glob patterns are supported to allow matching of multiple files with a single path.
When a file path’s value is True
, the matching files are considered to be managed by the file bundle and updated automatically when meltano upgrade
is run.
Manage this extra directly in your meltano.yml
project file:
files:
- name: dbt
update:
transform/dbt_project.yml: false
profiles/*.yml: true
If a file path starts with a *
, it must be wrapped in quotes to be considered valid YAML. For example, using *.yml
to match all .yml
files:
files: - name: dbt update: '*.yml': true
Alternatively, manage this extra using meltano config
or an environment variable:
meltano config <bundle> set _update <path> <true/false>
export <BUNDLE>__UPDATE='{"<path>": <true/false>}'
# For example:
meltano config --plugin-type=files dbt set _update transform/dbt_project.yml false
meltano config --plugin-type=files dbt set _update profiles/*.yml true
export DBT__UPDATE='{"transform/dbt_project.yml": false, "profiles/*.yml": true}'
If none of the other plugin types address your needs, any pip package that exposes an executable can be added to your project as a utility. Meltano has a selection of available utilities listed on MeltanoHub, or you can easily add your own custom utility.
Any pip package that exposes an executable can be added to your project as a custom utility.
meltano add --custom utility <plugin>
# For example:
meltano add --custom utility yoyo
(namespace): yoyo
(pip_url): yoyo-migrations
(executable): yoyo
You can then invoke the executable using meltano invoke
:
meltano invoke <plugin> [<executable arguments>...]
# For example:
meltano invoke yoyo new ./migrations -m "Add column to foo"
The benefit of doing this as opposed to adding the package to requirements.txt
or running pip install <package>
directly is that any packages installed this way benefit from Meltano’s virtual environment isolation.
This avoids dependency conflicts between packages.
Mappers allow you to transform or manipulate data after extraction and before loading. Common applications include:
Note that mappers are currently only available when using meltano run
.
You can install mappers like any other other plugin using meltano add
:
$ meltano discover mappers
Mappers
transform-field
meltano-map-transformer
$ meltano add mapper transform-field
Installing mapper 'transform-field'...
Installed mapper 'transform-field'
To learn more about mapper 'transform-field', visit https://github.com/transferwise/pipelinewise-transform-field
Mappers are unique in that after install you don’t invoke them directly. Instead you define mappings
by name and add a config object for each mapping.
This config object is passed to the mapper when the mapping name is called as part of a meltano run
invocation.
Note that this differs from other plugins, as you’re not invoking a plugin name - but referencing the mapping name instead.
Additionally, the requirements for the config object itself will vary by plugin.
So given a mapper with mappings configured like so:
mappers:
- name: transform-field
variant: transferwise
pip_url: pipelinewise-transform-field
executable: transform-field
mappings:
- name: hide-gitlab-secrets
config:
transformations:
- field_id: "author_email"
tap_stream_name: "commits"
type: "MASK-HIDDEN"
- field_id: "committer_email"
tap_stream_name: "commits"
type: "MASK-HIDDEN"
- name: null-created-at
config:
transformations:
- field_id: "created_at"
tap_stream_name: "accounts"
type: "SET-NULL"
You can then invoke the mappings by name:
# hide-gitlab-secrets will resolve to mapping with the same name. In this case, that mapping will perform two actions.
# Transform the "author_email" field in the "commits" stream and hide the email address.
# Transform the "committer_email" field in the "commits" stream and hide the email address.
$ meltano run tap-gitlab hide-gitlab-secrets target-jsonl
# null-created-at will resolve to mapping with the same name. In this case, that mapping will perform one action.
# Transform the "created_at" field in the "accounts" stream and set it to null.
$ meltano run tap-someapi null-created-at target-jsonl
You can also invoke multiple mappings at once in series:
$ tap-someapi fix-null-id fix-country-code target-jsonl
Each mapping will execute in a unique process instance of the mapper plugin. That means that you can also call mappings that leverage the same plugin at multiple locations numerous times within the run invocation:
# Fix any null country codes using transform-field mapper.
# Set the customers region based on their country code using your own mapper.
# Mask the id if the customer is in the EU region using transform-field mapper.
$ tap-someapi fix-null-country set-region-from-country mask-id-if-eu target-jsonl