Mara Pipelines documentation¶

Welcome to Mara Pipelines’s documentation. This is one of the core modules of the Mara Framework contains a lightweight data transformation framework with a focus on transparency and complexity reduction. It has a number of baked-in assumptions/ principles:

Data integration pipelines as code: pipelines, tasks and commands are created using declarative Python code.
PostgreSQL as a data processing engine.
Extensive web ui. The web browser as the main tool for inspecting, running and debugging pipelines.
GNU make semantics. Nodes depend on the completion of upstream nodes. No data dependencies or data flows.
No in-app data processing: command line tools as the main tool for interacting with databases and data.
Single machine pipeline execution based on Python’s multiprocessing. No need for distributed task queues. Easy debugging and output logging.
Cost based priority queues: nodes with higher cost (based on recorded run times) are run first.

User’s Guide¶

This part of the documentation focuses on step-by-step instructions how to use this module.

API Reference¶

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

Additional Notes¶

Legal information and changelog are here for the interested.

Mara Pipelines documentation¶

User’s Guide¶

API Reference¶

Additional Notes¶

Mara Pipelines

Navigation

Related Topics