Architecture

Orion Framework Architecture

Explore the canonical models that shape Orion — boundaries, data flow and connector responsibilities working together for resilient pipelines.

Arquitetura

Arquitetura do Orion Framework

Explore os modelos de arquitetura que orientam o Orion — limites, fluxo de dados e responsabilidades dos conectores alinhados para pipelines resilientes.

Layered Architecture Model

Orion enforces a layered model to keep pipeline logic independent from orchestration, logging and storage concerns.

Core Layer

Pure business rules for pipelines, kept free from external dependencies.

  • Entities for pipelines, nodes and datasets
  • Interfaces describing connectors, logging and catalog access
  • Use cases such as RunPipelineUseCase to drive execution

Application Layer

Coordinates orchestration by composing core abstractions.

  • CLI, pipeline builders and runners
  • Wires the context, catalog and logging for each execution
  • Guarantees dependency inversion — only depends on core interfaces

Infrastructure Layer

Implements integrations while honoring contracts defined by core.

  • Connectors for filesystems, warehouses and APIs
  • Concrete loggers and configuration providers
  • Persists datasets via DataCatalog and context helpers

Modelo em Camadas

O Orion aplica um modelo em camadas para manter a lógica das pipelines isolada de orquestração, logging e persistência.

Core Layer

Regras de negócio puras para pipelines, sem dependências externas.

  • Entidades de pipeline, nodes e datasets
  • Interfaces para conectores, logging e acesso ao catálogo
  • Casos de uso como RunPipelineUseCase para conduzir a execução

Application Layer

Coordena a orquestração compondo as abstrações do core.

  • CLI, builders de pipeline e runners
  • Configura contexto, catálogo e logging para cada execução
  • Garante inversão de dependência — depende apenas de interfaces do core

Infrastructure Layer

Implementa integrações respeitando os contratos definidos no core.

  • Conectores para arquivos, warehouses e APIs
  • Loggers concretos e provedores de configuração
  • Persiste datasets via DataCatalog e utilidades do contexto

Context Boundaries

External actors reach Orion exclusively through the CLI and configuration artifacts, while the layered core mediates every call to underlying services.

Limites de Contexto

Atores externos acessam o Orion exclusivamente pela CLI e artefatos de configuração, enquanto o núcleo em camadas media cada chamada aos serviços subjacentes.

graph LR
    subgraph "External Actors"
        Operator["Operator / CLI User"]
        Scheduler["Scheduler / Orchestrator"]
        Repo["Pipeline Module"]
        CatalogFile["Catalog YAML"]
        SourceSystems["Source Systems"]
        TargetStores["Target Stores"]
    end

    subgraph "Orion Framework"
        CLI["orion CLI"]
        Application["Application Layer"]
        Core["Core Layer"]
        Infrastructure["Infrastructure Layer"]
    end

    Operator --> CLI
    Scheduler --> CLI
    Repo --> Application
    CatalogFile --> Application
    CLI --> Application
    Application --> Core
    Core --> Infrastructure
    SourceSystems <---> Infrastructure
    Infrastructure --> TargetStores

Data Flow Through a Pipeline

Each run resolves configuration, builds the execution context and iterates nodes while caching inputs in memory and persisting outputs when the catalog requires.

Fluxo de Dados na Pipeline

Cada execução resolve a configuração, monta o contexto e itera os nodes armazenando entradas em memória e persistindo saídas quando o catálogo exige.

flowchart TD
    CLI["CLI Command: orion run"] --> LoadCatalog["Load catalog.yml"]
    LoadCatalog --> BuildContext["Build OrionContext (catalog, logger, hooks)"]
    BuildContext --> LoadPipeline["PipelineBuilder imports module"]
    LoadPipeline --> UseCase["RunPipelineUseCase"]
    UseCase --> Iterate{"Next node?"}
    Iterate -->|Resolve inputs| ResolveInputs["Read from in-memory cache"]
    ResolveInputs --> MemoryBranch{"Inputs missing?"}
    MemoryBranch -->|No| ExecuteNode["Execute node function"]
    MemoryBranch -->|Yes| FetchCatalog["Catalog.load(dataset)"]
    FetchCatalog --> Connectors["Connectors fetch data"]
    Connectors --> ExecuteNode
    ExecuteNode --> Persist["Store outputs in cache"]
    Persist --> CatalogCheck{"Catalog entry configured?"}
    CatalogCheck -->|Yes| SaveCatalog["Catalog.save(outputs)"]
    CatalogCheck -->|No| Iterate
    SaveCatalog --> Iterate
    Iterate -->|All nodes processed| Finish["Return collected data"]

Layer Communication Sequence

The sequence diagram highlights how the CLI request travels across layers, keeping application logic thin and delegating work to core and infrastructure components.

Sequência de Comunicação

O diagrama de sequência mostra como a requisição da CLI percorre as camadas, mantendo a lógica de aplicação enxuta e delegando trabalho a core e infraestrutura.

sequenceDiagram
    participant Operator as Operator
    participant CLI as CLI
    participant Builder as PipelineBuilder
    participant UseCase as RunPipelineUseCase
    participant Catalog as DataCatalog
    participant Connector as Connector Adapter
    participant Node as Pipeline Node

    Operator->>CLI: orion run --module pipeline --catalog catalog.yml
    CLI->>Builder: load_pipeline(module)
    Builder-->>CLI: pipeline definition
    CLI->>UseCase: execute(pipeline, context)
    UseCase->>Catalog: load(inputs)
    Catalog->>Connector: fetch(dataset)
    Connector-->>Catalog: dataset payload
    Catalog-->>UseCase: provide inputs
    UseCase->>Node: call(**kwargs)
    Node-->>UseCase: outputs
    UseCase->>Catalog: save(outputs)
    Catalog->>Connector: persist(dataset)
    Connector-->>Catalog: confirmation

Connector Interaction Map

Infrastructure adapters plug into the catalog to abstract persistence mechanics while providing observability hooks through the shared context.

Mapa de Conectores

Adaptadores de infraestrutura plugam-se ao catálogo para abstrair a persistência e expõem ganchos de observabilidade via contexto compartilhado.

graph TD
    subgraph "Infrastructure Layer"
        Context["OrionContext"]
        Catalog["DataCatalog"]
        Loader["Dataset Loader"]
        Saver["Dataset Saver"]
        Logger["Logger Adapter"]
    end

    subgraph "Connector Adapters"
        FileConn["FilesystemConnector"]
        DBConn["DatabricksConnector"]
        APIConn["APIConnector"]
    end

    subgraph "External Services"
        DataLake["Data Lake / Files"]
        Warehouse["Warehouse"]
        Services["HTTP Services"]
    end

    Context --> Catalog
    Catalog --> Loader
    Catalog --> Saver
    Context --> Logger
    Loader --> FileConn
    Loader --> DBConn
    Loader --> APIConn
    Saver --> FileConn
    Saver --> DBConn
    Saver --> APIConn
    FileConn --> DataLake
   DBConn --> Warehouse
    APIConn --> Services

Node Execution Lifecycle

A node resolves its inputs, executes custom logic with the Orion context and persists outputs whenever the catalog defines a matching dataset.

Ciclo de Execução do Node

Um node resolve os inputs, executa lógica customizada com o contexto Orion e persiste os outputs sempre que o catálogo define um dataset correspondente.

sequenceDiagram
    participant P as Pipeline
    participant N as Node
    participant C as Context
    participant Cat as Catalog
    participant Conn as Connector
    participant Mem as Memory(data)

    P->>N: Prepare inputs
    loop For each input
        N->>Mem: Is value cached?
        alt In memory
            Mem-->>N: Return cached value
        else Not cached
            N->>Cat: catalog.load(name)
            Cat->>Conn: pick connector
            Conn->>Conn: load()
            Conn-->>Cat: data payload
            Cat-->>N: provide value
        end
    end

    N->>C: call(context, *inputs)
    C->>C: process data
    C->>C: logger.info(...)
    C-->>N: outputs

    N->>Mem: store outputs

    alt Catalog entry exists
        N->>Cat: catalog.save(output)
        Cat->>Conn: pick connector
        Conn->>Conn: save()
    end

    N-->>P: Node finished

Catalog Structure

The catalog bridges YAML definitions and connector implementations, keeping metadata centralized while adapters handle the underlying storage.

Estrutura do Catálogo

O catálogo faz a ponte entre definições YAML e implementações de conectores, centralizando metadados enquanto adaptadores cuidam do storage.

graph LR
    subgraph "catalog.yml"
        YAML["YAML File
dataset: type: ..."] end subgraph "DataCatalog" Entries["entries dict"] Registry["connectors registry"] end subgraph "Connector Implementations" Local["LocalCSVConnector"] DB["DatabricksConnector"] end YAML -->|from_yaml| Entries Entries -->|lookup| Local Entries -->|lookup| DB Entries -->|load/save| Local Entries -->|load/save| DB

Pipeline Lifecycle

Pipelines move from definition to execution and completion through well-defined stages managed by the builder, context and runner utilities.

Ciclo de Vida da Pipeline

Pipelines evoluem da definição à execução e conclusão em estágios bem definidos gerenciados por builder, contexto e runner.

stateDiagram-v2
    [*] --> Defined: PipelineBuilder.build()
    Defined --> ContextReady: Create OrionContext
    ContextReady --> Prepared: Load catalog.yml
    Prepared --> Running: PipelineRunner.run()
    Running --> ProcessingNode: For each node
    ProcessingNode --> LoadingInputs: Resolve inputs
    LoadingInputs --> Executing: Execute node
    Executing --> Persisting: Save outputs (if catalog)
    Persisting --> ProcessingNode
    ProcessingNode --> Completed: All nodes processed
    Completed --> [*]

Analyst to Engineering Collaboration

Orion promotes an SQL-first dialogue between analysts and engineers. Use the shared template to capture the business question, SQL sketch and validation before a new node is built.

Visit the dedicated handoff page to see analyst and engineering walkthroughs, plus a ready-to-download template with catalog and node examples.

  • Collect context with the handoff template (business need, datasets, grain, constraints).
  • Draft the “Requirements → SQL” block highlighting joins, filters and open questions.
  • Fill the impact matrix to map each output column back to a catalog entry or new source.
  • Describe the validation scenario so engineers know how to certify the result.

The document also exemplifies how the SQL draft becomes an Orion node and how to wire it into the pipeline builder.

Colaboração Analista → Engenharia

O Orion incentiva a conversa em SQL entre analistas e engenheiros. Utilize o template compartilhado para registrar a demanda, o esboço SQL e a validação antes de criar um novo node.

Acesse a página dedicada de handoff para conferir o passo a passo de analistas e engenheiros, além do template pronto com exemplos de catálogo e nodes.

  • Reúna o contexto com o template de handoff (pergunta de negócio, datasets, granularidade, restrições).
  • Preencha o bloco “Requisitos → SQL” destacando joins, filtros e dúvidas em aberto.
  • Complete a matriz de impacto para mapear cada coluna de saída a um alias do catálogo ou nova fonte.
  • Descreva o cenário de validação para que a engenharia saiba como certificar o resultado.

O documento também mostra como o esboço SQL vira um node Orion e como registrá-lo no pipeline builder.