Context Boundaries
External actors reach Orion exclusively through the CLI and configuration artifacts, while the layered core mediates every call to underlying services.
Limites de Contexto
Atores externos acessam o Orion exclusivamente pela CLI e artefatos de configuração, enquanto o núcleo em camadas media cada chamada aos serviços subjacentes.
graph LR
subgraph "External Actors"
Operator["Operator / CLI User"]
Scheduler["Scheduler / Orchestrator"]
Repo["Pipeline Module"]
CatalogFile["Catalog YAML"]
SourceSystems["Source Systems"]
TargetStores["Target Stores"]
end
subgraph "Orion Framework"
CLI["orion CLI"]
Application["Application Layer"]
Core["Core Layer"]
Infrastructure["Infrastructure Layer"]
end
Operator --> CLI
Scheduler --> CLI
Repo --> Application
CatalogFile --> Application
CLI --> Application
Application --> Core
Core --> Infrastructure
SourceSystems <---> Infrastructure
Infrastructure --> TargetStores
Data Flow Through a Pipeline
Each run resolves configuration, builds the execution context and iterates nodes while caching inputs in memory and persisting outputs when the catalog requires.
Fluxo de Dados na Pipeline
Cada execução resolve a configuração, monta o contexto e itera os nodes armazenando entradas em memória e persistindo saídas quando o catálogo exige.
flowchart TD
CLI["CLI Command: orion run"] --> LoadCatalog["Load catalog.yml"]
LoadCatalog --> BuildContext["Build OrionContext (catalog, logger, hooks)"]
BuildContext --> LoadPipeline["PipelineBuilder imports module"]
LoadPipeline --> UseCase["RunPipelineUseCase"]
UseCase --> Iterate{"Next node?"}
Iterate -->|Resolve inputs| ResolveInputs["Read from in-memory cache"]
ResolveInputs --> MemoryBranch{"Inputs missing?"}
MemoryBranch -->|No| ExecuteNode["Execute node function"]
MemoryBranch -->|Yes| FetchCatalog["Catalog.load(dataset)"]
FetchCatalog --> Connectors["Connectors fetch data"]
Connectors --> ExecuteNode
ExecuteNode --> Persist["Store outputs in cache"]
Persist --> CatalogCheck{"Catalog entry configured?"}
CatalogCheck -->|Yes| SaveCatalog["Catalog.save(outputs)"]
CatalogCheck -->|No| Iterate
SaveCatalog --> Iterate
Iterate -->|All nodes processed| Finish["Return collected data"]
Layer Communication Sequence
The sequence diagram highlights how the CLI request travels across layers, keeping application logic thin and delegating work to core and infrastructure components.
Sequência de Comunicação
O diagrama de sequência mostra como a requisição da CLI percorre as camadas, mantendo a lógica de aplicação enxuta e delegando trabalho a core e infraestrutura.
sequenceDiagram
participant Operator as Operator
participant CLI as CLI
participant Builder as PipelineBuilder
participant UseCase as RunPipelineUseCase
participant Catalog as DataCatalog
participant Connector as Connector Adapter
participant Node as Pipeline Node
Operator->>CLI: orion run --module pipeline --catalog catalog.yml
CLI->>Builder: load_pipeline(module)
Builder-->>CLI: pipeline definition
CLI->>UseCase: execute(pipeline, context)
UseCase->>Catalog: load(inputs)
Catalog->>Connector: fetch(dataset)
Connector-->>Catalog: dataset payload
Catalog-->>UseCase: provide inputs
UseCase->>Node: call(**kwargs)
Node-->>UseCase: outputs
UseCase->>Catalog: save(outputs)
Catalog->>Connector: persist(dataset)
Connector-->>Catalog: confirmation
Connector Interaction Map
Infrastructure adapters plug into the catalog to abstract persistence mechanics while providing observability hooks through the shared context.
Mapa de Conectores
Adaptadores de infraestrutura plugam-se ao catálogo para abstrair a persistência e expõem ganchos de observabilidade via contexto compartilhado.
graph TD
subgraph "Infrastructure Layer"
Context["OrionContext"]
Catalog["DataCatalog"]
Loader["Dataset Loader"]
Saver["Dataset Saver"]
Logger["Logger Adapter"]
end
subgraph "Connector Adapters"
FileConn["FilesystemConnector"]
DBConn["DatabricksConnector"]
APIConn["APIConnector"]
end
subgraph "External Services"
DataLake["Data Lake / Files"]
Warehouse["Warehouse"]
Services["HTTP Services"]
end
Context --> Catalog
Catalog --> Loader
Catalog --> Saver
Context --> Logger
Loader --> FileConn
Loader --> DBConn
Loader --> APIConn
Saver --> FileConn
Saver --> DBConn
Saver --> APIConn
FileConn --> DataLake
DBConn --> Warehouse
APIConn --> Services
Node Execution Lifecycle
A node resolves its inputs, executes custom logic with the Orion context and persists outputs whenever the catalog defines a matching dataset.
Ciclo de Execução do Node
Um node resolve os inputs, executa lógica customizada com o contexto Orion e persiste os outputs sempre que o catálogo define um dataset correspondente.
sequenceDiagram
participant P as Pipeline
participant N as Node
participant C as Context
participant Cat as Catalog
participant Conn as Connector
participant Mem as Memory(data)
P->>N: Prepare inputs
loop For each input
N->>Mem: Is value cached?
alt In memory
Mem-->>N: Return cached value
else Not cached
N->>Cat: catalog.load(name)
Cat->>Conn: pick connector
Conn->>Conn: load()
Conn-->>Cat: data payload
Cat-->>N: provide value
end
end
N->>C: call(context, *inputs)
C->>C: process data
C->>C: logger.info(...)
C-->>N: outputs
N->>Mem: store outputs
alt Catalog entry exists
N->>Cat: catalog.save(output)
Cat->>Conn: pick connector
Conn->>Conn: save()
end
N-->>P: Node finished
Catalog Structure
The catalog bridges YAML definitions and connector implementations, keeping metadata centralized while adapters handle the underlying storage.
Estrutura do Catálogo
O catálogo faz a ponte entre definições YAML e implementações de conectores, centralizando metadados enquanto adaptadores cuidam do storage.
graph LR
subgraph "catalog.yml"
YAML["YAML File
dataset: type: ..."]
end
subgraph "DataCatalog"
Entries["entries dict"]
Registry["connectors registry"]
end
subgraph "Connector Implementations"
Local["LocalCSVConnector"]
DB["DatabricksConnector"]
end
YAML -->|from_yaml| Entries
Entries -->|lookup| Local
Entries -->|lookup| DB
Entries -->|load/save| Local
Entries -->|load/save| DB
Pipeline Lifecycle
Pipelines move from definition to execution and completion through well-defined stages managed by the builder, context and runner utilities.
Ciclo de Vida da Pipeline
Pipelines evoluem da definição à execução e conclusão em estágios bem definidos gerenciados por builder, contexto e runner.
stateDiagram-v2
[*] --> Defined: PipelineBuilder.build()
Defined --> ContextReady: Create OrionContext
ContextReady --> Prepared: Load catalog.yml
Prepared --> Running: PipelineRunner.run()
Running --> ProcessingNode: For each node
ProcessingNode --> LoadingInputs: Resolve inputs
LoadingInputs --> Executing: Execute node
Executing --> Persisting: Save outputs (if catalog)
Persisting --> ProcessingNode
ProcessingNode --> Completed: All nodes processed
Completed --> [*]
Analyst to Engineering Collaboration
Orion promotes an SQL-first dialogue between analysts and engineers. Use the shared template to capture the business question, SQL sketch and validation before a new node is built.
Visit the dedicated handoff page to see analyst and engineering walkthroughs, plus a ready-to-download template with catalog and node examples.
- Collect context with the handoff template (business need, datasets, grain, constraints).
- Draft the “Requirements → SQL” block highlighting joins, filters and open questions.
- Fill the impact matrix to map each output column back to a catalog entry or new source.
- Describe the validation scenario so engineers know how to certify the result.
The document also exemplifies how the SQL draft becomes an Orion node and how to wire it into the pipeline builder.
Colaboração Analista → Engenharia
O Orion incentiva a conversa em SQL entre analistas e engenheiros. Utilize o template compartilhado para registrar a demanda, o esboço SQL e a validação antes de criar um novo node.
Acesse a página dedicada de handoff para conferir o passo a passo de analistas e engenheiros, além do template pronto com exemplos de catálogo e nodes.
- Reúna o contexto com o template de handoff (pergunta de negócio, datasets, granularidade, restrições).
- Preencha o bloco “Requisitos → SQL” destacando joins, filtros e dúvidas em aberto.
- Complete a matriz de impacto para mapear cada coluna de saída a um alias do catálogo ou nova fonte.
- Descreva o cenário de validação para que a engenharia saiba como certificar o resultado.
O documento também mostra como o esboço SQL vira um node Orion e como registrá-lo no pipeline builder.