1. Configure credentials via CLI
orion databricks-config-orion
When prompted, provide:
- server-hostname: Workspace address.
- http-path: SQL warehouse path.
- access-token: Personal token.
- catalog (optional) and schema (optional).
Configuration is saved in ~/.orion/databricks.yml.
Alternative: Environment variables
export DATABRICKS_SERVER_HOSTNAME="workspace.cloud.databricks.com"
export DATABRICKS_HTTP_PATH="/sql/1.0/warehouses/abc123def456"
export DATABRICKS_ACCESS_TOKEN="dapi..."
export DATABRICKS_CATALOG="hive_metastore"
export DATABRICKS_SCHEMA="default"
2. Declare sources in catalog.yml
# Load from a table
clientes_raw:
type: databricks
table: clientes
# Load with query
clientes_ativos:
type: databricks
query: "SELECT * FROM clientes WHERE status = 'ativo'"
# Save to Databricks
clientes_processados:
type: databricks
table: clientes_processados
mode: overwrite
Advanced options
clientes_analytics:
type: databricks
table: clientes
catalog: analytics_catalog
schema: curated
3. Use in nodes
def extract(context: OrionContext):
df = context.catalog.load("clientes_raw")
context.logger.info(f"Loaded {len(df)} records")
return df
def load(context: OrionContext, df):
context.catalog.save("clientes_processados", df)
context.logger.info(f"Saved {len(df)} records")
Security
- Never version the file
~/.orion/databricks.yml. - Ensure restricted permissions:
chmod 600 ~/.orion/databricks.yml. - Add
~/.orion/to.gitignore.
1. Configurar credenciais via CLI
orion databricks-config-orion
Informe quando solicitado:
- server-hostname: Endereço do workspace.
- http-path: Caminho do SQL warehouse.
- access-token: Token pessoal.
- catalog (opcional) e schema (opcional).
A configuração é salva em ~/.orion/databricks.yml.
Alternativa: Variáveis de ambiente
export DATABRICKS_SERVER_HOSTNAME="workspace.cloud.databricks.com"
export DATABRICKS_HTTP_PATH="/sql/1.0/warehouses/abc123def456"
export DATABRICKS_ACCESS_TOKEN="dapi..."
export DATABRICKS_CATALOG="hive_metastore"
export DATABRICKS_SCHEMA="default"
2. Declarar fontes no catalog.yml
# Carregar de uma tabela
clientes_raw:
type: databricks
table: clientes
# Carregar com query
clientes_ativos:
type: databricks
query: "SELECT * FROM clientes WHERE status = 'ativo'"
# Salvar no Databricks
clientes_processados:
type: databricks
table: clientes_processados
mode: overwrite
Opções avançadas
clientes_analytics:
type: databricks
table: clientes
catalog: analytics_catalog
schema: curated
3. Utilizar em nodes
def extract(context: OrionContext):
df = context.catalog.load("clientes_raw")
context.logger.info(f"Carregados {len(df)} registros")
return df
def load(context: OrionContext, df):
context.catalog.save("clientes_processados", df)
context.logger.info(f"Salvos {len(df)} registros")
Segurança
- Não versione o arquivo
~/.orion/databricks.yml. - Garanta permissões restritas:
chmod 600 ~/.orion/databricks.yml. - Adicione
~/.orion/ao.gitignore.