تشغيل سير عمل CI/CD باستخدام مجموعة أصول Databricks وإجراءات GitHub

مقالة
08/08/2024

توضح هذه المقالة كيفية تشغيل سير عمل CI/CD (التكامل المستمر/النشر المستمر) في GitHub مع إجراءات GitHub وحزمة أصول Databricks. راجع ما هي حزم أصول Databricks؟

يمكنك استخدام إجراءات GitHub جنبا إلى جنب مع أوامر Databricks CLI bundle لأتمتة مهام سير عمل CI/CD وتخصيصها وتشغيلها من داخل مستودعات GitHub.

يمكنك إضافة ملفات YAML لإجراءات GitHub مثل ما يلي إلى دليل المستودع .github/workflows الخاص بك. يقوم ملف YAML لإجراءات GitHub التالية بالتحقق من صحة المهمة المحددة ونشرها وتشغيلها في المجموعة ضمن هدف ما قبل الإنتاج يسمى "qa" كما هو محدد داخل ملف تكوين المجموعة. يعتمد ملف YAML لإجراءات GitHub هذا على ما يلي:

ملف تكوين مجموعة في جذر المستودع، والذي يتم الإعلان عنه صراحة من خلال إعداد working-directory: . ملف YAML لإجراءات GitHub (يمكن حذف هذا الإعداد إذا كان ملف تكوين الحزمة موجودا بالفعل في جذر المستودع.) يعرف ملف تكوين المجموعة هذا سير عمل Azure Databricks المسمى my-job والهدف المسمى qa. راجع تكوين مجموعة أصول Databricks.
سر GitHub المسمى SP_TOKEN، يمثل الرمز المميز للوصول إلى Azure Databricks لكيان خدمة Azure Databricks المقترن بمساحة عمل Azure Databricks التي يتم نشر هذه الحزمة وتشغيلها. راجع البيانات السرية المشفرة.

# This workflow validates, deploys, and runs the specified bundle
# within a pre-production target named "qa".
name: 'QA deployment'

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is opened against the repo's
# main branch or an existing pull request's head branch is updated.
on:
  pull_request:
    types:
      - opened
      - synchronize
    branches:
      - main

jobs:
  # Used by the "pipeline_update" job to deploy the bundle.
  # Bundle validation is automatically performed as part of this deployment.
  # If validation fails, this workflow fails.
  deploy:
    name: 'Deploy bundle'
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "qa" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: 'Run pipeline update'
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: qa

يمكن أن يكون ملف GitHub Actions YAML التالي موجودا في نفس المستودع مثل الملف السابق. يتحقق هذا الملف من صحة الحزمة المحددة ونشرها وتشغيلها ضمن هدف إنتاج يسمى "prod" كما هو محدد داخل ملف تكوين المجموعة. يعتمد ملف YAML لإجراءات GitHub هذا على ما يلي:

ملف تكوين مجموعة في جذر المستودع، والذي يتم الإعلان عنه بشكل صريح من خلال إعداد working-directory: . ملف YAML لإجراءات GitHub (يمكن حذف هذا الإعداد إذا كان ملف تكوين الحزمة موجودا بالفعل في جذر المستودع.). يعرف ملف تكوين المجموعة هذا سير عمل Azure Databricks المسمى my-job والهدف المسمى prod. راجع تكوين مجموعة أصول Databricks.
سر GitHub المسمى SP_TOKEN، يمثل الرمز المميز للوصول إلى Azure Databricks لكيان خدمة Azure Databricks المقترن بمساحة عمل Azure Databricks التي يتم نشر هذه الحزمة وتشغيلها. راجع البيانات السرية المشفرة.

# This workflow validates, deploys, and runs the specified bundle
# within a production target named "prod".
name: 'Production deployment'

# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1

# Trigger this workflow whenever a pull request is pushed to the repo's
# main branch.
on:
  push:
    branches:
      - main

jobs:
  deploy:
    name: 'Deploy bundle'
    runs-on: ubuntu-latest

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Download the Databricks CLI.
      # See https://github.com/databricks/setup-cli
      - uses: databricks/setup-cli@main

      # Deploy the bundle to the "prod" target as defined
      # in the bundle's settings file.
      - run: databricks bundle deploy
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

  # Validate, deploy, and then run the bundle.
  pipeline_update:
    name: 'Run pipeline update'
    runs-on: ubuntu-latest

    # Run the "deploy" job first.
    needs:
      - deploy

    steps:
      # Check out this repo, so that this workflow can access it.
      - uses: actions/checkout@v3

      # Use the downloaded Databricks CLI.
      - uses: databricks/setup-cli@main

      # Run the Databricks workflow named "my-job" as defined in the
      # bundle that was just deployed.
      - run: databricks bundle run my-job --refresh-all
        working-directory: .
        env:
          DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
          DATABRICKS_BUNDLE_ENV: prod

مشاركة عبر

تشغيل سير عمل CI/CD باستخدام مجموعة أصول Databricks وإجراءات GitHub

(راجع أيضًا )

الملاحظات

الموارد الإضافية