Work with external tables
Unity Catalog governs data access permissions for external data for all queries that go through Unity Catalog but does not manage data lifecycle, optimizations, storage location, or layout.
External tables store data in a directory in cloud object storage in your cloud tenant. You must specify a storage location when you define an external table. When you drop an external table, the data files are not deleted.
When you create an external table, you can either register an existing directory of data files as a table or provide a path to create a new directory that stores the data files that back the table.
Databricks recommends using external tables for the following use cases:
- You need to register a table backed by existing data that is not compatible with Unity Catalog managed tables.
- You also require direct access to the data from non-Databricks clients that do not support other external access patterns. See Access Databricks data using external systems.
Unity Catalog privileges are not enforced when users access data files from external systems.
This article focuses on Unity Catalog external tables. External tables in the legacy Hive metastore have different behaviors. See Database objects in the legacy Hive metastore.
Important
If you update external table metadata using a non-Databricks client or using path-based access from within Databricks, that metadata does not automatically sync state with Unity Catalog. Databricks recommends against such metadata updates, but if you do perform one, you must run MSCK REPAIR TABLE <table-name> SYNC METADATA
to bring the schema in Unity Catalog up to date. See REPAIR TABLE.
File formats for external tables
External tables can use the following file formats:
- DELTA
- CSV
- JSON
- AVRO
- PARQUET
- ORC
- TEXT
Create an external table
To create an external table, can use SQL commands or Dataframe write operations.
Before you begin
To create an external table, you must meet the following permission requirements:
- The
CREATE EXTERNAL TABLE
privilege on an external location that grants access to theLOCATION
accessed by the external table. - The
USE SCHEMA
permission on the table’s parent schema. - The
USE CATALOG
permission on the table’s parent catalog. - The
CREATE TABLE
permission on the table’s parent schema.
For more information about configuring external locations, see Create an external location to connect cloud storage to Azure Databricks.
SQL command examples
Use one of the following command examples in a notebook or the SQL query editor to create an external table.
In the following examples, replace the placeholder values:
<catalog>
: The name of the catalog that will contain the table.<schema>
: The name of the schema that will contain the table.<table-name>
: A name for the table.<column-specification>
: The name and data type for each column.<bucket-path>
: The path to the cloud storage bucket where the table will be created.<table-directory>
: A directory where the table will be created. Use a unique directory for each table.
CREATE TABLE <catalog>.<schema>.<table-name>
(
<column-specification>
)
LOCATION 'abfss://<bucket-path>/<table-directory>';
For more information about table creation parameters, see CREATE TABLE.
Dataframe write operations
Many users create external tables from query results or DataFrame write operations. The following articles demonstrate some of the many patterns you can use to create an external table on Azure Databricks:
Drop an external table
To drop a table you must be its owner or have the MANAGE
privilege on the table. To drop an external table, run the following SQL command:
DROP TABLE IF EXISTS catalog_name.schema_name.table_name;
Unity Catalog does not delete the underlying data in cloud storage when you drop an external table. You must directly delete the underlying data files if you need to remove data associated with the table.
Example notebook: Create external tables
You can use the following example notebook to create a catalog, schema, and external table, and to manage permissions on them.