While learning and practicing Databricks, we may have our files in DBFS. But practically we will be fetching our source file/data from any of the OnPrem/Cloud storage.
If we have our data in Azure blob storage and that needs to get processed means, first we need to mount our data to DBFS.
DBFS stands for Databricks File System. DBFS provides for the cloud what the Hadoop File System (HDFS) provides for local.
The below steps can lead us to mount our Azure Blob Storage data to DBFS.
1. Create a Key vault and generate a secret to mount ADLS in databricks
In storage account Access Keys, copy any of the key1 or key2
![Mounting Azure Blob Storage to DBFS]()
2 . Go to Azure key Vaults -> Secrets -> Generate/Import
![Mounting Azure Blob Storage to DBFS]()
Paste the copied key from Access policies in the storage account and click Create
![Mounting Azure Blob Storage to DBFS]()
Give a name for your secret and paste the key that you copied
![Mounting Azure Blob Storage to DBFS]()
Secret has been created
3. Create a secrete scope in databricks
Go to https://<databricks-instance
>
#secrets/createScope
.
This URL is case sensitive, scope in createScope
must be uppercase
![Mounting Azure Blob Storage to DBFS]()
![Mounting Azure Blob Storage to DBFS]()
The properties(DNS Name & Resource ID) will be available in the Properties tab of an Azure Key Vault in the Azure portal.
After setting up the above things, Now we can create a new Databrciks Notebook and Mount our Blob to our DBFS using the below code.
dbutils.fs.mount(source = “wasbs://<container-name>@<storage-account-name>.blob.core.windows.net”,mount_point =“/mnt/<mount-name>”,extra_configs = {“<conf-key>”:dbutils.secrets.get(scope = “<scope-name>”, key = “<key-name>”)})
- <conf-key> can be either fs.azure.account.key.<storage-account-name>.blob.core.windows.net or fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net
- dbutils.secrets.get(scope = “<scope-name>”, key = “<key-name>”) gets the key that has been stored as a secret in a secret scope.