Introduction
Databricks Genie is a conversational AI assistant powered by Azure OpenAI, seamlessly integrated into Databricks to bridge the gap between data engineering, analytics, and business intelligence. By harnessing the power of Azure OpenAI models, Genie enables Data Analysts and Business Users to,
- Ask natural language questions and receive data-driven insights.
- Generate SQL queries and visualizations without coding.
- Leverage AI to automate data exploration and recommendations.
In this article, we will explore what Databricks Genie is, how it works, and a hands-on tutorial on setting up Genie using your own data.
What is Databricks Genie?
Databricks Genie empowers the technical and non-technical users can explore data, generate insights, and automate analysis in an intuitive, conversational way—without writing a single line of code. The main capabilities are,
- Conversational AI for BI: Users can query data using plain English, eliminating the need for deep SQL knowledge.
- Automated Data Exploration: Genie suggests relevant insights, trends, and correlations.
- Seamless Integration with Databricks: Works directly within Databricks Workspaces and connects with Lakehouse.
- Smart Visualizations: Generates charts, graphs, and dashboards dynamically based on user queries.
- SQL Generation & Execution: Automatically creates, runs, and refines SQL queries, allowing users to analyze data with minimal effort.
The user can talk to Genie by asking a question in natural language. The genie interprets questions, analyze them, converts them into SQL, executes them. This result is shown as a tabular result or a visualization.
Tutorial: How to use Genie in Azure Databricks
In this tutorial, we'll walk-through the two essential steps to start asking Genie questions. The first step is to prepare the data for the Genie and the second step is to setup the Genie for interaction.
Step 1. Prepare your dataset and add metadata.
- Download the source code attached to this article. In the source code, you will find three data files. They contain the sample customers and invoice data.
- sample_customers_data.csv
- sample_invoiceLines_data.csv
- sample_invoices_data.csv
- Upload each file using the "Upload the file" UI in Azure Databricks (link).
![Azure Databricks]()
- Select 'main' for Catalog and 'default' for Schema. Click 'Create table' to complete the process.
![Create table]()
- Now, the dataset is uploaded into Databricks and we can see it in the Catalog Explorer. The next step is to provide a description of the table and columns. Instead of manually creating the description, Databricks provides AI generated descriptions for us to use. For the table description, click 'AI Suggested Description' and for the column description click on 'AI generate' next to the column table.
![AI generate]()
- Accept the recommended Column comment by clicking on the 'checkmark' of each column.
![Checkmark]()
The Genie uses the dataset description, column name, and column comment to better understand the table relationships and how the data should interact. Repeat the steps above for the remaining two datasets before moving to the next step.
Step 2. Set up Genie using our data.
- On the main menu, click on Genie under SQL and the '+ New' button. This will bring up the New 'Genie spaces'. We are going to provide the following information.
- Title: sample_sales_data_genie
- Description: This genie has access to the customers, invoices, and invoice line data. The genie can provide specific information regarding the invoices, the revenue and customer related inquires.
- Default warehouse: Serverless Starter Warehouse
- Tables: Select the 3 sample tables in the 'main' catalog and 'default' schema.
![Default]()
- After clicking save, your sample_sales_data_genie is created. The next step, we will click on 'Start Warehouse' to start asking questions.
- Question 1: Tell me about what data you have access to.
![Start Warehouse]()
- Question 2: Which customer is responsible for generating the most revenue in Q1, 2024? I want the customer's name and total revenue.
![Total revenue]()
- We can click on the 'Show code' and we can get the SQL generated by the Genie.
![Show code]()
- Question 3: Can you create a pie chart showing each monthly spent for Customer 'Tailspin Toys (Rothsville, PA)' for Q1 2024?
![Pie chart]()
The three questions above shows that the genie understands our data and creates insights and visualizations. I highly recommend you try it with you own questions and test what the genie can do for you!
References
Summary
Databricks Genie, powered by Azure OpenAI, enables self-service analytics, and conversational data exploration for business and data professionals. To ensure optimal results, consider these key best practices.
- Store data in tables: Provide better querying performance and compatibility with Genie.
- Refine Natural Language Queries: Phrase your questions more clearly and explicitly. For example, instead of "Show me sales", try "Show me total monthly sales by product category in 2023."
- Validate AI-Generated SQL: Review and refine Genie’s SQL outputs for complex queries for performance/efficiency.
- Integrate with BI Tools: Use Genie to generate insights quickly and seamlessly export data to Power BI for advanced visualization.
- Ensure Data Quality: Maintain clean, well-labeled datasets so Genie can produce the most relevant and trustworthy results.
By following these best practices, you can maximize the impact of Azure Databricks Genie and empower business users with AI-driven analytics—making smarter, faster decisions a reality.
Happy Learning!