Creating DataFrames in PySpark Using Fabric Notebook

In this article, I will walk you through how to create a spark DataFrame from a basic Python data structure using arrays. Let's get started

What is a Spark DataFrame?

A Spark DataFrame is an immutable distributed collection of data organized into named columns, conceptually equivalent to a table in a relational database, but it can be created from Python data structures like lists of tuples or dictionaries and is optimized for parallel processing across a cluster to handle large-scale data efficiently.

First, let's see how to.

  • Create Simple Data Structure (list) and Assign to Variable in Fabric Notebook.
  • Display the type of the data structure
  • Create spark dataframe from data object variable name
  • Create spark dataframe from data structure
  • Verify the dataframe by using the type function

To execute all that, I will execute the codes seen in the screenshot below.

Codes

Next, we want to,

  • See the content of the dataframe
  • Alternative way to show the content of the dataframe
  • List of tuples with multiple elements

To achieve that, I executed the code seen in the screenshot below.

Execute

We want to proceed to,

  • Create a dataframe using the data2 variable of the tuple list
  • Specify column names to the list of tuples
    Tuples

After doing that, we want to,

  • Specify data types using SQL-like Data Definition Language (DDL)
  • Store column names in a variable
  • Create another spark dataframe using the data2 and the schema

To achieve that, I executed the code in the screenshot below.

DDL

We proceed to,

  • Use StructType to define the schema of a DataFrame and StructField to represent individual fields in the schema.
  • Create another dataframe
  • Show the content of the new dataframe

So, the code in the screenshot is executed.

DataFrame

Finally, we proceeded to,

  • Useschema attrbute to check the datatype of the data
  • Describe the dataframe
    Datatype

See you in the next article

Up Next
    Ebook Download
    View all
    Learn
    View all