Thursday, June 01, 2023

Python : Reading CSV Files from Azure File Share Using Python

Introduction:

In today's data-driven world, handling and analyzing data is a crucial task for businesses and developers alike. Azure File Share provides a scalable and reliable storage solution in the cloud. Python, being a popular programming language for data processing and analysis, can be leveraged to read CSV files from Azure File Share effortlessly. In this article, we will explore step-by-step instructions to read CSV files stored in Azure File Share using Python. We had requirement in our science related project where we need to process some part of our data in Python web app. As all scientist were comfortable with python we need to process the file in python.

Prerequisites:

  1. An Azure account with an active subscription.
  2. An Azure File Share with CSV files uploaded.
  3. Python installed on your local machine or Docker.

Step 1: Install Azure Storage File Share SDK

To interact with Azure File Share in Python, we need to install the Azure Storage File Share SDK. Open a terminal or command prompt and run the following command:

pip install azure-storage-file-share

Step 2: Import Required Libraries

Create a Python script and start by importing the necessary libraries:

from azure.storage.fileshare import ShareClient, ShareFileClient
import pandas as pd
import numpy as np

Step 3: Set Up Azure File Share Connection

Retrieve the connection string for your Azure Storage account from the Azure Portal. Replace 'your_connection_string' with your actual connection string:

connection_string = 'your_connection_string'
file_share_name = 'your_file_share_name'
file_path = '00-TestFiles/mytestfile.csv'

Step 4: Read CSV File from Azure File Share

Next, we will define a function to read the content of CSV file from the Azure File Share:
The below example read the file content into variable file_content. You need to write code to deal with it.

def read_csv_from_azure_file_share(file_path):
    share_client = ShareClient.from_connection_string(connection_string, file_share_name)
    with share_client.get_directory_client('your_directory_name') as directory_client:
        with directory_client.get_file_client(file_path) as file_client:
            file_content = file_client.download_file().readall().decode('utf-8')
    return file_content

Step 5: Process CSV Data

Now that we have the CSV file content in a string, we can use the Pandas library to process and analyze the data:

def process_csv_data(csv_content):
    df = pd.read_csv(pd.compat.StringIO(csv_content))
    # Your data processing and analysis code here
    return df

Sometimes you need particular column to be imported in numpy array. You can use below code to do so.

def read_csv_from_azure_file_share(file_path):
    share_client = ShareFileClient.from_connection_string(connection_string, file_share_name, file_path)
    DEST_FILE = "downloadedfiles/downloadedfile.csv"
    with open(DEST_FILE , "wb") as file_handle:
        stream = file_client.download_file()
        file_handle.write(stream.readall())
        array = np.loadtxt(DEST_FILE, delimiter=",", usecols=(0), skiprows=0)    
    return array

Step 6: Putting It All Together

Call the functions to read and process the CSV data:

if __name__ == '__main__':
    file_path = 'your_file_name.csv'
    csv_content = read_csv_from_azure_file_share(file_path)
    data_frame = process_csv_data(csv_content)
    print(data_frame.head())  # Display the first few rows of the DataFrame

Conclusion:

In this article, we learned how to read CSV files stored in Azure File Share using Python. By leveraging the Azure Storage File Share SDK and the Pandas library, developers can easily access and process data from Azure File Share, making it a powerful combination for data analysis tasks. Whether you are working with small or large datasets, Python and Azure File Share provide a scalable and efficient solution for handling your data in the cloud.