Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

How to Write ETL Logic in Python: Sample Code to Practice

Here's an example Python code that uses the mysql-connector library to connect to a MySQL database, extract data from a table, transform it, and load it as a JSON file. Here's an example:







Python ETL Sample Code


import mysql.connector

import json


# Connect to the MySQL database

cnx = mysql.connector.connect(user='username', password='password',

                              host='localhost',

                              database='database_name')


# Define a cursor to execute SQL queries

cursor = cnx.cursor()


# Define the SQL query to extract data

query = ("SELECT column1, column2, column3 FROM table_name")


# Execute the SQL query

cursor.execute(query)


# Fetch all rows from the result set

rows = cursor.fetchall()


# Transform the rows into a list of dictionaries

result = []

for row in rows:

    result.append({'column1': row[0], 'column2': row[1], 'column3': row[2]})


# Save the result as a JSON file

with open('output.json', 'w') as outfile:

    json.dump(result, outfile)


# Close the cursor and database connection

cursor.close()

cnx.close()

In this example, you will need to replace username, password, localhost, database_name, table_name, column1, column2, and column3 with the appropriate values for your MySQL database and table. 


The code will extract the data from the specified table, transform it into a list of dictionaries, and save it as a JSON file named output.json.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)