Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

How to Create UDF in Python Example

In Python,user-defined function usage is to avoid repeated work. The UDFs in Python are not like C/C++/JAVA. I am sharing ideas on how to create UDF in Python.



udf in python

Python Syntax for User defined function(UDF)

Below is the good example on Python UDF.
def function_name(list of parameters): 
"docstring" 
statement(s) 
return(parameter)       

Explanation of each keyword 

  1. The keyword def symbolizes the start of the function header.
  2. A function name to uniquely identify it. Function naming follows the similar rules that are used for writing identifiers
  3. List of parameters also called as a list of arguments through which value is passed to the function. The list of parameters is optional.
  4. A colon (:) to mark the end of function header.
  5. Optional documentation string (docstring) is used to describe the purpose of the function, which is slightly similar to python documentation using comment.
  6. Python statements that perform the intended task for which the user-defined function is made. It is mandatory to maintain the indentation level while writing python statements in the function definition.
  7. In the end, an optional return statement is used to return a value (result) from the function. This statement can contain an optional parameter to return the computed result back to the function call. If there is no parameter in the statement or the return statement is not mentioned at the end of function definition then the function returns the None object.

Python Vs Other Languages

Python user defined functions
Python is one of the most popular languages in data analytics. There are many other languages that have an option to create UDFs. Even in SQL of any database, you can easily create user-defined functions.

Advantages of Python User defined Function

  • User-defined functions help to decompose a large program into small segments which make the program easy to understand, maintain and debug.
  • If repeated code occurs in a program. The function can be used to include those codes and execute when needed by calling that function.
  • Programmers working on the large project can divide the workload by making different functions.
References

One practical advice

It is always a good idea to name user-defined functions according to the task they perform.

Also, Read

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)