Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

These Tips Helpful to Remove Python List and Dictionary Duplicates

In this post, I have shared top ideas to remove duplicates from the list. Those are with Append and Dictionary.


Python: How to Remove Duplicates From List

1. How to Remove Duplicates with Append

# Here is a list with duplicates

list_with_duplicates = [1,2,3,12,1,2,3,4,5,6,1,2,3,7,8,9]

It is simple if you follow the first-approach - brute force approach:

list_without_duplicates = []

for pd in list_with_duplicates:
  if pd not in list_without_duplicates:
      list_without_duplicates.append(pd)
print(list_without_duplicates)

Result:

[1, 2, 3, 12, 4, 5, 6, 7, 8, 9]

This method has performance issues when the list is bigger in size. 

Real-time.

Idea 1:  Remove Duplicates Using Append.



2. How to Remove Duplicates with Dictionary


# Here is you can convert a list to a dictionary

dict_without_duplicates = dict(zip(list_with_duplicates, list_with_duplicates))
print(dictionary_without_duplicates)

Result:

{1: 1, 2: 2, 3: 3, 12: 12, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}


Real-time.

Idea 2: Remove Duplicates Using Dictionary.


Once again, this works and has the advantage of taking less space than duplicating the entire list. 


Notes: Of course, we still need to convert it back to a list when we did, which will be somewhat painful since we must extract the keys and add them to a list.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)