Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

How to Deal With Missing Data: Pandas Fillna() and Dropna()

Here are the best examples of Pandas fillna(), dropna() and sum() methods. We have explained the process in two steps - Counting and Replacing the Null values.


Check and Replace Column Nulls


Count Nulls

## count null values column-wise

null_counts = df.isnull().sum()


print(null_counts)

```


Output:

```

Column1    1

Column2    1

Column3    5

dtype: int64

```

In the above code, we first create a sample Pandas DataFrame `df` with some null values. Then, we use the `isnull()` function to create a DataFrame of the same shape as `df`, where each element is a boolean value indicating whether that element is null or not. Finally, we use the `sum()` function to count the number of null values in each column of the resulting DataFrame. The output shows the count of null values column-wise. to count null values column-wise:


```

df.isnull().sum()

```


##Code snippet to count null values row-wise:


```

df.isnull().sum(axis=1)

```


In the above code, `df` is the Pandas DataFrame for which you want to count the null values. The `isnull()` function returns a DataFrame with the same shape as `df`, where each element is a boolean value indicating whether that element is null or not. 

The `sum()` function is then applied to the resulting DataFrame to count the number of null values.

Fill null values with zeros in Pandas


```

import pandas as pd


# create a sample dataframe

data = {'Column1': [1, 2, 3, 4, None],

        'Column2': ['A', 'B', None, 'C', 'D'],

        'Column3': [None, None, None, None, None]}

df = pd.DataFrame(data)


Fill Nulls

To fill null values with '0' in Pandas DataFrame, you can use the `fillna()` function. Here's an example code snippet to do this:


```

import pandas as pd


# create a sample dataframe

data = {'Column1': [1, 2, 3, 4, None],

        'Column2': ['A', 'B', None, 'C', 'D'],

        'Column3': [None, None, None, None, None]}

df = pd.DataFrame(data)


# fill null values with 0

df.fillna(0, inplace=True)


print(df)

```


Output:


```

   Column1 Column2  Column3

0      1.0      A      0.0

1      2.0      B      0.0

2      3.0      0      0.0

3      4.0      C      0.0

4      0.0      D      0.0

```

In the above code, we first create a sample Pandas DataFrame `df` with some null values. Then we use the `fillna()` function to replace all null values in the DataFrame with '0'. The `inplace=True` parameter ensures that the original DataFrame is modified and not a copy. Finally, we print the modified DataFrame with null values filled with '0'.


Note that the `axis` parameter is set to 0 by default in the `sum()` function, which means that it counts null values column-wise. To count null values row-wise, you need to set `axis` to 1.


Drop Nulls


df.dropna() 

It drops rows with any columns having the Nulls.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)