Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

Machine Learning Tutorial - Part:2

Machine learning is a branch of artificial intelligence. Using computing, you will design systems. These systems to behave with AI features, from your end, you need to train them. This process is called Machine Learning. Read my part-1 if you miss it.
machine learning life cycle

The life cycle of machine learning

  • Acquisition - Collect the data 
  • Prepare - Data Cleaning and Quality 
  • Process- Run Machine Tools 
  • Report- Present the Results

Acquire Data

You can acquire data from many sources; it might be data that are held by your organization or open data from the Internet. There might be one data set, or there could be ten or more.

Cleaning of Data

You must come to accept that data will need to be cleaned and checked for quality before any processing can take place. These processes occur during the prepare phase.

Running Machine Learning Scripts

The processing phase is where the work gets done. The machine learning routines that you have created perform this phase.

Reporting

Finally, the results are presented. Reporting can happen in a variety of ways, such as reinvesting the data back into a data store or reporting the results as a spreadsheet or report.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)