Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

What is "Learning Algorithm" in Machine Learning

#What is machine learning algorithm
#What is machine learning algorithm
Just a basics on Machine Learning

Alice has just begun taking a course on machine learning. She knows that at the end of the course, she will be expected to have “learned”all about this topic. A common way of gauging whether or not shehas learned is for her teacher, Bob, to give her a exam. She has done well at learning if she does well on the exam.

But what makes a reasonable exam? If Bob spends the entire semester talking about machine learning, and then gives Alice an exam on History of Pottery, then Alice’s performance on this exam will not be representative of her learning. On the other hand, if the (The general supervised approach to machine learning: a learning algorithm reads in training data and computes a learned function f . This function can then automatically label future text examples.)

exam only asks questions that Bob has answered exactly during lectures, then this is also a bad test of Alice’s learning, especially if it’s an “open notes” exam. What is desired is that Alice observes specific examples from the course, and then has to answer new, but related questions on the exam. This tests whether Alice has the ability to  generalize. Generalization is perhaps the most central concept in machine learning. This Machine learning with data science hands-on really useful.

Examples for Machine Learning

As a running concrete example, we will use that of a course recommendation system for undergraduate computer science students. We have a collection of students and a collection of courses. Each student has taken, and evaluated, a subset of the courses. The evaluation is simply a score from −2 (terrible) to +2 (awesome).

The job of the recommender system is to predict how much a particular student (say, Alice) will like a particular course (say, Algorithms).

Given historical data from course ratings (i.e., the past) we are trying to predict unseen ratings (i.e., the future). Now, we could be unfair to this system as well. We could ask it whether Alice is likely to enjoy the History of Pottery course. This is unfair because the system has no idea what History of Pottery even is, and has no prior experience with this course. On the other hand, we could ask it how much Alice will like Artificial Intelligence, which she took last year and rated as +2 (awesome).

We would expect the system to predict that she would really like it, but this isn’t demonstrating that the system has learned: it’s simply recalling its past experience. In the former case, we’re expecting the system to generalize beyond its experience, which is unfair. In the latter case, we’re not expecting it to generalize at all.

This general set up of predicting the future based on the past is at the core of most machine learning. The objects that our algorithm will make predictions about are examples. In the recommender system setting, an example would be some particular Student/Course pair (such as Alice/Algorithms). The desired prediction would be the rating that Alice would give to Algorithms.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)