Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

10 Kafka Interview Questions That Recently Asked

10 Kafka Interview Questions That Recently Asked

Kafka Interview Questions

Here're ten interview questions that were asked during Kafka's interview.  These are useful to update your knowledge.


1. What is Kafka?

Kafka is a framework of Publisher and Subscribe. It reads messages from the Producer and allows them to read by Subscribers. It keeps store all the producer messages in the form of topics (underlying partitions). It also maintains logs.


2. What is a Consumer group?

Each consumer is part of some Consumer group. By adding more consumers to a Consumer group, you can balance the load. In general, the Consumer group reads data from the same topic. The number of partitions in a Topic always should be the same as Consumers in a particular CG (consumer group).


3. What is Fault-Tolerance?

Each partition is replicated on multiple servers. So, when one partition is failed, the other backup will deliver. So this concept is called Fault-tolerance.


4. Can we decrease the partitions that we created?

No, you can't decrease the partitions once created. But, you can increase the partitions.


5. What is the architecture of Kafka?

The architecture is a combination of Producer, Broker, Subscriber, and Zookeeper. It can handle messages from multiple producers. It can have multiple Brokers (Sometimes it is called Kafka Broker). Zoooker oversees the Kafka cluster and has information about consumer's messages.


6. How to start Kafka Broker?

In Linux environments, you can start using $ bin/kafka-server-start.sh config/server-1.properties

$ bin/kafka-server-start.sh config/server-2.properties

So, you start Kafka server using different Server properties. Here Server-1, Server-2, and so on.


7. What is Leader Balancing in Kafka?

A partition in a Broker acts as a leader. The partitions of replicas are followers of this leader. In case of failure, the followers act as leads and deliver messages to consumers. This is called Leader balancing.


8. What is the real use of Broker?

The Broker's main functionality is to handle the storage of messages in topics.


9. What are the two main functions of Zookeeper?

  • Oversee the function of the Kafka cluster (all the nodes)
  • It commits each offset after reading by the consumer. So, in case of Consumer failure, with the help of Zooker, the consumer starts reading from the next offset (after it recovered from failure)

 10. What is the Retention period?

The amount of Time that Kafka stores messages in Topics are called the retention period. There are two types of retentions - Time-based and Storage Based


References

Comments

Post a Comment

Thanks for your message. We will get back you.

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)