Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

SAN: Real Architecture Explained

A SAN is connected behind the servers. SANs provide block-level access to shared data storage. Block level access refers to the specific blocks of data on a storage device as opposed to file level access. One file will contain several blocks.
The simplified SAN architecture to understand how data is stored in storage from servers





Storage Area Networks (SANs)
  • SANs provide high availability and robust business continuity for critical data environments. SANs are typically switched fabric architectures using Fibre Channel (FC) for connectivity.
  • The term switched fabric refers to each storage unit being connected to each server via multiple SAN switches also called SAN directors which provide redundancy within the paths to the storage units. This provides additional paths for communications and eliminates one central switch as a single point of failure.
  • Ethernet has many advantages similar to Fibre Channel for supporting SANs. Some of these include high speed, support of a switched fabric topology, widespread interoperability, and a large set of management tools.
  • In a storage network application, the switch is the key element. With the significant number of Gigabit and 10 Gigabit Ethernet ports shipped, leveraging IP and Ethernet for storage is a natural progression for some environments. 
SAN Vs IP
  1. IP was developed as an open standard with complete interoperability of components. Two new IP storage network technologies are Fibre Channel over Ethernet (FCoE) and SCSI over IP (iSCSI). IP communication across a standard IP network via Fibre Channel Tunneling or storage tunneling has the benefit of utilizing storage in locations that may exceed the directly attached limit of nearly 10 km when using fiber as the transport medium.
  2. Internal to the data center, legacy Fibre Channel can also be run over coaxial cable or twisted pair cabling, but at significantly shorter distances.
  3. The incorporation of the IP standard into these storage systems offers performance benefits through speed, greater availability, fault tolerance, and scalability. These solutions, properly implemented, can almost guaranty 100% availability of data. The IP based management protocols also provide network managers with a new set of tools, warnings, and triggers that were proprietary in previous generations of storage technology. Security and encryption solutions are also greatly enhanced. With 10G gaining popularity and the availability of new faster WAN links, these solutions can offer true storage on demand.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)