Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

SAN Storage: All about its 4 Real Usages

The storage area network fundamentals everyone must know you understand about applications. These applications may refer to horizontal applications (e.g., backup, archiving, data replication, disaster protection, and data warehousing) or vertical applications (e.g., online transaction processing (OLTP), enterprise resource planning (ERP) business applications, electronic commerce, broadcasting, prepress, medical, and geophysics).

SAN is also well suited to making performance and high availability more scalable and more affordable in applications such as clustering and data sharing. This article discusses two major horizontal applications, backup and data sharing, and how they interact with SAN. The other important point is, if you are a job seeker the below list is helpful. This is just a like a one time SAN interviews refresher. So you can do well in interviews.


1. Realtime (or window-less) backup

The importance of window-less backup (also called hot backup) becomes obvious when it addresses the large volume of data in a SAN centralized backup library. Realtime backup essentially lets you back up a volume or file periodically and automatically without affecting normal system operations.


The technique commonly used is called a snapshot, where you make a copy of the volume needing backup, and then back up the copy while accessing and modifying the original volume in normal operations. Network Integrity leads in development, and EMC and HDS have implemented solutions in currently available products. Major providers of total backup solutions include ADIC, ATL, StorageTek, Hewlett-Packard (HP), Exabyte, and Overland.


2. Resource sharing

A storage subsystem attached to multiple computer platforms is divided into partitions, each partition being accessible only to its owning platform or to a certain number of homogeneous platforms. The administrator can reassign storage capacity to different platforms as needs change.


One of the benefits of SAN connectivity is its ability to share resources (e.g., a large tape library) among multiple backup servers. Such sharing enables administrators to consolidate backups-from many different servers to locally attached tape drives-into one tape library.


3. Dynamic resource sharing

All storage is available to any connected host; hosts are allocated storage as they need it. If one host needs the storage, it can use any or all the available space. If a host deletes a file, that space is available to any other host. This dynamic storage sharing operates automatically and transparently. Dynamic resource sharing means that the systems administrator doesn't have to partition the storage before storing the data.

Data copy sharing: This process involves replication of the data. Data is the same across copies at the time of copy creation, but the copies can change independently afterward. There is no assurance that they will remain identical. Data access is usually prevented during replication so the copy accurately reflects all the data at a particular time.


For large amounts of data, the time needed to copy it may be important, , and the amount of storage necessary to store the copy could be very large. SAN facilitates data-copy sharing by allowing high-bandwidth connections to transfer large volumes of data.

4. True data sharing

If you are sharing data without making a copy, multiple computer platforms can access the same physical instance of the recorded data on a storage subsystem. This type of sharing is called true data sharing. Different levels of performance and complexity exist in implementing true data sharing:

The first level is when heterogeneous platforms can access data, but only the original data owner can modify it.

The second level is when multiple heterogeneous platforms can update and rewrite a data item, but only one at a time. In this case, you must use a locking mechanism to momentarily prevent a platform from updating the data.

The third level is called concurrent data sharing and exists when all platforms can either read or update the data at the same time.

The advantages of true data sharing are numerous. With only one copy of data, you never need to replicate the data for use elsewhere, you simplify data maintenance, and you eliminate problems due to out of sync conditions. True Data Sharing among platforms running heterogeneous operating systems requires translating to one common operating system. Examples of vendors offering implementations of true data sharing in a SAN architecture are Sequent, Mercury Computer Systems, DataDirect, Transoft, Retrieve, and Network Disk.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)