14 Top Data Pipeline Key Terms Explained
Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...
Hello Srini
ReplyDeleteJust read your article on vault -v- vaultless, this question can only be answered depending on the vault itself - was it built to be scalable? Does it store every transaction? Quite simply no it does not, but like i say it all depends on how the vault was built. Is it more secure than vaultless - definately.
Vault-less is reversible security method that replaces sensitive data with fake data that looks and feels just like the real thing. So vault-less is advanced than Vault.
Delete