Posts

Showing posts with the label features

Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

5 Killer Quora Answers on Amazon EBS

Image
Amazon Elastic Block Store (Amazon EBS) provides persistent block storage volumes to use with Amazon EC2 instances in the AWS Cloud. Each Amazon EBS volume is automatically replicated within its Availability Zone to protect you from component failure, offering high availability and durability.  Amazon EBS volumes offer the consistent and low-latency performance needed to run your workloads. With Amazon EBS, you can scale your usage up or down within minutes—all while paying a low price for only what you provision.  Amazon EBS Features. Choose between solid-state disk (SSD)-backed or hard disk drive (HDD)-backed volumes that can deliver the performance you need for your most demanding applications. Availability: Each Amazon EBS volume is designed for 99.999% availability and automatically replicates within its Availability Zone to protect your applications from component failure. Encryption: Amazon EBS encryption provides seamless support for data-at-rest and data-in-transit...

Beginner's Tutorial on SaS Visual Analytics

Image
SAS visual analytics is a completely new architecture from SAS. It has the capability to manage large amounts of data and bring it into memory to analyze it, explore it and publish reports.  Although the data amounts are massive — up to 1.1 billion rows of data, the SAS LASR Analytic Server, to use its full name, was designed to be intuitive to users without an advanced degree in computer science. A report from Simply hired. All about SAS analytics Server - The SAS Analytic Server begins with an eight-blade server with 96 processor cores, 768 gigabytes memory and 4.8 terabytes (TB) of disk storage.  The upper end of the reference configurations is 96 blades with 1,152 cores, 9.2 TB memory and 57.6 TB of disk storage, enough disk space to store the entire Library of Congress six times. Where to Learn SAS Visual Analytics Also read:    Modelling with SAS a detailed video course to get instant benefit  The real SAS Visual analytics benefi...

5 Best Features and Development Model for Agile

Image
In Agile development model each sprint has phases of requirements, design, development and testing. In the development phase, development team concentrates on new features to be developed and unit testing around it. But it misses on the regression of existing working functionality. This leads to defect seepage from Development phase to Test phase. Consequence of it is late defect identification, reporting, fixing and re-verification of the defect. This defect cycle continues till it is fixed. Drawback of Agile The drawback of this approach is project teams put in extra effort of identifying defects and reporting. Late identification of defects also leads to risk of schedule slippage. Here in this article is a mechanism proposed to minimize defect leakage from development phase to testing phase in agile software development life cycle by moving the regressed automated scripts from testing phase to development phase. Related: Agile+Developer+Jobs The advantages of this approac...

Top features of Apache Avro in Hadoop eco-System

Image
Avro defines a data format designed to support data-intensive applications, and provides support for this format in a variety of programming languages. The Hadoop ecosystem includes a new  binary data serialization system  — Avro.  Avro provides: ·       Rich data structures. ·          A compact, fast, binary data format. ·          A container file, to store persistent data. ·          Remote procedure call (RPC). ·         Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages. Its functionality is similar to the other marshaling systems such as Thrift, Protocol Buffers, and so on. The main differentiators of Avro...

Essential features of Hadoop Data joins (1 of 2)

Limitation of map side joining:   A record being processed by a mapper may be joined with a record not easily accessible (or even located) by that mapper. This is the main limitation. Who will facilitate map side join: Hadoop's apache.hadoop.mapred.join package contains helper classes to facilitate this map side join. What is joining data in Hadoop: You will come across, you need to analyze data from multiple sources, this scenario Hadoop follows data joining. In the case database world, joining of two or more tables is called joining. In Hadoop joining data involved different approaches. Approaches: Reduce side join Replicated joins using a Distributed cache Semijoin-Reduce side join with map side filtering What is the functionality of Map-reduce job: The traditional MapReduce job reads a set of input data, performs some transformations in the map phase, sorts the results, performs another transformation in the reduce phase, and writes a set of output data. The...

5 Top features of Columnar Databases (1 of 2 )

The traditional RDBMS - Since the days of punch cards and magnetic tapes, files have been physically contiguous bytes that are accessed from start (open file) to finish (end-of-file flag = TRUE). Yes, the storage could be split up on a disk and the assorted data pages connected by pointer chains, but it is still the same model. Then the file is broken into records (more physically contiguous bytes), and records are broken into fields (still more physically contiguous bytes). A file is processed in record by record (read/fetch next) or sequentially navigated in terms of a physical storage location (go to end of file, go back/forward n records, follow a pointer chain, etc.). There is no parallelism in this model. There is also an assumption of a physical ordering of the records within the file and an ordering of fields within the records. A lot of time and resources have been spent sorting records to make this access practical; you did not do random access on a magnetic tape and you co...

5 Top features of MongoDB

Image
The most important of the philosophies that underpin MongoDB is the notion that one size does not fit all. For many years, traditional SQL databases (MongoDB is a document-orientated database) have been used for storing content of all types. It didn't matter whether the data was a good fit for the relational model (which is used in all RDBMS databases, such as MySQL, PostgresSQL, SQLite, Oracle, MS SQL Server, and so on). The data was stuffed in there, anyway. Purpose Part of the reason for this is that, generally speaking, i t's much easier (and more secure) to read and write to a database than it is to write to a file system. If you pick up any book that teaches PHP (such as PHP for Absolute Beginners (Apress, 2009)) by Jason Lengstorf, you'll probably find that almost right away the database is used to store information, not the file system.  It's just so much easier to do things that way. And while using a database as a storage bin works, developers always...