Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

The awesome points to learn from DB2 NoSQL GraphStore

The awesome points to learn from db2 graphstore
 #db2 graphstore:
One best example, prior to understanding the RDF format for Graph data modelIf the graph data model is the model the semantic web uses to store data, RDF is the format in which it is written. 


Summary of DB2 Graph Store:
  • DB2-RDF support is officially called "NoSQL Graph Support".  
  • The API extends the Jena API (Graph layer).  Developers familiar with Jena TDB will have the Model layer capabilities they are accustomed to.
  • Although the DB2-RDF functionality is being released with DB2 LUW 10.1, it is also compatible with DB2 9.7.
  • Full supports for SPARQL 1.0 and a subset of SPARQL 1.1.  Full SPARQL 1.1 support (which is till a W3C working draft) will be forthcoming.
  • While RDBMS implementations of RDF graphs have typically been non-performant, that is not the case here*.  Some very impressive and innovative work has been put into optimization capabilities.  Out-of-the box performance is comparable with native triple stores, and read/write performance in the optimized schema has been seen to surpass these speeds.
Related: Presentation on DB2 NoSQL Graph Store

What is RDF data model(ref:wiki)

The RDF data model is similar to classical conceptual modeling approaches such as entity–relationship or class diagrams, as it is based upon the idea of making statements about resources (in particular web resources) in the form of subject–predicate–object expressions.  


These expressions are known as triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has", and an object denoting "the color blue". Therefore, RDF swaps object for subject that would be used in the classical notation of an entity–attribute–value model within object-oriented design; Entity (sky), attribute (color) and value (blue). RDF is an abstract model with several serialization formats (i.e., file formats), and so the particular way in which a resource or triple is encoded varies from format to format. 


This mechanism for describing resources is a major component in the W3C's Semantic Web activity: an evolutionary stage of the World Wide Web in which automated software can store, exchange, and use machine-readable information distributed throughout the Web, in turn enabling users to deal with the information with greater efficiency and certainty. 

RDF's simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity. 
A collection of RDF statements intrinsically represents a labeled, directed multi-graph. As such, an RDF-based data model is more naturally suited to certain kinds of knowledge representation than the relational model and other ontological models. However, in practice, RDF data is often persisted in relational database or native representations also called Triplestores, or Quad stores if context (i.e. the named graph) is also persisted for each RDF triple.[3] ShEX, or Shape Expressions,[4] is a language for expressing constraints on RDF graphs. It includes the cardinality constraints from OSLC Resource Shapes and Dublin Core Description Set Profiles as well as logical connectives for disjunction and polymorphism. As RDFS and OWL demonstrate, one can build additional ontology languages upon RDF.

Related:

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)