Posts

Showing posts with the label non-word

Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

How to Find Non-word Character: Python Regex Example

Image
In Python, the regular expression pattern \W matches any non-word character. Here's an example of usage. The valid word characters are [a-zA-Z0-9_]. \W (upper case W) matches any non-word character. Regex examples to find non-word char #1 Example import re text = "Hello, world! How are you today?" non_words = re.findall(r'\W', text) print(non_words) In the above example, the re.findall() function is used to find all non-word characters in the text string using the regular expression pattern \W. The output will be a list of non-word characters found in the string: Output [',', '!', ' ', ' ', '?'] This includes punctuation marks and spaces but excludes letters, digits, and underscores, which are considered word characters in regular expressions. #2 Example import re text = "Hello, world! How are non-word-char:! you today?" non_words = re.findall(r'non-word-char:\W', text) print(non_words) Output ['non-wo...