Posts

Showing posts with the label Spark

Featured Post

15 Python Tips : How to Write Code Effectively

Image
 Here are some Python tips to keep in mind that will help you write clean, efficient, and bug-free code.     Python Tips for Effective Coding 1. Code Readability and PEP 8  Always aim for clean and readable code by following PEP 8 guidelines.  Use meaningful variable names, avoid excessively long lines (stick to 79 characters), and organize imports properly. 2. Use List Comprehensions List comprehensions are concise and often faster than regular for-loops. Example: squares = [x**2 for x in range(10)] instead of creating an empty list and appending each square value. 3. Take Advantage of Python’s Built-in Libraries  Libraries like itertools, collections, math, and datetime provide powerful functions and data structures that can simplify your code.   For example, collections.Counter can quickly count elements in a list, and itertools.chain can flatten nested lists. 4. Use enumerate Instead of Range     When you need both the index and the value in a loop, enumerate is a more Pyth

Spark SQL Query how to write it in Ten steps

Image
Spark SQL example The post tells how to write SQL query in Spark and explained in ten steps.This example demonstrates how to use sqlContext.sql to create and load two tables and select rows from the tables into two DataFrames. The next steps use the DataFrame API to filter the rows for salaries greater than 150,000 from one of the tables and shows the resulting DataFrame. Then the two DataFrames are joined to create a third DataFrame. Finally the new DataFrame is saved to a Hive table. 1. At the command line, copy the Hue sample_07 and sample_08 CSV files to HDFS: $ hdfs dfs -put HUE_HOME/apps/beeswax/data/sample_07.csv /user/hdfs $ hdfs dfs -put HUE_HOME/apps/beeswax/data/sample_08.csv /user/hdfs where HUE_HOME defaultsto /opt/cloudera/parcels/CDH/lib/hue (parcel installation) or /usr/lib/hue (package installation). 2. Start spark-shell: $ spark-shell 3. Create Hive tables sample_07 and sample_08: scala> sqlContext.sql("CREATE TABLE sample_07 (code string

SPARK is Replacement for MapReduce in Bigdata Real Analytics!

Image
Apache Spark is among the Hadoop ecosystem technologies acting as catalysts for broader adoption of big data infrastructure. Now, Looker -- a vendor of business intelligence software -- has announced support for Spark and other Hadoop technologies. The goal? To speed up access to the data that fuels business decision making. SPARK Jobs Hadoop's arrival on the scene 10 years ago may have started the big data revolution, but only recently did adoption of this technology begin spreading to a wider audience. Apache Spark is one of the catalysts for the growing adoption rates. Spark can be used as a replacement for MapReduce, a component of Hadoop implementations, to speed up the processing and analytics of big data by 100x in memory, according to the Apache Software Foundation. In today's business environment, in which real-time analytics is the goal and organizations don't want to wait for data warehouses and analysts to provide batch intelligence back to business u

Hot Skills: Spark Self Study Materials

Image
Spark: With job postings up 120% year-over-year on Dice, demand for this open-source cluster-computing framework is broad-based. Government contractors and financial-services firms are just a few of the groups eager to find candidates with this skillset. 2015 Average Salary: $113,214 Related: SPARK Self Study Materials Spark Big Data and Cloud:  As companies expand their tech infrastructures, they need cloud and Big Data services such as Azure (#2), Hive (#8), and Cassandra (#9) for data storage, analysis, and security. Big Data and cloud-related skills dominated the Highest-Paid Skills list on Dice’s salary survey for the second straight year.  2015 Average Salary: Big Data—$121,328 Azure — $110,207 Salesforce: This customer-service platform serves as the bedrock for many companies’ customer service departments. Demand for Salesforce professionals seems unlikely to decline anytime soon. Employers are even willing to offer telecommuting options to lure Salesforce talent. 2