Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

How to write R Script in simple way

A script is a good way to keep track of what you're doing. If you have a long analysis, and you want to be able to recreate it later, a good idea is to type it into a script. If you're working in the Windows R GUI (also in the Mac R GUI), there is even a built-in script editor.

#How-to-write-RScript:
Photo credit: Srini
To get to it, pull down the File menu and choose New Script (New Document on a Mac). A window will open in which you can type your script. R Script is a series of commands that you can execute at one time and you can save a lot of time. the script is just a plain text file with R commands in it.

How to create an R Script

  1. You can prepare a script in any text editor, such as vim, TextWrangler, or Notepad.
  2. You can also prepare a script in a word processor, like Word, Writer, TextEdit, or WordPad, PROVIDED you save the script in plain text (ASCII) format.
  3. This should (!) append a ".txt" file extension to the file.
  4. Drop the script into your working directory, and then read it into R using the source() function.
  5. Just put the .txt file into your working directory
  6. Now that you've got it in your working directory one way or another, do this in R.
> source(file = "sample_script.txt") # Don't forget those quotes!
A note: This may not have worked. And the reason for that is, your script may not have had the name "sample_script.txt".
if you make sure the file has the correct name, R will read it. If the file is in your working directory, type dir() at the command prompt, and R will show you the full file name.
Also, R does not like spaces in script names, so don't put spaces in your script names! (In newer versions of R, this is no longer an issue.)

What is all about the script you have written

Example:
# A comment: this is a sample script.
y=c(12,15,28,17,18)
x=c(22,39,50,25,18)
mean(y)
mean(x)
plot(x,y)
What happened to the mean of "y" and the mean of "x"?
The script has created the variables "x" and "y" in your workspace (and has erased any old objects you had by that name).
You can see them with the ls( ) function.

Executing a script does everything typing those commands in the Console would do, EXCEPT print things to the Console. Do this.

> x
[1] 22 39 50 25 18
> mean(x)
[1] 30.8
See? It's there. But if you want to be sure a script will print it to the Console, you should use the print() function.
> print(x)
[1] 22 39 50 25 18
> print(mean(x))
[1] 30.8
When you're working in the Console, the print() is understood (implicit) when you type a command or data object name. This is not necessarily so in a script.
  • Hit the Enter key after the last line. Now, in the editor window, pull down the Edit menu and choose Run All. (On a Mac, highlight all the lines of the script and choose Execute.) The script should execute in your R Console.
  • Pull down the File Menu and choose Save As... Give the file a nice name, like "script2.txt". R will NOT save it by default with a file extension, so be sure you give it one. (Note: On my Mac, the script editor in R will not let me save the script with a .txt extension. It insists that I use .R. Fine!) Close the editor window. Now, in the R Console, do this:
> source(file = "script2.txt") # or source(file = "script2.R") if that's how you saved it
The "aov.out" object was created in your workspace. However, nothing was echoed to your Console because you didn't tell it to print().
Go to File and choose New Script (New Document on a Mac). In the script editor, pull down File and choose Open Script... (Open Document... on a Mac). In the Open Script dialog that appears, change Files Of Type to all files (not necessary on a Mac). Then choose to open "script2.txt" (or "script2.R", whatever!). Edit it to look like this.
print(with(PlantGrowth, tapply(weight, group, mean)))
with(PlantGrowth, aov(weight ~ group)) -> aov.out
print(summary.aov(aov.out))
print(summary.lm(aov.out))
Pull down File and choose Save. Close the script editor window(s). And FINALLY...
> source(file = "script2.txt") # or source(file = "script2.R") if necessary
Finally, writing scripts is simple.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)