Posts

Showing posts with the label ETL questions

Featured Post

Top Questions People Ask About Pandas, NumPy, Matplotlib & Scikit-learn — Answered!

Image
 Whether you're a beginner or brushing up on your skills, these are the real-world questions Python learners ask most about key libraries in data science. Let’s dive in! 🐍 🐼 Pandas: Data Manipulation Made Easy 1. How do I handle missing data in a DataFrame? df.fillna( 0 ) # Replace NaNs with 0 df.dropna() # Remove rows with NaNs df.isna(). sum () # Count missing values per column 2. How can I merge or join two DataFrames? pd.merge(df1, df2, on= 'id' , how= 'inner' ) # inner, left, right, outer 3. What is the difference between loc[] and iloc[] ? loc[] uses labels (e.g., column names) iloc[] uses integer positions df.loc[ 0 , 'name' ] # label-based df.iloc[ 0 , 1 ] # index-based 4. How do I group data and perform aggregation? df.groupby( 'category' )[ 'sales' ]. sum () 5. How can I convert a column to datetime format? df[ 'date' ] = pd.to_datetime(df[ 'date' ]) ...

19 Top Unix File Scenario Commands

Image
ETL developers main task is to browse various flat files before they start testing. File browsing in UNIX is tricky. If you know right command to do it you can save a lot of time. These 19 top UNIX files commands useful to use in your project. In UNIX a file normally can have Header, Detail and Trailer. There are scenarios where you need only details without header and Trailer, and need only recent one record, and you need to skip some records from the input files. So for all the File based scenarios, I have given useful UNIX commands.   1). How to print/display the first line of a file?  There are many ways to do this. However the easiest way to display the first line of a file is using the [head] command.  $> head -1 file. Txt If you specify [head -2] then it would print first 2 records of the file.  Another way can be by using [sed] command. [sed] is a very powerful text editor which can be used for various text manipulation purposes like this.  ...