Skip to content Skip to footer

Getting Started with Pandas for Data Manipulation in Python

Python is the core language used for data manipulation to perform operations such as analysis, manipulation or optimization on large datasets to achieve the desired format. Python has a large collection of libraries such as pandas, NumPy and Matplotlib to process, clean and analyze large data and numbers. 

The Pandas are the most important, using the open-source library to work with relational or labelled data sets. It plays a key role in data manipulation in Python to emphasize faster optimization and process of the data. 

In this guide, we are going to deeply understand the pandas library for data manipulation along with the explanation of the main functions and key features of pandas and code examples of them:

Understanding Data Manipulation 

Data manipulation refers to the process of utilizing programming languages and libraries to analyze, organize, manipulate and optimize large datasets to make large data more readable, optimized, structured and understandable. Effective data manipulation includes the process of extracting, cleaning, filtering or inserting the data information. 

Sometimes, data models or sets contain the copied or similar data twice, which is optional for our desired output or goals. Therefore, data manipulation comes in handy for properly analyzing, cleaning and processing the data set to make the input more efficient and optimized for our model. 

Data is the core of every industry, so the use of data manipulation methods, techniques and libraries is vast, from technology to finance to real estate and marketing.

The Power of Pandas

Pandas is the open-source library that handles, manipulates, analyzes and optimizes the data. It provides a structured and easy-to-use data structure with many functions and methods to insert, filter, and append data sets.  

Pandas provides a flexible and structured data structure designed to work with the “relational” and “labelled” data easily. Pandas key data structure is called the “DataFrame,” which lets you manage, store and manipulate tabular data. 

 The following points demonstrate the area of data pandas are well suited for:

  • Tabular data such as SQL tables or Excel spreadsheets. 
  • Arbitrary matrix data with row and column labels. 
  • Ordered and unordered data. 
  • Observational or statistical data sets.

You need to install the pandas library because it does not come with the Python installation. So, all you have to use is the package installer command on your console to install the pandas library.

The following command is used to install the pandas library inside windows: 

# Install pandas using pip
pip install pandas
(or)
pip3 install pandas

After finishing the installation, you need to import the pandas on the top of your program using the import statement.

import pandas as PD

After importing the library, Let’s now understand DataFrames and the creation of DataFrames from various file formats.

Creating and Understanding DataFrames

DataFrames are the most important concept used as the data structure to store tabular data. They are the two-dimensional labelled arrays used to hold various data types such as Dictionaries, SQL, and CSV.

Let’s now understand how to create DataFrames from the dictionary data.

Creating DataFrames from Python Dictionaries:

Dictionaries are a natural fit for creating DataFrames, where keys become column names, and values become the data in those columns’ columns.

For example, we have a dictionary in Python with the “studentData” that contains the key “Name” with the value in the list format and the key “Percentage” with the list of Percentage each student gets.

studentData = {‘Name’: [‘Joey’, ‘Rock’, ‘Zyne’], ‘Percentage’: [‘40%’, ‘85%’, ‘88%’]}

The following code Demonstrates how you can convert the “studentData” dictionary into the DataFrame using the DataFrame():

import pandas as pd

# Creating a DataFrame from a dictionary

studentData = {'Name': ['Joey', 'Rock', 'Zyne'],

        'Percentage': ['40%', '85%', '88%']}

dataFrame_from_dict = pd.DataFrame(studentData)

# Printing the DataFrame

print(dataFrame_from_dict)

The following image demonstrates the data frame created from the dictionary:

The following image demonstrates the data frame created from the dictionary:

As you can see, the image above contains three columns where the Name and Percentage columns are created from the studentData dictionary, and the DataFrame creates the first column to represent the index of each row.

Loading and Inspecting CSV data files

CSV (Comma-Separated Values) files are the common data format that you will often use in Python. Pandas provides a function to load the CSV files and automatically create a DataFrame. 

For example, we have a users.csv file that contains some data of the users:

For example, we have a users.csv file that contains some data of the users:

Pandas library provides a read_csv() function to load the CSV files into the program, and then the read_csv() function also creates DataFrame from the CSV file data. 

The following code demonstrates the loading of the users.csv file and creating a DataFrame from the data of the users.csv file:

import pandas as pd

# Reading a CSV file and Creating a DataFrame

dataFrame_from_csv = pd.read_csv('users.csv')

# Printing the DataFrame

print(dataFrame_from_csv)

Note: You must pass the CSV file path to read_csv() as a string.

The following image demonstrates the DataFrame created from the users.csv file:

The following image demonstrates the DataFrame created from the users.csv file:

As you can see, the pandas read_csv() function loads the data from the CSV files and then beautifully creates the DataFrame from the data of the CSV file.  

head() method

Suppose you have a Dataframe with 1000 or more rows, but you want to see only five rows from the above to understand the data format. 

To do so, pandas provide a head() method to see the first five rows of the DataFrame. We can call the head() method on our DataFrame variable to return only the utmost five rows of the DataFrame:

import pandas as pd

# Reading a CSV file and Creating a DataFrame

dataFrame_from_csv = pd.read_csv('users.csv')

# Printing the five rows from DataFrame

print(dataFrame_from_csv.head())

The following image shows the output of the head() method to show five rows from the DataFrame:

The following image shows the output of the head() method to show five rows from the DataFrame:

DataFrame Information

The info() function prints information about a DataFrame, including the total columns, RangeIndex, memory usage, DataFrame type and many more details. 

The info() function is quite useful when you want to get a concise summary of the large dataset or the DataFrame without printing the DataFrame on the output screen. 

Let’s use our old users.csv file to load, and create the DataFrame and then call the info() function on the variable that contains the DataFrame to get the information:

import pandas as pd

# Reading a CSV file and Creating a DataFrame

dataFrame_from_csv = pd.read_csv('users.csv')

# Printing the five rows from DataFrame

print(dataFrame_from_csv.info())

The following image shows the output of the info() function to the information about the DataFrame:

The following image shows the output of the info() function to the information about the DataFrame:

The most helpful information you will get using the info() function on a DataFrame is the Total number of columns and the Memory usage.

Adding a Row to DataFrame

You can use the append() function to append a new row to the bottom of the DataFrame. The row is defined as the dictionary with the key-value pair of the information, and then the append() function is used to add the row to the DataFrame.  

When passing the new row to the append() function, also make sure to pass ignore_index=True to reset the row indices. 

The following code demonstrates adding a new row to DataFrame using the append() function:

import pandas as pd 

# Creating a DataFrame from a dictionary 

studentData = {'Name': ['Joey', 'Rock', 'Zyne'], 

        'Percentage': ['40%', '85%', '88%']} 

dataFrame_from_dict = pd.DataFrame(studentData) 

# Printing DataFrame

print('Original DataFrame\n------------------')

print(dataFrame_from_dict)

# Adding row of data to the DataFrame 

new_student = {'Name': "Bob", 'Percentage': '65%'}

new_df = dataFrame_from_dict.append(new_student, ignore_index=True)

# Printing New DataFrame with the added row 

print('\n\nNew row added to DataFrame\n--------------------------')

print(new_df)

The below image demonstrates DataFrame before or after adding the new row:

Adding a Column to DataFrame

You can use the DataFrame.assign() to add a column to the DataFrame. It uses keyword arguments (**kwargs) to add the column to the DataFrame. This method doesn’t mutate the old DataFrame; instead, it returns the new DataFrame after adding a new column to the existing DataFrame. 

The following code demonstrates the adding a new column to DataFrame using the assign():

import pandas as pd 

# Creating a DataFrame from a dictionary 

studentData = {'Name': ['Joey', 'Rock', 'Zyne'], 

        'Percentage': ['40%', '85%', '88%']} 

dataFrame_from_dict = pd.DataFrame(studentData) 

# Printing DataFrame

print('Original DataFrame\n------------------')

print(dataFrame_from_dict)

# Adding column of data to the DataFrame 

age = [24 ,22, 18]

new_df = dataFrame_from_dict.assign(Age=age)

# Printing New DataFrame with the added column 

print('\n\nNew column added to DataFrame\n--------------------------')

print(new_df)

The below image demonstrates DataFrame before or after adding the new column:

The below image demonstrates DataFrame before or after adding the new column:

Deleting the Row & Column to DataFrame

You can use the drop() method to remove or delete the row and column from the DataFrame using the index of the row and the label to remove the column. Also, Make sure to add the “axis=1” when removing the column. 

The following code demonstrates the deleting column & row from the DataFrame using the drop() method:

import pandas as pd 

# Creating a DataFrame from a dictionary 

studentData = {'Name': ['Joey', 'Rock', 'Zyne'], 

        'Percentage': ['40%', '85%', '88%']} 

dataFrame_from_dict = pd.DataFrame(studentData) 

# Printing DataFrame

print('Original DataFrame\n------------------')

print(dataFrame_from_dict)

# Deleting Column & Row from the DataFrame

new_df = dataFrame_from_dict.drop('Percentage', axis=1)

new_df = new_df.drop(2)

# Printing DataFrame after deletng Column & Row 

print('\n\nDataFrame after deletng Column & Row \n--------------------------')

print(new_df)

The below image demonstrates DataFrame before or after deleting the column & row:

The below image demonstrates DataFrame before or after deleting the column & row:

Conclusion

Python language is flexible and versatile in nature; hence, it is used in many fields like Artificial Intelligence (AI), Web Development, IoT and many more. Therefore, it is important to have a good understanding of data manipulation in Python to analyze, organize, and optimize large sets of data. Data Manipulation in Python is useful for web development, website, data analysis and processing. 

As a development company, we at Delphin Technologies provide a host of services like designing and developing websites and mobile apps.

24*7 Tracking

We don’t want to keep you in the dark; in fact, we are committed to developing a long term relationship with our clients. We deliver regular updates so that you can keep a track of the progress throughout your Ecommerce Web Design project.

Hire Delphin Technologies for Your Python Project!

Get In Touch

×