Handling CSV Files With Pandas In Python

Summary:

This blog post aims at giving a jump start to using Pandas for handling CSV files with python. This is useful when dealing with big CSV files or in machine learning or just when you only have a command line interface to edit a CSV. 

Requirements

Pandas is a python library and depends on Scipy and Numpy. In case you are having trouble installing we recommend that you check this link http://therandomtechadventure.blogspot.com/2017/06/Installing-Python-With-Sklearn-On-Windows.html

Handling CSV

First import Pandas library

import pandas as pd

Then you need to read the csv file

data = pd.read_csv("sample.csv")

In some cases the csv file could be too big to read at one go into memory in this case you can do the following

data = pd.read_csv("sample.csv",skiprows=100, nrows=100) #Skip first 100 rows then read 100 rows
data = pd.read_csv("sample.csv",nrows=100) #read only 100 first rows

If you are working statistical problems you can use the describe function below to get a summary of the csv file

print(data.describe())

Pandas is not limited to that when you read the csv it created a Dataframe. There are several ways of selecting and manipulating Dataframe https://pandas.pydata.org/pandas-docs/stable/indexing.html

data[0:100]

print(data.tail())

print(data.head())
Let's say in our case if we want to filter the csv per a certain column:

data = data[data.ColumnName == 1]

This is also useful when you want to eliminate empty columns. In Python 3.x you can eliminate null values by comparing the field with itself

data = data[data.ColumnName == data.ColumnName ]

Finally after manipulating your csv file you can easily save it back.

data.to_csv("new.csv",",")

Final Words

This short blog post was just many of the useful functions provided by Pandas. However, we hope that it has helped you getting started with Pandas and CSVs. Thank you for reading! :)

Comments

Popular Posts