Handling CSV Files With Pandas In Python
Summary:
This blog post aims at giving a jump start to using Pandas for handling CSV files with python. This is useful when dealing with big CSV files or in machine learning or just when you only have a command line interface to edit a CSV.
Requirements
Pandas is a python library and depends on Scipy and Numpy. In case you are having trouble installing we recommend that you check this link http://therandomtechadventure.blogspot.com/2017/06/Installing-Python-With-Sklearn-On-Windows.html
Handling CSV
First import Pandas library
import pandas as pd
Then you need to read the csv file
data = pd.read_csv("sample.csv")
In some cases the csv file could be too big to read at one go into memory in this case you can do the following
data = pd.read_csv("sample.csv",skiprows=100, nrows=100) #Skip first 100 rows then read 100 rows
data = pd.read_csv("sample.csv",nrows=100) #read only 100 first rows
If you are working statistical problems you can use the describe function below to get a summary of the csv file
print(data.describe())
Pandas is not limited to that when you read the csv it created a Dataframe. There are several ways of selecting and manipulating Dataframe https://pandas.pydata.org/pandas-docs/stable/indexing.html
data[0:100]
print(data.tail())
print(data.head())
data = data[data.ColumnName == 1]
data = data[data.ColumnName == data.ColumnName ]
Finally after manipulating your csv file you can easily save it back.
data.to_csv("new.csv",",")
Final Words
This short blog post was just many of the useful functions provided by Pandas. However, we hope that it has helped you getting started with Pandas and CSVs. Thank you for reading! :)
Comments
Post a Comment