This is the second set of articles in Python for data science series. Earlier we discussed ‘why python for data science?’ and we completed a set of tutorials on ‘NumPy for beginners’. If someone does not know anything about NumPy, I would recommend that you learn NumPy first but it is not mandatory.
‘Pandas’ is the most popular python library for data sciences. We can think of it as a well-equipped querying language used for exploratory data analysis (EDA). Pandas library is built on top of NumPy. The idea here is to make all the arithmetic and complex mathematical operations one can perform on arrays in NumPy accessible to a querying language.
This set of tutorials/articles is divided into 6 parts.
|1. Creating Pandas data structures|
|2. Inspecting dataframes|
|3. Indexing and Selecting data|
|4.Merge and Concat|
|6.Grouping and Summarizing|
In this course, we will be using Jupyter Notebook as our editor. I particularly suggest using Jupyter Notebook from here on because when you start data science projects, it gives an easy access to the results you want to see while doing the analysis.
So let’s start!
I assume you have python installed on your laptops already. If not, check the previous article, I shared a link to install Anaconda. If you have Anaconda, you can simply ‘install Pandas’ from your terminal or command prompt using:
conda install pandas
If you do not have Anaconda on your computer, ‘install Pandas’ from your terminal using:
pip install pandas
Once you have Pandas installed, launch your Jupyter notebook and get started.
Don’t worry; we will dive easy and simple. The articles are meticulously articulated for conceptual understanding together with hands-on. In between, I will try to correlate how we use these concepts while doing data pre-processing in data science projects.
Next! First tutorial: Creating Pandas data structures (Part 1)