Categories
BLOG

pandas pipe

Using Pandas pipe function to improve code readability

An intuitive tutorial for the best practice with Pandas pipe()

B. Chen
Jun 9 · 5 min read

In Data Processing, it is often necessary to write a function to perform operations (such as statistical calculations, splitting, or substituting value) on a certain row or column to obtain new data.

Instead of writing

Pandas introduced pipe() starting from version 0.16.2. pipe() enables user-defined methods in method chains.

Method chaining is a programmatic style of invoking multiple method calls sequentially with each call performing an action on the same object and returning it.

It eliminates the cognitive burden of naming variables at each intermediate step. Fluent Interface, a method of creating object-oriented API relies on method cascading (aka method chaining). This is akin to piping in Unix systems.

Method chaining substantially increases the readability of the code. Let’s dive into a tutorial to see how it improves our code readability.

Dataset preparation

For this tutorial, we will be working on the Titanic Dataset from Kaggle. This is a very famous dataset and very often is a student’s first step in data science. Let’s import some libraries and load data to get started.

In Data Processing, it is often necessary to write a function to perform operations (such as statistical calculations, splitting, or substituting value) on a certain row or column to obtain new data…