[Pandas]How to select data

This article shows the most common methods regarding data selection

First, let’s create a dataframe.

1
2
3
4
5
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(5,4), index = ['a', 'b', 'c', 'd', 'e'], columns= ['A', 'B', 'C', 'D'])
df
A B C D
a 0.122248 0.581777 0.972888 0.869366
b 0.476451 0.453979 0.004705 0.644530
c 0.954790 0.747131 0.652936 0.758767
d 0.077122 0.407514 0.019102 0.553546
e 0.921520 0.157199 0.371028 0.825792

Use [] for columns selection

1
df[['A', 'B']]
A B
a 0.122248 0.581777
b 0.476451 0.453979
c 0.954790 0.747131
d 0.077122 0.407514
e 0.921520 0.157199

Select a range

1
df[1:3] # or df['b':'c']
A B C D
b 0.476451 0.453979 0.004705 0.644530
c 0.954790 0.747131 0.652936 0.758767

Use .loc

1
df.loc['a':'c', ['A','B']]
A B
a 0.122248 0.581777
b 0.476451 0.453979
c 0.954790 0.747131

Select all columns

1
df.loc['a':'c', :]
A B C D
a 0.122248 0.581777 0.972888 0.869366
b 0.476451 0.453979 0.004705 0.644530
c 0.954790 0.747131 0.652936 0.758767

Select with boolean

1
df[df['A']>0.2]
A B C D
b 0.476451 0.453979 0.004705 0.644530
c 0.954790 0.747131 0.652936 0.758767
e 0.921520 0.157199 0.371028 0.825792

Select with callable(lambda)

1
df[lambda df: df['A']>0.2]
A B C D
b 0.476451 0.453979 0.004705 0.644530
c 0.954790 0.747131 0.652936 0.758767
e 0.921520 0.157199 0.371028 0.825792
0%