Description
This is a first version of the analysis of pandas usage in Kaggle notebooks.
We've fetched Python notebooks from Kaggle and we run them using record_api to analyze the number of calls to the main objects of the pandas API. A total of 895 notebooks could be analyzed.
In a separate column, information about the page views in the pandas documentation has been added. The page views are normalized by 1,000 (so the page with more views in the pandas documentation would have a value of 1,000 in the column).
For simplicity, only the attributes of DataFrame
, Series
and the pandas
top-level module have been merged. So, pandas.sum()
, Series.sum()
and DataFrame.sum()
would appear in the list as simply sum
.
The different sections are to help reading the document, and not an "official" categorization of the API. Feedback is welcome if something feels misplaced.
The source code to generate the table is available at this repo.
Top 25 called methods
Notes:
- Operators (e.g.
__add__
) are merged with their equivalent method (e.g.add
) __getitem__
is both used to access a columndf[col]
and to filterdf[condition]
- Accessing a call is also possible via
__getattr__
(e.g.df.col_name
), but this has not been captured
Object | Kaggle calls |
---|---|
__getitem__ |
143992 |
__setitem__ |
40059 |
eq |
3018 |
mul |
2799 |
add |
2768 |
groupby |
2267 |
loc |
1667 |
drop |
1618 |
fillna |
1609 |
columns |
1583 |
head |
1575 |
truediv |
1442 |
shape |
1267 |
sub |
1144 |
isnull |
1057 |
sort_values |
1015 |
and |
957 |
values |
953 |
sum |
898 |
astype |
728 |
value_counts |
706 |
index |
664 |
gt |
622 |
apply |
538 |
to_frame |
479 |
Main items by category
Data summary and info
Object | Kaggle calls | Docs views |
---|---|---|
info |
275 | 22 |
empty |
0 | 32 |
describe |
303 | 146 |
value_counts |
706 | 161 |
dtypes |
175 | 64 |
memory_usage |
83 | 2 |
ndim |
0 | 1 |
shape |
1267 | 17 |
size |
3 | 45 |
values |
953 | 113 |
attrs |
0 | 0 |
array |
0 | 0 |
unique |
193 | 106 |
dtype |
149 | 8 |
nbytes |
0 | 0 |
Indexing
Object | Kaggle calls | Docs views |
---|---|---|
__getitem__ |
143992 | 0 |
__setitem__ |
40059 | 0 |
axes |
0 | 4 |
columns |
1583 | 31 |
set_index |
72 | 278 |
swapaxes |
0 | 0 |
select_dtypes |
180 | 36 |
lookup |
0 | 11 |
xs |
5 | 16 |
loc |
1667 | 232 |
iloc |
427 | 122 |
index |
664 | 164 |
reindex |
11 | 136 |
reindex_like |
0 | 2 |
reset_index |
305 | 279 |
add_prefix |
16 | 6 |
add_suffix |
0 | 3 |
get |
0 | 16 |
iat |
1 | 17 |
keys |
13 | 16 |
at |
4 | 40 |
filter |
3 | 170 |
rename |
401 | 355 |
rename_axis |
0 | 13 |
idxmax |
7 | 49 |
idxmin |
0 | 10 |
droplevel |
0 | 0 |
truncate |
0 | 7 |
swaplevel |
0 | 7 |
take |
0 | 5 |
reorder_levels |
0 | 5 |
sort_index |
32 | 90 |
set_axis |
0 | 1 |
pop |
14 | 9 |
searchsorted |
0 | 3 |
name |
113 | 13 |
item |
0 | 3 |
argmax |
0 | 2 |
argmin |
0 | 1 |
argsort |
0 | 3 |
Filter, select, sort
Object | Kaggle calls | Docs views |
---|---|---|
nlargest |
25 | 17 |
nsmallest |
1 | 8 |
head |
1575 | 108 |
tail |
60 | 12 |
drop_duplicates |
20 | 194 |
sort_values |
1015 | 457 |
sample |
63 | 102 |
query |
12 | 69 |
Operators
Object | Kaggle calls | Docs views |
---|---|---|
add |
2768 | 104 |
div |
2 | 10 |
dot |
0 | 9 |
eq |
3018 | 1 |
equals |
0 | 35 |
floordiv |
3 | 0 |
ge |
68 | 1 |
gt |
622 | 1 |
le |
197 | 0 |
lt |
8 | 0 |
mod |
11 | 1 |
mul |
2799 | 4 |
ne |
163 | 1 |
pow |
29 | 2 |
product |
0 | 3 |
radd |
0 | 6 |
rdiv |
0 | 0 |
rfloordiv |
0 | 0 |
rmod |
0 | 0 |
rmul |
0 | 2 |
rpow |
0 | 0 |
rsub |
0 | 2 |
rtruediv |
0 | 2 |
sub |
1144 | 7 |
truediv |
1442 | 0 |
Missing values
Object | Kaggle calls | Docs views |
---|---|---|
isnull |
1057 | 90 |
notnull |
60 | 40 |
dropna |
193 | 346 |
fillna |
1609 | 248 |
interpolate |
3 | 39 |
isna |
108 | 27 |
notna |
5 | 11 |
hasnans |
0 | 0 |
Map
Object | Kaggle calls | Docs views |
---|---|---|
cut |
59 | 84 |
eval |
0 | 12 |
corrwith |
1 | 11 |
applymap |
2 | 49 |
astype |
728 | 234 |
rank |
2 | 34 |
clip |
4 | 13 |
where |
10 | 105 |
mask |
14 | 25 |
combine |
0 | 12 |
combine_first |
0 | 11 |
isin |
86 | 138 |
abs |
25 | 12 |
replace |
463 | 216 |
apply |
538 | 379 |
round |
14 | 68 |
transform |
10 | 39 |
factorize |
3 | 15 |
map |
420 | 91 |
between |
1 | 12 |
Reduce
Object | Kaggle calls | Docs views |
---|---|---|
cov |
0 | 9 |
quantile |
47 | 78 |
var |
4 | 11 |
skew |
88 | 5 |
std |
140 | 39 |
sum |
898 | 114 |
kurt |
60 | 1 |
kurtosis |
23 | 3 |
count |
109 | 107 |
max |
131 | 70 |
mean |
390 | 107 |
median |
228 | 21 |
min |
107 | 26 |
mode |
205 | 18 |
prod |
1 | 1 |
nunique |
15 | 27 |
all |
9 | 16 |
any |
87 | 22 |
mad |
3 | 2 |
sem |
0 | 2 |
corr |
239 | 105 |
is_monotonic |
0 | 0 |
is_monotonic_decreasing |
0 | 0 |
is_monotonic_increasing |
0 | 0 |
is_unique |
0 | 1 |
cov |
0 | 9 |
autocorr |
0 | 7 |
quantile |
47 | 78 |
Misc
Object | Kaggle calls | Docs views |
---|---|---|
iterrows |
39 | 102 |
style |
84 | 76 |
itertuples |
0 | 36 |
bool |
0 | 5 |
squeeze |
0 | 2 |
update |
8 | 56 |
pipe |
3 | 7 |
__iter__ |
0 | 1 |
items |
1 | 6 |
iteritems |
3 | 37 |
view |
0 | 0 |
Reshape / Join / Concat...
Object | Kaggle calls | Docs views |
---|---|---|
get_dummies |
258 | 152 |
crosstab |
58 | 40 |
concat |
432 | 315 |
merge_asof |
0 | 16 |
merge_ordered |
0 | 4 |
wide_to_long |
0 | 7 |
pivot |
29 | 95 |
pivot_table |
54 | 144 |
join |
159 | 225 |
melt |
18 | 75 |
stack |
0 | 36 |
transpose |
9 | 76 |
assign |
19 | 74 |
insert |
17 | 57 |
merge |
425 | 413 |
drop |
1618 | 625 |
explode |
0 | 0 |
align |
3 | 10 |
append |
439 | 515 |
T |
55 | 6 |
unstack |
17 | 58 |
repeat |
0 | 5 |
ravel |
0 | 5 |
Group
Object | Kaggle calls | Docs views |
---|---|---|
agg |
0 | 16 |
aggregate |
3 | 58 |
groupby |
2267 | 719 |
Window
Object | Kaggle calls | Docs views |
---|---|---|
cummax |
0 | 2 |
cummin |
0 | 0 |
cumprod |
0 | 5 |
cumsum |
8 | 29 |
pct_change |
0 | 34 |
rolling |
42 | 140 |
ewm |
0 | 33 |
expanding |
0 | 11 |
duplicated |
14 | 90 |
diff |
1 | 54 |