12-2. 고급 pandas 13-1. 파이썬 모델링 라이브러리

2021. 3. 1. 20:46

728x90

12.1.3 Categorical연산

임의의 숫자 데이터를 pandas.qcut함수

pandas.cut함수

import numpy as np; import pandas as pd
np.random.seed(12345)

draws = np.random.randn(1000)
print(draws[:5])

bins = pd.qcut(draws,4)
print(bins)

bins = pd.qcut(draws,4,labels=['Q1','Q2','Q3','Q4'])
print(bins)

categorical을 이용한 성능 개선

bins = pd.Series(bins, name='quartile')

results = (pd.Series(draws).groupby(bins).agg(['count','min','max']).reset_index())
print(results)

N = 10000000

draws = pd.Series(np.random.randn(N))
labels = pd.Series(['foo','bar','bax','qux'] * (N //4)) 

categories = labels.astype('category')
print(labels.memory_usage())
print(categories.memory_usage())

12.1.4 Categorical메서드

cat.codes

cat.set_categories()

value_counts()

cat.remove_unused_categories()

모델링을 위한 더미값 생성하기

통계나 머신러닝 도구를 사용하다 보면 범주형 데이터를 더미값으로 변환

get_dummies

12.2 고급 GroupBy 사용

12.2.1 그룹 변환과 GroupBy 객체 풀어내기

apply

transfrom(lambda x: x.mean())

rank(ascending=False)

12.2.2 시계열 그룹 리샘플링

resample

count

pd.TimeGrouper객체

12.3 메서드 연결 기법

DataFrame.assign메서드

12.3.1 pipe메서드

pipe 메서드 연결을 좀 더 쉽게 쓸 수 있도록 해준다.

pipe를 이용한 유용한 패턴 중 하나는 일련의 연산을 재사용 가능한 함수로 일반화하는 것이다.

13. 파이썬 모델링 라이브러리

통계문제

statsmodels와 scikit-learn

'책 > python for Data Analysis' 카테고리의 다른 글

14-1. 데이터 분석 예제 (0)	2021.03.01
13-2. 파이썬 모델링 라이브러리 (0)	2021.03.01
11-3. 시계열 (0)	2021.02.21
11-2. 시계열 (0)	2021.02.21
10-3. 데이터 집계와 그룹 연산 11-1. 시계열 (0)	2021.02.19

NAIAHD

12-2. 고급 pandas 13-1. 파이썬 모델링 라이브러리

'책 > python for Data Analysis' 카테고리의 다른 글

+ Recent posts

티스토리툴바