Pandas 간단 사용법

2020. 3. 4. 12:56

1. pandas는 파이썬 언어로 그리드 성격의 데이터를 다룰 때 주로 사용된다

- pandas의 dataframe을 주로 제어한다

2. dataframe의 3요소 : 컬럼, 데이터(로우), 인덱스

3. dataframe 만들기

import pandas as pd

- DataFrame 생성자를 이용해 만들기

dftest = pd.DataFrame([('bird', 389.0), ('bird', 24.0), ('mammal', 80.5), ('mammal', np.nan)], index=['falcon', 'parrot', 'lion', 'monkey'], columns=('class', 'max_speed'))

print(dftest)

출력:

class max_speed

falcon bird 389.0

parrot bird 24.0

lion mammal 80.5

monkey mammal NaN

- 넘파이 행렬을 이용해서 만들기

my_2darray = np.array([[1, 2, 3], [4, 5, 6]])

print(pd.DataFrame(my_2darray))

- dictionary를 이용해서 만들기

my_dict = {"a": ['1', '3'], "b": ['1', '2'], "c": ['2', '4']}

print(pd.DataFrame(my_dict))

- DataFrame을 이용해서 새로운 DataFrame을 만들기

my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])

print(pd.DataFrame(my_df))

- 시리즈를 이용해서 만들기

my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi", "United States":"Washington", "Belgium":"Brussels"})

print(pd.DataFrame(my_series))

- csv 파일로드하여 만들기

data = pd.read_csv('dataset\\005930.KS_5y.csv')

4. DataFrame shape 알아보기

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))

- shape를 통해 column과 row 길이를 알 수 있다

print(df.shape)

- row의 개수를 알 수 있다.

print(len(df.index))

- column 타이틀명을 리스트타입으로 얻어낸다

print(list(df.columns))

출력:[ 0, 1, 2]

dict = {"a": ['1', '3'], "b": ['1', '2'], "c": ['2', '4']}

dict1 = pd.DataFrame(dict)

print(list(dict1.columns))

출력:['a', 'b', 'c']

5. DataFrame의 특정컬럼이나 로우(인덱스) 선택하기

-특정컬럼을 얻어낼 때:

c = df['column']

r = df.ix[index]

예제:

df = pd.DataFrame({"A":[1,4,7], "B":[2,5,8], "C":[3,6,9]})

a = df['A']

print(df)

출력:

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

print(a)

출력:

0    1
1    4
2    7

b = df.loc[:, 'A']

print(b)

출력:

0    1
1    4
2    7

c = df.ix[0]['B']

print(c)

출력:

d = df.ix[1]

print(d)

출력:

A    4
B    5
C    6

e = df.loc[1]

print(e)

출력:

A    4
B    5
C    6

f = df.iloc[1]

print(f)

출력:

A    4
B    5
C    6

6. 로우 추가하기

ix의 경우 df.ix[2]인 경우 index=2인 로우를 찾아 그 로우에 데이터를 교체한다. 이때 index=2인 row가 없으면 로우를 새로 추가한다

예제:

df = pd.DataFrame(data=np.array([[10,11],[20,21],[30,31],[40,41]]),index=[3,4,5,2])

print(df)

    0   1
3  10  11
4  20  21
5  30  31
2  40  41

df.ix[2] = [0, 0]

print(df)

    0   1
3  10  11
4  20  21
5  30  31
2   0   0 ===>index=2인 로우를 찾아 데이터 교체

df.ix[1] = [1,1]

print(df)

    0   1
3  10  11
4  20  21
5  30  31
2   0   0
1   1   1 ==> index=1인 로우가 없어 마지막 줄 추가

7. append 이용해 row 추가

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))

print(df)

A B

0 1 2

1 3 4

df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))

print(df2)

A B

0 5 6

1 7 8

df.append(df2)

print(df)

A B

0 1 2

1 3 4

0 5 6 ==>df2의 index 그대로 셋팅

1 7 8 ==>df2의 index 그대로 셋팅

df.append(df2, ignore_index=True)

print(df)

A B

0 1 2

1 3 4

2 5 6 ==>index=2로 자동셋팅

3 7 8 ==>index=3로 자동셋팅

8. 컬럼추가

df = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['A', 'B', 'C'])

df.loc[:, 'D'] = pd.Series(['5', '6', '7'], index=df.index)

print(df)

   A  B  C  D
0  1  2  3  5
1  4  5  6  6
2  7  8  9  7

9. 컬럼삭제

df = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['A', 'B', 'C'])

df = df.drop('A', axis=1)

print(df)

B  C
0 2  3
1 5  6
2 8  9

df.drop('A', axis=1, inplace=True)

print(df)

B  C
0 2  3
1 5  6
2 8  9

10. 로우 삭제

- 특정 index 번째 로우를 삭제 (0부터 카운트됨)

df = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['A', 'B', 'C'], index=[3,4,5])

df = df.drop(df.index[1])

print(df)

   A  B  C
3 1  2  3
5 7  8  9

기타등등

Pandas 간단 사용법

+ Recent posts

티스토리툴바