파이썬 SettingWithCopyWarning과 FutureWarning 해결하기: Pandas DataFrame에서 .copy() 사용 및 numeric

[문제 상황 예시]
[해결 방법]

def create_section_df(df):
    bins = list(range(0, 401, 10))
    bins_label = [str(x) + "이상 " + str(x + 10) + "미만" for x in bins]
    df["section"] = pd.cut(
        df["total_worktime"], bins=range(0, 401, 10), right=False, labels=bins_label[:-1]
    )
    df["section_count"] = 1
    section_df = df.groupby(["section"], as_index=False)[
        [
            'y.m',
            "total_worktime",
            "section_count",
        ]
    ].sum()
    total = section_df['section_count'].sum()
    section_df['rate'] = section_df['section_count']/total
    return section_df

이런 코드를 작성해서 쓰다가

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

<ipython-input-119-9d4178635b1e>:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy 라는 경고 문구를 만나서 해결 방법을 찾아봤습니다.

이 경고는 DataFrame의 일부분에 값을 할당하려고 할 때 발생하는데, 원본 DataFrame이 실수로 수정될 수 있는 상황을 방지하기 위해 나타납니다.

[문제 상황 예시]

import pandas as pd

data = {
    "A": [1, 2, 3, 4],
    "B": [5, 6, 7, 8],
}

df = pd.DataFrame(data)

# SettingWithCopyWarning 발생하는 경우
df[df["A"] > 2]["B"] = 0

SettingWithCopyWarning은 Pandas가 코드를 실행하면서 DataFrame의 일부를 수정하려고 시도할 때 발생합니다. 이 경고는 수정하려는 객체가 원본 DataFrame의 복사본인지, 그대로인지 확인할 수 없기 때문에 발생하는데, 이로 인해 원본 DataFrame에 예기치 않은 변경이 발생할 수 있다고 합니다.

[해결 방법]

1. .loc[] 또는 .iloc[] 사용하기: 이 방법을 사용하면 원본 DataFrame의 일부를 명확하게 지정하고 수정할 수 있습니다. 대상이 명확해지기 때문에 SettingWithCopyWarning이 뜨지 않게 됩니다.

df.loc[df["A"] > 2, "B"] = 0

2. DataFrame 복사본 사용하기: 원본 DataFrame을 건드리지 않고 작업하려면 .copy() 메소드를 사용해 복사본을 만들 수 있습니다. 이렇게 하면 원본 데이터에 영향을 주지 않고 작업을 수행할 수 있습니다.

df_copy = df[df["A"] > 2].copy()
df_copy["B"] = 0

저는 copy를 사용해서 문제를 해결했습니다. 하고 글을 마무리 하려 했는데 사실 한가지 오류가 더 떴었습니다..

<ipython-input-131-cbae4b3d9c44>:9: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
section_df = df.groupby(["section"], as_index=False)[

이 오류는 찾아보니.. 말 그대로 현재는 사용이 가능하나 미래에 DataFrameGroupBy.sum에서 원래는 sum을 하는 데이터들의 dtype과 상관없이 일단 동작이 됐었는데 앞으로는 고정을 해줘야만 한다는 내용이었습니다. 그래서 numeric_only=True를 명시하거나 아니면 컬럼을 정확하게 선택해서 sum을 해야만 했습니다.

def create_section_df(df):
    df1 = df.copy() # copy warning을 방지하기 위해 미리 copy를 생성합니다.
    bins = list(range(0, 401, 10))
    bins_label = [str(x) + "이상 " + str(x + 10) + "미만" for x in bins]
    df1["section"] = pd.cut(
        df1["total_worktime"], bins=range(0, 401, 10), right=False, labels=bins_label[:-1]
    )
    df1["section_count"] = 1
    section_df = df1.groupby(["section"], as_index=False).sum(numeric_only=True)  # numeric_only 인자를 추가합니다.
    total = section_df['section_count'].sum()
    section_df['rate'] = section_df['section_count']/total
    return section_df

저작자표시 (새창열림)

'파이썬 > 파이썬 궁금증 회고' 카테고리의 다른 글

파이썬 concat - 특정 경로 내 CSV 파일들을 하나의 DF로 합치기(병합하기) (0)	2024.07.22
아나콘다 가상환경에서 파이썬 라이브러리 설치 안될 때 (0)	2023.10.17
.py와 .ipynb에서 같은 파일 사용하기(같은 파일 경로 사용하기), 다른 컴퓨터에서 같은 파일 사용하기 (0)	2023.05.12
맥) 아나콘다 네비게이터에서 VS code 안 보일 때 해결 방법 (0)	2023.04.12
맥 환경에서 파이썬 아나콘다 가상환경 sys.path에 디렉토리 영구적으로 추가하기 (0)	2023.04.10

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

다비의 작업실

파이썬 SettingWithCopyWarning과 FutureWarning 해결하기: Pandas DataFrame에서 .copy() 사용 및 numeric_only 지정

[문제 상황 예시]

[해결 방법]

'파이썬 > 파이썬 궁금증 회고' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

파이썬 SettingWithCopyWarning과 FutureWarning 해결하기: Pandas DataFrame에서 .copy() 사용 및 numeric_only 지정

[문제 상황 예시]

[해결 방법]

'파이썬 > 파이썬 궁금증 회고' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역