240813 로지스틱회귀(범주형자료) 하는 방법

로지스틱 선형회귀를 하기 위한 방법

라이브러리 설치

!pip install sklearn
!pip install numpy
!pip install pandad
!pip install matplotlib
!pip instal seaborn

Import

import sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

설치되어 있는지 확인하는 방법

(예시) import pandas as pd
pd.__version__

데이터 확인

titanic_df=pd.read_csv('C:/Users/USER/Documents/ML/titanic/train.csv', encoding='utf-8')

가설 세우기

1. 비상상황 특성상 여성을 배려해서 우선 대피 시켜 여성 생존률이 높을것이다.
- 1-1.pivot table을 만들어 확인
- 1-2.그래프를 통해서 확인

pd.pivot_table(titanic_df, index='Sex', columns='Survived',aggfunc='size')

데이터 종류 확인

숫자
- Age, SibSp, Parch, Fare
-범주형
- Pclass, Sex, Cabin, Embarked
x변수는 Fare, y 변수는 Survived로 일단 진행

x_1=titanic_df['Fare']

x_1=titanic_df[['Fare']]
y_true=titanic_df[['Survived']]

sns.scatterplot(titanic_df, x='Fare', y='Survived')

sns.histplot(titanic_df,x='Fare')

데이터의 기술통계량은 알아보자!

titanic_df.describe()

로지스틱회귀를 위해 데이터 인코딩을 해보자

from sklearn.linear_model import LogisticRegression

로지스틱회귀를 위한 데이터 훈련

model_lor = LogisticRegression()
model_lor.fit(x_1,y_true)

로지스틱 회귀식에서 필요한 값 확인

def get_att(x):                              # x 모델을 넣기
    print('클래스 종류', x.classes_)
    print('독립변수 갯수', x.n_features_in_)
    print('들어간 독립변수(x)의 이름', x.feature_names_in_)
    print('가중치', x.coef_)
    print('바이어스', x.intercept_)

get_att(model_lor)

모델의 정확도와 f1-score를 알아보자

from sklearn.metrics import accuracy_score, f1_score

def get_metrics(true, pred):
    print('정확도', accuracy_score(true,pred))
    print('f1-socre',f1_score(true,pred))

get_metrics(y_true, y_pred_1)

x변수가 1개가 아니라 여러개를 선택해서 해보자

기본적으로 데이터 전처리 과정이 필요함

#y(Survived): 
#x(수치형): Sex, Fare 
#x(범주형): Pclass(좌석등급)


def get_sex(x):
    if x =='female':
        return 0
    else:
        return 1
    
titanic_df['Sex_en']=titanic_df['Sex'].apply(get_sex)

x_2=titanic_df[['Pclass','Sex_en','Fare']]
y_true=titanic_df[['Survived']]

model_lor_2=LogisticRegression()
model_lor_2.fit(x_2,y_true)

get_att(model_lor_2)

예측값 만들기

y_pred_2=model_lor_2.predict(x_2)

모델의 정확도와 f1-score를 알아보자

get_metrics(y_true,y_pred_1)
get_metrics(y_true,y_pred_2)

'TIL' 카테고리의 다른 글

240813 선형회귀(숫자예측) 하는 방법 (0)	2024.08.13
240809 TIL / recursive CTE 관련 (0)	2024.08.09
240807 TIL (0)	2024.08.07
240806 통계학기초 연습문제 (0)	2024.08.06
240805 TIL (0)	2024.08.05

초승달 데이터분석가의 길

240813 로지스틱회귀(범주형자료) 하는 방법

라이브러리 설치

Import

데이터 확인

가설 세우기

데이터 종류 확인

데이터의 기술통계량은 알아보자!

로지스틱회귀를 위해 데이터 인코딩을 해보자

로지스틱회귀를 위한 데이터 훈련

로지스틱 회귀식에서 필요한 값 확인

모델의 정확도와 f1-score를 알아보자

x변수가 1개가 아니라 여러개를 선택해서 해보자

예측값 만들기

모델의 정확도와 f1-score를 알아보자

'TIL' 카테고리의 다른 글

티스토리툴바

240813 로지스틱회귀(범주형자료) 하는 방법

라이브러리 설치

Import

데이터 확인

가설 세우기

데이터 종류 확인

데이터의 기술통계량은 알아보자!

로지스틱회귀를 위해 데이터 인코딩을 해보자

로지스틱회귀를 위한 데이터 훈련

로지스틱 회귀식에서 필요한 값 확인

모델의 정확도와 f1-score를 알아보자

x변수가 1개가 아니라 여러개를 선택해서 해보자

예측값 만들기

모델의 정확도와 f1-score를 알아보자

'TIL' 카테고리의 다른 글

'TIL' Related Articles

티스토리툴바