핸즈온머신러닝 2022. 6. 14. 17:56

https://www.youtube.com/watch?v=wquIJHKX7T0&list=PLJN246lAkhQjX3LOdLVnfdFaCbGouEBeb&index=15

로지스틱 회귀

로지스틱 회귀는 샘플이 특정 클래스에 속할 확률을 추정하는 데 널리 사용된다.

결정 경계

t = np.linspace(-10, 10, 100)
sig = 1 / (1 + np.exp(-t))
plt.figure(figsize=(9, 3))
plt.plot([-10, 10], [0, 0], "k-")
plt.plot([-10, 10], [0.5, 0.5], "k:")
plt.plot([-10, 10], [1, 1], "k:")
plt.plot([0, 0], [-1.1, 1.1], "k-")
plt.plot(t, sig, "b-", linewidth=2, label=r"$\sigma(t) = \frac{1}{1 + e^{-t}}$")
plt.xlabel("t")
plt.legend(loc="upper left", fontsize=20)
plt.axis([-10, 10, -0.1, 1.1])
save_fig("logistic_function_plot")
plt.show()

로지스틱 비용 함수의 편도 함수는 MSE 값을 미분 한 값과 동일함

from sklearn import datasets
iris = datasets.load_iris()
list(iris.keys())

# 출력: 
# ['data',
# 'target',
# 'frame',
# 'target_names',
# 'DESCR',
# 'feature_names',
# 'filename',
# 'data_module']

붓꽃 데이터 셋을 사용한다. 이 데이터 셋은 세 개의 품족에 속하는 붓꽃 150개의 꽃잎과 꽃받침의 너비와 길이를 담고있다.

np.unique(iris.target, return_counts=True)

# 출력: (array([0,1,2]), array([50,50,50]))

세 개의 클래스와 각 클래스 별로 50개의 샘플이 있음을 알 수 있다.

X = iris["data"][:, 3:]  # 꽃잎 너비
y = (iris["target"] == 2).astype(int)  # Iris virginica이면 1 아니면 0

이진 분류기를 사용할 것이기 때문에 Iris virginicar이면 1 아니면 0으로 y값을 정하고, X 값은 꽃잎의 너비 피쳐 한가지만 사용한다.

from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression(solver="lbfgs", random_state=42)
log_reg.fit(X, y)

# LogisticRegression(random_state=42)

로지스틱 회귀 모델을 임포트 해주고 객체를 만들고 fit 메소드를 사용해서 훈련한다.

로지스틱 모델이 어떻게 동작하는지 보는 것이 목적이기 때문에 Train, test 셋을 분리하지 않는다.

X_new = np.linspace(0, 3, 1000).reshape(-1, 1)
y_proba = log_reg.predict_proba(X_new)

plt.plot(X_new, y_proba[:, 1], "g-", linewidth=2, label="Iris virginica")
plt.plot(X_new, y_proba[:, 0], "b--", linewidth=2, label="Not Iris virginica")

책에 있는 사진:

decision_boundary
# 출력: array([1.66066066])

log_reg.predict([[1.7], [1.5]])
# 출력: array([1, 0])

from sklearn.linear_model import LogisticRegression

X = iris["data"][:, (2, 3)]  # petal length, petal width
y = (iris["target"] == 2).astype(int)

log_reg = LogisticRegression(solver="lbfgs", C=10**10, random_state=42)
log_reg.fit(X, y)

x0, x1 = np.meshgrid(
        np.linspace(2.9, 7, 500).reshape(-1, 1),
        np.linspace(0.8, 2.7, 200).reshape(-1, 1),
    )
X_new = np.c_[x0.ravel(), x1.ravel()]

y_proba = log_reg.predict_proba(X_new)

plt.figure(figsize=(10, 4))
plt.plot(X[y==0, 0], X[y==0, 1], "bs")
plt.plot(X[y==1, 0], X[y==1, 1], "g^")

zz = y_proba[:, 1].reshape(x0.shape)
contour = plt.contour(x0, x1, zz, cmap=plt.cm.brg)


left_right = np.array([2.9, 7])
boundary = -(log_reg.coef_[0][0] * left_right + log_reg.intercept_[0]) / log_reg.coef_[0][1]

plt.clabel(contour, inline=1, fontsize=12)
plt.plot(left_right, boundary, "k--", linewidth=3)
plt.text(3.5, 1.5, "Not Iris virginica", fontsize=14, color="b", ha="center")
plt.text(6.5, 2.3, "Iris virginica", fontsize=14, color="g", ha="center")
plt.xlabel("Petal length", fontsize=14)
plt.ylabel("Petal width", fontsize=14)
plt.axis([2.9, 7, 0.8, 2.7])
save_fig("logistic_regression_contour_plot")
plt.show()

소프트맥스 회귀

다중 분류에 사용되며 다항 로지스틱 회귀라고도 한다.

X = iris["data"][:, (2, 3)]  # 꽃잎 길이, 꽃잎 너비
y = iris["target"]

softmax_reg = LogisticRegression(multi_class="multinomial",solver="lbfgs", C=10, random_state=42)
softmax_reg.fit(X, y) 

# 출력: LogisticRegression(C=10, multi_class='multinomial', random_state=42)

타깃 값을 여러개 클래스로 사용하면 자동으로 다중 분류기가 되므로 multi_class="multinomial"이라고 명시할 필요는 없다.

x0, x1 = np.meshgrid(
        np.linspace(0, 8, 500).reshape(-1, 1),
        np.linspace(0, 3.5, 200).reshape(-1, 1),
    )
X_new = np.c_[x0.ravel(), x1.ravel()]

y_proba = softmax_reg.predict_proba(X_new)
y_predict = softmax_reg.predict(X_new)

zz1 = y_proba[:, 1].reshape(x0.shape)
zz = y_predict.reshape(x0.shape)

plt.figure(figsize=(10, 4))
plt.plot(X[y==2, 0], X[y==2, 1], "g^", label="Iris virginica")
plt.plot(X[y==1, 0], X[y==1, 1], "bs", label="Iris versicolor")
plt.plot(X[y==0, 0], X[y==0, 1], "yo", label="Iris setosa")

from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])

plt.contourf(x0, x1, zz, cmap=custom_cmap)
contour = plt.contour(x0, x1, zz1, cmap=plt.cm.brg)
plt.clabel(contour, inline=1, fontsize=12)
plt.xlabel("Petal length", fontsize=14)
plt.ylabel("Petal width", fontsize=14)
plt.legend(loc="center left", fontsize=14)
plt.axis([0, 7, 0, 3.5])
save_fig("softmax_regression_contour_plot")
plt.show()

softmax_reg.predict([[5, 2]])
# 출력: array([2])

softmax_reg.predict_proba([[5, 2]])
# 출력: array([[6.38014896e-07, 5.74929995e-02, 9.42506362e-01]])

문제

로지스틱 회귀는 분류에 사용되는 모델이다. 로지스틱 회귀는 0과 1 사이의 값을 출력하는 ( ) 함수이므로 ( ) 해야하는 상황에 널리 사용된다.

로지스틱 회귀의 비용함수인 로그 손실은 경사 하강법을 통해 전역 최솟값을 찾을 수 있다. (O/X)

소프트맥스 회귀 분류기는 다중 출력기이므로 하나의 사진에서 여러 사람의 얼굴을 인식하는 데에 사용할 수 있다. (O/X)

저작자표시 (새창열림)

'핸즈온머신러닝' 카테고리의 다른 글

핸즈온 머신러닝[5] 서포트 벡터 머신(2) (0)	2022.07.03
핸즈온 머신러닝[5] 서포트 벡터 머신(1) (0)	2022.06.22
핸즈온 머신러닝[4] 모델 훈련(2) (0)	2022.06.13
핸즈온 머신러닝[4] 모델 훈련(1) (0)	2022.05.13
핸즈온 머신러닝[3] 분류(2) (0)	2022.05.05

ABOUT ME

JH's Tech Blog JH's Tech Blog

로지스틱 회귀

결정 경계

소프트맥스 회귀

문제

'핸즈온머신러닝' 카테고리의 다른 글

티스토리툴바

ABOUT ME

로지스틱 회귀

결정 경계

소프트맥스 회귀

문제

'핸즈온머신러닝' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바