吴恩达机器学习ex2 Logistic Regression (python)-白红宇的个人博客

发布日期：2021-05-11 00:17:54 浏览次数：33 分类：精选文章

本文共 3535 字，大约阅读时间需要 11 分钟。

逻辑回归是一种常用的分类算法，广泛应用于预测模型。以下是对逻辑回归的详细实现，分为传统逻辑回归和带L2正则化的逻辑回归两部分。

1. 传统逻辑回归

1.1 数据可视化

首先导入必要的库：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

读取数据：

path = 'data/ex2data1.txt'
data = pd.read_csv(path, header=None, names=['Exam 1', 'Exam 2', 'Admitted'])

绘制散点图：

positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
plt.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='Admitted')
plt.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
plt.legend()
plt.xlabel('Exam 1 Score')
plt.ylabel('Exam 2 Score')
plt.show()

1.2 模型实现

定义sigmoid函数：

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

定义成本函数：

def cost(theta, X, y):
    first = (-y) * np.log(sigmoid(X @ theta))
    second = (1 - y) * np.log(1 - sigmoid(X @ theta))
    return np.mean(first - second)

定义梯度函数：

def gradient(theta, X, y):
    return (X.T @ (sigmoid(X @ theta) - y)) / len(X)

优化参数：

import scipy.optimize as opt
result = opt.fmin_tnc(func=cost, x0=np.zeros(X.shape[1]), fprime=gradient, args=(X, y))

预测函数：

def predict(theta, X):
    probability = sigmoid(X @ theta)
    return [1 if x >= 0.5 else 0 for x in probability]

验证准确率：

theta_min = np.matrix(result[0])
predictions = predict(theta_min, X)
correct = [1 if (a == 1 and b == 1) or (a == 0 and b == 0) else 0 for (a, b) in zip(predictions, y)]
accuracy = sum(map(int, correct)) / len(correct)
print(f'accuracy = {accuracy}%')

1.3 决策边界

绘制决策边界：

x1 = np.linspace(30, 100, 100)
x2 = -(result[0][0] + x1 * result[0][1]) / result[0][2]
plt.plot(x1, x2, 'y', label='Prediction')
plt.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='Admitted')
plt.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
plt.legend()
plt.xlabel('Exam 1 Score')
plt.ylabel('Exam 2 Score')
plt.show()

2. 带L2正则化的逻辑回归

2.1 数据特征映射

定义特征映射函数：

def feature_mapping(x1, x2, power, as_ndarray=False):
    features = {}
    for i in range(power + 1):
        for p in range(i + 1):
            features[f'{i-p}{p}'] = np.power(x1, i - p) * np.power(x2, p)
    if as_ndarray:
        return np.array(pd.DataFrame(features))
    else:
        return pd.DataFrame(features)

读取数据并映射特征：

x1 = np.array(data['Microchip 1'])
x2 = np.array(data['Microchip 2'])
data2 = feature_mapping(x1, x2, power=6)
print(data2.shape)
print(data2.head())

2.2 模型实现

定义正则化成本函数：

def regularized_cost(theta, X, y, l=1):
    return cost(theta, X, y) + (l / (2 * len(X))) * (theta[1:] @ theta[1:])

定义正则化梯度函数：

def regularized_gradient(theta, X, y, l=1):
    return gradient(theta, X, y) + (1 / len(X)) * theta

优化参数：

result2 = opt.minimize(fun=regularized_cost, x0=np.zeros(X.shape[1]), args=(X, y), method='CG', jac=regularized_gradient)

预测函数：

def predict(theta, X):
    probability = sigmoid(X @ theta)
    return [1 if x >= 0.5 else 0 for x in probability]

验证准确率：

final_theta = result2.x
y_predict = predict(final_theta, X)
print(classification_report(y, y_predict))

2.3 决策边界

绘制决策边界：

x = np.linspace(-1, 1.5, 50)
xx, yy = np.meshgrid(x, x)
z = data2 @ final_theta
z = z.reshape(xx.shape)
plt.contour(xx, yy, z, 0, colors='black')
plt.ylim(-.8, 1.2)
plt.scatter(positive['Microchip 1'], positive['Microchip 2'], s=50, c='b', marker='o', label='Admitted')
plt.scatter(negative['Microchip 1'], negative['Microchip 2'], s=50, c='r', marker='x', label='Not Admitted')
plt.legend()
plt.xlabel('Microchip 1 Score')
plt.ylabel('Microchip 2 Score')
plt.show()

通过以上步骤，可以完成逻辑回归的实现和优化，理解其在不同情况下的性能表现。

上一篇：Day5 数组中插入元素与插入排序

下一篇：Day4 Hailstone

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！