吴恩达机器学习ex2 Logistic Regression (python)
发布日期:2021-05-11 00:17:54 浏览次数:33 分类:精选文章

本文共 3535 字,大约阅读时间需要 11 分钟。

逻辑回归是一种常用的分类算法,广泛应用于预测模型。以下是对逻辑回归的详细实现,分为传统逻辑回归和带L2正则化的逻辑回归两部分。

1. 传统逻辑回归

1.1 数据可视化

首先导入必要的库:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

读取数据:

path = 'data/ex2data1.txt'
data = pd.read_csv(path, header=None, names=['Exam 1', 'Exam 2', 'Admitted'])

绘制散点图:

positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
plt.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='Admitted')
plt.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
plt.legend()
plt.xlabel('Exam 1 Score')
plt.ylabel('Exam 2 Score')
plt.show()

1.2 模型实现

定义sigmoid函数:

def sigmoid(z):
return 1 / (1 + np.exp(-z))

定义成本函数:

def cost(theta, X, y):
first = (-y) * np.log(sigmoid(X @ theta))
second = (1 - y) * np.log(1 - sigmoid(X @ theta))
return np.mean(first - second)

定义梯度函数:

def gradient(theta, X, y):
return (X.T @ (sigmoid(X @ theta) - y)) / len(X)

优化参数:

import scipy.optimize as opt
result = opt.fmin_tnc(func=cost, x0=np.zeros(X.shape[1]), fprime=gradient, args=(X, y))

预测函数:

def predict(theta, X):
probability = sigmoid(X @ theta)
return [1 if x >= 0.5 else 0 for x in probability]

验证准确率:

theta_min = np.matrix(result[0])
predictions = predict(theta_min, X)
correct = [1 if (a == 1 and b == 1) or (a == 0 and b == 0) else 0 for (a, b) in zip(predictions, y)]
accuracy = sum(map(int, correct)) / len(correct)
print(f'accuracy = {accuracy}%')

1.3 决策边界

绘制决策边界:

x1 = np.linspace(30, 100, 100)
x2 = -(result[0][0] + x1 * result[0][1]) / result[0][2]
plt.plot(x1, x2, 'y', label='Prediction')
plt.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='Admitted')
plt.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
plt.legend()
plt.xlabel('Exam 1 Score')
plt.ylabel('Exam 2 Score')
plt.show()

2. 带L2正则化的逻辑回归

2.1 数据特征映射

定义特征映射函数:

def feature_mapping(x1, x2, power, as_ndarray=False):
features = {}
for i in range(power + 1):
for p in range(i + 1):
features[f'{i-p}{p}'] = np.power(x1, i - p) * np.power(x2, p)
if as_ndarray:
return np.array(pd.DataFrame(features))
else:
return pd.DataFrame(features)

读取数据并映射特征:

x1 = np.array(data['Microchip 1'])
x2 = np.array(data['Microchip 2'])
data2 = feature_mapping(x1, x2, power=6)
print(data2.shape)
print(data2.head())

2.2 模型实现

定义正则化成本函数:

def regularized_cost(theta, X, y, l=1):
return cost(theta, X, y) + (l / (2 * len(X))) * (theta[1:] @ theta[1:])

定义正则化梯度函数:

def regularized_gradient(theta, X, y, l=1):
return gradient(theta, X, y) + (1 / len(X)) * theta

优化参数:

result2 = opt.minimize(fun=regularized_cost, x0=np.zeros(X.shape[1]), args=(X, y), method='CG', jac=regularized_gradient)

预测函数:

def predict(theta, X):
probability = sigmoid(X @ theta)
return [1 if x >= 0.5 else 0 for x in probability]

验证准确率:

final_theta = result2.x
y_predict = predict(final_theta, X)
print(classification_report(y, y_predict))

2.3 决策边界

绘制决策边界:

x = np.linspace(-1, 1.5, 50)
xx, yy = np.meshgrid(x, x)
z = data2 @ final_theta
z = z.reshape(xx.shape)
plt.contour(xx, yy, z, 0, colors='black')
plt.ylim(-.8, 1.2)
plt.scatter(positive['Microchip 1'], positive['Microchip 2'], s=50, c='b', marker='o', label='Admitted')
plt.scatter(negative['Microchip 1'], negative['Microchip 2'], s=50, c='r', marker='x', label='Not Admitted')
plt.legend()
plt.xlabel('Microchip 1 Score')
plt.ylabel('Microchip 2 Score')
plt.show()

通过以上步骤,可以完成逻辑回归的实现和优化,理解其在不同情况下的性能表现。

上一篇:Day5 数组中插入元素与插入排序
下一篇:Day4 Hailstone

发表评论

最新留言

做的很好,不错不错
[***.243.131.199]2025年05月14日 16时06分21秒