统计建模与R软件第六章习题…
发布日期:2021-10-16 07:12:21 浏览次数:21 分类:技术文章

本文共 16559 字,大约阅读时间需要 55 分钟。

原文地址:
作者:
Ex6.1

(1)

> x <- c(5.1, 3.5, 7.1, 6.2, 8.8, 7.8, 4.5, 5.6, 8.0, 6.4)

> y <- c(1907, 1287, 2700, 2373, 3260, 3000, 1947, 2273, 3113,2493)

> plot(x,y)


由此判断,Y和X有线性关系。

(2)

> lm.sol<-lm(y~1+x)

> summary(lm.sol)


Call:

lm(formula = y ~ 1 + x)


Residuals:

     Min       1Q   Median       3Q      Max

-128.591  -70.978   -3.727   49.263  167.228


Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)   140.95     125.11   1.127    0.293   

            364.18      19.26  18.908 6.33e-08 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 96.42 on 8 degrees of freedom

Multiple R-squared: 0.9781,     Adjusted R-squared: 0.9754

F-statistic: 357.5 on 1 and 8 DF,  p-value: 6.33e-08

回归方程为 Y=140.95+364.18X

(3)

β1项很显著,但常数项β0不显著。

回归方程很显著。

(4)

> new <- data.frame(x=7)

> lm.pred<-predict(lm.sol,new,interval="prediction")

> lm.pred

       fit      lwr      upr

1 2690.227 2454.971 2925.484

故Y(7)= 2690.227, [2454.971,2925.484]


Ex6.2

(1)

>pho<-data.frame(x1 <- c(0.4,0.4,3.1,0.6,4.7,1.7,9.4,10.1,11.6,12.6,10.9,23.1,23.1,21.6,23.1,1.9,26.8,29.9), x2 <- c(52,34,19,34,24,65,44,31,29,58,37,46,50,44,56,36,58,51), x3 <- c(158,163,37,157,59,123,46,117,173,112,111,114,134,73,168,143,202,124), y <- c(64,60,71,61,54,77,81,93,93,51,76,96,77,93,95,54,168,99))

> lm.sol<-lm(y~x1+x2+x3,data=pho)

> summary(lm.sol)


Call:

lm(formula = y ~ x1 + x2 + x3, data = pho)


Residuals:

    Min      1Q  Median      3Q     Max

-27.575 -11.160  -2.799  11.574  48.808


Coefficients:

            Estimate Std. Error t value Pr(>|t|)  

(Intercept)  44.9290    18.3408   2.450  0.02806 *

x1            1.8033     0.5290   3.409  0.00424 **

x2           -0.1337     0.4440  -0.301  0.76771  

x3            0.1668     0.1141   1.462  0.16573  

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 19.93 on 14 degrees of freedom

Multiple R-squared: 0.551,      Adjusted R-squared: 0.4547

F-statistic: 5.726 on 3 and 14 DF,  p-value: 0.009004

回归方程为 y=44.9290+1.8033x1-0.1337x2+0.1668x3

(2)

回归方程显著,但有些回归系数不显著。

(3)

> lm.step<-step(lm.sol)

Start:  AIC=111.2

y ~ x1 + x2 + x3


       Df Sum of Sq     RSS     AIC

- x2        36.0  5599.4   109.3

<none>               5563.4   111.2

- x3       849.8  6413.1   111.8

- x1      4617.8 10181.2   120.1


Step:  AIC=109.32

y ~ x1 + x3


       Df Sum of Sq     RSS     AIC

<none>               5599.4   109.3

- x3       833.2  6432.6   109.8

- x1      5169.5 10768.9   119.1


> summary(lm.step)


Call:

lm(formula = y ~ x1 + x3, data = pho)


Residuals:

    Min      1Q  Median      3Q     Max

-29.713 -11.324  -2.953  11.286  48.679


Coefficients:

            Estimate Std. Error t value Pr(>|t|)  

(Intercept)  41.4794    13.8834   2.988  0.00920 **

x1            1.7374     0.4669   3.721  0.00205 **

x3            0.1548     0.1036   1.494  0.15592  

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 19.32 on 15 degrees of freedom

Multiple R-squared: 0.5481,     Adjusted R-squared: 0.4878

F-statistic: 9.095 on 2 and 15 DF,  p-value: 0.002589

x3仍不够显著。

再用drop1函数做逐步回归。

> drop1(lm.step)

Single term deletions


Model:

y ~ x1 + x3

       Df Sum of Sq     RSS     AIC

<none>               5599.4   109.3

x1        5169.5 10768.9   119.1

x3         833.2  6432.6   109.8

可以考虑再去掉x3.

> lm.opt<-lm(y~x1,data=pho);summary(lm.opt)


Call:

lm(formula = y ~ x1, data = pho)


Residuals:

    Min      1Q  Median      3Q     Max

-31.486  -8.282  -1.674   5.623  59.337


Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  59.2590     7.4200   7.986 5.67e-07 ***

x1            1.8434     0.4789   3.849  0.00142 **

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 20.05 on 16 degrees of freedom

Multiple R-squared: 0.4808,     Adjusted R-squared: 0.4484

F-statistic: 14.82 on 1 and 16 DF,  p-value: 0.001417

皆显著。


Ex6.3

> x<-c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,6,6,6,7,7,7,8,8,8,9,11,12,12,12)

> y<-c(0.6,1.6,0.5,1.2,2.0,1.3,2.5,2.2,2.4,1.2,3.5,4.1,5.1,5.7,3.4,9.7,8.6,4.0,5.5,10.5,17.5,13.4,4.5,30.4,12.4,13.4,26.2,7.4)

> plot(x,y)

> lm.sol<-lm(y~1+x)

> summary(lm.sol)


Call:

lm(formula = y ~ 1 + x)


Residuals:

    Min      1Q  Median      3Q     Max

-9.8413 -2.3369 -0.0214  1.0592 17.8320


Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  -1.4519     1.8353  -0.791    0.436   

            1.5578     0.2807   5.549 7.93e-06 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 5.168 on 26 degrees of freedom

Multiple R-squared: 0.5422,     Adjusted R-squared: 0.5246

F-statistic:  30.8 on 1 and 26 DF,  p-value: 7.931e-06

线性回归方程为 y=-1.4519+1.5578x,通过F 检验。 常数项参数未通过t 检验。

> abline(lm.sol)


> y.yes<-resid(lm.sol)

> y.fit<-predict(lm.sol)

> y.rst<-rstandard(lm.sol)

> plot(y.yes~y.fit)



> plot(y.rst~y.fit)



残差并非是等方差的。

修正模型,对相应变量Y做开方。

> lm.new<-update(lm.sol,sqrt(.)~.)

> summary(lm.new)


Call:

lm(formula = sqrt(y) ~ x)


Residuals:

     Min       1Q   Median       3Q      Max

-1.54255 -0.45280 -0.01177  0.34925  2.12486


Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  0.76650    0.25592   2.995  0.00596 **

           0.29136    0.03914   7.444 6.64e-08 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.7206 on 26 degrees of freedom

Multiple R-squared: 0.6806,     Adjusted R-squared: 0.6684

F-statistic: 55.41 on 1 and 26 DF,  p-value: 6.645e-08

此时所有参数和方程均通过检验。

对新模型做标准化残差图,情况有所改善,不过还是存在一个离群值。第24和第28个值存在问题。



Ex6.4

> toothpaste<-data.frame( X1=c(-0.05, 0.25,0.60,0,0.20, 0.15,-0.15, 0.15,0.10,0.40,0.45,0.35,0.30, 0.50,0.50,0.40,-0.05,-0.05,-0.10,0.20,0.10,0.50,0.60,-0.05,0,0.05,0.55),X2=c(5.50,6.75,7.25,5.50,6.50,6.75,5.25,6.00,6.25,7.00,6.90,6.80,6.80,7.10,7.00,6.80,6.50,6.25,6.00,6.50,7.00,6.80,6.80,6.50,5.75,5.80,6.80),Y=c(7.38,8.51,9.52,7.50,8.28,8.75,7.10,8.00,8.15,9.10,8.86,8.90,8.87,9.26,9.00,8.75,7.95, 7.65,7.27,8.00,8.50,8.75,9.21,8.27,7.67,7.93,9.26))

> lm.sol<-lm(Y~X1+X2,data=toothpaste); summary(lm.sol)


Call:

lm(formula = Y ~ X1 + X2, data = toothpaste)


Residuals:

     Min       1Q   Median       3Q      Max

-0.37130 -0.10114  0.03066  0.10016  0.30162


Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)   4.0759     0.6267   6.504 1.00e-06 ***

X1            1.5276     0.2354   6.489 1.04e-06 ***

X2            0.6138     0.1027   5.974 3.63e-06 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.1767 on 24 degrees of freedom

Multiple R-squared: 0.9378,     Adjusted R-squared: 0.9327

F-statistic:   181 on 2 and 24 DF,  p-value: 3.33e-15

回归诊断:

> influence.measures(lm.sol)

Influence measures of

         lm(formula = Y ~ X1 + X2, data = toothpaste) :


     dfb.1_   dfb.X1   dfb.X2   dffit cov.r   cook.d    hat inf

  0.00908  0.00260 -0.00847  0.0121 1.366 5.11e-05 0.1681   

  0.06277  0.04467 -0.06785 -0.1244 1.159 5.32e-03 0.0537   

-0.02809  0.07724  0.02540  0.1858 1.283 1.19e-02 0.1386   

  0.11688  0.05055 -0.11078  0.1404 1.377 6.83e-03 0.1843   *

  0.01167  0.01887 -0.01766 -0.1037 1.141 3.69e-03 0.0384   

-0.43010 -0.42881  0.45774  0.6061 0.814 1.11e-01 0.0936   

  0.07840  0.01534 -0.07284  0.1082 1.481 4.07e-03 0.2364   *

  0.01577  0.00913 -0.01485  0.0208 1.237 1.50e-04 0.0823   

  0.01127 -0.02714 -0.00364  0.1071 1.156 3.95e-03 0.0466   

10 -0.07830  0.00171  0.08052  0.1890 1.155 1.22e-02 0.0726   

11  0.00301 -0.09652 -0.00365 -0.2281 1.127 1.76e-02 0.0735   

12 -0.03114  0.01848  0.03459  0.1542 1.132 8.12e-03 0.0514   

13 -0.09236 -0.03801  0.09940  0.2201 1.071 1.62e-02 0.0522   

14 -0.02650  0.03434  0.02606  0.1179 1.235 4.81e-03 0.0956   

15  0.00968 -0.11445 -0.00857 -0.2545 1.150 2.19e-02 0.0910   

16 -0.00285 -0.06185  0.00098 -0.1608 1.146 8.83e-03 0.0594   

17  0.07201  0.09744 -0.07796 -0.1099 1.364 4.19e-03 0.1731   

18  0.15132  0.30204 -0.17755 -0.3907 1.087 5.04e-02 0.1085   

19  0.07489  0.47472 -0.12980 -0.7579 0.731 1.66e-01 0.1092   

20  0.05249  0.08484 -0.07940 -0.4660 0.625 6.11e-02 0.0384   *

21  0.07557  0.07284 -0.07861 -0.0880 1.471 2.69e-03 0.2304   *

22 -0.17959 -0.39016  0.18241 -0.5494 0.912 9.41e-02 0.1022   

23  0.06026  0.10607 -0.06207  0.1251 1.374 5.42e-03 0.1804   

24 -0.54830 -0.74197  0.59358  0.8371 0.914 2.13e-01 0.1731   

25  0.08541  0.01624 -0.07775  0.1314 1.249 5.97e-03 0.1069   

26  0.32556  0.11734 -0.30200  0.4480 1.018 6.49e-02 0.1033   

27  0.17243  0.32754 -0.17676  0.4127 1.148 5.66e-02 0.1369   

> source("Reg_Diag.R");Reg_Diag(lm.sol) #薛毅老师自己写的程序

      residual s1    standard s2     student s3 hat_matrix s4      DFFITS s5

  0.00443843     0.02753865     0.02695925    0.16811819     0.01211949  

-0.09114255    -0.53021138    -0.52211469    0.05369239    -0.12436727  

  0.07726887     0.47112863     0.46335666    0.13857353     0.18584310  

  0.04805665     0.30111062     0.29532912    0.18427663     0.14036860  

-0.09130271    -0.52689847    -0.51881406    0.03838430    -0.10365442  

  0.30162101     1.79287913     1.88596579    0.09362223     0.60613406  

  0.03066005     0.19855842     0.19453763    0.23641540  0.10824626  

  0.01199519     0.07085860     0.06937393    0.08226537     0.02077047  

  0.08491891     0.49217591     0.48426323    0.04664158     0.10711246  

10  0.11625405     0.68315814     0.67537315    0.07261134     0.18897969  

11 -0.13874451    -0.81570765    -0.80983786    0.07348894    -0.22807820  

12  0.11540228     0.67051940     0.66263761    0.05137589     0.15420864  

13  0.16178406     0.94041623     0.93806144    0.05219432     0.22013204  

14  0.06210727     0.36957277     0.36282531    0.09557411     0.11794546  

15 -0.13650951    -0.81026658    -0.80428349    0.09101221    -0.25449541  

16 -0.11097950    -0.64757782    -0.63955524    0.05943308    -0.16076716  

17 -0.03939381    -0.24515626    -0.24029557    0.17309048    -0.10993940  

18 -0.18593575    -1.11438446    -1.12029013    0.10845395    -0.39073410  

19 -0.33609591    -2.01522068  * -2.16439284  * 0.10922236    -0.75789180  *

20 -0.37130271  * -2.14274943  * -2.33258738  * 0.03838430    -0.46603012  

21 -0.02545527    -0.16420856    -0.16084153    0.23042354  * -0.08801075  

22 -0.26374306    -1.57517595    -1.62848498    0.10217431    -0.54936198  

23  0.04349338     0.27187605     0.26656251    0.18041800     0.12506702  

24  0.28060619     1.74627363     1.82969510    0.17309048     0.83711731  *

25  0.06459859     0.38683016     0.37987153    0.10691352     0.13143357  

26  0.21752520     1.29995371     1.31989945    0.10329116     0.44796770  

27  0.16987516     1.03474390     1.03633781    0.13685835     0.41266341  

   cooks_distance s6  COVRATIO s7

   5.108777e-05    1.3656752  

   5.316885e-03    1.1589547  

   1.190200e-02    1.2827036  

   6.827446e-03    1.3771332  

   3.693897e-03    1.1410104  

   1.106753e-01    0.8143179  

   4.068871e-03    1.4806452  *

   1.500251e-04    1.2372586  

   3.950358e-03    1.1560508  

10   1.218047e-02    1.1550557  

11   1.759216e-02    1.1271148  

12   8.116460e-03    1.1316638  

13   1.623390e-02    1.0710597  

14   4.811117e-03    1.2349272  

15   2.191171e-02    1.1501502  

16   8.832858e-03    1.1457602  

17   4.193532e-03    1.3637206  

18   5.035591e-02    1.0866343  

19   1.659840e-01    0.7313914  

20   6.109050e-02    0.6248838  

21   2.691197e-03    1.4714103  

22   9.412101e-02    0.9121786  

23   5.423856e-03    1.3735324  

24   2.127740e-01  * 0.9139942  

25   5.971157e-03    1.2485557  

26   6.488529e-02    1.0178195  

27   5.658922e-02    1.1479080  

为什么两种方法检测结果不一样呢...不继续了


Ex6.5

> cement<-data.frame(X1=c( 7,  1, 11, 11,  7, 11,  3,  1,  2, 21,  1, 11, 10),X2=c(26, 29, 56, 31, 52, 55, 71, 31, 54, 47, 40, 66, 68),X3=c( 6, 15,  8,  8,  6,  9, 17, 22, 18,  4, 23,  9,  8),X4=c(60, 52, 20, 47, 33, 22,  6, 44, 22, 26, 34, 12, 12),Y =c(78.5, 74.3, 104.3,  87.6,  95.9, 109.2, 102.7, 72.5, 93.1,115.9,  83.8, 113.3, 109.4))

> xx<-cor(cement[1:4])

> kappa(xx,exact=T)

[1] 1376.881

大于1000,认为有严重的多重共线性。

> eigen(xx)

$values

[1] 2.235704035 1.576066070 0.186606149 0.001623746


$vectors

           [,1]       [,2]       [,3]      [,4]

[1,] -0.4759552  0.5089794  0.6755002 0.2410522

[2,] -0.5638702 -0.4139315 -0.3144204 0.6417561

[3,]  0.3940665 -0.6049691  0.6376911 0.2684661

[4,]  0.5479312  0.4512351 -0.1954210 0.6767340

即2.235704035X1+1.576066070X2+0.186606149X3+0.001623746X4=0

可以忽略X4项,可以看出X1,X2,X3存在共线性。

删去X3和X4后:

> xx<-cor(cement[1:2])

> kappa(xx,exact=T)

[1] 1.59262

共线性消失了。

如果去掉X3呢?

> cement1<-cbind(cement$X1,cement$X2,cement$X4)

> xx<-cor(cement1)

> kappa(xx,exact=T)

[1] 77.25113

如果去掉X4呢?

> xx<-cor(cement[1:3])

> kappa(xx,exact=T)

[1] 11.12112

看起来去掉X3和X4是合理的。


Ex6.6

>infection<-data.frame(x1<-c(0,1,0,0,0,1,1,1),x2<-c(0,0,1,0,1,0,1,1),x3<-c(0,0,0,1,1,1,0,1),success<-c(0,0,23,8,28,0,11,1),fail<-c(9,0,3,32,30,2,87,17))

> infection$Ymat<-cbind(infection$success,infection$fail)

> glm.sol<-glm(Ymat~x1+x2+x3,family=binomial,data=infection)

> summary(glm.sol)


Call:

glm(formula = Ymat ~ x1 + x2 + x3, family = binomial, data = infection)


Deviance Residuals:

                                                       

-2.56229   0.00000   1.49623   1.21563  -0.78520  -0.15231  -0.07162   0.26470 


Coefficients:

            Estimate Std. Error z value Pr(>|z|)   

(Intercept)  -0.8207     0.4947  -1.659   0.0971 . 

x1           -3.2544     0.4813  -6.761 1.37e-11 ***

x2            2.0299     0.4553   4.459 8.25e-06 ***

x3           -1.0720     0.4254  -2.520   0.0117 * 

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


(Dispersion parameter for binomial family taken to be 1)


    Null deviance: 83.491  on 6  degrees of freedom

Residual deviance: 10.997  on 3  degrees of freedom

AIC: 36.178


Number of Fisher Scoring iterations: 5

回归模型为

P=exp(-0.8207-3.2544x1+2.0299x2-1.0720x3)/(1+exp(-0.8207-3.2544x1+2.0299x2-1.0720x3))


Ex6.7

(1)

> x<-c(rep(0:6,rep(2,7)))

> y<-c(508.1,498.4,568.2,577.3,651.7,657.0,713.4,697.5,755.3,758.9,787.6,792.1,841.4,831.8)

> lm.sol<-lm(y~1+x)

> summary(lm.sol)


Call:

lm(formula = y ~ 1 + x)


Residuals:

    Min      1Q  Median      3Q     Max

-25.400 -11.484  -3.779  14.629  24.921


Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  523.800      8.474   61.81  < 2e-16 ***

            54.893      2.350   23.36 2.26e-11 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 17.59 on 12 degrees of freedom

Multiple R-squared: 0.9785,     Adjusted R-squared: 0.9767

F-statistic: 545.5 on 1 and 12 DF,  p-value: 2.265e-11

线性回归模型为y=523.800+54.893x,通过t检验和F检验。

(2)

> lm.sol<-lm(y~1+x+I(x^2));summary(lm.sol)


Call:

lm(formula = y ~ 1 + x + I(x^2))


Residuals:

     Min       1Q   Median       3Q      Max

-10.6643  -5.6622  -0.4655   5.5000  10.6679


Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept) 502.5560     4.8500 103.619  < 2e-16 ***

           80.3857     3.7861  21.232 2.81e-10 ***

I(x^2)       -4.2488     0.6063  -7.008 2.25e-05 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 7.858 on 11 degrees of freedom

Multiple R-squared: 0.9961,     Adjusted R-squared: 0.9953

F-statistic:  1391 on 2 and 11 DF,  p-value: 5.948e-14

多项式回归模型为:

y=502.5560+80.3857x-4.2488x^2,通过t检验和F检验。

(3)作散点图和拟合曲线:

> plot(x,y)

> xfit<-seq(0,6,0.01)

> yfit<-predict(lm.sol,data.frame(x=xfit))

> lines(xfit,yfit)



Ex6.8

读入数据:

> cancer<-read.table("data",header=T)

> cancer

   x1 x2 x3 x4 x5 y

70 64  1 1

60 63  1 0

70 65 11  1 0

40 69 10  1 0

40 63 58  1 0

70 48  1 0

70 48 11  1 0

80 63  1 0

60 63 14  1 0

10 30 53  1 0

11 80 43 12  1 0

12 40 55  1 0

13 60 66 25  1 1

14 40 67 23  1 0

15 20 61 19  1 0

16 50 63  1 0

17 50 66 16  1 0

18 40 68 12  1 0

19 80 41 12  1 1

20 70 53  1 1

21 60 37 13  1 0

22 90 54 12  0 1

23 50 52  0 1

24 70 50  0 1

25 20 65 21  0 0

26 80 52 28  0 1

27 60 70 13  0 0

28 50 40 13  0 0

29 70 36 22  0 0

30 40 44 36  0 0

31 30 54  0 0

32 30 59 87  0 0

33 40 69  0 0

34 60 50 22  0 0

35 80 62  0 0

36 70 68 15  0 0

37 30 39  0 0

38 60 49 11  0 0

39 80 64 10  0 1

40 70 67 18  0 1

> glm.sol<-glm(y~x1+x2+x3+x4+x5,family=binomial,data=cancer);summary(glm.sol)


Call:

glm(formula = y ~ x1 + x2 + x3 + x4 + x5, family = binomial,

    data = cancer)


Deviance Residuals:

     Min        1Q    Median        3Q       Max 

-1.71500  -0.66725  -0.22254   0.09936   2.23936 


Coefficients:

            Estimate Std. Error z value Pr(>|z|) 

(Intercept) -7.01140    4.47534  -1.567   0.1172 

x1           0.09994    0.04304   2.322   0.0202 *

x2           0.01415    0.04697   0.301   0.7631 

x3           0.01749    0.05458   0.320   0.7486 

x4          -1.08297    0.58721  -1.844   0.0651 .

x5          -0.61309    0.96066  -0.638   0.5233 

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


(Dispersion parameter for binomial family taken to be 1)


    Null deviance: 44.987  on 39  degrees of freedom

Residual deviance: 28.392  on 34  degrees of freedom

AIC: 40.392


Number of Fisher Scoring iterations: 6

有的系数并不显著。

下面做逐步回归:

> glm.new<-step(glm.sol)

Start:  AIC=40.39

y ~ x1 + x2 + x3 + x4 + x5


       Df Deviance    AIC

- x3     28.484 38.484

- x2     28.484 38.484

- x5     28.799 38.799

<none>      28.392 40.392

- x4     32.642 42.642

- x1     38.306 48.306


Step:  AIC=38.48

y ~ x1 + x2 + x4 + x5


       Df Deviance    AIC

- x2     28.564 36.564

- x5     28.993 36.993

<none>      28.484 38.484

- x4     32.705 40.705

- x1     38.478 46.478


Step:  AIC=36.56

y ~ x1 + x4 + x5


       Df Deviance    AIC

- x5     29.073 35.073

<none>      28.564 36.564

- x4     32.892 38.892

- x1     38.478 44.478


Step:  AIC=35.07

y ~ x1 + x4


       Df Deviance    AIC

<none>      29.073 35.073

- x4     33.535 37.535

- x1     39.131 43.131

只留下x1和x4两个变量。

> summary(glm.new)


Call:

glm(formula = y ~ x1 + x4, family = binomial, data = cancer)


Deviance Residuals:

    Min       1Q   Median       3Q      Max 

-1.4825  -0.6617  -0.1877   0.1227   2.2844 


Coefficients:

            Estimate Std. Error z value Pr(>|z|) 

(Intercept) -6.13755    2.73844  -2.241   0.0250 *

x1           0.09759    0.04079   2.393   0.0167 *

x4          -1.12524    0.60239  -1.868   0.0618 .

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


(Dispersion parameter for binomial family taken to be 1)


    Null deviance: 44.987  on 39  degrees of freedom

Residual deviance: 29.073  on 37  degrees of freedom

AIC: 35.073


Number of Fisher Scoring iterations: 6

回归方程为

P=exp(-6.13755+0.09759x1-1.12524x4)/(1+exp(-6.13755+0.09759x1-1.12524x4))

概率估计略。


Ex6.9

我表示不想做了...

我没弄明白nls()函数里所说的start即初始值怎么设置。好像可以随便设置,只要保证函数收敛即可??










转载地址:https://blog.csdn.net/sjpljr/article/details/70168875 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:统计建模与R软件第七章习题…
下一篇:统计建模与R软件第五章习题…

发表评论

最新留言

做的很好,不错不错
[***.243.131.199]2024年04月20日 14时15分18秒