[]不同机器学习模型的决策边界(附代码)( 二 )


(plt1)/(plt2 + plt3)
[]不同机器学习模型的决策边界(附代码)
本文插图

或者 , 我们可以将绘图重新布置为所需的任何方式 , 并通过以下方式进行绘图:
(plt1 + plt2) / (plt5 + plt6)
[]不同机器学习模型的决策边界(附代码)
本文插图

我觉得这看起来不错 。
目标
我的目标是建立一种分类算法 , 以区分这两个植物种类 , 然后计算决策边界 , 以便更好地了解模型如何做出此类预测 。 为了为每个变量组合创建决策边界图 , 我们需要数据中变量的不同组合 。
var_combos %filter(!Var1 == Var2)var_combos %>%head() %>%kable(caption = ''Variable Combinations'', escape = F,, digits = 2) %>%kable_styling(bootstrap_options = c(''striped'', ''hover'', ''condensed'', ''responsive''), font_size = 9, fixed_thead = T, full_width = F) %>%scroll_box(width = ''100%'', height = ''200px'')
[]不同机器学习模型的决策边界(附代码)
本文插图

接下来 , 我将用到以上不同的变量组合来创建列表(每个组合一个列表) , 并用合成数据(或每个变量组合的最小值到最大值的数据)给列表赋值 。 这将作为我们的合成测试数据 , 对其进行预测并建立决策边界 。
需要注意的是这些图最终将是二维的 , 因此我们仅在两个变量上训练机器学习模型 , 但是对于这两个变量的每种组合而言 , 它们将是取boundary_lists data frame中的前两个变量 。
boundary_lists %summarise(minX = min(.[[1]], na.rm = TRUE),maxX = max(.[[1]], na.rm = TRUE),minY = min(.[[2]], na.rm = TRUE),maxY = max(.[[2]], na.rm = TRUE))) %>%map(.,~tibble(x = seq(.x$minX, .x$maxX, length.out = 200),y = seq(.x$minY, .x$maxY, length.out = 200),)) %>%map(.,~tibble(xx = rep(.x$x, each = 200),yy = rep(.x$y, time = 200))) %>%map2(.,asplit(var_combos, 1), ~ .x %>%set_names(.y))我们可以看到前两个列表的前四个观察结果如何:
boundary_lists %>%map(., ~head(., 4)) %>%head(2)## [[1]]## # A tibble: 4 x 2##Sepal.Width Sepal.Length## ## 124.3## 224.31## 324.33## 424.34#### [[2]]## # A tibble: 4 x 2##Petal.Length Sepal.Length## ## 114.3## 214.31## 314.33## 414.34boundary_lists %>%map(., ~head(., 4)) %>%tail(2)## [[1]]## # A tibble: 4 x 2##Sepal.Width Petal.Width## ## 120.1## 220.109## 320.117## 420.126#### [[2]]## # A tibble: 4 x 2##Petal.Length Petal.Width## ## 110.1## 210.109## 310.117## 410.126训练时间
现在 , 我们已经建立了测试用模拟数据 , 我想根据实际观察到的观测值训练模型 。 我将使用到上面图中的每个数据点训练以下模型:

  • 逻辑回归模型
  • 支持向量机+线性核
  • 支持向量机+多项式核
  • 支持向量机 +径向核
  • 支持向量机+sigmoid核
  • 随机森林
  • 默认参数下的XGBoost模型
  • 单层Keras神经网络(带有线性组成)
  • 更深层的Keras神经网络(带有线性组成)
  • 更深一层的Keras神经网络(带有线性组成)
  • 默认参数下的LightGBM模型
旁注:我不是深度学习/ Keras / Tensorflow方面的专家 , 所以我相信有更好的模型产生更好的决策边界 , 但是用purrr、map来训练不同的机器学习模型是件很有趣的事 。
####################################################################################################################################################################### params_lightGBM %mutate(modeln = str_c('mod', row_number()))%>%pmap(~{xname = ..1yname = ..2modelname = ..3df %>%select(Species, xname, yname) %>%group_by(grp = 'grp') %>%nest() %>%mutate(models = map(data, ~{list(# Logistic ModelModel_GLM = { glm(Species ~ ., data = http://news.hoteastday.com/a/.x, family = binomial(link='logit'))},# Support Vector Machine (linear)Model_SVM_Linear = {e1071::svm(Species ~ ., data = http://news.hoteastday.com/a/.x,type ='C-classification', kernel = 'linear')},# Support Vector Machine (polynomial)Model_SVM_Polynomial = {e1071::svm(Species ~ ., data = http://news.hoteastday.com/a/.x,type ='C-classification', kernel = 'polynomial')},# Support Vector Machine (sigmoid)Model_SVM_radial = {e1071::svm(Species ~ ., data = http://news.hoteastday.com/a/.x,type ='C-classification', kernel = 'sigmoid')},# Support Vector Machine (radial)Model_SVM_radial_Sigmoid = {e1071::svm(Species ~ ., data = http://news.hoteastday.com/a/.x,type ='C-classification', kernel = 'radial')},# Random ForestModel_RF = {randomForest::randomForest(formula = as.factor(Species) ~ ., data = http://news.hoteastday.com/a/.)},# Extreme Gradient BoostingModel_XGB = {xgboost(objective ='binary:logistic',eval_metric = 'auc',data = http://news.hoteastday.com/a/as.matrix(.x[, 2:3]),label = as.matrix(.x$Species), # binary variablenrounds = 10)},# Kera Neural NetworkModel_Keras = {mod %layer_dense(units = 2, activation ='relu', input_shape = 2) %>%layer_dense(units = 2, activation = 'sigmoid')mod %>% compile(loss = 'binary_crossentropy',optimizer_sgd(lr = 0.01, momentum = 0.9),metrics = c('accuracy'))fit(mod,x = as.matrix(.x[, 2:3]),y = to_categorical(.x$Species, 2),epochs = 5,batch_size = 5,validation_split = 0)print(modelname)assign(modelname, mod)},# Kera Neural NetworkModel_Keras_2 = {mod %layer_dense(units = 2, activation = 'relu', input_shape = 2) %>%layer_dense(units = 2, activation = 'linear', input_shape = 2) %>%layer_dense(units = 2, activation = 'sigmoid')mod %>% compile(loss = 'binary_crossentropy',optimizer_sgd(lr = 0.01, momentum = 0.9),metrics = c('accuracy'))fit(mod,x = as.matrix(.x[, 2:3]),y = to_categorical(.x$Species, 2),epochs = 5,batch_size = 5,validation_split = 0)print(modelname)assign(modelname, mod)},# Kera Neural NetworkModel_Keras_3 = {mod %layer_dense(units = 2, activation = 'relu', input_shape = 2) %>%layer_dense(units = 2, activation = 'relu', input_shape = 2) %>%layer_dense(units = 2, activation = 'linear', input_shape = 2) %>%layer_dense(units = 2, activation = 'sigmoid')mod %>% compile(loss = 'binary_crossentropy',optimizer_sgd(lr = 0.01, momentum = 0.9),metrics = c('accuracy'))fit(mod,x = as.matrix(.x[, 2:3]),y = to_categorical(.x$Species, 2),epochs = 5,batch_size = 5,validation_split = 0)print(modelname)assign(modelname, mod)},# LightGBM modelModel_LightGBM = {lgb.train(data = http://news.hoteastday.com/a/lgb.Dataset(data = as.matrix(.x[, 2:3]), label = .x$Species),objective ='binary',metric = 'auc',min_data = http://news.hoteastday.com/a/1#params = params_lightGBM,#learning_rate = 0.1)})}))}) %>%map(., ~unlist(., recursive = FALSE))校准数据


推荐阅读