自学围棋的AlphaGo Zero,你也可以造一个( 三 )
1def train():
2 criterion = AlphaLoss()
3 dataset = SelfPlayDataset()
4 player, checkpoint = load_player(current_time, loaded_version)
5 optimizer = create_optimizer(player, lr,
6 param=checkpoint['optimizer'])
7 best_player = deepcopy(player)
8 dataloader = DataLoader(dataset, collate_fn=collate_fn, \
9 batch_size=BATCH_SIZE, shuffle=True)
10
11 while True:
12 for batch_idx, (state, move, winner) in enumerate(dataloader):
13
14 ## Evaluate a copy of the current network
15 if total_ite % TRAIN_STEPS == 0:
16 pending_player = deepcopy(player)
17 result = evaluate(pending_player, best_player)
18
19 if result:
20 best_player = pending_player
21
22 example = {
23 'state': state,
24 'winner': winner,
25 'move' : move
26 }
27 optimizer.zero_grad()
28 winner, probas = pending_player.predict(example['state'])
29
30 loss = criterion(winner, example['winner'], \
31 probas, example['move'])
32 loss.backward()
33 optimizer.step()
34
35 ## Fetch new games
36 if total_ite % REFRESH_TICK == 0:
37 last_id = fetch_new_games(collection, dataset, last_id)
训练用的丧失函数表现如下:
1class AlphaLoss(torch.nn.Module):
2 def __init__(self):
3 super(AlphaLoss, self).__init__()
4
5 def forward(self, pred_winner, winner, pred_probas, probas):
6 value_error = (winner - pred_winner) ** 2
7 policy_error = torch.sum((-probas *
8 (1e-6 + pred_probas).log()), 1)
9 total_error = (value_error.view(-1) + policy_error).mean()
10 return total_error
三是评估 (Evaluation) ,看训练过的智能体,比起正在生成数据的智能体,是不是更优良了 (最优良者回到第一步,持续生成数据)。
1def evaluate(player, new_player):
2 results = play(player, opponent=new_player)
3 black_wins = 0
4 white_wins = 0
5
6 for result in results:
7 if result[0] == 1:
8 white_wins += 1
9 elif result[0] == 0:
10 black_wins += 1
11
12 ## Check if the trained player (black) is better than
13 ## the current best player depending on the threshold
14 if black_wins >= EVAL_THRESH * len(results):
15 return True
16 return False
第三部分很主要,要不断选出最优的网络,来不断生成高质量的数据,能力晋升AI的棋艺 。
三个环节周而复始,能力养成壮大的棋手 。
有志于AI围棋的各位,也可以试一试这个PyTorch实现 。
原来摘自量子位,原作 Dylan Djian 。
代码实现传送门:
网页链接
教程原文传送门:
网页链接
AlphaGo Zero论文传送门:
【自学围棋的AlphaGo Zero,你也可以造一个】网页链接
推荐阅读
- 继象棋之后,人机大战为何选中围棋?
- 王者荣耀里有哪些好看的情侣皮肤?
- 我的世界如何制作自动钓鱼机
- 王者荣耀怎么查询以前玩过的区
- 文本文档怎么改格式,怎样修改文本文档的格式
- 冷清秋|11万人在线看董洁炒黄瓜!素颜出镜超美,20年前的冷清秋又回来了
- |张嘉文被美女嫌弃软趴趴的,遭同一个女生拒绝两次,好兄弟真丢人
- 流金岁月|《流金岁月》的女性哲学:各个年龄段都要活得优雅,才最有大智慧
- 樱井孝宏|日本超一线殿堂级声优樱井孝宏塌房,纷纷表示:“唉,又是睡的问题啊!”
- 龙门石窟的标志性建筑,龙门石窟建筑风格-