Reinforcement Learning An Introduction_Sutton-增强学习导论文档+代码


压缩包中包括Reinforcement Learning An Introduction英文与中文文档,还包括涉及的课程代码。
资源截图
代码片段和文件信息
#######################################################################
# Copyright (C)                                                       #
# 2016 - 2018 Shangtong Zhang(zhangshangtong.cpp@gmail.com)           #
# 2016 Jan Hakenberg(jan.hakenberg@gmail.com)                         #
# 2016 Tian Jun(tianjun.cpp@gmail.com)                                #
# 2016 Kenta Shimada(hyperkentakun@gmail.com)                         #
# Permission given to modify the code as long as you keep this        #
# declaration at the top                                              #
#######################################################################

import numpy as np
import pickle

BOARD_ROWS = 3
BOARD_COLS = 3
BOARD_SIZE = BOARD_ROWS * BOARD_COLS

class State:
    def __init__(self):
        # the board is represented by an n * n array
        # 1 represents a chessman of the player who moves first
        # -1 represents a chessman of another player
        # 0 represents an empty position
        self.data = np.zeros((BOARD_ROWS BOARD_COLS))
        self.winner = None
        self.hash_val = None
        self.end = None

    # compute the hash value for one state it‘s unique
    def hash(self):
        if self.hash_val is None:
            self.hash_val = 0
            for i in self.data.reshape(BOARD_ROWS * BOARD_COLS):
                if i == -1:
                    i = 2
                self.hash_val = self.hash_val * 3 + i
        return int(self.hash_val)

    # check whether a player has won the game or it‘s a tie
    def is_end(self):
        if self.end is not None:
            return self.end
        results = []
        # check row
        for i in range(0 BOARD_ROWS):
            results.append(np.sum(self.data[i :]))
        # check columns
        for i in range(0 BOARD_COLS):
            results.append(np.sum(self.data[: i]))

        # check diagonals
        results.append(0)
        for i in range(0 BOARD_ROWS):
            results[-1] += self.data[i i]
        results.append(0)
        for i in range(0 BOARD_ROWS):
            results[-1] += self.data[i BOARD_ROWS - 1 - i]

        for result in results:
            if result == 3:
                self.winner = 1
                self.end = True
                return self.end
            if result == -3:
                self.winner = -1
                self.end = True
                return self.end

        # whether it‘s a tie
        sum = np.sum(np.abs(self.data))
        if sum == BOARD_ROWS * BOARD_COLS:
            self.winner = 0
            self.end = True
            return self.end

        # game is still going on
        self.end = False
        return self.end

    # @symbol: 1 or -1
    # put chessman symbol in position (i j)
    def next_state(self i j symbol):
        new_state = State()
        new_state.data = np.copy(self.data)
        new_state.data[i j] = symbol
        return new_state

    # print the board
    def print(self):
        for i 

 属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----

     文件   12613382  2018-09-29 12:23  Reinforcement Learning-增强学习(文档+代码)Reinforcement Learning:An Introduction.pdf

     文件         40  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-master.gitignore

     文件        148  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-master.travis.yml

     文件      11177  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter01 ic_tac_toe.py

     文件       9048  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter02 en_armed_testbed.py

     文件       3808  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter03grid_world.py

     文件       7391  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter04car_rental.py

     文件       2445  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter04gamblers_problem.py

     文件       3436  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter04grid_world.py

     文件      13065  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter05lackjack.py

     文件       1814  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter05infinite_variance.py

     文件       9355  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter06cliff_walking.py

     文件       4269  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter06maximization_bias.py

     文件       6574  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter06
andom_walk.py

     文件       4018  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter06windy_grid_world.py

     文件       4249  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter07
andom_walk.py

     文件       1627  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter08expectation_vs_sample.py

     文件      23252  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter08maze.py

     文件       4892  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter08 rajectory_sampling.py

     文件      15793  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter09
andom_walk.py

     文件       4262  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter09square_wave.py

     文件       9605  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter10access_control.py

     文件      13681  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter10mountain_car.py

     文件      11839  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter11counterexample.py

     文件      12140  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter12mountain_car.py

     文件       9679  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter12
andom_walk.py

     文件       8012  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterchapter13short_corridor.py

     文件      36003  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterimagesexample_13_1.png

     文件     238133  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterimagesexample_6_2.png

     文件      31488  2018-09-17 16:16  Reinforcement Learning-增强学习(文档+代码)
einforcement-learning-an-introduction-masterimagesexample_8_4.png

............此处省略69个文件信息

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件举报,一经查实,本站将立刻删除。

发表评论

评论列表(条)