Shogi
Usage
or you can directly load Shogi
class
Description
TBA
Specs
Name | Value |
---|---|
Version | v0 |
Number of players | 2 |
Number of actions | 2187 |
Observation shape | (9, 9, 119) |
Observation type | bool |
Rewards | {-1, 0, 1} |
Observation
We follow the observation design of dlshogi, an open-source shogi AI.
Ther original dlshogi implementations are here.
Pgx implementation has [9, 9, 119]
shape and [:, :, x]
denotes:
x |
Description |
---|---|
0:14 |
Where my piece x exists |
14:28 |
Where my pieces x are attacking |
28:31 |
Where the number of my attacking pieces are >= 1,2,3 respectively |
31:45 |
Where opponent's piece x exists |
45:59 |
Where opponent's pieces x are attacking |
59:62 |
Where the number of opponent's attacking pieces are >= 1,2,3 respectively |
The following planes are all ones ore zeros
x |
Description |
---|---|
62:70 |
My hand has >= 1, ..., 8 Pawn |
70:74 |
My hand has >= 1, 2, 3, 4 Lance |
74:78 |
My hand has >= 1, 2, 3, 4 Knight |
78:82 |
My hand has >= 1, 2, 3, 4 Silver |
82:86 |
My hand has >= 1, 2, 3, 4 Gold |
86:88 |
My hand has >= 1, 2 Bishop |
88:90 |
My hand has >= 1, 2 Rook |
90:98 |
Oppnent's hand has >= 1, ..., 8 Pawn |
98:102 |
Oppnent's hand has >= 1, 2, 3, 4 Lance |
102:106 |
Oppnent's hand has >= 1, 2, 3, 4 Knight |
106:110 |
Oppnent's hand has >= 1, 2, 3, 4 Silver |
110:114 |
Oppnent's hand has >= 1, 2, 3, 4 Gold |
114:116 |
Oppnent's hand has >= 1, 2 Bishop |
116:118 |
Oppnent's hand has >= 1, 2 Rook |
118 |
Ones if checked |
Note that piece ids are
Piece | Id |
---|---|
歩 PAWN |
0 |
香 LANCE |
1 |
桂 KNIGHT |
2 |
銀 SILVER |
3 |
角 BISHOP |
4 |
飛 ROOK |
5 |
金 GOLD |
6 |
玉 KING |
7 |
と PRO_PAWN |
8 |
成香 PRO_LANCE |
9 |
成桂 PRO_KNIGHT |
10 |
成銀 PRO_SILVER |
11 |
馬 HORSE |
12 |
龍 DRAGON |
13 |
Action
The design of action also follows that of dlshogi.
There are 2187 = 81 x 27
distinct actions.
The action can be decomposed into
direction
from which the piece moves anddestination
to which the piece moves
by direction, destination = action // 81, action % 81
.
The direction
is encoded by
id | direction |
---|---|
0 | Up |
1 | Up left |
2 | Up right |
3 | Left |
4 | Right |
5 | Down |
6 | Down left |
7 | Down right |
8 | Up2 left |
9 | Up2 right |
10 | Promote + Up |
11 | Promote + Up left |
12 | Promote + Up right |
13 | Promote + Left |
14 | Promote + Right |
15 | Promote + Down |
16 | Promote + Down left |
17 | Promote + Down right |
18 | Promote + Up2 left |
19 | Promote + Up2 right |
20 | Drop Pawn |
21 | Drop Lance |
22 | Drop Knight |
23 | Drop Silver |
24 | Drop Bishop |
25 | Drop Rook |
26 | Drop Gold |
Rewards
Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:
Reward | |
---|---|
Win | +1 |
Lose | -1 |
Draw | 0 |
Termination
Termination occurs when
- either player checkmates the opponent, or
512
steps are elapsed (from AlphaZero[Silver+18]
)
Fourfold repetition is not implemented in v0
.
Version History
v0
: Initial release (v1.0.0)