Shogi
 
 
Usage
or you can directly load Shogi class
Description
TBA
Specs
| Name | Value | 
|---|---|
| Version | v1 | 
| Number of players | 2 | 
| Number of actions | 2187 | 
| Observation shape | (9, 9, 119) | 
| Observation type | bool | 
| Rewards | {-1, 0, 1} | 
Observation
We follow the observation design of dlshogi, an open-source shogi AI. 
Ther original dlshogi implementations are here.
Pgx implementation has [9, 9, 119] shape and [:, :, x] denotes:
| x | Description | 
|---|---|
| 0:14 | Where my piece xexists | 
| 14:28 | Where my pieces xare attacking | 
| 28:31 | Where the number of my attacking pieces are >= 1,2,3respectively | 
| 31:45 | Where opponent's piece xexists | 
| 45:59 | Where opponent's pieces xare attacking | 
| 59:62 | Where the number of opponent's attacking pieces are >= 1,2,3respectively | 
The following planes are all ones ore zeros
| x | Description | 
|---|---|
| 62:70 | My hand has >= 1, ..., 8Pawn | 
| 70:74 | My hand has >= 1, 2, 3, 4Lance | 
| 74:78 | My hand has >= 1, 2, 3, 4Knight | 
| 78:82 | My hand has >= 1, 2, 3, 4Silver | 
| 82:86 | My hand has >= 1, 2, 3, 4Gold | 
| 86:88 | My hand has >= 1, 2Bishop | 
| 88:90 | My hand has >= 1, 2Rook | 
| 90:98 | Oppnent's hand has >= 1, ..., 8Pawn | 
| 98:102 | Oppnent's hand has >= 1, 2, 3, 4Lance | 
| 102:106 | Oppnent's hand has >= 1, 2, 3, 4Knight | 
| 106:110 | Oppnent's hand has >= 1, 2, 3, 4Silver | 
| 110:114 | Oppnent's hand has >= 1, 2, 3, 4Gold | 
| 114:116 | Oppnent's hand has >= 1, 2Bishop | 
| 116:118 | Oppnent's hand has >= 1, 2Rook | 
| 118 | Ones if checked | 
Note that piece ids are
| Piece | Id | 
|---|---|
| 歩 PAWN | 0 | 
| 香 LANCE | 1 | 
| 桂 KNIGHT | 2 | 
| 銀 SILVER | 3 | 
| 角 BISHOP | 4 | 
| 飛 ROOK | 5 | 
| 金 GOLD | 6 | 
| 玉 KING | 7 | 
| と PRO_PAWN | 8 | 
| 成香 PRO_LANCE | 9 | 
| 成桂 PRO_KNIGHT | 10 | 
| 成銀 PRO_SILVER | 11 | 
| 馬 HORSE | 12 | 
| 龍 DRAGON | 13 | 
Action
The design of action also follows that of dlshogi.
There are 2187 = 81 x 27 distinct actions.
The action can be decomposed into 
- directionfrom which the piece moves and
- destinationto which the piece moves
by direction, destination = action // 81, action % 81.
The direction is encoded by
| id | direction | 
|---|---|
| 0 | Up | 
| 1 | Up left | 
| 2 | Up right | 
| 3 | Left | 
| 4 | Right | 
| 5 | Down | 
| 6 | Down left | 
| 7 | Down right | 
| 8 | Up2 left | 
| 9 | Up2 right | 
| 10 | Promote + Up | 
| 11 | Promote + Up left | 
| 12 | Promote + Up right | 
| 13 | Promote + Left | 
| 14 | Promote + Right | 
| 15 | Promote + Down | 
| 16 | Promote + Down left | 
| 17 | Promote + Down right | 
| 18 | Promote + Up2 left | 
| 19 | Promote + Up2 right | 
| 20 | Drop Pawn | 
| 21 | Drop Lance | 
| 22 | Drop Knight | 
| 23 | Drop Silver | 
| 24 | Drop Bishop | 
| 25 | Drop Rook | 
| 26 | Drop Gold | 
Rewards
Non-zero rewards are given only at the terminal states. The reward at terminal state is described in this table:
| Reward | |
|---|---|
| Win | +1 | 
| Lose | -1 | 
| Draw | 0 | 
Termination
Termination occurs when
- either player checkmates the opponent, or
- 512steps are elapsed (from AlphaZero- [Silver+18])
Fourfold repetition is not implemented in v0.
Version History
- v1: Bug fix in current player by @KazukiOta in #1298 (v2.6.0)
- v0: Initial release (v1.0.0)