Skip to content

Commit 19a0121

Browse files
Fix docs / website build (#1064)
* Update links * tweak docs * Doc tweaks * Fix docs... * Always build docs * Fix last doc issues... * Docs build again! * Drop if from docs build * update paths
1 parent d470d48 commit 19a0121

File tree

23 files changed

+128
-63
lines changed

23 files changed

+128
-63
lines changed

.github/workflows/ci.yml

Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -204,19 +204,11 @@ jobs:
204204
- uses: actions/checkout@v4
205205
with:
206206
fetch-depth: 0
207-
- name: Get changed files
208-
id: documentation-changed
209-
uses: tj-actions/changed-files@v42
210-
with:
211-
files: |
212-
docs/**
213-
- if: (steps.documentation-changed.outputs.any_changed == 'true')
214-
run: python -m pip install --user matplotlib
207+
- run: python -m pip install --user matplotlib
215208
- uses: julia-actions/setup-julia@v1
216209
with:
217210
version: "1"
218211
- name: Build homepage
219-
if: (steps.documentation-changed.outputs.any_changed == 'true')
220212
run: |
221213
cd docs/homepage
222214
julia --project --color=yes -e '
@@ -225,7 +217,6 @@ jobs:
225217
using Franklin;
226218
optimize()' > build.log
227219
- name: Make sure homepage is generated without error
228-
if: (steps.documentation-changed.outputs.any_changed == 'true')
229220
run: |
230221
if grep -1 "Franklin Warning" build.log; then
231222
echo "Franklin reported a warning"
@@ -234,12 +225,16 @@ jobs:
234225
echo "Franklin did not report a warning"
235226
fi
236227
- name: Build docs
237-
if: (steps.documentation-changed.outputs.any_changed == 'true')
238228
run: |
239229
cd docs
240230
julia --project --color=yes -e '
241-
using Pkg; Pkg.instantiate();
242-
include("make.jl")'
231+
using Pkg; Pkg.instantiate()
232+
Pkg.develop(path="../src/ReinforcementLearningBase")
233+
Pkg.develop(path="../src/ReinforcementLearningCore")
234+
Pkg.develop(path="../src/ReinforcementLearningEnvironments")
235+
Pkg.develop(path="../") # ReinforcementLearning meta-package
236+
Pkg.develop(path="../src/ReinforcementLearningFarm")
237+
include("make.jl")' skiplinks # Temporarily skip broken link checks
243238
mv build homepage/__site/docs
244239
- name: Deploy to the main repo
245240
uses: peaceiris/actions-gh-pages@v3

NEWS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@
7777

7878
- Add `StockTradingEnv` from the paper [Deep Reinforcement Learning for
7979
Automated Stock Trading: An Ensemble
80-
Strategy](https://github.com/AI4Finance-LLC/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020).
80+
Strategy](https://github.com/AI4Finance-Foundation/FinRL-Trading).
8181
This environment is a good testbed for multi-continuous action space
8282
algorithms. [#428](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/428)
8383

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,9 +54,9 @@ The above simple example demonstrates four core components in a general
5454
reinforcement learning experiment:
5555

5656
- **Policy**. The
57-
[`RandomPolicy`](https://juliareinforcementlearning.org/docs/rlcore/#ReinforcementLearningCore.RandomPolicy)
57+
[`RandomPolicy`](https://juliareinforcementlearning.github.io/docs/rlcore/#ReinforcementLearningCore.RandomPolicy)
5858
is the simplest instance of
59-
[`AbstractPolicy`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.AbstractPolicy).
59+
[`AbstractPolicy`](https://juliareinforcementlearning.github.io/docs/rlbase/#ReinforcementLearningBase.AbstractPolicy).
6060
It generates a random action at each step.
6161

6262
- **Environment**. The

docs/homepage/blog/an_introduction_to_reinforcement_learning_jl_design_implementations_thoughts/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Although most existing reinforcement learning related packages are written in Py
6262
Many existing packages inspired the development of ReinforcementLearning.jl a lot. Following are some important ones.
6363

6464
- [Dopamine](https://google.github.io/dopamine/)\dcite{dayan2009dopamine} provides a clear implementation of the **Rainbow**\dcite{hessel2018rainbow} algorithm. The [gin](https://github.com/google/gin-config) config file template and the concise workflow is the origin of the `Experiment` in ReinforcementLearning.jl.
65-
- [OpenSpiel](https://github.com/deepmind/open_spiel)\dcite{LanctotEtAl2019OpenSpiel} provides a lot of useful functions to describe many different kinds of games. They are turned into traits in our package.
65+
- [OpenSpiel](https://github.com/google-deepmind/open_spiel)\dcite{LanctotEtAl2019OpenSpiel} provides a lot of useful functions to describe many different kinds of games. They are turned into traits in our package.
6666
- [Ray/rllib](https://docs.ray.io/en/master/rllib.html)\dcite{liang2017ray} has many nice abstraction layers in the policy part. We also borrowed the definition of environments here. This is explained with details in section 2.
6767
- [rlpyt](https://github.com/astooke/rlpyt)\dcite{stooke2019rlpyt} has a nice code structure and we borrowed some implementations of policy gradient algorithms from it.
6868
- [Acme](https://github.com/deepmind/acme)\dcite{hoffman2020acme} offers a framework for distributed reinforcement learning.

docs/make.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ makedocs(
2222
ReinforcementLearning,
2323
ReinforcementLearningBase,
2424
ReinforcementLearningCore,
25-
ReinforcementLearningEnvironments,
25+
ReinforcementLearningEnvironments
2626
],
2727
format = Documenter.HTML(
2828
prettyurls = true,

docs/src/How_to_implement_a_new_algorithm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ end
4545
Implementing a new algorithm mainly consists of creating your own `AbstractPolicy` (or `AbstractLearner`, see [this section](#using-resources-from-rlcore)) subtype, its action sampling method (by overloading `Base.push!(policy::YourPolicyType, env)`) and implementing its behavior at each stage. However, ReinforcemementLearning.jl provides plenty of pre-implemented utilities that you should use to 1) have less code to write 2) lower the chances of bugs and 3) make your code more understandable and maintainable (if you intend to contribute your algorithm).
4646

4747
## Using Agents
48-
The recommended way is to use the policy wrapper `Agent`. An agent is itself an `AbstractPolicy` that wraps a policy and a trajectory (also called Experience Replay Buffer in reinforcement learning literature). Agent comes with default implementations of `push!(agent, stage, env)` and `plan!(agent, env)` that will probably fit what you need at most stages so that you don't have to write them again. Looking at the [source code](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/main/src/ReinforcementLearningCore/src/policies/agent.jl/), we can see that the default Agent calls are
48+
The recommended way is to use the policy wrapper `Agent`. An agent is itself an `AbstractPolicy` that wraps a policy and a trajectory (also called Experience Replay Buffer in reinforcement learning literature). Agent comes with default implementations of `push!(agent, stage, env)` and `plan!(agent, env)` that will probably fit what you need at most stages so that you don't have to write them again. Looking at the [source code](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/main/src/ReinforcementLearningCore/src/policies/agent/agent_base.jl), we can see that the default Agent calls are
4949

5050
```julia
5151
function Base.push!(agent::Agent, ::PreEpisodeStage, env::AbstractEnv)

docs/src/How_to_write_a_customized_environment.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ write many different kinds of environments based on interfaces defined in
66
[ReinforcementLearningBase.jl](@ref).
77

88
The most commonly used interfaces to describe reinforcement learning tasks is
9-
[OpenAI/Gym](https://gym.openai.com/). Inspired by it, we expand those
9+
[OpenAI/Gym](https://gymnasium.farama.org). Inspired by it, we expand those
1010
interfaces a little to utilize multiple-dispatch in Julia and to cover
1111
multi-agent environments.
1212

@@ -30,7 +30,7 @@ act!(env::YourEnv, action)
3030
## An Example: The LotteryEnv
3131

3232
Here we use an example introduced in [Monte Carlo Tree Search: A
33-
Tutorial](https://www.informs-sim.org/wsc18papers/includes/files/021.pdf) to
33+
Tutorial](https://ieeexplore.ieee.org/document/8632344) to
3434
demonstrate how to write a simple environment.
3535

3636
The game is defined like this: assume you have \$10 in your pocket, and you are
@@ -168,7 +168,7 @@ policy we defined above. A [`QBasedPolicy`](@ref)
168168
contains two parts: a `learner` and an `explorer`. The `learner` *learn* the
169169
state-action value function (aka *Q* function) during interactions with the
170170
`env`. The `explorer` is used to select an action based on the Q value returned
171-
by the `learner`. Inside of the [`MonteCarloLearner`](@ref), a
171+
by the `learner`. Inside of the [`TDLearner`](@ref), a
172172
[`TabularQApproximator`](@ref) is used to estimate the Q value.
173173

174174
That's the problem! A [`TabularQApproximator`](@ref) only accepts states of type `Int`.
@@ -304,11 +304,7 @@ legal_action_space_mask(ttt)
304304
```
305305

306306
For some simple environments, we can simply use a `Tuple` or a `Vector` to
307-
describe the action space. A special space type [`Space`](@ref) is also provided
308-
as a meta space to hold the composition of different kinds of sub-spaces. For
309-
example, we can use `Space(((1:3),(true,false)))` to describe the environment
310-
with two kinds of actions, an integer between `1` and `3`, and a boolean.
311-
Sometimes, the action space is not easy to be described by some built in data
307+
describe the action space. Sometimes, the action space is not easy to be described by some built in data
312308
structures. In that case, you can defined a customized one with the following
313309
interfaces implemented:
314310

@@ -370,7 +366,7 @@ to the perspective from the `current_player(env)`.
370366

371367
In multi-agent environments, sometimes the sum of rewards from all players are
372368
always `0`. We call the [`UtilityStyle`](@ref) of these environments [`ZeroSum`](@ref).
373-
`ZeroSum` is a special case of [`ConstantSum`](@ref). In cooperational games, the reward
369+
`ZeroSum` is a special case of [`ConstantSum`](@ref). In cooperative games, the reward
374370
of each player are the same. In this case, they are called [`IdenticalUtility`](@ref).
375371
Other cases fall back to [`GeneralSum`](@ref).
376372

@@ -403,7 +399,7 @@ each action, then we call the [`ChanceStyle`](@ref) of these environments are of
403399
default return value. One special case is that,
404400
in [Extensive Form Games](https://en.wikipedia.org/wiki/Extensive-form_game), a
405401
chance node is involved. And the action probability of this special player is
406-
determined. We define the `ChanceStyle` of these environments as [`EXPLICIT_STOCHASTIC`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.EXPLICIT_STOCHASTIC).
402+
determined. We define the `ChanceStyle` of these environments as [`EXPLICIT_STOCHASTIC`](@ref).
407403
For these environments, we need to have the following methods defined:
408404

409405
```@repl customized_env

docs/src/rlbase.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22

33
```@autodocs
44
Modules = [ReinforcementLearningBase]
5-
```
5+
```

docs/src/rlcore.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ In addition to containing the [run loop](./How_to_implement_a_new_algorithm.md),
88

99
## QBasedPolicy
1010

11-
`QBasedPolicy` is an `AbstractPolicy` that wraps a Q-Value _learner_ (tabular or approximated) and an _explorer_. Use this wrapper to implement a policy that directly uses a Q-value function to
12-
decide its next action. In that case, instead of creating an `AbstractPolicy` subtype for your algorithm, define an `AbstractLearner` subtype and specialize `RLBase.optimise!(::YourLearnerType, ::Stage, ::Trajectory)`. This way you will not have to code the interaction between your policy and the explorer yourself.
11+
[`QBasedPolicy`](@ref) is an [`AbstractPolicy`](@ref) that wraps a Q-Value _learner_ (tabular or approximated) and an _explorer_. Use this wrapper to implement a policy that directly uses a Q-value function to
12+
decide its next action. In that case, instead of creating an [`AbstractPolicy`](@ref) subtype for your algorithm, define an [`AbstractLearner`](@ref) subtype and specialize `RLBase.optimise!(::YourLearnerType, ::Stage, ::Trajectory)`. This way you will not have to code the interaction between your policy and the explorer yourself.
1313
RLCore provides the most common explorers (such as epsilon-greedy, UCB, etc.). You can find many examples of QBasedPolicies in the DQNs section of RLZoo.
1414

1515
## Parametric approximators
@@ -29,4 +29,4 @@ The other advantage of `TargetNetwork` is that it uses Julia's multiple dispatch
2929

3030
## Architectures
3131

32-
Common model architectures are also provided such as the `GaussianNetwork` for continuous policies with diagonal multivariate policies; and `CovGaussianNetwork` for full covariance (very slow on GPUs at the moment).
32+
Common model architectures are also provided such as the `GaussianNetwork` for continuous policies with diagonal multivariate policies; and `CovGaussianNetwork` for full covariance (very slow on GPUs at the moment).

docs/src/rlenvs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
</ol>
4444
```
4545

46-
**Note**: Many traits are *borrowed* from [OpenSpiel](https://github.com/deepmind/open_spiel).
46+
**Note**: Many traits are *borrowed* from [OpenSpiel](https://github.com/google-deepmind/open_spiel).
4747

4848
## 3-rd Party Environments
4949

docs/src/tutorial.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,5 +112,5 @@ agent = Agent(
112112
run(agent, env, StopAfterNEpisodes(10), TotalRewardPerEpisode())
113113
```
114114

115-
Here the [`Trajectory`](@ref) is used to store the **S**tate,
115+
Here the `Trajectory` is used to store the **S**tate,
116116
**A**ction, **R**eward, is_**T**erminated info during interactions with the environment.

src/ReinforcementLearningBase/src/CommonRLInterface.jl

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,12 @@ ActionStyle(env::RLBaseEnv) =
9494
CRL.provided(CRL.valid_actions, env.env) ? FullActionSet() : MinimalActionSet()
9595

9696
current_player(env::RLBaseEnv) = CRL.player(env.env)
97+
98+
"""
99+
players(env::RLBaseEnv)
100+
101+
Players in the game. This is a no-op for single-player games. `MultiAgent` games should implement this method.
102+
"""
97103
players(env::RLBaseEnv) = CRL.players(env.env)
98104

99105
#

src/ReinforcementLearningBase/src/ReinforcementLearningBase.jl

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,20 @@
11
module ReinforcementLearningBase
22

3-
const RLBase = ReinforcementLearningBase
43
export RLBase
54

5+
"""
6+
[ReinforcementLearningBase.jl](@ref)
7+
(**RLBase**) provides common constants, traits, abstractions and interfaces
8+
in developing reinforcement learning algorithms in Julia.
9+
10+
Foundational types and utilities for two main concepts of reinforcement learning are provided:
11+
12+
- [`AbstractPolicy`](@ref)
13+
- [`AbstractEnv`](@ref)
14+
"""
15+
16+
const RLBase = ReinforcementLearningBase
17+
618
using Random
719
using Reexport
820

0 commit comments

Comments
 (0)