Skip to content

Commit 062fc09

Browse files
use gather; fix outdated docs
Co-authored-by: Manikya <[email protected]>
1 parent 7175c36 commit 062fc09

File tree

9 files changed

+44
-36
lines changed

9 files changed

+44
-36
lines changed

docs/src/gpu.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ If you define a structured model, like a `Dense` layer or `Chain`, you just need
3030
```julia
3131
d = Dense(10, 5, σ)
3232
d = fmap(cu, d)
33-
d.W # CuArray
33+
d.weight # CuArray
3434
d(cu(rand(10))) # CuArray output
3535

3636
m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)

docs/src/models/advanced.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ by simply deleting it from `ps`:
6868

6969
```julia
7070
ps = params(m)
71-
delete!(ps, m[2].b)
71+
delete!(ps, m[2].bias)
7272
```
7373

7474
## Custom multiple input or output layer

docs/src/models/nnlib.md

+7
Original file line numberDiff line numberDiff line change
@@ -67,3 +67,10 @@ NNlib.batched_mul!
6767
NNlib.batched_adjoint
6868
NNlib.batched_transpose
6969
```
70+
71+
## Gather and Scatter
72+
73+
```@docs
74+
NNlib.gather
75+
NNlib.scatter
76+
```

docs/src/models/overview.md

+19-19
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Here's how you'd use Flux to build and train the most basic of models, step by s
1515

1616
This example will predict the output of the function `4x + 2`. First, import `Flux` and define the function we want to simulate:
1717

18-
```
18+
```julia
1919
julia> using Flux
2020

2121
julia> actual(x) = 4x + 2
@@ -28,7 +28,7 @@ This example will build a model to approximate the `actual` function.
2828

2929
Use the `actual` function to build sets of data for training and verification:
3030

31-
```
31+
```julia
3232
julia> x_train, x_test = hcat(0:5...), hcat(6:10...)
3333
([0 1 4 5], [6 7 9 10])
3434

@@ -42,38 +42,38 @@ Normally, your training and test data come from real world observations, but thi
4242

4343
Now, build a model to make predictions with `1` input and `1` output:
4444

45-
```
45+
```julia
4646
julia> model = Dense(1, 1)
4747
Dense(1, 1)
4848

49-
julia> model.W
50-
1-element Array{Float64,1}:
51-
-0.99009055
49+
julia> model.weight
50+
1×1 Matrix{Float32}:
51+
-1.4925033
5252

53-
julia> model.b
54-
1-element Array{Float64,1}:
53+
julia> model.bias
54+
1-element Vector{Float32}:
5555
0.0
5656
```
5757

58-
Under the hood, a dense layer is a struct with fields `W` and `b`. `W` represents a weight and `b` represents a bias. There's another way to think about a model. In Flux, *models are conceptually predictive functions*:
58+
Under the hood, a dense layer is a struct with fields `weight` and `bias`. `weight` represents a weights' matrix and `bias` represents a bias vector. There's another way to think about a model. In Flux, *models are conceptually predictive functions*:
5959

60-
```
60+
```julia
6161
julia> predict = Dense(1, 1)
6262
```
6363

6464
`Dense(1, 1)` also implements the function `σ(Wx+b)` where `W` and `b` are the weights and biases. `σ` is an activation function (more on activations later). Our model has one weight and one bias, but typical models will have many more. Think of weights and biases as knobs and levers Flux can use to tune predictions. Activation functions are transformations that tailor models to your needs.
6565

6666
This model will already make predictions, though not accurate ones yet:
6767

68-
```
68+
```julia
6969
julia> predict(x_train)
70-
1×6 Array{Float32,2}:
71-
-1.98018 -5.94054 -9.90091 -13.8613 -17.8216 -21.782
70+
1×6 Matrix{Float32}:
71+
0.0 -1.4925 -2.98501 -4.47751 -5.97001 -7.46252
7272
```
7373

7474
In order to make better predictions, you'll need to provide a *loss function* to tell Flux how to objectively *evaluate* the quality of a prediction. Loss functions compute the cumulative distance between actual values and predictions.
7575

76-
```
76+
```julia
7777
julia> loss(x, y) = Flux.Losses.mse(predict(x), y)
7878
loss (generic function with 1 method)
7979

@@ -87,7 +87,7 @@ More accurate predictions will yield a lower loss. You can write your own loss f
8787

8888
Under the hood, the Flux [`train!`](@ref) function uses *a loss function* and *training data* to improve the *parameters* of your model based on a pluggable [`optimiser`](../training/optimisers.md):
8989

90-
```
90+
```julia
9191
julia> using Flux: train!
9292

9393
julia> opt = Descent()
@@ -100,12 +100,12 @@ julia> data = [(x_train, y_train)]
100100

101101
Now, we have the optimiser and data we'll pass to `train!`. All that remains are the parameters of the model. Remember, each model is a Julia struct with a function and configurable parameters. Remember, the dense layer has weights and biases that depend on the dimensions of the inputs and outputs:
102102

103-
```
104-
julia> predict.W
103+
```julia
104+
julia> predict.weight
105105
1-element Array{Float64,1}:
106106
-0.99009055
107107

108-
julia> predict.b
108+
julia> predict.bias
109109
1-element Array{Float64,1}:
110110
0.0
111111
```
@@ -120,7 +120,7 @@ Params([[-0.99009055], [0.0]])
120120
These are the parameters Flux will change, one step at a time, to improve predictions. Each of the parameters comes from the `predict` model:
121121

122122
```
123-
julia> predict.W in parameters, predict.b in parameters
123+
julia> predict.weight in parameters, predict.bias in parameters
124124
(true, true)
125125
126126
```

docs/src/models/regularisation.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,10 @@ m = Dense(10, 5)
1313
loss(x, y) = logitcrossentropy(m(x), y)
1414
```
1515

16-
We can apply L2 regularisation by taking the squared norm of the parameters , `m.W` and `m.b`.
16+
We can apply L2 regularisation by taking the squared norm of the parameters , `m.weight` and `m.bias`.
1717

1818
```julia
19-
penalty() = sum(abs2, m.W) + sum(abs2, m.b)
19+
penalty() = sum(abs2, m.weight) + sum(abs2, m.bias)
2020
loss(x, y) = logitcrossentropy(m(x), y) + penalty()
2121
```
2222

src/layers/basic.jl

+2-1
Original file line numberDiff line numberDiff line change
@@ -475,7 +475,8 @@ function Embedding(in::Integer, out::Integer;
475475
end
476476

477477
(m::Embedding)(x::Union{OneHotVector, OneHotMatrix}) = m.weight * x # equivalent to m.weight[:,onecold(x)]
478-
(m::Embedding)(x::Union{Int,AbstractVector}) = m.weight[:, x]
478+
(m::Embedding)(x::Integer) = m([x])
479+
(m::Embedding)(x::AbstractVector) = NNlib.gather(m.weight, x)
479480
(m::Embedding)(x::AbstractArray) = reshape(m(vec(x)), :, size(x)...)
480481

481482
function Base.show(io::IO, m::Embedding)

src/utils.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This function is mainly used by weight initializers, e.g., [`kaiming_normal`](@r
1515
```jldoctest
1616
julia> layer = Dense(10, 20);
1717
18-
julia> Flux.nfan(size(layer.W))
18+
julia> Flux.nfan(size(layer.weight))
1919
(10, 20)
2020
2121
julia> layer = Conv((3, 3), 2=>10);

test/cuda/layers.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ end
140140

141141
@test sum(l(ip)) 0.f0
142142
gs = gradient(() -> sum(l(ip)), Flux.params(l))
143-
@test l.b gs.params
143+
@test l.bias gs.params
144144
end
145145

146146
@testset "Extended BatchNorm" begin

test/utils.jl

+10-10
Original file line numberDiff line numberDiff line change
@@ -226,19 +226,19 @@ end
226226
m = Chain(Dense(10, 5, relu), Dense(5, 2))
227227
x64 = rand(Float64, 10)
228228
x32 = rand(Float32, 10)
229-
@test eltype(m[1].W) == Float32
229+
@test eltype(m[1].weight) == Float32
230230
@test eltype(m(x32)) == Float32
231231
@test eltype(m(x64)) == Float64
232232
@test eltype(f64(m)(x32)) == Float64
233233
@test eltype(f64(m)(x64)) == Float64
234-
@test eltype(f64(m)[1].W) == Float64
235-
@test eltype(f32(f64(m))[1].W) == Float32
234+
@test eltype(f64(m)[1].weight) == Float64
235+
@test eltype(f32(f64(m))[1].weight) == Float32
236236
end
237237

238238
@testset "Zeros" begin
239239
m = Dense(3,2; bias=false)
240-
@test f64(m).b === m.b === Zeros()
241-
@test f32(m).b === m.b === Zeros()
240+
@test f64(m).bias === m.bias === Zeros()
241+
@test f32(m).bias === m.bias === Zeros()
242242

243243
@testset "Gradients for broadcasted $op with sizes $s" for op in (+,-,*), s in ((1,), (2,3))
244244
o = ones(s)
@@ -340,19 +340,19 @@ end
340340

341341
nobias(n) = Zeros()
342342
testdense(m, bt) = @testset "Check layer $i" for (i, (l1, l2)) in enumerate(zip(m, dm(bt)))
343-
@test l1.W == l2.W
344-
@test l1.b == l2.b
345-
@test_skip typeof(l1.b) === typeof(l2.b)
343+
@test l1.weight == l2.weight
344+
@test l1.bias == l2.bias
345+
@test_skip typeof(l1.bias) === typeof(l2.bias)
346346
end
347347

348348
@testset "loadparams!" begin
349349
import Flux: loadparams!
350350
pars(w, b) = [w, b]
351351
import Flux: loadparams!, Zeros
352352
pars(w, b::Zeros) = [w, Flux.zeros(size(w,1))]
353-
pars(l) = pars(l.W, l.b)
353+
pars(l) = pars(l.weight, l.bias)
354354
pararray(m) = mapreduce(pars, vcat, m)
355-
weights(m) = mapreduce(l -> [l.W], vcat, m)
355+
weights(m) = mapreduce(l -> [l.weight], vcat, m)
356356
@testset "Bias type $bt" for bt in (Flux.zeros, nobias)
357357
m = dm(bt)
358358
loadparams!(m, params(m))

0 commit comments

Comments
 (0)