Skip to content

Commit 8c01e8f

Browse files
committed
Merge branch 'develop-3.0' into develop
2 parents fc9b739 + 582bbcc commit 8c01e8f

File tree

4 files changed

+260
-24
lines changed

4 files changed

+260
-24
lines changed

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Riak - a distributed, decentralised data storage system.
2+
3+
To build riak, Erlang OTP 22 or higher is required.
4+
5+
`make rel` will build a release which can be run via `rel/riak/bin/riak start`. Riak is primarily configured via `rel/riak/etc/riak.conf`
6+
7+
To make a package, install appropriate build tools for your operating system and run `make package`.
8+
9+
To create a local multi-node build environment use `make devclean; make devrel`.
10+
11+
To test Riak use [Riak Test](https://github.com/basho/riak_test/blob/develop-3.0/doc/SIMPLE_SETUP.md).
12+
13+
Up to date documentation is not available, but work on [documentation](https://www.tiot.jp/riak-docs/riak/kv/2.9.10/) is ongoing and the core information available in the [legacy documentation](https://docs.riak.com/riak/kv/latest/index.html) is still generally relevant.
14+
15+
Issues and PRs can be tracked via [Riak Github](https://github.com/basho/riak/issues) or [Riak KV Github](https://github.com/basho/riak_kv/issues).

README.org

Lines changed: 0 additions & 22 deletions
This file was deleted.

RELEASE-NOTES.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,45 @@
1+
# Riak KV 3.0.10 Release Notes
2+
3+
This release is focused on improving memory management, especially with the leveled backend, and improving the efficiency and ease of configuration of tictacaae full-sync.
4+
5+
- Improved [memory management of leveled](https://github.com/martinsumner/leveled/pull/371) SST files that contain [rarely accessed data](https://github.com/martinsumner/leveled/pull/371)
6+
7+
- Fix a bug whereby leveled_sst files could spend an [extended time in the delete_pending state](https://github.com/martinsumner/leveled/pull/377), causing significant short-term increases in memory usage when there are work backlogs in the penciller.
8+
9+
- Change the queue for reapers and erasers so that [they overflow to disk](https://github.com/basho/riak_kv/issues/1807), rather than simply consuming more and more memory.
10+
11+
- Change the replrtq (nextgenrepl) queue to use the same [overflow queue mechanism](https://github.com/basho/riak_kv/issues/1817) as used by the reaper and erasers.
12+
13+
- Change the default full-sync mechanism for tictacaae (nextgenrepl) full-sync to `auto_check`, which attempts to [automatically learn and use information about modified date-ranges](https://github.com/basho/riak_kv/issues/1815) in full-sync checks. The related changes also make full-sync by default bi-directional, reducing the amount of wasted effort in full-sync queries.
14+
15+
- Add [a peer discovery feature](https://github.com/basho/riak_kv/issues/1804) for replrtq (nextgenrepl) so that new nodes added to the cluster can be automatically recognised without configuration changes. By default this is disabled, and should only be enabled once both clusters have been upgraded to at least 3.0.10.
16+
17+
- Allow for underlying beam memory management and scheduler configuration to be exposed via riak.conf to allow for further performance tests on these settings. Note initial tests indicate the potential for [significant improvements when using the leveled backend](https://github.com/basho/riak_kv/issues/1826).
18+
19+
- Fix a potential issue whereby corrupted objects would prevent AAE (either legacy or nextgenrepl) [tree rebuilds](https://github.com/basho/riak_kv/issues/1824) from completing.
20+
21+
- Improved [handling of key amnesia](https://github.com/basho/riak_kv/issues/1813), to prevent rebounding of objects, and also introduce a reader process (like reaper and eraser) to which read repairs can be queued with overflow to disk.
22+
23+
Some caveats for this release exist:
24+
25+
- The release does not support OTP 20, only OTP 22 is supported. Updating some long out-of-date components have led to a requirement for the OTP version to be lifted.
26+
27+
- Volume and performance testing with the leveled backend now uses the following non-default settings:
28+
29+
```
30+
erlang.schedulers_busywait = none
31+
erlang.schedulers_busywait_dirtycpu = none
32+
erlang.schedulers_busywait_dirtyio = none
33+
erlang.async_threads = 4
34+
erlang.schedulers.force_wakeup_interval = 0
35+
erlang.schedulers.compaction_of_load = true
36+
leveled_reload_recalc = enabled
37+
```
38+
39+
- To maintain backwards compatibility with older linux versions, the [latest version of basho's leveldb](https://github.com/basho/leveldb/releases/tag/2.0.37) is not yet supported. This is likely to change in the next release, where support for older linux versions will be dropped.
40+
41+
- The release process has [exposed an issue](https://github.com/basho/riak_kv/issues/1831) via a recently extended test. This issue is pre-existing, and not specific to this release.
42+
143
# Riak KV 3.0.9 Release Notes
244

345
This release contains stability, monitoring and performance improvements.

priv/riak.schema

Lines changed: 203 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,25 +143,226 @@
143143
merge
144144
]}.
145145

146-
%% VM scheduler collapse, part 1 of 2
146+
%% @doc Riak changes the VM default wakeup interval in order to reduce the
147+
%% risk of scheduler collapse, prior to the availability of Dirty NIFs in
148+
%% later OTP versions. When using the leveled backend exclusively (either for
149+
%% AAE or object storage) this change is likely unnecessary, and the VM default
150+
%% of 0 can be used.
147151
{mapping, "erlang.schedulers.force_wakeup_interval", "vm_args.+sfwi", [
148152
{default, 500},
149153
{datatype, integer},
150154
merge
151155
]}.
152156

153-
%% VM scheduler collapse, part 2 of 2
157+
%% @doc Riak changes the compaction_of_load default from true to false. This
158+
%% is part of the strategy for preventing scheduler collapse in older VMs.
159+
%% When using the leveled backend exclusively (either for AAE or object
160+
%% storage), this change from the standard BEAM defaults is likely unnecessary
161+
%% - and compaction_of_load can be re-enabled.
154162
{mapping, "erlang.schedulers.compaction_of_load", "vm_args.+scl", [
155163
{default, "false"},
156164
merge
157165
]}.
158166

167+
%% @doc Sets the number of threads in async thread pool, valid range
168+
%% is 0-1024. If thread support is available, the default is 64.
169+
%%
170+
%% More information at: http://erlang.org/doc/man/erl.html
171+
%%
172+
%% Large async_thread pools are likely now unnecessary if exclusively using
173+
%% the leveled backend due to dirty NIFs, and so can be set to a much smaller
174+
%% value (potentially 1).
175+
{mapping, "erlang.async_threads", "vm_args.+A", [
176+
{default, 64},
177+
{datatype, integer},
178+
{validators, ["range:0-1024"]},
179+
merge
180+
]}.
181+
159182
%% VM emulator ignore break signal (prevent ^C / ^Gq)
160183
{mapping, "erlang.vm.ignore_break_signal", "vm_args.+Bi", [
161184
{default, "true"},
162185
merge
163186
]}.
164187

188+
%% @doc The VM single block carrier threshold (KB) for process heap
189+
{mapping, "erlang.eheap_memory.sbct", "vm_args.+MHsbct", [
190+
{commented, 512},
191+
{datatype, integer},
192+
merge
193+
]}.
194+
195+
%% @doc The VM single block carrier threshold (KB) for binary heap
196+
{mapping, "erlang.binary_memory.sbct", "vm_args.+MBsbct", [
197+
{commented, 512},
198+
{datatype, integer},
199+
merge
200+
]}.
201+
202+
%% @doc The VM multi block carrier large size for process heap
203+
{mapping, "erlang.eheap_memory.lmbcs", "vm_args.+MHlmbcs", [
204+
{commented, 5120},
205+
{datatype, integer},
206+
merge
207+
]}.
208+
209+
%% @doc The VM multi block carrier large size for binary heap
210+
{mapping, "erlang.binary_memory.lmbcs", "vm_args.+MBlmbcs", [
211+
{commented, 5120},
212+
{datatype, integer},
213+
merge
214+
]}.
215+
216+
%% @doc The VM multi block carrier small size for process heap
217+
{mapping, "erlang.eheap_memory.smbcs", "vm_args.+MHsmbcs", [
218+
{commented, 256},
219+
{datatype, integer},
220+
merge
221+
]}.
222+
223+
%% @doc The VM multi block carrier small size for binary heap
224+
{mapping, "erlang.binary_memory.smbcs", "vm_args.+MBsmbcs", [
225+
{commented, 256},
226+
{datatype, integer},
227+
merge
228+
]}.
229+
230+
%% @doc Set allocation strategy for binary multiblock carriers. Default is
231+
%% not predictable - do not rely on aoffcbf being the default. For more info
232+
%% see:
233+
%% https://github.com/erlang/otp/blob/master/erts/emulator/internal_doc/CarrierMigration.md
234+
{mapping, "erlang.binary_memory.as", "vm_args.+MBas", [
235+
{commented, "aoffcbf"},
236+
{datatype, {enum, [bf, aobf, aoff, aoffcbf, aoffcaobf, ageffcaoff, ageffcbf, ageffcaobf, gf]}},
237+
merge
238+
]}.
239+
240+
%% @doc Set allocation strategy for process multiblock carriers. Default is
241+
%% not predictable - do not rely on aoffcbf being the default. For more info
242+
%% see:
243+
%% https://github.com/erlang/otp/blob/master/erts/emulator/internal_doc/CarrierMigration.md
244+
{mapping, "erlang.eheap_memory.as", "vm_args.+MHas", [
245+
{commented, "aoffcbf"},
246+
{datatype, {enum, [bf, aobf, aoff, aoffcbf, aoffcaobf, ageffcaoff, ageffcbf, ageffcaobf, gf]}},
247+
merge
248+
]}.
249+
250+
%% @doc Set scheduler binding. This is either unbound (default - u) or can be
251+
%% set to whatever the default binding condition is, in the deployed release of
252+
%% OTP (db).
253+
%% For more info see: https://www.erlang.org/doc/man/erl.html#+sbt
254+
%% Note that if non-Riak work is activated on the same node - e.g. as part
255+
%% of batch operational jobs, or monitoring - allowing schedulers to be bound
256+
%% can result in significant and unpredictable negative outcomes. There may be
257+
%% other ways of achieving similar performance improvements - e.g. by
258+
%% right-sizing scheduler counts - that are lower risk than scheduler binding.
259+
%% If a CPU topology cannot be determined, the binding will default to unbound
260+
%% even when a binding is configured. To confirm binding, use `remote_console`
261+
%% and view:
262+
%% `erlang:system_info(scheduler_bindings).`
263+
{mapping, "erlang.schedulers_binding", "vm_args.+stbt", [
264+
{commented, "u"},
265+
{datatype, {enum, [u, db]}},
266+
merge
267+
]}.
268+
269+
%% @doc Busy wait of schedulers
270+
%% Sets scheduler busy wait threshold. Defaults to medium. The threshold
271+
%% determines how long schedulers are to busy wait when running out of work
272+
%% before going to sleep.
273+
%% Significant improvements in efficiency may be gained by disabling busy
274+
%% waiting
275+
{mapping, "erlang.schedulers_busywait", "vm_args.+sbwt", [
276+
{commented, "none"},
277+
{datatype, {enum, [none, very_short, short, medium, long, very_long]}},
278+
merge
279+
]}.
280+
281+
%% @doc Busy wait of dirty cpu schedulers
282+
%% Sets scheduler busy wait threshold. Defaults to short. The threshold
283+
%% determines how long schedulers are to busy wait when running out of work
284+
%% before going to sleep.
285+
%% Significant improvements in efficiency may be gained by disabling busy
286+
%% waiting
287+
{mapping, "erlang.schedulers_busywait_dirtycpu", "vm_args.+sbwtdcpu", [
288+
{commented, "none"},
289+
{datatype, {enum, [none, very_short, short, medium, long, very_long]}},
290+
merge
291+
]}.
292+
293+
%% @doc Busy wait of dirty io schedulers
294+
%% Sets scheduler busy wait threshold. Defaults to short. The threshold
295+
%% determines how long schedulers are to busy wait when running out of work
296+
%% before going to sleep.
297+
%% Significant improvements in efficiency may be gained by disabling busy
298+
%% waiting
299+
{mapping, "erlang.schedulers_busywait_dirtyio", "vm_args.+sbwtdio", [
300+
{commented, "none"},
301+
{datatype, {enum, [none, very_short, short, medium, long, very_long]}},
302+
merge
303+
]}.
304+
305+
%% @doc Set the Percentage of Schedulers to be online
306+
%% For every vCPU in the system, what percentage should have a scheduler, and
307+
%% what percentage of those schedulers should be online by default.
308+
%% Do not set unless guided by perfomance tests for the specific setup and
309+
%% workload.
310+
{mapping, "erlang.schedulers_online_percentage", "vm_args.+SP",[
311+
{commented, "100:75"},
312+
{validators, ["scheduler_percentage"]},
313+
merge
314+
]}.
315+
316+
%% @doc Set the Percentage of Dirty CPU Schedulers to be online
317+
%% When using the leveled backend a relatievly low number of dirty schedulers
318+
%% (e.g. 25%) are likely to be required due to the low proportion of NIFs in
319+
%% use.
320+
%% The percentages cannot exceed those of the schedulers_online_percentage
321+
%% which will default to 100% of CPU.
322+
%% Do not set unless guided by perfomance tests for the specific setup and
323+
%% workload.
324+
{mapping, "erlang.schedulers_dirtycpu_online_percentage", "vm_args.+SDPcpu",[
325+
{commented, "50:25"},
326+
{validators, ["scheduler_percentage"]},
327+
merge
328+
]}.
329+
330+
%% @doc Set the absolute limit of Dirty IO Schedulers to be online
331+
%% When using the leveled backend a relatievly high number of dirty schedulers
332+
%% may be required relative to the CPU count, depending on the concurrent disk
333+
%% throughput possible.
334+
%% Unlike the scheduler percentages, this is set as an abolute number between
335+
%% 1 and 1024 (default is 10).
336+
%% Do not set unless guided by perfomance tests for the specific setup and
337+
%% workload.
338+
{mapping, "erlang.schedulers_dirtyio_online", "vm_args.+SDio",[
339+
{commented, 10},
340+
{datatype, integer},
341+
{validators, ["scheduler_absolute"]},
342+
merge
343+
]}.
344+
345+
{validator,
346+
"scheduler_percentage",
347+
"must be A:B when B =< A and both A and B 1 < x =< 100",
348+
fun(PercPerc) ->
349+
case string:tokens(PercPerc, ":") of
350+
[A, B] ->
351+
AV = list_to_integer(A),
352+
BV = list_to_integer(B),
353+
AV =< 100 andalso AV > 0 andalso BV =< 100 andalso BV > 0;
354+
_ ->
355+
false
356+
end
357+
end}.
358+
359+
{validator,
360+
"scheduler_absolute",
361+
"must be 1 to 1024",
362+
fun(Value) ->
363+
is_integer(Value) andalso Value =< 1024 andalso Value >= 1
364+
end}.
365+
165366
{{#devrel}}
166367
%% Because of the 'merge' keyword in the proplist below, the docs and datatype
167368
%% are pulled from the leveldb schema.

0 commit comments

Comments
 (0)