diff --git a/eeps/eep-0079.md b/eeps/eep-0079.md new file mode 100644 index 0000000..421d814 --- /dev/null +++ b/eeps/eep-0079.md @@ -0,0 +1,798 @@ + Author: Björn Gustavsson , Lukas Backström + Status: Draft + Type: Standards Track + Created: 26-Nov-2024 + Erlang-Version: OTP-29.0 + Post-History: 27-Oct-2024, 6-Nov-2025 +**** +EEP 79: Native records +--- + +Abstract +======== + +This EEP proposes a new native datatype similar to records. + +Native records are tied to a specific module and use the same syntax +as the current (tuple-based) records except for declaration. + +The main advantage of the native records is that more fields can +be added to the definition without having to recompile all code +that uses the record. + +Native records can never be fully compatible with all uses of the old +records. Therefore, to make native records compelling, they should be +implemented as efficient as possible. We believe that a good +implementation would make them faster than maps. + +The old records (based on tuples) must always remain for backwards +compatibility, and for some usages they could still be the best choice. + +There is a new reflection API so that the shell and tools can look into +any native record. + +Example: + +```erlang +-module(average). +-export([start/0]). + +%% New record declaration syntax. +-record #state{ values = [] :: list(number()), avg = 0.0 :: float() }. + +start() -> + spawn(fun() -> loop(#state{}) end. + +loop(State) -> + receive + {get_avg, From} -> + From ! State#state.avg, loop(); + {get_values, From} -> + From ! State#state.values, loop(State#state{values = [], avg = 0.0}); + {put_value, Value} -> + Values = [Value | State#state.values], + loop(State#state{values = Values, avg = lists:sum(Values) / length(Values)}) + end. +``` + +Goals +===== + +1. Replace most tuple-record usages without having to update anything +but the declaration. This includes the major parts of the classic +record syntax (creating, reading, updating, and matching). + +2. Create something that is useful in building both APIs and keeping +internal state (efficient). + +3. Allows IDEs + static analysis tools to easily infer information about the keys. + +Hard Non-Goals +-------------- + +The following items will never be considered as goals. + +1. Replacing **all** tuple record usage scenarios. +2. Allowing variable key lookup. +3. Allowing variables as a record name. +4. Supporting creation of records in guards. +5. Supporting `element/2` for native records. +6. Supporting opaque records. +7. Having `undefined` as default when neither the record definition nor + the record creation provides a value for the field. +8. `_` wildcard default for assigning a value to any field not explictly + initialized. + +Soft Non-Goals +-------------- + +Here are some features we might consider implementin in a future release. + +1. Supporting usage in `lists:key*`. +2. Supporting usage in `ets` functions. +3. Supporting of record bifs (`record_info`). +4. The #Name.Field syntax. + +Description +=========== + +Native-record definitions +------------------------- + +A native record is a data structure for holding a fixed number of +elements in named fields. Similar to functions, native records are +defined in a module and can be exported or kept private. A native +record definition consists of the name of the native record, a list of +type parameters, and the field names. + +Formally: + +```erlang +-record #Name(TVar1, ..., TVarN) { + Field1 [= Value1] [:: Type1], + ... + FieldN [= ValueN] [:: TypeN] +}. +``` + +`Name` and `FieldN` need to be atoms. `Name` is allowed to be used without quotes +when it is a keyword, a variable, or a non-ASCII string. That is, `div` and `Tillstånd` +are allowed as `Name` without quoting them. + +By default a native-record definition is visible to the defining module only. It +is visible to other modules if exported via `-export_record()` directive. The +`export_record` directive is similar to `-export()` and `export_type()` directives. + +Examples: + +```erlang +-module(example). +-export_record([user, pair]). +-record #user{ + id = -1 :: integer(), + name :: binary(), + city :: binary() +}. +-record #pair(A, B) { + first :: A, + second :: B +}. +-record #state{ + count +}. +``` + +The order of the keys as declared is preserved and part of the native record +definition. When printed the fields are printed in the order defined. + +There is no defined maximum number of fields in a native record. Because of +the need to define all fields in a record, it is very hard to accidentally +create a record with one million elements. + +### `-import_record()` + +As it can be seen from the next sections, working with native records outside of +the defining module needs using fully qualified names of the native records: +`#misc:user{}`, ... This quickly may become too cumbersome and verbose. The +`-import_record` directive works similar to the `-import` and `-import_type` directives. +Imported native records can be used by their short names. + +```erlang +-module(example2). +-import_record(example1, [user, pair]). +``` + +### Creating native records + +The following expression creates a new user native record (in the same +module `misc` where the native record is defined): + +```erlang +#user{name = ~"John", city = ~"Stockholm"} +``` + +The next expression is used to create a new user native record outside +of the defining module: + +```erlang +#misc:user{name = ~"John", city = ~"Stockholm"} +``` + +Similar to function calls, in this case the fully qualified name must +be used (module name + native-record name). The same syntax can also +be used within the defining module to create a record of the latest +version of the module. That is, if the current module has been upgraded +it will create a native record with the new definition. There is no +way to externally refer to a native record of an old code generation. + +A native record can be imported via `-import_record` directive and +then used by its short name. + +```erlang +-module(example). +-import_record(misc, [user]). +make_user(Name, City) -> + #user{name = Name, city = City}. +``` + +A general syntax for native record creation: + +```erlang +#Name{Field1 = Expr1, ..., FieldN = ExprN} +#Module:Name{Field1 = Expr1, ..., FieldN = ExprN} +``` + +`Module`, `Field1`, .. `FieldN` must be atoms. `Name` must be a atom, +but does not need quotation for keywords, variables, and non-ASCII +characters. Fields can be in any order. The compiler validates that +all `Field1`, ..., `FieldN` are unique, and if defined in the same +module it will issue a warning if keys are not present. + +#### Default values + +If no value is provided for a field, and there is a default field +value in the native record definition, the default value is used. If +no value is provided for a field and there is no default field value, +a native record creation fails with a `{novalue,FieldName}` error. + +In OTP 29, the default value can only be an expression that can be +evaluated to a constant at compile-time. Essentially, the allowed +expressions are guard expression, with the following additional +restrictions: + +* No variables. + +* No calls of any kind. + +* No creation of any kind of records. + +Example: + +```erlang +-record #default{ + one = 1, + two = 2*20+1 +}. +``` + +Reasons for not allowing arbitrary expressions: + +* Default values require a lookup into the defining module, and the + default values can in turn be other native records or even function + calls. If we put this together with code upgrade you can get some + very strange behaviours where a native record with three fields + using the same native record could end up with three different + versions of that native record. + +* Making an efficient implementation allowing arbitrary expressions + is probably not possible to achieve for OTP 29. + +Tuple-based records will initialize a field to `undefined` if neither +the record definition nor the record creation provides a value. +We consider that to be a mis-feature that will delay the detection of +bugs to either runtime or when Dialyzer is run. + +#### Validation + +A native record creation is validated at runtime against the +native record definition. It fails with a `badrecord` error in the +following cases: + +* There is no corresponding native-record definition. + +* The native-record definition is not visible at the call site (it is + not exported). + +* The native record create expression references the field FN which is + not defined (in the structure definition). + +* No value is provided for a field FN and the native-record definition + has no default value for FN. + +#### Native-record values and native-record definitions at runtime + +Now that we have seen how to define and create native records, it +makes sense to specify how they behave at runtime, particularly in the +context of Erlang's dynamic nature. Native records in this proposal +are designed to support Erlang dynamism in a flexible way. The main +scenarios: + +* Code upgrade: + * Native-record definitions can be upgraded (or even removed). + +* Distributed Erlang: + * Native records can travel between the Erlang nodes having different + versions of code. + +From now on let’s employ more precise terminology (when needed): +*native-record definitions* and *native-record values*. + +When a native-record value is created, it “captures” key information +from the current native-record definition, namely: + +* They fully qualified name of the native-record (module name and + native-record name) + +* Field names + +* Whether it is exported (through `-export_record`) + +The runtime needs the current native-record definition to create a +record. It does not use native-record definition when updating or +reading values from a record. + +To minimize the ambiguity of the next sections we would use more verbose wording: + +* The fields of a native-record value +* The fields of a native-record definition +* The native-record value is exported +* The native-record definition is exported + +In the simplest case — a single Erlang node without code reloading, +native-record values and native-record definitions would always be in +sync. + +### Accessing native-record fields + +The syntax for accessing native-record fields is as follows: + +```erlang +Expr#Name.Field +Expr#Module:Name.Field +``` + +These expressions return the value of the specified field of the +native-record value. + +When a field from a native-record value is accessed, the current +native-record definition is **not** consulted. + +An access operation fails with a `{badrecord,Expr}` error if: + +* The definition of the record was not exported when the native-record + value was created, and it is now used outside its definining + module. + +* `Expr` does not evaluate to a native-record value of the expected + type `#Name` or `#Module:Name` (that is, it is either not a native + record at all or it is another native record). + +An access operation fails with a `{badfield,Field}` error if: + +* The field `Field` is not defined in the native-record value. + +The expression can be used in guards — a guard would fail if the +corresponding expression raises. + +### Anonymous access of native records + +The following syntax allows accessing field `Field` in any record: + +```erlang +Expr#_.Field +``` + +This access operation fails with a `{badfield,Field}` error if: + +* The field `Field` is not defined in the native-record value. + +* The definition of the record was not exported when the native-record + value was created, and it is now used outside its defining module. + +### Updating native records + +The syntax for updating native-record values: + +```erlang +Expr#Name{Field1=Expr1, ..., FieldN=ExprN} +Expr#Module:Name{Field1=Expr1, ..., FieldN=ExprN} +``` + +Field names must be atoms. + +When a native-record value is updated, its native-record +definition is **not** consulted. + +An update operation fails with a `{badrecord,Expr}` error if: + +* The definition of the record was not exported when the native-record + value was created, and it is now used outside its defining module. + +An update operation fails with a `{badfield,FN}` error if: + +* The native-record update expression references the field FN which is + not defined native-record value. + +Native-record update expressions are not allowed in guards. + +### Anonymous update of native records + +The following syntax allows updating any record that has the given +fields: + +```erlang +Expr#_{Field1=Expr1, ..., FieldN=ExprN} +``` + +An update operation fails with a `{badrecord,Expr}` error if: + +* The definition of the record was not exported when the native-record + value was created, and it is now used outside its defining module. + +* `Expr` does not evaluate to a native-record value of the expected + type `#Name` or `#Module:Name` (that is, it is either not a native + record at all or it is another native record). + +An anonymous update operation fails with a `{badfield,FN}` error if: + +* The native-record update expression references the field FN which is + not defined (in the structure definition). + +### Pattern matching over native records + +A pattern that matches a certain native-record value is created in the +same way as a native-record is created. + +The syntax: + +```erlang +#Name{Field1 = Expr1, ..., FieldN = ExprN} +#Module:Name{Field1 = Expr1, ..., FieldN = ExprN} +``` + +Here, `Expr1`, .. `ExprN` are patterns, and Field names must be atoms. + +When a native-record value is matched, its native-record definition is +**not** consulted. + +Pattern matching fails if: + +* The definition of the record was not exported when the native-record + value was created, and it is now used outside its defining module. + +* The pattern references a FieldK and the native-record value does not + contain this field. + +Note, however, that it is possible to match on the name of a +non-exported record. Thus, if the `match_name/1` function in the +following example is called with an instance of record `r` defined in +`some_module`, it will succeed even if the record is not exported: + +```erlang +-module(example). +-export([match_name/1]). + +match_name(#some_module:r{}) -> + ok. +``` + + + +```erlang +-module(some_module). +-export([get_r/1]). + +-record #r{a}. + +get_r(N) -> #r{a=N}. +``` + +The following syntax allows matching any record having the named fields: + +```erlang +#_{Field1 = Expr1, ..., FieldN = ExprN} +``` + +### Checking whether a record is current + +Do we need a way to check that the native-record is the current version? + +Yes, we should add a BIF that essentially does the following: + +```erlang +record:get_fields(Record) =:= record:get_fields(#Module:Record{}) +``` + +but guaranteed to always work and be more efficient. + +TODO: What should the name of the BIF be? + +We should also have a BIF that checks whether an instance of a native +record referes to the current definition of the native record. + +TODO: What should the name of that BIF be? + +### Fetching field index + +Fetching the record index using the `#name.field` syntax is not +supported, because there is no way it can actually be used, since neither +ETS nor `element/2` will work with native records. + +### Native-record guard BIFs + +#### `is_record/3` + +The existing `is_record/3` BIF is overloaded to also accept a native record: + +```erlang +-spec is_record(Term :: dynamic(), Module :: module(), Name :: atom()) -> boolean(); + (Term :: dynamic(), Name :: atom(), Arity :: non_neg_integer()) -> boolean(). +``` + +If `Module` is a module name and `Name` is an atom, the predicate +returns true if term `Term` is a native-record value with the +corresponding native-record name. + +This function will only check that `Term` is a native record with +`Name` created from the given the module. It will not check whether +the native record is still defined in the given module, nor whether +it is exported. + +Example: + +```erlang +-module(misc). +is_user(U) -> is_record(U, some_module, user). +``` + +#### `is_record/2` + +The existing `is_record/2` function is extended to also work on native +records: + +```erlang +is_record(Term :: dynamic(), Name :: atom()) -> boolean(). +``` + +`Name` must be the name of one of the following: + +* a tuple record +* a local native record +* a native record imported using `-import_record()' + +When `is_record/2` is used in a guard, `Name` must be a literal atom; +otherwise, there will be a compilation error. There will be a +compilation error if `Name` is neither the name of a local record nor +an imported native record. + +If `is_record/2` is used in a function body, `Name` is allowed to be a +variable. + +If `Name` refers to an imported native record, see the description of +`is_record/3` for more details. + +Examples: + +```erlang +-module(misc). +-record #user() {a,b,c}. + +is_user(U) when is_record(U, user) -> + true; +is_user(_U) -> + false. +``` + + + +```erlang +-module(example). +-import_record(misc, [user/0]). +is_user(U) -> is_record(U, user). +``` + +#### `is_record/1` + +```erlang + -spec is_record(Term :: term()) -> boolean(). +``` + +`is_record(Record)` returns `true` if `Record` is any native record value. + +### Native records in specs and in the language of types + +Native records can be used as types using the following syntax: + +```erlang +%% local or imported native-record +#RecordName(TField :: TType, ... ) +%% remote native-record +#Module:RecordName(TVar1, ..., TVarN) +``` + +If you export a native record, its type will be available for other +modules to use. Dialyzer will complain if you attempt to use an +un-exported native record. + +Example: + +```erlang +-module(misc). +-export_record([user/0, pair/2]). +-record #user() { + id = -1 :: integer(), + name :: binary(), + city :: binary() +}. +-record #pair(A, B) { + first :: A, + second :: B +}. +-type int_pair() :: #pair(integer(), integer()). +-spec mk_user() -> #user(). +mk_user() -> + #user{id = 1, name = ~"Alice", city = ~"London"}. + +-spec mk_user_limited() -> #user(). +mk_user() -> + #user{id = 1, name = ~"Alice", city = ~"London"}. + +-spec mk_pair(A, B) -> #pair(A, B). +mk_pair(A, B) -> + #pair{first = A, second = B}. +``` + +A new builtin type `record()` is introduced. It denotes the set of all +possible native record values at runtime. + +### Documentation + +Native records can be documented just as functions/types/callbacks can be documented. +If you export a record it will be visible and you have to add `-doc false.` +for it to not be shown. + +If a spec, type, callback, or native record refers to an undocumented +local native record, the compiler will issue a warning. + +### Compatibility between OTP 28 and OTP 29 + +When attempting to send a native record to an older node (OTP 28 or earlier), +the sender should send a message to the logger process and close the connection. + +### Ordering and equality + +With addition of native records the runtime values of different types +are now ordered as follows: + +```erlang +number() +< atom() +< reference() +< fun() +< port() +< pid() +< tuple() +< record() +< map() +< [] +< [_|_] +< bitstring() +``` + +Native-record values are ordered by their fully qualified name, then +by their visibility, then by their keys, and finally by field values. +Equality is defined through equality of all the properties: name, +visibility, keys, and field values. + +### Reflection + +A new module `records` provides functionality for basic runtime +reflection. This is a very preliminary sketch of how it may look: + +```erlang +-module(records). +-spec get_module(record()) -> module(). +-spec get_name(record()) -> atom(). +-spec is_exported(record()) -> boolean(). +-spec get_field_names(record()) -> [atom()]. +-spec get(record(), atom()) -> dynamic(). +-spec create(Module :: module(), RecordName :: atom(), FieldsMap :: #{atom() => term()}) -> record(). +-spec create(Module :: module(), RecordName :: atom(), FieldsMap :: #{atom() => term()}, Options :: #{ exported => boolean() }) -> record(). +-spec update(Src::record(), Module :: module(), RecordName :: atom(), FieldsMap :: #{atom() => term()}) -> record(). +``` + +This part needs more thought about enforcing and bypassing visibility +(exported and private native-record definitions). + +* It should provide a means to work with any native-record through + reflection, but bypassing visibility and opacity restrictions should + be explicit. + +The functions in the `records` module are BIFs. + +### Printing + +Here is how printed native-records will look like: + +```erlang +%% native-record +#users:user{id = 1, name = "Alice", city = "London"} +``` + +Printing of fields follows the field order. Whether a native-record value is +exported is not visible through printing. + +### External term format + +External term format is extended to support serialization of native-record values. + +### Tooling + +While the compiler does not validate native-record operations at +compile time, validation can be easily performed through simple +linting that ensures: + +* The visibility of native-records is respected (a non-exported + native-record definition is not used outside of the defining module) + +* Fields are used correctly + +All of these checks are straightforward, and `xref` can be easily extended to handle them. + +### Performance characteristics + +While the implementation of native-records is a large scope, from the +runtime perspective they are much closer to maps than to records +(tuples). So, their performance characteristics should align with maps +(with insignificant overhead for runtime validation). Additionally, +given that native-records are more specialized versions of maps (with +all keys being atoms), there is potential for optimizations. + +### Differences compared to tuple records + +1. Creation of native records cannot be done in guards + +2. `element/2` will not accept native records. + +Thoughts to consider +==================== + +Problems with native records +---------------------------- + +* Default values require a lookup into the defining module, and the default + values can in turn be other native-records or even function calls. If we + put this together with code upgrade you can get some very strange behaviours + where a native record with three fields using the same native-record could + end up with three different versions of that native-record. + +How much compile time checking should we do? +-------------------------------------------- + +Tuple-based records have checking of keys at compile time. The more +dynamic we make native records, the less compile-time checking we can +do. We don't want to end up with the same compile-time dependencies +as behaviours/parse_transforms. + +The proposal in this EEP would allow for compile-time checks within the +owning module, but not in other modules. + +Should we have some private fields? +----------------------------------- + +No. The only mechanism to restrict visible is to not export a record. + +Still, the BIFs in the reflection API will always be able to get the fields +and their values from any native record. + +Should large records be trees? +------------------------------ + +When a map becomes large, it switches to a tree so that +updating an element does not entail a full copy of all +values. Do we want the same for native-records? + +No. Because all fields of native records are known, it is possible to +make access more efficient than for maps. + +Supporting native records in ETS +-------------------------------- + +In a future release, we could add opt-in support for native records in +ETS. I suggest that we will also support maps in ETS at the same time, +and that we also re-implement match specs. + +Reference Implementation +------------------------ + +This is a work-in-progress: + +[https://github.com/bjorng/otp/tree/bjorn/native-records/OTP-19785](https://github.com/bjorng/otp/tree/bjorn/native-records/OTP-19785) + +Backward Compatibility +====================== + +Copyright +========= + +This document is placed in the public domain or under the CC0-1.0-Universal +license, whichever is more permissive. + +[EmacsVar]: <> "Local Variables:" +[EmacsVar]: <> "mode: indented-text" +[EmacsVar]: <> "indent-tabs-mode: nil" +[EmacsVar]: <> "sentence-end-double-space: t" +[EmacsVar]: <> "fill-column: 70" +[EmacsVar]: <> "coding: utf-8" +[EmacsVar]: <> "End:" +[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: "