Skip to content

Latest commit

 

History

History
59 lines (47 loc) · 1.36 KB

README.md

File metadata and controls

59 lines (47 loc) · 1.36 KB

Build status

Installation

The package is registered in the General registry and so can be installed at the REPL with

] add GroupedArrays.

Motivation

GroupedArray returns an AbstractArray with integers corresponding to each group (or a missing for groups with missing).

using GroupedArrays
p = repeat(["a", "b", missing], outer = 2)
g = GroupedArray(p)
# 6-element GroupedArray{Int64, 1}:
#  1
#  2
#   missing
#  1
#  2
#   missing

Use the keyword argument coalesce = true to consider missing values as distinct

using GroupedArrays
p = repeat(["a", "b", missing], outer = 2)
g = GroupedArray(p; coalesce = true)
# 6-element GroupedArray{Int64, 1}:
#  1
#  2
#  3
#  1
#  2
#  3

GroupedArray can be used to compute groups across multiple vectors:

p1 = repeat(["a", "b"], outer = 3)
p2 = repeat(["d", "e"], inner = 3)
g = GroupedArray(p1, p2)
# 6-element GroupedArray{Int64, 1}:
#  1
#  2
#  1
#  3
#  4
#  3

See also

Internally, a GroupedArray is stored as a vector of Integers, where 0 corresponds to missing.

The algorithm to construct GroupedArrays is taken from DataFrames.jl