Provides ECMAScript regular expressions for Lua 5.1, 5.2, 5.3, 5.4 and LuaJit. Uses libregexp from Fabrice Bellard's QuickJS.
To install jsregexp globally with luarocks,
run
sudo luarocks install jsregexpTo install jsregexp for a different lua version (in this case Lua5.1 or LuaJit), run
sudo luarocks --lua-version 5.1 install jsregexpTo install jsregexp locally for your user, run
luarocks --local --lua-version 5.1 install jsregexpThis will place the compiled module in $HOME/.luarocks/lib/lua/5.1 so $HOME/.luarocks/lib/lua/5.1/?.so needs to be added to package.cpath.
Simply running make in this project's root will compile the module jsregexp.so (tested on linux only).
This module provides two functions
jsregexp.compile(regex, flags?)
jsregexp.compile_safe(regex, flags?)that take an ECMAScript regular expression as a string and an optional string of flags, most notably
"d"provide tables with begin/end indices of match groups in match objects"i": case insensitive search"g": match globally"n": enables named groups (not present in JavaScript, needs to be enabled manually if needed)"u": utf-16 support if detected in the pattern string (implicity set)
The complete list of flags can be found in the JavaScript reference.
On success, compile and compile_safe return a RegExp object. On failure, compile throws an error while compile_save returns nil and an error message.
Each RegExp object re has the following fields
re.last_index -- the position at which the next match will be searched in re:exec or re:test (see notes below)
re.source -- the pattern string
re.flags -- a string representing the active flags
re.dot_all -- is the dot_all flag set?
re.global -- is the global flag set?
re.has_indices -- is the indices flag set?
re.ignore_case -- is the ignore_case flag set?
re.multiline -- is the multiline flag set?
re.sticky -- is the sticky flag set?
re.unicode -- is the unicode flag set?Calling tostring on a RegExp object returns representation in the form of "/<source>/<flags>".
The RegExp object re has the following methods corresponding to JavaScript regular expressions:
re:exec(str) -- returns the next match of re in str (see notes below)
re:test(str) -- returns true if the regex matches str (see notes below)
re:match(str) -- returns, for a global regexp, a list of all match strings or nil if no match, calls re:exec(str) otherwise
re:match_all(str) -- returns a closure that repeatedly calls re:exec on a global regexp, to be used in for-loops
re:match_all_list(str) -- returns a list of all matches
re:search(str) -- returns the 1-based index of the first match of re in str, or -1 if no match
re:split(str, limit?) -- splits str at re, at most limit times
re:replace(str, replacement) -- relplace the first match of re in str by replacement (all, if global)
re:replace_all(str, replacement) -- relplace each match of re in str by replacementFor the documentation of the behaviour of each of these functions, see the JavaScript reference.
Note: Each regexp object has a field last_index which denotes the position at which the next call to exec and test searches for the next match.
Afterwards last_index is changed accordingly. If you need to use these methods, you should reset last_index to 1.
Note: Because the regexp engine used works with UTF16 instead of UTF8, the input string is converted to UTF16 if necessary. Calling exec or test on
non-Ascii strings repeatedly could potentially introduce a large overhead. This conversion only needs to be done once for the match* methods, you probably want to use those instead.
A match object m returned by exec and the match* functions has the following fields:
m[0] -- the full match
m[i] -- match group i
m.input -- the input string
m.capture_count -- number of capture groups
m.index -- start of the capture (1-based)
m.groups -- table of the named groups and their content
m.indices -- table of begin/end indices of all match groups (if "d" flag is set)
m.indices.groups -- table of named groups and their begin/end indices (if "d" flag is set)Calling tostring on a match object returns the full match m[0].
local jsregexp = require("jsregexp")
local re, err = jsregexp.compile_safe("(\\w)\\w*", "g")
if not re then
print(err)
return
end
local str = "Hello World"
for match in re:match_all(str) do
print(match)
for j, group in ipairs(match) do
print(j, group)
end
end