Consider unicode identifiers support #414
Description
There is already support in TextMate, particularly in language-babel
I've tested this regex and it seems to be working fine (see at Lightshow):
[$_\\p{L}\\p{Nl}][$\\p{L}\\p{Nl}\\p{Mn}\\p{Mc}\\p{Nd}\\p{Pc}\\x{200C}\\x{200D}]*
function a () { }
function foo123 () { }
function $ () { }
function $$abc$$ () { }
function FOO () { }
function _foo_ () { }
function $foo_foo$ () { }
function π() { }
function ლ_ಠ益ಠ_ლ() {}
function абв() {}
function d‿d() {} //\\p{Pc}
function Oo̶O() {} // \p{Mn}
function _ැ_() {} //\p{Mc}
function میخواهم() {} // \x{200C}
function _ണ്_() {} // \x{200D}, valid in ECMAScript 6/Unicode 8.0.0, but not in ES3
function _۴_() {} // \p{Nd}
function Ⅳ() {} // \p{Nl}
\p{L}
matches any kind of letter from any language
\p{Nl}
matches a number that looks like a letter, such as a Roman numeral
\p{Mn}
matches a character intended to be combined with another character without taking up extra space (e.g. accents, umlauts, etc.)
\p{Mc}
matches a character intended to be combined with another character that takes up extra space (vowel signs in many Eastern languages)
\p{Nd}
matches a digit zero through nine in any script except ideographic scripts
\p{Pc}
matches a punctuation character such as an underscore that connects words
\x{200C}
zero width non-joiner
\x{200D}
zero width joiner
Refs:
JavaScript variable name validator
Unicode Character Categories
What characters are valid for JavaScript variable names? [Stack Overflow]