Skip to content

Commit 68fb06e

Browse files
committed
Added boost xpressive support. Extended toolkit to match/replace/extract
functionality both in operators and native functions. Updated toolkit description, bumped toolkit to v2.0
1 parent d87b4e6 commit 68fb06e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+2464
-423
lines changed

README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
streamsx.regex
22
==============
3-
This toolkit provides support for [RE2](https://code.google.com/p/re2) regular expression library.
3+
This toolkit provides support for 2 libraries:
4+
- [Boost Xpressive](https://theboostcpplibraries.com/boost.xpressive) regular expression library.
5+
- [RE2](https://code.google.com/p/re2) regular expression library.
46

57
RE2 uses automata theory to guarantee that regular expression searches run in time linear in the size of the input. Web page with SPLDoc for operators and samples: [streamsx.regex SPLDoc](http://ibmstreams.github.io/streamsx.regex).
68

com.ibm.streamsx.regex/.project

+1
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,6 @@
3030
<nature>com.ibm.streams.studio.splproject.SPLProjectNature</nature>
3131
<nature>org.eclipse.xtext.ui.shared.xtextNature</nature>
3232
<nature>org.eclipse.jdt.core.javanature</nature>
33+
<nature>com.ibm.etools.systems.projects.core.remoteunixnature</nature>
3334
</natures>
3435
</projectDescription>

com.ibm.streamsx.regex/com.ibm.streamsx.regex.re2/RegexMatch/RegexMatch_cpp.cgt

-96
This file was deleted.

com.ibm.streamsx.regex/com.ibm.streamsx.regex.re2/RegexMatch/RegexMatch_h.cgt

-46
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/RegexRun_cpp.pm
2+
/RegexRun_h.pm

com.ibm.streamsx.regex/com.ibm.streamsx.regex.re2/RegexMatch/RegexMatch.xml com.ibm.streamsx.regex/com.ibm.streamsx.regex.re2/RegexRun/RegexRun.xml

+50-8
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,12 @@
33
<cppOperatorModel>
44
<context>
55
<description>
6-
RegexMatch operator has three custom output functions:
7-
* RegexSimpleMatch performs a partial match - runs in interpretation mode (like SPL regexMatch function).
6+
RegexRun operator has five custom output functions:
87
* RegexPartialMatch performs a partial match - runs in compilation mode.
98
* RegexFullMatch performs a full match - runs in compilation mode.
9+
* RegexReplace performs a single search/replace - runs in compilation mode.
10+
* RegexGlobalReplace performs a global search/replace - runs in compilation mode.
11+
* RegexExtract performs search/extract - runs in compilation mode.
1012
</description>
1113
<customOutputFunctions>
1214
<customOutputFunction>
@@ -48,12 +50,52 @@ RegexMatch operator has three custom output functions:
4850
<prototype>boolean RegexPartialMatch(blob blb, rstring pattern)</prototype>
4951
</function>
5052
<function pseudoFunction="false">
51-
<description>Tries to match the string with the pattern.</description>
52-
<prototype>boolean RegexSimpleMatch(rstring str, rstring pattern)</prototype>
53+
<description>Searches the string with the pattern (defined as the operator parameter) and replaces a first finding with rewrite.</description>
54+
<prototype>rstring RegexReplace(rstring str, rstring rewrite)</prototype>
5355
</function>
5456
<function pseudoFunction="false">
55-
<description>Tries to match the blob with the pattern.</description>
56-
<prototype>boolean RegexSimpleMatch(blob blb, rstring pattern)</prototype>
57+
<description>Searches the string with the pattern (defined as the operator parameter) and replaces a first finding with rewrite.</description>
58+
<prototype>rstring RegexReplace(blob blb, rstring rewrite)</prototype>
59+
</function>
60+
<function pseudoFunction="false">
61+
<description>Searches the string with the pattern and replaces a first finding with rewrite.</description>
62+
<prototype>rstring RegexReplace(rstring str, rstring pattern, rstring rewrite)</prototype>
63+
</function>
64+
<function pseudoFunction="false">
65+
<description>Searches the string with the pattern and replaces a first finding with rewrite.</description>
66+
<prototype>rstring RegexReplace(blob blb, rstring pattern, rstring rewrite)</prototype>
67+
</function>
68+
<function pseudoFunction="false">
69+
<description>Searches the string with the pattern (defined as the operator parameter) and replaces all found with rewrite.</description>
70+
<prototype>rstring RegexGlobalReplace(rstring str, rstring rewrite)</prototype>
71+
</function>
72+
<function pseudoFunction="false">
73+
<description>Searches the string with the pattern (defined as the operator parameter) and replaces all found with rewrite.</description>
74+
<prototype>rstring RegexGlobalReplace(blob blb, rstring rewrite)</prototype>
75+
</function>
76+
<function pseudoFunction="false">
77+
<description>Searches the string with the pattern and replaces all found with rewrite.</description>
78+
<prototype>rstring RegexGlobalReplace(rstring str, rstring pattern, rstring rewrite)</prototype>
79+
</function>
80+
<function pseudoFunction="false">
81+
<description>Searches the string with the pattern and replaces all found with rewrite.</description>
82+
<prototype>rstring RegexGlobalReplace(blob blb, rstring pattern, rstring rewrite)</prototype>
83+
</function>
84+
<function pseudoFunction="false">
85+
<description>Tries to match the string with the pattern (defined as the operator parameter).</description>
86+
<prototype>rstring RegexExtract(rstring str, rstring rewrite)</prototype>
87+
</function>
88+
<function pseudoFunction="false">
89+
<description>Tries to match the string with the pattern (defined as the operator parameter).</description>
90+
<prototype>rstring RegexExtract(blob blb, rstring rewrite)</prototype>
91+
</function>
92+
<function pseudoFunction="false">
93+
<description>Tries to match the string with the pattern (defined as the operator parameter).</description>
94+
<prototype>rstring RegexExtract(rstring str, rstring pattern, rstring rewrite)</prototype>
95+
</function>
96+
<function pseudoFunction="false">
97+
<description>Tries to match the string with the pattern (defined as the operator parameter).</description>
98+
<prototype>rstring RegexExtract(blob blb, rstring pattern, rstring rewrite)</prototype>
5799
</function>
58100
</customOutputFunction>
59101
</customOutputFunctions>
@@ -103,7 +145,7 @@ RegexMatch operator has three custom output functions:
103145
</parameters>
104146
<inputPorts>
105147
<inputPortSet>
106-
<description>The RegexMatch operator is configurable with a single input port. The input port is non-mutating and its punctuation mode is Oblivious.</description>
148+
<description>The RegexRun operator is configurable with a single input port. The input port is non-mutating and its punctuation mode is Oblivious.</description>
107149
<windowingDescription></windowingDescription>
108150
<tupleMutationAllowed>false</tupleMutationAllowed>
109151
<windowingMode>NonWindowed</windowingMode>
@@ -114,7 +156,7 @@ RegexMatch operator has three custom output functions:
114156
</inputPorts>
115157
<outputPorts>
116158
<outputPortSet>
117-
<description>The RegexMatch operator is configurable with one output port. The output port is mutating and its punctuation mode is Preserving.</description>
159+
<description>The RegexRun operator is configurable with one output port. The output port is mutating and its punctuation mode is Preserving.</description>
118160
<expressionMode>Expression</expressionMode>
119161
<autoAssignment>true</autoAssignment>
120162
<completeAssignment>true</completeAssignment>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
<%SPL::CodeGen::implementationPrologue($model);%>
2+
3+
<%
4+
my $inputPort = $model->getInputPortAt(0);
5+
my $outputPort = $model->getOutputPortAt(0);
6+
my $inTuple = $inputPort->getCppTupleName();
7+
8+
my $pattern = ($_ = $model->getParameterByName('pattern')) ? $_->getValueAt(0)->getCppExpression() : '""';
9+
my $logErrors = ($_ = $model->getParameterByName('logErrors')) ? $_->getValueAt(0)->getCppExpression() : "true";
10+
my $maxMemory = ($_ = $model->getParameterByName('maxMemory')) ? $_->getValueAt(0)->getCppExpression() : 1000000;
11+
%>
12+
13+
void MY_OPERATOR::updateRegexMap(const rstring & pattern) {
14+
AutoPortMutex am(_mutex, *this);
15+
if(_regexMap.count(pattern) == 0) {
16+
_regexMap.insert(pattern, std::auto_ptr<RE2>(new RE2(getStringPiece(pattern), _options)));
17+
}
18+
}
19+
20+
template<typename T>
21+
bool MY_OPERATOR::RegexFullMatch(const T & data) {
22+
return RE2::FullMatch(getStringPiece(data), _regex);
23+
}
24+
25+
template<typename T>
26+
bool MY_OPERATOR::RegexFullMatch(const T & data, const rstring & pattern) {
27+
updateRegexMap(pattern);
28+
return RE2::FullMatch(getStringPiece(data), _regexMap.at(pattern));
29+
}
30+
31+
template<typename T>
32+
bool MY_OPERATOR::RegexPartialMatch(const T & data) {
33+
return RE2::PartialMatch(getStringPiece(data), _regex);
34+
}
35+
36+
template<typename T>
37+
bool MY_OPERATOR::RegexPartialMatch(const T & data, const rstring & pattern) {
38+
updateRegexMap(pattern);
39+
return RE2::PartialMatch(getStringPiece(data), _regexMap.at(pattern));
40+
}
41+
42+
template<typename T>
43+
rstring MY_OPERATOR::RegexReplace(const T & data, const SPL::rstring & rewrite) {
44+
re2::StringPiece sp(getStringPiece(data));
45+
rstring result(sp.data(), sp.size());
46+
RE2::Replace(&result, _regex, getStringPiece(rewrite));
47+
return result;
48+
}
49+
50+
template<typename T>
51+
rstring MY_OPERATOR::RegexReplace(const T & data, const rstring & pattern, const SPL::rstring & rewrite) {
52+
updateRegexMap(pattern);
53+
re2::StringPiece sp(getStringPiece(data));
54+
rstring result(sp.data(), sp.size());
55+
RE2::Replace(&result, _regexMap.at(pattern), getStringPiece(rewrite));
56+
return result;
57+
}
58+
59+
template<typename T>
60+
rstring MY_OPERATOR::RegexGlobalReplace(const T & data, const SPL::rstring & rewrite) {
61+
re2::StringPiece sp(getStringPiece(data));
62+
rstring result(sp.data(), sp.size());
63+
RE2::GlobalReplace(&result, _regex, getStringPiece(rewrite));
64+
return result;
65+
}
66+
67+
template<typename T>
68+
rstring MY_OPERATOR::RegexGlobalReplace(const T & data, const rstring & pattern, const SPL::rstring & rewrite) {
69+
updateRegexMap(pattern);
70+
re2::StringPiece sp(getStringPiece(data));
71+
rstring result(sp.data(), sp.size());
72+
RE2::GlobalReplace(&result, _regexMap.at(pattern), getStringPiece(rewrite));
73+
return result;
74+
}
75+
76+
template<typename T>
77+
rstring MY_OPERATOR::RegexExtract(const T & data, const SPL::rstring & rewrite) {
78+
rstring result;
79+
if( RE2::Extract(getStringPiece(data), _regex, getStringPiece(rewrite), &result))
80+
return result;
81+
else
82+
return rewrite;
83+
}
84+
85+
template<typename T>
86+
rstring MY_OPERATOR::RegexExtract(const T & data, const rstring & pattern, const SPL::rstring & rewrite) {
87+
updateRegexMap(pattern);
88+
rstring result;
89+
if( RE2::Extract(getStringPiece(data), _regexMap.at(pattern), getStringPiece(rewrite), &result))
90+
return result;
91+
else
92+
return rewrite;
93+
}
94+
95+
MY_OPERATOR::MY_OPERATOR() : _options(), _regex(getStringPiece(<%=$pattern%>), _options) {
96+
_options.set_log_errors(<%=$logErrors%>);
97+
_options.set_max_mem(<%=$maxMemory%>);
98+
}
99+
100+
@include "../../com.ibm.streamsx.regex/Common/RegexOperator_cpp.cgt"
101+
102+
<%SPL::CodeGen::implementationEpilogue($model);%>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
@include "../../com.ibm.streamsx.regex/Common/RegexInclude_h.cgt"
2+
3+
#include "re2/re2.h"
4+
5+
<%SPL::CodeGen::headerPrologue($model);%>
6+
7+
using SPL::rstring;
8+
9+
class MY_OPERATOR : public MY_BASE_OPERATOR {
10+
private:
11+
typedef ptr_unordered_map<rstring, RE2> RegexMap;
12+
13+
RE2::Options _options;
14+
RE2 _regex;
15+
16+
template<typename T> bool RegexFullMatch(const T & data);
17+
template<typename T> bool RegexFullMatch(const T & data, const rstring & pattern);
18+
template<typename T> bool RegexPartialMatch(const T & data);
19+
template<typename T> bool RegexPartialMatch(const T & data, const rstring & pattern);
20+
template<typename T> rstring RegexReplace(const T & data, const SPL::rstring & rewrite);
21+
template<typename T> rstring RegexReplace(const T & data, const rstring & pattern, const SPL::rstring & rewrite);
22+
template<typename T> rstring RegexGlobalReplace(const T & data, const SPL::rstring & rewrite);
23+
template<typename T> rstring RegexGlobalReplace(const T & data, const rstring & pattern, const SPL::rstring & rewrite);
24+
template<typename T> rstring RegexExtract(const T & data, const SPL::rstring & rewrite);
25+
template<typename T> rstring RegexExtract(const T & data, const rstring & pattern, const SPL::rstring & rewrite);
26+
27+
re2::StringPiece getStringPiece(const rstring & str) { return re2::StringPiece(str.data(), str.size()); }
28+
re2::StringPiece getStringPiece(const blob & blb) { return re2::StringPiece(reinterpret_cast<const char*>(blb.getData()), blb.getSize()); }
29+
30+
@include "../../com.ibm.streamsx.regex/Common/RegexOperator_h.cgt"
31+
};
32+
33+
<%SPL::CodeGen::headerEpilogue($model);%>

0 commit comments

Comments
 (0)