-
Notifications
You must be signed in to change notification settings - Fork 208
Add regex stuff #529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add regex stuff #529
Conversation
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
src/models/capture_group_patterns.rs
Outdated
| fn validate(&self) -> Result<(), String> { | ||
| if self.pattern().starts_with("rgx ") { | ||
| panic!("Regex not supported") | ||
| let mut _val = &self.pattern()[4..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let mut _val = &self.pattern()[4..]; | |
| let mut _val = &self.extract_regex(); |
|
|
||
| # The below three rules do a dummy type migration from List<Integer> to NewList | ||
|
|
||
| # Updates the import statement from `java.util.List` to `com.uber.NEwList` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Updates the import statement from `java.util.List` to `com.uber.NEwList` | |
| # Updates the import statement from `java.util.List` to `com.uber.NewList` |
| replace_node = "m_name" | ||
| replace = "addToNewList" | ||
| holes = ["name"] | ||
| is_seed_rule = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every replace_node in this test suite is a constant, even though the surrounding context is matched by more complex regexps. For generality, we might want a regex that matches a more general pattern on the replaced node. How about converting OurMapOfX to HashMap<X> for various Xs (i.e. the code could have OurMapOfInteger, OurMapOfString, and OurMapOfOurMapOfLong)
Or is something like that not meant to be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a brilliant idea ! We can absolutely do that!
Done ✅
|
|
||
| // Will not get updated | ||
| List<String> b = getListStr(); | ||
| Integer item = getItemStr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Integer item = getItemStr(); | |
| String itemStr = getItemStr(); |
Or something like that. I know this code doesn't get compiled, but it still should make sense types-wise :)
src/utilities/regex_utilities.rs
Outdated
| /// * `replace_node` - node to replace | ||
| /// | ||
| /// # Returns | ||
| /// The range of the match in the source code and the corresponding mapping from tags to code snippets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Return seems to just be a vector of matches. Where is this mapping encoded?
src/utilities/regex_utilities.rs
Outdated
| pub(crate) fn get_all_matches_for_regex( | ||
| node: &Node, source_code: String, regex: &Regex, recursive: bool, replace_node: Option<String>, | ||
| ) -> Vec<Match> { | ||
| // let code_snippet = node.utf8_text(source_code.as_bytes()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete commented out line
src/utilities/regex_utilities.rs
Outdated
| all_matches | ||
| } | ||
|
|
||
| // Creates an hashmap from the capture group(name) to the corresponding code snippet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Creates an hashmap from the capture group(name) to the corresponding code snippet. | |
| // Creates a hashmap from the capture group (name) to the corresponding code snippet. |
src/utilities/regex_utilities.rs
Outdated
| let range_matches_inside_node = node.start_byte() <= captures.get(0).unwrap().start() | ||
| && node.end_byte() >= captures.get(0).unwrap().end(); | ||
| if (recursive && range_matches_inside_node) || range_matches_node { | ||
| let group_by_tag = if let Some(ref rn) = replace_node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group by tag is either the matched code string (plus location info) corresponding to the match group for the replace node if present or a the matched code string matching the full regex match if not, correct? Why is the name group_by_tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! your understanding is correct. It represents the match corresponding to the replace node (if present) or the entire match.
renamed to - replace_node_match
| range: Range::from_regex_match(mtch, source_code), | ||
| matches, | ||
| associated_comma: None, | ||
| associated_comments: Vec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we populate associated comments later and does that work for regex matches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
But for testing purposes, I added a scenario where we expect the associated comment to be cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but still curious about this.
src/models/capture_group_patterns.rs
Outdated
| fn validate(&self) -> Result<(), String> { | ||
| if self.pattern().starts_with("rgx ") { | ||
| panic!("Regex not supported") | ||
| let mut _val = &self.pattern()[4..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this logic here and in extract_regex() both? This probably should not be duplicated in case we ever change the identifier, and also should be something like len(REGEX_QUERY_PREFIX) and that prefix used elsewhere to match "rgx "
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of documentation/comment nits, but LGTM once those are addressed!
| # The below three rules do a dummy type migration from OurListOfInteger to List<Integer> | ||
|
|
||
| # Updates the import statement from `java.util.List` to `com.uber.NEwList` | ||
| # Updates the import statement from `java.util.List` to `com.uber.NewList` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Updates the import statement from `java.util.List` to `com.uber.NewList` | |
| # Updates the import statement from `com.uber.OurListOfInteger` to `java.util.List` |
| is_seed_rule = false | ||
|
|
||
|
|
||
| # The below three rules do a dummy type migration like - from OurMapOfStringInteger to HashMap<String, Integer> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # The below three rules do a dummy type migration like - from OurMapOfStringInteger to HashMap<String, Integer> | |
| # The below three rules do a dummy type migration from OurMapOf{T1}{T2} to HashMap<T1, T2>. For example, from OurMapOfStringInteger to HashMap<String, Integer>. This is to exercise non-constant regex matches for replace_node. |
| replace_node = "n" | ||
| replace = "" | ||
|
|
||
| # Adds Import to java.util.hashmap if absent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Adds Import to java.util.hashmap if absent | |
| # Adds Import to java.util.HashMap if absent |
| range: Range::from_regex_match(mtch, source_code), | ||
| matches, | ||
| associated_comma: None, | ||
| associated_comments: Vec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but still curious about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aah forgot to push my comments
| range: Range::from_regex_match(mtch, source_code), | ||
| matches, | ||
| associated_comma: None, | ||
| associated_comments: Vec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
But for testing purposes, I added a scenario where we expect the associated comment to be cleaned up.
src/utilities/regex_utilities.rs
Outdated
| let range_matches_inside_node = node.start_byte() <= captures.get(0).unwrap().start() | ||
| && node.end_byte() >= captures.get(0).unwrap().end(); | ||
| if (recursive && range_matches_inside_node) || range_matches_node { | ||
| let group_by_tag = if let Some(ref rn) = replace_node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! your understanding is correct. It represents the match corresponding to the replace node (if present) or the entire match.
renamed to - replace_node_match
| replace_node = "m_name" | ||
| replace = "addToNewList" | ||
| holes = ["name"] | ||
| is_seed_rule = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a brilliant idea ! We can absolutely do that!
Done ✅
Implement the logic for matching code with regex. Add tests for : * query as regex * filter as regex * propagate when hole is captured by regex * propagate built in rules when query is regex ------- [ghstack-poisoned]
|
Addressed comments 3ac21b0 and rebased |
ghstack-source-id: 78529c1 Pull Request resolved: uber/piranha#529
Implement the logic for matching code with regex.
Add tests for :
Stack from ghstack (oldest at bottom):