-
Notifications
You must be signed in to change notification settings - Fork 10
First support for secondary metadata query. #114
base: master
Are you sure you want to change the base?
Conversation
Add support for a secondary SQL query to retrieve metadata from a second table. This first implementation supports a model in which metadata is stored in a second table in one or more rows keyed by some unique value from the original doc record. Three new config parameters are used: db.extraMetadataSql (to specify the query), db.extraMetadataSqlParameters (to list the column names whose values are used as parameters in db.extraMetadataSql), and optionally db.extraMetadataColumns (to specify the columns from the metadata result set to use). The addMetadata/addMetadataTo* methods were refactored a bit to allow addMetadata to be used for the extraMetadata table as well. The current implementation requires the key into the metadata table to be part of the doc's unique key, but this is not necessarily a requirement to support retrieving metadata from a second table. The config parameters and new methods use "extraMetadata" in their names to identify them as applying to this secondary query. The prior existence of "db.metadataColumns" as a parameter applying to the primary doc query made new parameter names starting with "db.metadata*" feel confusing. A better naming scheme might be nice. This implementation is expected to be extended with support for other metadata structures, possibly in a manner similar to the ResponseGenerator, so that should be taken into account when thinking about config parameter names.
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
wiarlawd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments but mostly minor suggestions or questions.
Check the code coverage. It looks like you're missing tests for urlAndMetadataLister, including both getDocIds and getModifiedDocIds.
| } | ||
| } | ||
|
|
||
| if (!isNullOrEmptyString(cfg.getValue("db.extraMetadataSql"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be null; we've been drawing the conditions more narrowly in new code. I prefer using foo.isEmpty, but "".equals(foo) is popular in the code, too.
| extraMetadataColumns = new MetadataColumns(extraMetadataColumnsConfig); | ||
| log.config("extra metadata columns: " + extraMetadataColumns); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an else to check for non-empty db.extraMetadataColumns and db.extraMetadataSqlParameters and add them to ignored. You should add a test for that as well (I think ignoring both in one test is fine, or maybe even add it to an existing ignored property test; I'm not sure how those are organized.
| private void addMetadata(MetadataHandler meta, ResultSet rs) | ||
| throws SQLException, IOException { | ||
| private void addMetadata(MetadataHandler meta, ResultSet rs, | ||
| MetadataColumns rsMetadataColumns) throws SQLException, IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just columns? rsMetadataColumns made me expect a ResultSetMetaData object.
| private void addExtraMetadataToRecordBuilder( | ||
| final DocIdPusher.Record.Builder builder, Connection conn, | ||
| String uniqueId) throws SQLException, IOException { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete blank line.
| final DocIdPusher.Record.Builder builder, Connection conn, | ||
| String uniqueId) throws SQLException, IOException { | ||
|
|
||
| try (PreparedStatement stmt = getExtraMetadataFromDb(conn, uniqueId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What made you decide against the extra helper method to share this try statement between the two new methods?
| return st; | ||
| } | ||
|
|
||
| private PreparedStatement getExtraMetadataFromDb(Connection conn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're planning to rename these methods, so lets go with prepareExtraMetadataStatement for this one now, so the change doesn't collide with this one.
| return "UniqueKey(" + docIdSqlCols + "," + types + "," + contentSqlCols + "," | ||
| + aclSqlCols + ")"; | ||
| return "UniqueKey(" + docIdSqlCols + "," + types + "," | ||
| + contentSqlCols + "," + aclSqlCols + "," + metadataSqlCols + ")"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why re-wrap this and modify the first line? Random or on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line was triggering my 80-cols check and since I was adding to the string anyway, I changed it to get rid of the warning.
|
|
||
| Map<String, String> configEntries = new HashMap<String, String>(); | ||
| configEntries.put("db.uniqueKey", "id:int"); | ||
| configEntries.put("db.everyDocIdSql", "select * from data"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused, here and the other new tests, so put it after the usual comment to that effect.
| configEntries.put("db.extraMetadataSql", | ||
| "select * from metadata where id = ?"); | ||
| configEntries.put("db.extraMetadataSqlParameters", "id"); | ||
| configEntries.put("db.extraMetadataColumns", "OTHER"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could either leave this lowercase, or maybe mixed case? Or, since I think we should have a test with a mapping in db.extraMetadataColumns, we could use uppercase in one, as here, and lower case in the other, as "other:col1".
wiarlawd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly ideas for more tests, and the issue with docId.isUrl and db.extraMetadataSql.
| private void addExtraMetadataToRecordBuilder( | ||
| final DocIdPusher.Record.Builder builder, Connection conn, | ||
| String uniqueId) throws SQLException, IOException { | ||
| MetadataHandler handler = new MetadataHandler() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe inline this as we do in addMetadataToXxx?
| public String toString() { | ||
| return "UniqueKey(" + docIdSqlCols + "," + types + "," + contentSqlCols + "," | ||
| + aclSqlCols + ")"; | ||
| + aclSqlCols + "," + metadataSqlCols + ")"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Har, OK. Either way is fine.
|
|
||
| Builder setMetadataSqlColumns(String metadataSqlColumns) | ||
| throws InvalidConfigurationException { | ||
| if (null == metadataSqlColumns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Yoda conditions, we do not. (Used in the past, they have been, but no longer.)
| } | ||
|
|
||
| @Test | ||
| public void testInit_ignoredPropertyes_extraMetadata() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Propertyes/Properties/, unless these are specifically ye olde style propertyes.
Also, I think testInitExtraMetadata_..., for consistency with the other names (we aren't using testInit_... anywhere, though perhaps that would be better).
| String columnValue = decodeSlashInData(parts[i]); | ||
| zip.put(docIdSqlCols.get(i), columnValue); | ||
| if (docIdIsUrl) { | ||
| zip.put(docIdSqlCols.get(0), uniqueId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this. It feels so special case: if your doc ID is an URL (whether using urlAndMetadataLister mode or not), then your extraMetadataSql can only take that URL as a unique key parameter. We should make that explicit somewhere, maybe in a comment here. As we discussed, I think it would be preferable to try the route you suggested originally where the extraMetadataSqlParameters come from the singleDocContentSql result set, rather than from the doc ID. Maybe just a TODO here for now, I'm OK with exploring that in a separate change, but PJ may not be.
| throws SQLException, IOException { | ||
| synchronized (this) { | ||
| if (metadataColumns == null) { | ||
| metadataColumns = new MetadataColumns(rs.getMetaData()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line isn't tested, since you lifted this out of the helper in two places, only the other line is executed.
| addMetadata(handler, rs, extraMetadataColumns); | ||
| } | ||
| } catch (SQLException ex) { | ||
| throw new IOException(ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is untested. I'm not sure how to trigger this. A flat invalid query would fail in validation. Maybe a type mismatch in the parameter expression: <int-field> = ? with a character parameter? I can look harder if you can't find something easily.
| moreEntries.put("db.uniqueKey", "id:int"); | ||
| moreEntries.put("db.extraMetadataSql", | ||
| "select other from data where id = ?"); | ||
| moreEntries.put("db.extraMetadataSqlParameters", "id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a test that does not specify this? By default, db.uniqueKey should be used, so we should have a test of that, which could maybe be this one. I think we should have some test where the unique key has two columns, one used in singleDocContentSql, and the other used in extraMetadataSql.
| } | ||
|
|
||
| @Test | ||
| public void testExtraMetadata_extraMetadataColumns_Mapping() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Mapping/mapping/
| pusher.getRecords()); | ||
| } | ||
|
|
||
| @Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another test that I don't see from before that we could have had, is to make sure that getDocIds and getModifiedDocIds really do ignore the metadata config when docIdIsUrl. We have lots of init tests, and one each for getDocIds and getModifiedDocIds, to check the URL and date handling. I think we should either add some metadata to those, or have a new pair of tests with just a single row, and both metadataColumns and extraMetadataSql configured, and verify that no metadata is returned (and maybe extraMetadataSql could be syntactically invalid; though I think we "ignore" the property, but still validate the SQL?).
wiarlawd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still waiting for PJ's review.
| private void setSqlValues(PreparedStatement st, String uniqueId, | ||
| List<String> sqlCols) throws SQLException { | ||
| Map<String, String> zip = new TreeMap<>(String.CASE_INSENSITIVE_ORDER); | ||
| // Parameters to SQL queries are taken from the unique id. When reading extraMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
80 cols
| pusher.getRecords()); | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete added blank line.
| "select id, name from data where id = ?"); | ||
| configEntries.put("db.modeOfOperation", "rowToText"); | ||
| configEntries.put("db.metadataColumns", "name"); | ||
| configEntries.put("db.includeAllColumnsAsMetadata", "true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this setting adds coverage for the case you noted above for line 694.
Add support for a secondary SQL query to retrieve metadata from a
second table. This first implementation supports a model in which
metadata is stored in a second table in one or more rows keyed by
some unique value from the original doc record. Three new config
parameters are used: db.extraMetadataSql (to specify the query),
db.extraMetadataSqlParameters (to list the column names whose
values are used as parameters in db.extraMetadataSql), and
optionally db.extraMetadataColumns (to specify the columns from
the metadata result set to use).
The addMetadata/addMetadataTo* methods were refactored a bit to
allow addMetadata to be used for the extraMetadata table as well.
The current implementation requires the key into the metadata
table to be part of the doc's unique key, but this is not
necessarily a requirement to support retrieving metadata from a
second table.
The config parameters and new methods use "extraMetadata" in
their names to identify them as applying to this secondary
query. The prior existence of "db.metadataColumns" as a parameter
applying to the primary doc query made new parameter names
starting with "db.metadata*" feel confusing. A better naming
scheme might be nice.
This implementation is expected to be extended with support for
other metadata structures, possibly in a manner similar to the
ResponseGenerator, so that should be taken into account when
thinking about config parameter names.