Skip to content
This repository was archived by the owner on Jun 23, 2022. It is now read-only.

Conversation

@aptls
Copy link

@aptls aptls commented Dec 16, 2017

Add support for a secondary SQL query to retrieve metadata from a
second table. This first implementation supports a model in which
metadata is stored in a second table in one or more rows keyed by
some unique value from the original doc record. Three new config
parameters are used: db.extraMetadataSql (to specify the query),
db.extraMetadataSqlParameters (to list the column names whose
values are used as parameters in db.extraMetadataSql), and
optionally db.extraMetadataColumns (to specify the columns from
the metadata result set to use).

The addMetadata/addMetadataTo* methods were refactored a bit to
allow addMetadata to be used for the extraMetadata table as well.

The current implementation requires the key into the metadata
table to be part of the doc's unique key, but this is not
necessarily a requirement to support retrieving metadata from a
second table.

The config parameters and new methods use "extraMetadata" in
their names to identify them as applying to this secondary
query. The prior existence of "db.metadataColumns" as a parameter
applying to the primary doc query made new parameter names
starting with "db.metadata*" feel confusing. A better naming
scheme might be nice.

This implementation is expected to be extended with support for
other metadata structures, possibly in a manner similar to the
ResponseGenerator, so that should be taken into account when
thinking about config parameter names.

Add support for a secondary SQL query to retrieve metadata from a
second table. This first implementation supports a model in which
metadata is stored in a second table in one or more rows keyed by
some unique value from the original doc record. Three new config
parameters are used: db.extraMetadataSql (to specify the query),
db.extraMetadataSqlParameters (to list the column names whose
values are used as parameters in db.extraMetadataSql), and
optionally db.extraMetadataColumns (to specify the columns from
the metadata result set to use).

The addMetadata/addMetadataTo* methods were refactored a bit to
allow addMetadata to be used for the extraMetadata table as well.

The current implementation requires the key into the metadata
table to be part of the doc's unique key, but this is not
necessarily a requirement to support retrieving metadata from a
second table.

The config parameters and new methods use "extraMetadata" in
their names to identify them as applying to this secondary
query. The prior existence of "db.metadataColumns" as a parameter
applying to the primary doc query made new parameter names
starting with "db.metadata*" feel confusing. A better naming
scheme might be nice.

This implementation is expected to be extended with support for
other metadata structures, possibly in a manner similar to the
ResponseGenerator, so that should be taken into account when
thinking about config parameter names.
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot. The email used to register you as an authorized contributor must be the email used for the Git commit.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

Copy link
Contributor

@wiarlawd wiarlawd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments but mostly minor suggestions or questions.

Check the code coverage. It looks like you're missing tests for urlAndMetadataLister, including both getDocIds and getModifiedDocIds.

}
}

if (!isNullOrEmptyString(cfg.getValue("db.extraMetadataSql"))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be null; we've been drawing the conditions more narrowly in new code. I prefer using foo.isEmpty, but "".equals(foo) is popular in the code, too.

extraMetadataColumns = new MetadataColumns(extraMetadataColumnsConfig);
log.config("extra metadata columns: " + extraMetadataColumns);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an else to check for non-empty db.extraMetadataColumns and db.extraMetadataSqlParameters and add them to ignored. You should add a test for that as well (I think ignoring both in one test is fine, or maybe even add it to an existing ignored property test; I'm not sure how those are organized.

private void addMetadata(MetadataHandler meta, ResultSet rs)
throws SQLException, IOException {
private void addMetadata(MetadataHandler meta, ResultSet rs,
MetadataColumns rsMetadataColumns) throws SQLException, IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just columns? rsMetadataColumns made me expect a ResultSetMetaData object.

private void addExtraMetadataToRecordBuilder(
final DocIdPusher.Record.Builder builder, Connection conn,
String uniqueId) throws SQLException, IOException {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete blank line.

final DocIdPusher.Record.Builder builder, Connection conn,
String uniqueId) throws SQLException, IOException {

try (PreparedStatement stmt = getExtraMetadataFromDb(conn, uniqueId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What made you decide against the extra helper method to share this try statement between the two new methods?

return st;
}

private PreparedStatement getExtraMetadataFromDb(Connection conn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're planning to rename these methods, so lets go with prepareExtraMetadataStatement for this one now, so the change doesn't collide with this one.

return "UniqueKey(" + docIdSqlCols + "," + types + "," + contentSqlCols + ","
+ aclSqlCols + ")";
return "UniqueKey(" + docIdSqlCols + "," + types + ","
+ contentSqlCols + "," + aclSqlCols + "," + metadataSqlCols + ")";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why re-wrap this and modify the first line? Random or on purpose?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line was triggering my 80-cols check and since I was adding to the string anyway, I changed it to get rid of the warning.


Map<String, String> configEntries = new HashMap<String, String>();
configEntries.put("db.uniqueKey", "id:int");
configEntries.put("db.everyDocIdSql", "select * from data");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused, here and the other new tests, so put it after the usual comment to that effect.

configEntries.put("db.extraMetadataSql",
"select * from metadata where id = ?");
configEntries.put("db.extraMetadataSqlParameters", "id");
configEntries.put("db.extraMetadataColumns", "OTHER");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could either leave this lowercase, or maybe mixed case? Or, since I think we should have a test with a mapping in db.extraMetadataColumns, we could use uppercase in one, as here, and lower case in the other, as "other:col1".

Copy link
Contributor

@wiarlawd wiarlawd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly ideas for more tests, and the issue with docId.isUrl and db.extraMetadataSql.

private void addExtraMetadataToRecordBuilder(
final DocIdPusher.Record.Builder builder, Connection conn,
String uniqueId) throws SQLException, IOException {
MetadataHandler handler = new MetadataHandler() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inline this as we do in addMetadataToXxx?

public String toString() {
return "UniqueKey(" + docIdSqlCols + "," + types + "," + contentSqlCols + ","
+ aclSqlCols + ")";
+ aclSqlCols + "," + metadataSqlCols + ")";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Har, OK. Either way is fine.


Builder setMetadataSqlColumns(String metadataSqlColumns)
throws InvalidConfigurationException {
if (null == metadataSqlColumns) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Yoda conditions, we do not. (Used in the past, they have been, but no longer.)

}

@Test
public void testInit_ignoredPropertyes_extraMetadata() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Propertyes/Properties/, unless these are specifically ye olde style propertyes.

Also, I think testInitExtraMetadata_..., for consistency with the other names (we aren't using testInit_... anywhere, though perhaps that would be better).

String columnValue = decodeSlashInData(parts[i]);
zip.put(docIdSqlCols.get(i), columnValue);
if (docIdIsUrl) {
zip.put(docIdSqlCols.get(0), uniqueId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. It feels so special case: if your doc ID is an URL (whether using urlAndMetadataLister mode or not), then your extraMetadataSql can only take that URL as a unique key parameter. We should make that explicit somewhere, maybe in a comment here. As we discussed, I think it would be preferable to try the route you suggested originally where the extraMetadataSqlParameters come from the singleDocContentSql result set, rather than from the doc ID. Maybe just a TODO here for now, I'm OK with exploring that in a separate change, but PJ may not be.

throws SQLException, IOException {
synchronized (this) {
if (metadataColumns == null) {
metadataColumns = new MetadataColumns(rs.getMetaData());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line isn't tested, since you lifted this out of the helper in two places, only the other line is executed.

addMetadata(handler, rs, extraMetadataColumns);
}
} catch (SQLException ex) {
throw new IOException(ex);
Copy link
Contributor

@wiarlawd wiarlawd Dec 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is untested. I'm not sure how to trigger this. A flat invalid query would fail in validation. Maybe a type mismatch in the parameter expression: <int-field> = ? with a character parameter? I can look harder if you can't find something easily.

moreEntries.put("db.uniqueKey", "id:int");
moreEntries.put("db.extraMetadataSql",
"select other from data where id = ?");
moreEntries.put("db.extraMetadataSqlParameters", "id");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a test that does not specify this? By default, db.uniqueKey should be used, so we should have a test of that, which could maybe be this one. I think we should have some test where the unique key has two columns, one used in singleDocContentSql, and the other used in extraMetadataSql.

}

@Test
public void testExtraMetadata_extraMetadataColumns_Mapping()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Mapping/mapping/

pusher.getRecords());
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another test that I don't see from before that we could have had, is to make sure that getDocIds and getModifiedDocIds really do ignore the metadata config when docIdIsUrl. We have lots of init tests, and one each for getDocIds and getModifiedDocIds, to check the URL and date handling. I think we should either add some metadata to those, or have a new pair of tests with just a single row, and both metadataColumns and extraMetadataSql configured, and verify that no metadata is returned (and maybe extraMetadataSql could be syntactically invalid; though I think we "ignore" the property, but still validate the SQL?).

Copy link
Contributor

@wiarlawd wiarlawd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still waiting for PJ's review.

private void setSqlValues(PreparedStatement st, String uniqueId,
List<String> sqlCols) throws SQLException {
Map<String, String> zip = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
// Parameters to SQL queries are taken from the unique id. When reading extraMetadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 cols

pusher.getRecords());
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete added blank line.

"select id, name from data where id = ?");
configEntries.put("db.modeOfOperation", "rowToText");
configEntries.put("db.metadataColumns", "name");
configEntries.put("db.includeAllColumnsAsMetadata", "true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this setting adds coverage for the case you noted above for line 694.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants