-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARKC-577, round two #1250
base: b3.0
Are you sure you want to change the base?
SPARKC-577, round two #1250
Conversation
…nently restored KeyspaceDef as a mechanism for accessing downstream "def" types (TableDef, IndexDef, etc.) from a Schema instance.
…, serializable metadata
…umns effectively duplicates columns and was only used in one spot.
…se toInternal. Hope was this would address some of the ongoing IT issues we're seeing... but it doesn't look like that was successful. Also expanded Schema's IT to include tests for additional functoinality (now that it's being provided largely by an impl backed by Java driver metadata)
…ter this change so with any luck we should be good now.
*/ | ||
def tableFromCassandra(session: CqlSession, | ||
keyspaceName: String, | ||
tableName: String): DriverTableDef = { | ||
|
||
fromCassandra(session, Some(keyspaceName), Some(tableName)).tables.headOption match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the use of headOption here is a bit cumbersome, but unfortunately we're stuck with fromCassandra() for the foreseeable future. There are a fair amount of users who access that functionality through com.datastax.spark.connector.schemaFromCassandra(), which is used in a number of places throughout the code base. It's probably cleaner to re-work this API to return KeyspaceDefs or TableDefs directly rather than using Schema as a common return type (which in turn requires this "Schema with only some keyspaces/tables" nonsense used here). That seemed like a larger and/or discrete effort, however... and this ticket is already big enough. :)
@absurdfarce I'm not sure if this is the way to go. I thought that we we want to use Driver types and get rid of SCC types completely. The less code to maintain the better. Do you think it would be possible to refactor SCC so that it relies solely on Driver types? |
There's a bit of a complication there @jtgrabowski , although I'm not completely sure it's fatal. The old Table/ColumnDef impl served two use cases in the code base. It represented data retrieved from C* metadata but it also could be created to define tables/columns which should be created by SCC. My original take on this ticket tried to separate those concerns... but it snowballed quickly. That led me to the approach used here of pulling these impls into traits and creating parallel impls. I suppose we might be able to get there by leaving the Table/ColumnDef impls alone and just changing the blahFromCassandra() calls to return Java driver types and just leave everything else alone. I believe we do have at least one case where TableDef.copy() is used to tweak something we get from C*, so in that case we'll have to construct a TableDef based on the underlying Java driver type... but that's probably doable. There's a second (smaller) problem as well: ColumnDef contains info that is maintained in ColumnMetadata and some info from TableMetadata. So callers interested in that functionality may need to be changed to get a column and then get the corresponding table. Unfortunately the Java driver doesn't allow that traversal directly; they'll have to get the table name and look it up as a separate op. That's entirely doable, but it does represent a bit of an increase in complexity. Anyways, if you're okay with the approach suggested above I can try this again and see how feasible that is to implement. |
I see. It's far more complicated than we all expected :) What would we gain with the current approach and the described one? I initially thought that we are going for less code to maintain but now I'm not sure if this whole thing is worth the effort. wdyt? |
Yeah, it's definitely more complicated than we expected. :) It seems to me that the following things are true:
Given all of these factors I think there's still a decent argument for trying the following:
Doing so gives us the decoupling between table creation and metadata and moves us further along in the goal of interacting with the Java driver types more directly in the code base. wdyt @jtgrabowski ? |
Let's give it a try if you think it's feasible. I traced couple of *FromCassandra invocations and it looks like it would require a bit of work to convert them. |
Description
How did the Spark Cassandra Connector Work or Not Work Before this Patch
Connector was using internal serializable reps for keyspace/table metadata because corresponding Java driver classes weren't serializable. This changed in v4.6.0.
General Design of the patch
Existing connector metadata types were converted to traits with existing impls becoming a "default" implementation. Also added trait impls based on Java driver metadata types. Worth noting that the existing representations served two distinct functions:
Schema loading functions return an impl based on Java driver metadata types so (1) is the default for most operations. (2) is primarily used in the ColumnMapper code.
Also worth noting: impl for (1) is largely a direct cut-through to Java driver types and methods, but in some cases this wasn't adequate to implement the trait. As an example: the connector metadata types were storing some information (such as clustering column info) at the column level while the Java metadata types represent this at the table level. To work around this problem the new impls (based on Java driver metadata types) act as providers of Java metadata types for the current level all the way up to the keyspace. So DriverColumnDef can provide column metadata for the column it represents as well as table and keyspace metadata for the appropriate structures. Distinct traits were implemented to identify which classes can provide which Java metadata type(s).
Fixes: SPARKC-577
How Has This Been Tested?
Still a WIP, hasn't been tested meaningfully yet
Checklist: