created map-enc to force main econding to specific type #27

boymaas · 2018-01-07T10:19:19Z

needed bigquery-io write-with-schema to work and as such changed the following. Maybe I am missing something, if not this could be a potential fix for other places which have the same problem.

and used this to fix a problem I had inserting rows into bitquery due to default main output encoding of the df-map function

bfabry · 2018-01-08T19:54:08Z

github's inability to notify me about PR's I actually care about continues... sorry for the delay @boymaas we'll look at this

bfabry · 2018-01-08T20:09:06Z

ok, some initial thoughts:

First, thank you for submitting a patch!

I believe you're right there's a bug here. I don't see how that function could work without something specifying that the output coder is a TableRow coder of some type.

I'm currently a bit torn about whether expanding the number of pipeline functions so that we have ones that take a Coder is the best way forward. Another possible option would be a function that actually takes the PCollection and sets the output coder, which would mean we could reuse that function for either df-map or df-mapcat or even df-apply-dofn and pardo/create-and-apply. This would look like our existing df/output-is-kv and would be applied something like

       (-> pcoll
            (df/df-map "make-table-row" [#'table-row-maker schema])
            (df/df-output-is (TableRowJsonCoder/of))
            (.apply "write-to-bquery" (-> (BigQueryIO/writeTableRows)
                                          (.to output)
                                          (.withSchema bq-schema)
                                          (.withWriteDisposition (write-disposition WriteDispostions))
                                          (.withCreateDisposition (create-disposition CreateDispositions)))))))))

Or even another option would be to create versions of df-map/df-mapcat etc that take an options map that gets merged with the one passed to create-and-apply allowing you to override certain things. This is all really a question of where we want to go with the API so I think we want to wait till next week when @pelletier can weigh in.

Thank you for changing the string concatenation to a composite. That's inarguably better.

Finally, I need to figure out what we need to do to accept patches from people who aren't zendesk employees.

boymaas · 2018-01-09T08:41:45Z

No problem, thank you for opensourcing this library! This has save me a lot of work, map-enc was a quick fix to get things working until I had confirmed this was a bug, and not me using the library the wrong way.

I prefer your solution as personally I favour composition above all else. The only drawback I see is that in the solution:

(-> pcoll
            (df/df-map "make-table-row" [#'table-row-maker schema])
            (df/df-output-is (TableRowJsonCoder/of))
            (.apply "write-to-bquery" (-> (BigQueryIO/writeTableRows)
                                          (.to output)
                                          (.withSchema bq-schema)
                                          (.withWriteDisposition (write-disposition WriteDispostions))
                                          (.withCreateDisposition (create-disposition CreateDispositions)))))))))

assuming this would add another step in the pipeline, it would make the whole processing pipeline a bit less efficient. If that is the case I would go for options.

Ideally would be of course to rewrite the composition part of the code to use edn data structures which you can build/compose and compile down to a pipeline "onyx style". This would allow for further optimisations in a later stage. Most of the hard stuff has already been tackled.

bfabry · 2018-01-09T17:46:48Z

assuming this would add another step in the pipeline, it would make the whole processing pipeline a bit less efficient. If that is the case I would go for options.

It wouldn't actually add another step to the pipeline, from a beam pov it's identical, just different api syntax from our pov. No performance impact.

Ideally would be of course to rewrite the composition part of the code to use edn data structures which you can build/compose and compile down to a pipeline "onyx style". This would allow for further optimisations in a later stage. Most of the hard stuff has already been tackled.

The idea has merit, the tricky bit is figuring out how to do it in such a way that changes/extensions to beam can be integrated into that datastructure with minimal/no work. For the time being the beam api has been changing so rapidly and dramatically that it hasn't really been possible

pelletier · 2018-01-23T20:52:39Z

@boymaas Sorry for the late reply. We discussed it internally but forgot to reply back to this pull request.

I do not want to continue expending the number of df-* functions we have. I'd prefer to see a set-coder function that can be applied to a pipeline to specify the coder to use. This mimics the Beam API (setCoder()) and offers the most flexibility.

dgriff1 · 2019-01-29T15:33:55Z

Alternatively, it might be nice to accept & args in the df-* functions. A need I have is getting the context and window, you could also specify the coders there. Something like (df-map pcoll "transform" #'transform-fn :with-context :with-window :with-coder (Coder.))

created map-enc to force main econding to specific type

07debff

and used this to fix a problem I had inserting rows into bitquery due to default main output encoding of the df-map function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

created map-enc to force main econding to specific type #27

created map-enc to force main econding to specific type #27

Uh oh!

boymaas commented Jan 7, 2018

Uh oh!

bfabry commented Jan 8, 2018

Uh oh!

bfabry commented Jan 8, 2018 •

edited

Loading

Uh oh!

boymaas commented Jan 9, 2018

Uh oh!

bfabry commented Jan 9, 2018 •

edited

Loading

Uh oh!

pelletier commented Jan 23, 2018

Uh oh!

dgriff1 commented Jan 29, 2019

Uh oh!

Uh oh!

created map-enc to force main econding to specific type #27

Are you sure you want to change the base?

created map-enc to force main econding to specific type #27

Uh oh!

Conversation

boymaas commented Jan 7, 2018

Uh oh!

bfabry commented Jan 8, 2018

Uh oh!

bfabry commented Jan 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boymaas commented Jan 9, 2018

Uh oh!

bfabry commented Jan 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelletier commented Jan 23, 2018

Uh oh!

dgriff1 commented Jan 29, 2019

Uh oh!

Uh oh!

bfabry commented Jan 8, 2018 •

edited

Loading

bfabry commented Jan 9, 2018 •

edited

Loading