-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataframes from PostgresQL queries #28
Comments
The way to do that is to use vector-row-reader. It was created specifically for this purpose, but doesn't appear to have made it into the cl-postgres documentation. If it does what you like, a pull request in the docs will be greatly appreciated (either in cl-postgres and/or here in data-frames. If in data frames, send me the paragraphs and I'll insert so you don't need the whole doc toolchain.) See marijnh/Postmodern#272 for the discussion that led to |
Thanks, I'll check that out ASAP and get back to you. |
That's what, I think, the pivot table is for (refer to the issue in Postmodern). Pivot rows->columns, then use |
(Sorry, deleted my previous reply, probably shouldn't have) I don't think pivoting does what I need; it's not just rows->columns. You can see from the crosstab tutorial that it's creating columns based on the number of distinct values in a particular column of the original data set. That's valid for some use cases but my data is already in postgres exactly how I want it represented in the data-frame so a pivot operation will actually distort things. For clarity: My data is financial OHLCV data, so what I really want is a column vector for each of Timestamp/Open/High/Low/Close/Volume. If I pivot on any of those it'll just make things a mess. I'm starting to think I will have to massage it by hand in CL; not a huge deal, it's just a lot of allocations I was hoping to avoid. |
This just 'feels' like it should be simpler than we're making it. Before doing it by hand in CL, perhaps re-open the issue in the Post-modern repo. Sabre and I talked about this before and he seems to think it's easy. He's a helpful guy. |
It definitely does, but SQL is fundamentally row-based so I'm starting to think it might not be possible unless there's a Postgres feature I'm not aware of. In case anyone else lands here via google, here's what I'm using as a workaround in the interim: ;; wherein postmodern, via the :vectors results style, is returning
;; a vector-of-vectors where each vector/row is of the form:
;; #(id date-time-group open high low close volume)
(defun query-to-df (ohlcv-data)
(let ((n (length ohlcv-data)))
(let ((open (make-array n))
(high (make-array n))
(low (make-array n))
(close (make-array n))
(dtg (make-array n))
(keys #(:dtg :open :high :low :close)))
(loop for row across ohlcv-data
for i from 0
do (progn
(setf (elt open i) (elt row 2))
(setf (elt high i) (elt row 3))
(setf (elt low i) (elt row 4))
(setf (elt close i) (elt row 5))
(setf (elt dtg i) (elt row 1))))
(df:make-df keys (list dtg open high low close))))) |
Curious whether this works for you. Assume ohlcv-data is returned from a query that looks like (with relevant where clause): (pomo:query "select dtg, open, high, low, close from ohlcv") Is the following equivalent to your function above? (defun query-to-df (ohlcv-data)
(let ((keys #(:dtg :open :high :low :close)))
(df:make-df keys (loop for x in (apply #'mapcar #'list ohlcv-data)
collect (coerce x 'vector))))) |
@sabracrolleton, is there a way to get the results of a query as a row-major array? If so then aops:each-index could be used to transpose the matrix: (defparameter A #2A((1 2)
(3 4)))
;; Transpose
(aops:each-index (i j) (aref A j i)) ; => #2A((1 3)
; (2 4)) |
I have to admit I am getting a little confused between when you are talking sets (sql results) and arrays. Postgresql arrays can only have a single data type and it is not clear to me that the dtg column is the same data type as the open, high, low, close columns.
is going to return (just using placeholders here)
but the work was done in the Postmodern cl-postgres::vector-row-reader function, not in Postgresql and I do not think it is in the order @nathanvy wants. My query-to-df function suggestion above obviously does not use the vector-row-reader function, it just transposes the rows first, then coerces them into vectors. I think it does the same thing as @nathanvy 's function, but I did not test the efficiency of either. In terms of thinking about what Postgresql's built in functions could do, if dtg was the same data type as the others, then Postgresql's array_agg function looking something like:
would return something like:
but I do not think that is in the order you want. If dtg is not the same data type as open, high, low, close, then it cannot be in the Postgresql array. So something like:
would result in something that is even less what you want:
You might think about putting open, high, low and close into a Postgresql array (changing the database table structure), but that would have its own pluses and minuses. I am not sure whether any of this is helpful or if I have just confused things more. |
Ah, I seem to remember the homogeneous type requirement being an issue in previous discussions. Perhaps this is the right track then for a generic table->data-frame converter. Loop over all the column names and get them one by one as vectors. It's rare that a data-frame, in an analytics scenario, will have all the same data types. |
Okay, weighing in on mobile so please excuse formatting.
Do I have that about right? If my understanding is correct then I'm happy to play with the profiler and find a computationally cheap way to massage the data from one format to the other. No problem. |
This doesn't eval properly for me. Something about inappropriate usage of |
@nathanvy your understanding is correct. With respect to my alternative query-to-df function. It does NOT use the :vectors result style.
|
Roger that. I suppose we can close this issue now. Thanks both of you. |
One more way this could work using the vector-of-vectors returned by
To create a matrix from a vector of vectors: (aops:combine #(#(0 1) #(2 3))) ; => #2A((0 1)
; (2 3))
(defparameter A #2A((1 2)
(3 4)))
;; Transpose
(aops:each-index (i j) (aref A j i)) ; => #2A((1 3)
; (2 4)) and then (matrix-df #(:a :b) #2A((1 2)
(3 4)))
;#<DATA-FRAME (2 observations of 2 variables)> I don't know how efficient that will be, but it's convenient. @nathanvy, I'd appreciate a report on how well it works when you decide on a method and know it's efficiency. I want to update the |
Will that work with heterogenous data types? It doesn't seem to work for me. Here's a csv of the type of data I'm working with. The columns are, in order:
|
It should work with heterogeneous data types. What error are you encountering? |
(defun query-to-df-2 (ohlcv-data)
(let ((arr (aops:combine ohlcv-data))
(keys #(:dtg :open :high :low :close)))
(df:matrix-df keys (aops:each-index (i j) (aref arr j i)))))
(let* ((the-date (local-time:encode-timestamp 0 0 0 0 1 6 2024))
(data (pomo:with-connection (list (getf *config* :db-name)
(getf *config* :db-username)
(getf *config* :db-pw)
(getf *config* :db-hostname))
(pomo:query (format nil "select dtg, open, high, low, close from ndx where dtg > '~a' and dtg < '~a' order by dtg asc" the-date (local-time:now)) :vectors))))
(query-to-df-2 data))
|
That basically looks correct. As a next step I'd try capturing the output of Edit: I've just tested a round trip from a LS-USER> mtcars
#<DATA-FRAME (32 observations of 12 variables)
Motor Trend Car Road Tests>
LS-USER> (as-array mtcars)
#2A(("Mazda RX4" 21 6 160 110 3.9D0 2.62D0 16.46D0 0 1 4 4) ("Mazda RX4 Wag" 21 6 160 110 3.9D0 2.875D0 17.02D0 0 1 4 4) ("Datsun 710" 22.8D0 4 108 93 3.85D0 2.32D0 18.61D0 1 1 4 1) ("Hornet 4 Drive" 21.4D0 6 258 110 3.08D0 3.215D0 19.44D0 1 0 3 1) ("Hornet Sportabout" 18.7D0 8 360 175 3.15D0 3.44D0 17.02D0 0 0 3 2) ("Valiant" 18.1D0 6 225 105 2.76D0 3.46D0 20.22D0 1 0 3 1) ("Duster 360" 14.3D0 8 360 245 3.21D0 3.57D0 15.84D0 0 0 3 4) ("Merc 240D" 24.4D0 4 146.7D0 62 3.69D0 3.19D0 20 1 0 4 2) ("Merc 230" 22.8D0 4 140.8D0 95 3.92D0 3.15D0 22.9D0 1 0 4 2) ("Merc 280" 19.2D0 6 167.6D0 123 3.92D0 3.44D0 18.3D0 1 0 4 4) ("Merc 280C" 17.8D0 6 167.6D0 123 3.92D0 3.44D0 18.9D0 1 0 4 4) ("Merc 450SE" 16.4D0 8 275.8D0 180 3.07D0 4.07D0 17.4D0 0 0 3 3) ("Merc 450SL" 17.3D0 8 275.8D0 180 3.07D0 3.73D0 17.6D0 0 0 3 3) ("Merc 450SLC" 15.2D0 8 275.8D0 180 3.07D0 3.78D0 18 0 0 3 3) ("Cadillac Fleetwood" 10.4D0 8 472 205 2.93D0 5.25D0 17.98D0 0 0 3 4) ("Lincoln Continental" 10.4D0 8 460 215 3 5.424D0 17.82D0 0 0 3 4) ("Chrysler Imperial" 14.7D0 8 440 230 3.23D0 5.345D0 17.42D0 0 0 3 4) ("Fiat 128" 32.4D0 4 78.7D0 66 4.08D0 2.2D0 19.47D0 1 1 4 1) ("Honda Civic" 30.4D0 4 75.7D0 52 4.93D0 1.615D0 18.52D0 1 1 4 2) ("Toyota Corolla" 33.9D0 4 71.1D0 65 4.22D0 1.835D0 19.9D0 1 1 4 1) ("Toyota Corona" 21.5D0 4 120.1D0 97 3.7D0 2.465D0 20.01D0 1 0 3 1) ("Dodge Challenger" 15.5D0 8 318 150 2.76D0 3.52D0 16.87D0 0 0 3 2) ("AMC Javelin" 15.2D0 8 304 150 3.15D0 3.435D0 17.3D0 0 0 3 2) ("Camaro Z28" 13.3D0 8 350 245 3.73D0 3.84D0 15.41D0 0 0 3 4) ("Pontiac Firebird" 19.2D0 8 400 175 3.08D0 3.845D0 17.05D0 0 0 3 2) ("Fiat X1-9" 27.3D0 4 79 66 4.08D0 1.935D0 18.9D0 1 1 4 1) ("Porsche 914-2" 26 4 120.3D0 91 4.43D0 2.14D0 16.7D0 0 1 5 2) ("Lotus Europa" 30.4D0 4 95.1D0 113 3.77D0 1.513D0 16.9D0 1 1 5 2) ("Ford Pantera L" 15.8D0 8 351 264 4.22D0 3.17D0 14.5D0 0 1 5 4) ("Ferrari Dino" 19.7D0 6 145 175 3.62D0 2.77D0 15.5D0 0 1 5 6) ("Maserati Bora" 15 8 301 335 3.54D0 3.57D0 14.6D0 0 1 5 8) ("Volvo 142E" 21.4D0 4 121 109 4.11D0 2.78D0 18.6D0 1 1 4 2))
LS-USER> (split * 1)
#(#("Mazda RX4" 21 6 160 110 3.9D0 2.62D0 16.46D0 0 1 4 4) #("Mazda RX4 Wag" 21 6 160 110 3.9D0 2.875D0 17.02D0 0 1 4 4) #("Datsun 710" 22.8D0 4 108 93 3.85D0 2.32D0 18.61D0 1 1 4 1) #("Hornet 4 Drive" 21.4D0 6 258 110 3.08D0 3.215D0 19.44D0 1 0 3 1) #("Hornet Sportabout" 18.7D0 8 360 175 3.15D0 3.44D0 17.02D0 0 0 3 2) #("Valiant" 18.1D0 6 225 105 2.76D0 3.46D0 20.22D0 1 0 3 1) #("Duster 360" 14.3D0 8 360 245 3.21D0 3.57D0 15.84D0 0 0 3 4) #("Merc 240D" 24.4D0 4 146.7D0 62 3.69D0 3.19D0 20 1 0 4 2) #("Merc 230" 22.8D0 4 140.8D0 95 3.92D0 3.15D0 22.9D0 1 0 4 2) #("Merc 280" 19.2D0 6 167.6D0 123 3.92D0 3.44D0 18.3D0 1 0 4 4) #("Merc 280C" 17.8D0 6 167.6D0 123 3.92D0 3.44D0 18.9D0 1 0 4 4) #("Merc 450SE" 16.4D0 8 275.8D0 180 3.07D0 4.07D0 17.4D0 0 0 3 3) #("Merc 450SL" 17.3D0 8 275.8D0 180 3.07D0 3.73D0 17.6D0 0 0 3 3) #("Merc 450SLC" 15.2D0 8 275.8D0 180 3.07D0 3.78D0 18 0 0 3 3) #("Cadillac Fleetwood" 10.4D0 8 472 205 2.93D0 5.25D0 17.98D0 0 0 3 4) #("Lincoln Continental" 10.4D0 8 460 215 3 5.424D0 17.82D0 0 0 3 4) #("Chrysler Imperial" 14.7D0 8 440 230 3.23D0 5.345D0 17.42D0 0 0 3 4) #("Fiat 128" 32.4D0 4 78.7D0 66 4.08D0 2.2D0 19.47D0 1 1 4 1) #("Honda Civic" 30.4D0 4 75.7D0 52 4.93D0 1.615D0 18.52D0 1 1 4 2) #("Toyota Corolla" 33.9D0 4 71.1D0 65 4.22D0 1.835D0 19.9D0 1 1 4 1) #("Toyota Corona" 21.5D0 4 120.1D0 97 3.7D0 2.465D0 20.01D0 1 0 3 1) #("Dodge Challenger" 15.5D0 8 318 150 2.76D0 3.52D0 16.87D0 0 0 3 2) #("AMC Javelin" 15.2D0 8 304 150 3.15D0 3.435D0 17.3D0 0 0 3 2) #("Camaro Z28" 13.3D0 8 350 245 3.73D0 3.84D0 15.41D0 0 0 3 4) #("Pontiac Firebird" 19.2D0 8 400 175 3.08D0 3.845D0 17.05D0 0 0 3 2) #("Fiat X1-9" 27.3D0 4 79 66 4.08D0 1.935D0 18.9D0 1 1 4 1) #("Porsche 914-2" 26 4 120.3D0 91 4.43D0 2.14D0 16.7D0 0 1 5 2) #("Lotus Europa" 30.4D0 4 95.1D0 113 3.77D0 1.513D0 16.9D0 1 1 5 2) #("Ford Pantera L" 15.8D0 8 351 264 4.22D0 3.17D0 14.5D0 0 1 5 4) #("Ferrari Dino" 19.7D0 6 145 175 3.62D0 2.77D0 15.5D0 0 1 5 6) #("Maserati Bora" 15 8 301 335 3.54D0 3.57D0 14.6D0 0 1 5 8) #("Volvo 142E" 21.4D0 4 121 109 4.11D0 2.78D0 18.6D0 1 1 4 2))
LS-USER> (combine *)
#2A(("Mazda RX4" 21 6 160 110 3.9D0 2.62D0 16.46D0 0 1 4 4) ("Mazda RX4 Wag" 21 6 160 110 3.9D0 2.875D0 17.02D0 0 1 4 4) ("Datsun 710" 22.8D0 4 108 93 3.85D0 2.32D0 18.61D0 1 1 4 1) ("Hornet 4 Drive" 21.4D0 6 258 110 3.08D0 3.215D0 19.44D0 1 0 3 1) ("Hornet Sportabout" 18.7D0 8 360 175 3.15D0 3.44D0 17.02D0 0 0 3 2) ("Valiant" 18.1D0 6 225 105 2.76D0 3.46D0 20.22D0 1 0 3 1) ("Duster 360" 14.3D0 8 360 245 3.21D0 3.57D0 15.84D0 0 0 3 4) ("Merc 240D" 24.4D0 4 146.7D0 62 3.69D0 3.19D0 20 1 0 4 2) ("Merc 230" 22.8D0 4 140.8D0 95 3.92D0 3.15D0 22.9D0 1 0 4 2) ("Merc 280" 19.2D0 6 167.6D0 123 3.92D0 3.44D0 18.3D0 1 0 4 4) ("Merc 280C" 17.8D0 6 167.6D0 123 3.92D0 3.44D0 18.9D0 1 0 4 4) ("Merc 450SE" 16.4D0 8 275.8D0 180 3.07D0 4.07D0 17.4D0 0 0 3 3) ("Merc 450SL" 17.3D0 8 275.8D0 180 3.07D0 3.73D0 17.6D0 0 0 3 3) ("Merc 450SLC" 15.2D0 8 275.8D0 180 3.07D0 3.78D0 18 0 0 3 3) ("Cadillac Fleetwood" 10.4D0 8 472 205 2.93D0 5.25D0 17.98D0 0 0 3 4) ("Lincoln Continental" 10.4D0 8 460 215 3 5.424D0 17.82D0 0 0 3 4) ("Chrysler Imperial" 14.7D0 8 440 230 3.23D0 5.345D0 17.42D0 0 0 3 4) ("Fiat 128" 32.4D0 4 78.7D0 66 4.08D0 2.2D0 19.47D0 1 1 4 1) ("Honda Civic" 30.4D0 4 75.7D0 52 4.93D0 1.615D0 18.52D0 1 1 4 2) ("Toyota Corolla" 33.9D0 4 71.1D0 65 4.22D0 1.835D0 19.9D0 1 1 4 1) ("Toyota Corona" 21.5D0 4 120.1D0 97 3.7D0 2.465D0 20.01D0 1 0 3 1) ("Dodge Challenger" 15.5D0 8 318 150 2.76D0 3.52D0 16.87D0 0 0 3 2) ("AMC Javelin" 15.2D0 8 304 150 3.15D0 3.435D0 17.3D0 0 0 3 2) ("Camaro Z28" 13.3D0 8 350 245 3.73D0 3.84D0 15.41D0 0 0 3 4) ("Pontiac Firebird" 19.2D0 8 400 175 3.08D0 3.845D0 17.05D0 0 0 3 2) ("Fiat X1-9" 27.3D0 4 79 66 4.08D0 1.935D0 18.9D0 1 1 4 1) ("Porsche 914-2" 26 4 120.3D0 91 4.43D0 2.14D0 16.7D0 0 1 5 2) ("Lotus Europa" 30.4D0 4 95.1D0 113 3.77D0 1.513D0 16.9D0 1 1 5 2) ("Ford Pantera L" 15.8D0 8 351 264 4.22D0 3.17D0 14.5D0 0 1 5 4) ("Ferrari Dino" 19.7D0 6 145 175 3.62D0 2.77D0 15.5D0 0 1 5 6) ("Maserati Bora" 15 8 301 335 3.54D0 3.57D0 14.6D0 0 1 5 8) ("Volvo 142E" 21.4D0 4 121 109 4.11D0 2.78D0 18.6D0 1 1 4 2))
LS-USER> (matrix-df (keys mtcars) *)
#<DATA-FRAME (32 observations of 12 variables)> |
They are, but it looks like there are dozens of zero-elements towards the end of the outer vector. With the |
Hi,
In my application I would like to query a Postgres database using postmodern and then perform some analysis with lisp-stat.
postmodern
returns data as a list of rows, each of those a list. For example for a tableFOO
with columnsid
andbar
, a select statement might return:And in lisp that looks like:
I would like to get the list that postmodern returns into a lisp-stat data frame. Is there a convenient way to do this? Could I just walk the list from postgres using
loop
and then build up a dataframe row-by-row?The text was updated successfully, but these errors were encountered: