Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-streaming DuckDBClient.query #1024

Merged
merged 3 commits into from
Mar 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```

This returns an array of 10 rows, inspected here:
This returns an array of 10 rows as an [Apache Arrow](./lib/arrow) table, inspected here:

```js echo
top10
Expand Down Expand Up @@ -144,11 +144,11 @@ Plot.plot({
SQL fenced code blocks are shorthand for the `sql` tagged template literal. You can invoke the `sql` tagged template literal directly like so:

```js echo
const rows = await sql`SELECT random() AS random`;
const [row] = await sql`SELECT random() AS random`;
```

```js echo
rows[0].random
row.random
```

The `sql` tag is useful for querying data within JavaScript, such as to query data for visualization without needing to create a separate SQL code block and giving the data a name. For example, below we use DuckDB to bin stars by brightness, and then visualize the bins as a histogram using a [rect mark](https://observablehq.com/plot/marks/rect).
Expand Down
66 changes: 13 additions & 53 deletions src/client/stdlib/duckdb.js
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ export class DuckDBClient {
throw error;
}
return {
schema: getArrowTableSchema(batch.value),
schema: batch.value.schema,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’re now deviating from the DatabaseClient specification used in Observable notebooks, but I think it’s better to standardize on the Apache Arrow than our own specification. (Also this schema is unused in Framework — it exists in notebooks to power the data table cell, and we haven’t written than in Framework yet #23, but when we do, it should be based on the Apache Arrow schema.)

async *readRows() {
try {
while (!batch.done) {
Expand All @@ -106,15 +106,20 @@ export class DuckDBClient {
}

async query(query, params) {
const result = await this.queryStream(query, params);
const results = [];
for await (const rows of result.readRows()) {
for (const row of rows) {
results.push(row);
const connection = await this._db.connect();
let table;
try {
if (params?.length > 0) {
const statement = await connection.prepare(query);
table = await statement.query(...params);
} else {
table = await connection.query(query);
}
} catch (error) {
await connection.close();
throw error;
}
results.schema = result.schema;
return results;
return table;
}

async queryRow(query, params) {
Expand Down Expand Up @@ -361,48 +366,3 @@ function isArrowTable(value) {
Array.isArray(value.schema.fields)
);
}

function getArrowTableSchema(table) {
return table.schema.fields.map(getArrowFieldSchema);
}

function getArrowFieldSchema(field) {
return {
name: field.name,
type: getArrowType(field.type),
nullable: field.nullable,
databaseType: `${field.type}`
};
}

// https://github.com/apache/arrow/blob/89f9a0948961f6e94f1ef5e4f310b707d22a3c11/js/src/enum.ts#L140-L141
function getArrowType(type) {
switch (type.typeId) {
case 2: // Int
return "integer";
case 3: // Float
case 7: // Decimal
return "number";
case 4: // Binary
case 15: // FixedSizeBinary
return "buffer";
case 5: // Utf8
return "string";
case 6: // Bool
return "boolean";
case 8: // Date
case 9: // Time
case 10: // Timestamp
return "date";
case 12: // List
case 16: // FixedSizeList
return "array";
case 13: // Struct
case 14: // Union
return "object";
case 11: // Interval
case 17: // Map
default:
return "other";
}
}