diff --git a/docs/document_loader.ipynb b/docs/document_loader.ipynb index 7be30a9b..a098657d 100644 --- a/docs/document_loader.ipynb +++ b/docs/document_loader.ipynb @@ -229,34 +229,6 @@ ")" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a table (if not already exists)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_google_cloud_sql_pg import Column\n", - "\n", - "await engine.ainit_document_table(\n", - " table_name=TABLE_NAME,\n", - " content_column=\"product_name\",\n", - " metadata_columns=[\n", - " Column(\"id\", \"SERIAL\", nullable=False),\n", - " Column(\"content\", \"VARCHAR\", nullable=False),\n", - " Column(\"description\", \"VARCHAR\", nullable=False),\n", - " ],\n", - " metadata_json_column=\"metadata\",\n", - " store_metadata=True,\n", - ")" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -286,21 +258,20 @@ ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "z-AZyzAQ7bsf" - }, - "outputs": [], + "cell_type": "markdown", + "metadata": {}, "source": [ - "from langchain_google_cloud_sql_pg import PostgresLoader\n", - "\n", - "# Creating a basic PostgreSQL object\n", - "loader = await PostgresLoader.create(\n", - " engine,\n", - " table_name=TABLE_NAME,\n", - " # schema_name=SCHEMA_NAME,\n", - ")" + "When creating an `PostgresLoader` for fetching data from Cloud SQL PG, you have two main options to specify the data you want to load:\n", + "* using the table_name argument - When you specify the table_name argument, you're telling the loader to fetch all the data from the given table.\n", + "* using the query argument - When you specify the query argument, you can provide a custom SQL query to fetch the data. This allows you to have full control over the SQL query, including selecting specific columns, applying filters, sorting, joining tables, etc.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load Documents using the `table_name` argument" ] }, { @@ -309,7 +280,7 @@ "id": "PeOMpftjc9_e" }, "source": [ - "### Load Documents via default table\n", + "#### Load Documents via default table\n", "The loader returns a list of Documents from the table using the first column as page_content and all other columns as metadata. The default table will have the first column as\n", "page_content and the second column as metadata (JSON). Each row becomes a document. \n", "\n", @@ -343,7 +314,7 @@ "id": "kSkL9l1Hc9_e" }, "source": [ - "### Load documents via custom table/metadata or custom page content columns" + "#### Load documents via custom table/metadata or custom page content columns" ] }, { @@ -363,6 +334,42 @@ "print(docs)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load Documents using a SQL query\n", + "The query parameter allows users to specify a custom SQL query which can include filters to load specific documents from a database." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "table_name = \"products\"\n", + "content_columns = [\"product_name\", \"description\"]\n", + "metadata_columns = [\"id\", \"content\"]\n", + "\n", + "loader = PostgresLoader.create(\n", + " engine=engine,\n", + " query=f\"SELECT * FROM {table_name};\",\n", + " content_columns=content_columns,\n", + " metadata_columns=metadata_columns,\n", + ")\n", + "\n", + "docs = await loader.aload()\n", + "print(docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note**: If the `content_columns` and `metadata_columns` are not specified, the loader will automatically treat the first returned column as the document’s `page_content` and all subsequent columns as `metadata`." + ] + }, { "cell_type": "markdown", "metadata": { @@ -396,10 +403,45 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create PostgresSaver\n", + "## Create PostgresSaver\n", "The `PostgresSaver` allows for saving of pre-processed documents to the table using the first column as page_content and all other columns as metadata. This table can easily be loaded via a Document Loader or updated to be a VectorStore. The default table will have the first column as page_content and the second column as metadata (JSON)." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a table (if not already exists)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import Column\n", + "\n", + "await engine.ainit_document_table(\n", + " table_name=TABLE_NAME,\n", + " content_column=\"product_name\",\n", + " metadata_columns=[\n", + " Column(\"id\", \"SERIAL\", nullable=False),\n", + " Column(\"content\", \"VARCHAR\", nullable=False),\n", + " Column(\"description\", \"VARCHAR\", nullable=False),\n", + " ],\n", + " metadata_json_column=\"metadata\",\n", + " store_metadata=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create PostgresSaver" + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/docs/vector_store.ipynb b/docs/vector_store.ipynb index a253b3a6..4071dbdf 100644 --- a/docs/vector_store.ipynb +++ b/docs/vector_store.ipynb @@ -586,6 +586,37 @@ "\n", "print(docs)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Search for documents without Vector Store.\n", + "You may want to search documents based on Document metadata as a tool or as a part of an exploratory workflow. The Document Loader can be used to customize the search and load data in the form of Documents from your database. Learn how to ['Load Documents using a SQL query'](https://github.com/googleapis/langchain-google-cloud-sql-pg-python/blob/main/docs/document_loader.ipynb)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_google_cloud_sql_pg import PostgresLoader\n", + "\n", + "table_name = \"products\"\n", + "content_columns = [\"product_name\", \"description\"]\n", + "metadata_columns = [\"id\", \"content\"]\n", + "\n", + "loader = PostgresLoader.create(\n", + " engine=engine,\n", + " query=f\"SELECT * FROM {table_name};\",\n", + " content_columns=content_columns,\n", + " metadata_columns=metadata_columns,\n", + ")\n", + "\n", + "docs = await loader.aload()\n", + "print(docs)" + ] } ], "metadata": {