Skip to content

Fix(docs): correct multiple typos in pdf_parsing_for_semantic_retrieval_systems.ipynb #573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions site/en/docs/pdf_parsing_for_semantic_retrieval_systems.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -742,7 +742,7 @@
" questions: list[str],\n",
" links: list[str],\n",
" ) -> dict[str, Any]:\n",
" \"\"\"Structred data extraction from image analysis.\"\"\"\n",
" \"\"\"Structured data extraction from image analysis.\"\"\"\n",
" return {\n",
" 'title': title,\n",
" 'key_words': key_words,\n",
Expand Down Expand Up @@ -774,7 +774,7 @@
" function_declarations=[glm.FunctionDeclaration(\n",
" name=\"structured_data_extraction\",\n",
" description=textwrap.dedent(\"\"\"\\\n",
" Structred data extraction from image analysis.\n",
" Structured data extraction from image analysis.\n",
" \"\"\"),\n",
" parameters=glm.Schema(\n",
" type=glm.Type.OBJECT,\n",
Expand Down Expand Up @@ -815,20 +815,20 @@
" \"\"\"Extracts metadata from the image provided and returns it in a structured dict.\"\"\"\n",
" prompt = textwrap.dedent(f\"\"\"\n",
" You are an expert image analyzer. Given an image of a PDF page, your job is to write the following for each and every image.\n",
" 1. Generate key-words that matches the content from the image. (at most 10.)\n",
" 1. Generate key-words that match the content from the image. (at most 10.)\n",
" 2. Suggest a one-word title for the image.\n",
" 3. Generate 1-2 short questions from the image.\n",
" 4. Extract links that are present in the image.\n",
"\n",
" Your answer should follow the following format.\n",
" ** 1. Key-words**\n",
" [list of relevant key-words to descibe the content of the image]\n",
" [list of relevant key-words to describe the content of the image]\n",
"\n",
" **2. Title**\n",
" Suggest a one-word title based on the content here.\n",
"\n",
" **3. Questions**\n",
" [lst of generated questions here...]\n",
" [list of generated questions here...]\n",
" ....\n",
"\n",
" **4. Links**\n",
Expand Down Expand Up @@ -952,7 +952,7 @@
"id": "-1q_v21t2E94"
},
"source": [
"Neat! The models were successfuly able to extract your custom metadata from the given information sources!"
"Neat! The models were successfully able to extract your custom metadata from the given information sources!"
]
},
{
Expand Down Expand Up @@ -1019,7 +1019,7 @@
" is_separator_regex=False,\n",
" )\n",
"\n",
" # iter through all PDF files.\n",
" # iterate through all PDF files.\n",
" for filename, file_bytes in pdfs.items():\n",
" print(f\"Extracting data from file: {filename}\")\n",
"\n",
Expand Down Expand Up @@ -1239,7 +1239,7 @@
"id": "vZl-A8EMVCZu"
},
"source": [
"`relevant_chunks` has chunks that matched our search results. Each chunk returned has a `chunk_relevance_score` and `chunk`. Where `chunk_relevance_score` deontes the degree to which the `user_query` is semantically similar to the contents from `chunk`."
"`relevant_chunks` has chunks that matched our search results. Each chunk returned has a `chunk_relevance_score` and `chunk`. Where `chunk_relevance_score` denotes the degree to which the `user_query` is semantically similar to the contents from `chunk`."
]
},
{
Expand Down