You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- `model_name`: The name of the LLM model used by the node.
509
-
- `prompt`: Manually set the prompt for the specific model in the configuration. The prompt can either be passed in as a string of text or as a path to a text file containing the desired prompting.
510
+
- `prompt`: Manually set the prompt for the specific model in the configuration. The prompt can either be passed in as a string of text or as a path to a text file containing the desired prompting.
510
511
- `service`: Specifies the service for running the LLM inference. (Set to `nvfoundation` if using NIM.)
511
512
- `max_tokens`: Defines the maximum number of tokens that can be generated in one output step.
512
513
- `temperature`: Controls randomness in the output. A lower temperature produces more deterministic results.
@@ -516,6 +517,7 @@ The configuration defines how the workflow operates, including model settings, i
516
517
- `return_intermediate_steps`: Controls whether to return intermediate steps taken by the agent, and include them in the output file. Helpful for troubleshooting agent responses.
517
518
- `return_source_documents`: Controls whether to return source documents from the VDB tools, and include them in the intermediate steps output. Helpful for identifying the source files used in agent responses.
518
519
- Note: enabling this will also include source documents in the agent's memory and increase the agent's prompt length.
520
+
- `max_concurrency`: Controls the maximum number of concurrent requests to the LLM. Default is `None`, which doesn't limit concurrency.
519
521
- Embedding model for generating VDB for RAG: `rag_embedding`
520
522
- `_type`: Defines the source of the model used for generating embeddings (e.g., `nim`, `huggingface`, `openai`).
521
523
- Other model-dependent parameters, such as `model`/`model_name`, `api_key`, `truncate`, or `encode_kwargs`: see the [embedding model customization](#customizing-the-embedding-model) section below for more details.
@@ -725,8 +727,7 @@ To customize the output, modify the configuration file accordingly. In any confi
725
727
}
726
728
```
727
729
728
-
To post the output to an HTTP endpoint, update the JSON object in the config file as follows, replacing the domain, port, and endpoint with the desired
729
-
destination (note the trailing slash in the "url" field). The output will be sent as JSON data.
730
+
To post the output to an HTTP endpoint, update the JSON object in the config file as follows, replacing the domain, port, and endpoint with the desired destination (note the trailing slash in the "url" field). The output will be sent as JSON data.
730
731
731
732
```
732
733
"output": {
@@ -853,7 +854,7 @@ We've integrated VDB and embedding creation directly into the pipeline with cach
853
854
854
855
NVIDIA offers optimized models and tools like NIMs ([build.nvidia.com/explore/retrieval](https://build.nvidia.com/explore/retrieval)) and cuVS ([github.com/rapidsai/cuvs](https://github.com/rapidsai/cuvs)).
855
856
856
-
### Service outages
857
+
### Service errors
857
858
858
859
#### National Vulnerability Database (NVD)
859
860
These typically resolve on their own. Please wait and try running the pipeline again later. Example errors:
429 errors can occur when your requests exceed the rate limit for the model. Try setting the `engine.agent.max_concurrency` to a low value such as 5 to reduce the rate of requests.
875
+
```
876
+
Exception: [429] Too Many Requests
877
+
```
878
+
871
879
### Running out of credits
872
880
873
881
If you run out of credits for the NVIDIA API Catalog, you will need to obtain more credits to continue using the API. Please contact your NVIDIA representative to get more credits added.
0 commit comments