diff --git a/docs/plugins/vertex-ai.md b/docs/plugins/vertex-ai.md index 65f59ce712..36f41c02a8 100644 --- a/docs/plugins/vertex-ai.md +++ b/docs/plugins/vertex-ai.md @@ -2,21 +2,22 @@ The Vertex AI plugin provides interfaces to several AI services: -* [Google generative AI models](https://cloud.google.com/vertex-ai/generative-ai/docs/): - * Gemini text generation - * Imagen2 and Imagen3 image generation - * Text embedding generation - * Multimodal embedding generation -* A subset of evaluation metrics through the Vertex AI [Rapid Evaluation API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/evaluation): - * [BLEU](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#bleuinput) - * [ROUGE](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#rougeinput) - * [Fluency](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#fluencyinput) - * [Safety](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#safetyinput) - * [Groundeness](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#groundednessinput) - * [Summarization Quality](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#summarizationqualityinput) - * [Summarization Helpfulness](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#summarizationhelpfulnessinput) - * [Summarization Verbosity](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#summarizationverbosityinput) -* [Vector Search](https://cloud.google.com/vertex-ai/docs/vector-search/overview) +* [Google generative AI models](https://cloud.google.com/vertex-ai/generative-ai/docs/): + * Gemini text generation + * Imagen2 and Imagen3 image generation + * Text embedding generation + * Multimodal embedding generation +* A subset of evaluation metrics through the Vertex + AI [Rapid Evaluation API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/evaluation): + * [BLEU](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#bleuinput) + * [ROUGE](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#rougeinput) + * [Fluency](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#fluencyinput) + * [Safety](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#safetyinput) + * [Groundeness](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#groundednessinput) + * [Summarization Quality](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#summarizationqualityinput) + * [Summarization Helpfulness](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#summarizationhelpfulnessinput) + * [Summarization Verbosity](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations/evaluateInstances#summarizationverbosityinput) +* [Vector Search](https://cloud.google.com/vertex-ai/docs/vector-search/overview) ## Installation @@ -24,7 +25,9 @@ The Vertex AI plugin provides interfaces to several AI services: npm i --save @genkit-ai/vertexai ``` -If you want to locally run flows that use this plugin, you also need the [Google Cloud CLI tool](https://cloud.google.com/sdk/docs/install) installed. +If you want to locally run flows that use this plugin, you also need +the [Google Cloud CLI tool](https://cloud.google.com/sdk/docs/install) +installed. ## Configuration @@ -41,21 +44,39 @@ const ai = genkit({ }); ``` -The plugin requires you to specify your Google Cloud project ID, the [region](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations) to which you want to make Vertex API requests, and your Google Cloud project credentials. +The plugin requires you to specify your Google Cloud project ID, +the [region](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations) +to which you want to make Vertex API requests, and your Google Cloud project +credentials. + +* You can specify your Google Cloud project ID either by setting `projectId` in + the `vertexAI()` configuration or by setting the `GCLOUD_PROJECT` environment + variable. If you're running your flow from a Google Cloud environment (Cloud + Functions, Cloud Run, and so on), `GCLOUD_PROJECT` is automatically set to the + project ID of the environment. +* You can specify the API location either by setting `location` in the + `vertexAI()` configuration or by setting the `GCLOUD_LOCATION` environment + variable. +* To provide API credentials, you need to set up Google Cloud Application + Default Credentials. + 1. To specify your credentials: + + * If you're running your flow from a Google Cloud environment (Cloud + Functions, Cloud Run, and so on), this is set automatically. + * On your local dev environment, do this by running: + + ```posix-terminal + gcloud auth application-default login --project YOUR_PROJECT_ID + ``` -* You can specify your Google Cloud project ID either by setting `projectId` in the `vertexAI()` configuration or by setting the `GCLOUD_PROJECT` environment variable. If you're running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), `GCLOUD_PROJECT` is automatically set to the project ID of the environment. -* You can specify the API location either by setting `location` in the `vertexAI()` configuration or by setting the `GCLOUD_LOCATION` environment variable. -* To provide API credentials, you need to set up Google Cloud Application Default Credentials. - 1. To specify your credentials: - * If you're running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), this is set automatically. - * On your local dev environment, do this by running: + * For other environments, see + the [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) + docs. - ```posix-terminal - gcloud auth application-default login --project YOUR_PROJECT_ID - ``` - - * For other environments, see the [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) docs. - 1. In addition, make sure the account is granted the Vertex AI User IAM role (`roles/aiplatform.user`). See the Vertex AI [access control](https://cloud.google.com/vertex-ai/generative-ai/docs/access-control) docs. + 1. In addition, make sure the account is granted the Vertex AI User IAM role ( + `roles/aiplatform.user`). See the Vertex + AI [access control](https://cloud.google.com/vertex-ai/generative-ai/docs/access-control) + docs. ## Usage @@ -80,9 +101,15 @@ const llmResponse = await ai.generate({ }); ``` -This plugin also supports grounding Gemini text responses using [Google Search](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini#web-ground-gemini) or [your own data](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini#private-ground-gemini). +This plugin also supports grounding Gemini text responses +using [Google Search](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini#web-ground-gemini) +or [your own data](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini#private-ground-gemini). -Important: Vertex AI charges a fee for grounding requests in addition to the cost of making LLM requests. See the [Vertex AI pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing) page and be sure you understand grounding request pricing before you use this feature. +Important: Vertex AI charges a fee for grounding requests in addition to the +cost of making LLM requests. See +the [Vertex AI pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing) +page and be sure you understand grounding request pricing before you use this +feature. Example: @@ -110,13 +137,15 @@ await ai.generate({ }) ``` -This plugin also statically exports a reference to the Gecko text embedding model: +This plugin also statically exports a reference to the Gecko text embedding +model: ```ts import { textEmbedding004 } from '@genkit-ai/vertexai'; ``` -You can use this reference to specify which embedder an indexer or retriever uses. For example, if you use Chroma DB: +You can use this reference to specify which embedder an indexer or retriever +uses. For example, if you use Chroma DB: ```ts const ai = genkit({ @@ -150,7 +179,7 @@ This plugin can also handle multimodal embeddings: import { multimodalEmbedding001, vertexAI } from '@genkit-ai/vertexai'; const ai = genkit({ - plugins: [vertextAI({location: 'us-central1' })], + plugins: [vertextAI({ location: 'us-central1' })], }); const embeddings = await ai.embed({ @@ -198,7 +227,7 @@ const response = await ai.generate({ model: imagen3, output: { format: 'media' }, prompt: [ - { media: { url: `data:image/png;base64,${baseImg}` }}, + { media: { url: `data:image/png;base64,${baseImg}` } }, { media: { url: `data:image/png;base64,${maskImg}` }, metadata: { type: 'mask' }, @@ -215,11 +244,16 @@ const response = await ai.generate({ return response.media(); ``` -Refer to [Imagen model documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/imagen-api#edit_images_2) for more detailed options. +Refer +to [Imagen model documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/imagen-api#edit_images_2) +for more detailed options. #### Anthropic Claude 3 on Vertex AI Model Garden -If you have access to Claude 3 models ([haiku](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-haiku), [sonnet](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-sonnet) or [opus](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-opus)) in Vertex AI Model Garden you can use them with Genkit. +If you have access to Claude 3 +models ([haiku](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-haiku), [sonnet](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-sonnet) +or [opus](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-opus)) +in Vertex AI Model Garden you can use them with Genkit. Here's a sample configuration for enabling Vertex AI Model Garden models: @@ -251,9 +285,67 @@ const llmResponse = await ai.generate({ }); ``` +##### Prompt Caching + +> Anthropic cache control is in a Pre-Generally Available (GA) state on Google +Vertex. For more see Google Vertex +Anthropic [prompt caching](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude-prompt-caching) +documentation. + +The Anthropic Claude models offer prompt caching to reduce latency and costs +when reusing the same content in multiple requests. When you send a query, you +can cache all or specific parts of your input so that subsequent queries can use +the cached results from the previous request. This avoids additional compute and +network costs. Caches are unique to your Google Cloud project and cannot be used +by other projects. + +For details about how to structure your prompts, see the Anthropic [Prompt +caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) +documentation. + +--- + +When using Anthropic models, you can set up cache control for system messages. +You can enable this feature by adding a `cacheControl` property to the `custom` +field of a message. +The only currently supported cache type is `ephemeral`. + +For example, to use cache control for a specific system message, you would set +it up like this: + +```ts +const llmResponse = await ai.generate({ + model: claude3Sonnet, // or another Anthropic model + messages: [ + { + role: 'system', + content: [ + { + text: 'This is an important instruction that can be cached.', + custom: { + cacheControl: { + type: 'ephemeral', + }, + }, + }, + ], + }, + { + role: 'user', + content: [{ text: 'What should I do when I visit Melbourne?' }], + }, + ], +}); +``` + +Using this feature allows the Anthropic model to cache certain system messages, +potentially reducing response times and costs on subsequent requests. + #### Llama 3.1 405b on Vertex AI Model Garden -First you'll need to enable [Llama 3.1 API Service](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama3-405b-instruct-maas) in Vertex AI Model Garden. +First you'll need to +enable [Llama 3.1 API Service](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama3-405b-instruct-maas) +in Vertex AI Model Garden. Here's sample configuration for Llama 3.1 405b in Vertex AI plugin: @@ -282,7 +374,10 @@ const llmResponse = await ai.generate({ #### Mistral Models on Vertex AI Model Garden -If you have access to Mistral models ([Mistral Large](https://console.cloud.google.com/vertex-ai/publishers/mistralai/model-garden/mistral-large), [Mistral Nemo](https://console.cloud.google.com/vertex-ai/publishers/mistralai/model-garden/mistral-nemo), or [Codestral](https://console.cloud.google.com/vertex-ai/publishers/mistralai/model-garden/codestral)) in Vertex AI Model Garden, you can use them with Genkit. +If you have access to Mistral +models ([Mistral Large](https://console.cloud.google.com/vertex-ai/publishers/mistralai/model-garden/mistral-large), [Mistral Nemo](https://console.cloud.google.com/vertex-ai/publishers/mistralai/model-garden/mistral-nemo), +or [Codestral](https://console.cloud.google.com/vertex-ai/publishers/mistralai/model-garden/codestral)) +in Vertex AI Model Garden, you can use them with Genkit. Here's a sample configuration for enabling Vertex AI Model Garden models: @@ -322,6 +417,7 @@ const llmResponse = await ai.generate({ ``` The models support: + - `mistralLarge`: Latest Mistral large model with function calling capabilities - `mistralNemo`: Optimized for efficiency and speed - `codestral`: Specialized for code generation tasks @@ -346,7 +442,8 @@ for await (const chunk of response.stream) { ### Evaluators -To use the evaluators from Vertex AI Rapid Evaluation, add an `evaluation` block to your `vertexAI` plugin configuration. +To use the evaluators from Vertex AI Rapid Evaluation, add an `evaluation` block +to your `vertexAI` plugin configuration. ```ts import { genkit } from 'genkit'; @@ -373,34 +470,69 @@ const ai = genkit({ }); ``` -The configuration above adds evaluators for the `Safety` and `ROUGE` metrics. The example shows two approaches- the `Safety` metric uses the default specification, whereas the `ROUGE` metric provides a customized specification that sets the rouge type to `rougeLsum`. +The configuration above adds evaluators for the `Safety` and `ROUGE` metrics. +The example shows two approaches- the `Safety` metric uses the default +specification, whereas the `ROUGE` metric provides a customized specification +that sets the rouge type to `rougeLsum`. -Both evaluators can be run using the `genkit eval:run` command with a compatible dataset: that is, a dataset with `output` and `reference` fields. The `Safety` evaluator can also be run using the `genkit eval:flow -e vertexai/safety` command since it only requires an `output`. +Both evaluators can be run using the `genkit eval:run` command with a compatible +dataset: that is, a dataset with `output` and `reference` fields. The `Safety` +evaluator can also be run using the `genkit eval:flow -e vertexai/safety` +command since it only requires an `output`. ### Indexers and retrievers -The Genkit Vertex AI plugin includes indexer and retriever implementations backed by the Vertex AI Vector Search service. +The Genkit Vertex AI plugin includes indexer and retriever implementations +backed by the Vertex AI Vector Search service. -(See the [Retrieval-augmented generation](../rag.md) page to learn how indexers and retrievers are used in a RAG implementation.) +(See the [Retrieval-augmented generation](../rag.md) page to learn how indexers +and retrievers are used in a RAG implementation.) -The Vertex AI Vector Search service is a document index that works alongside the document store of your choice: the document store contains the content of documents, and the Vertex AI Vector Search index contains, for each document, its vector embedding and a reference to the document in the document store. After your documents are indexed by the Vertex AI Vector Search service, it can respond to search queries, producing lists of indexes into your document store. +The Vertex AI Vector Search service is a document index that works alongside the +document store of your choice: the document store contains the content of +documents, and the Vertex AI Vector Search index contains, for each document, +its vector embedding and a reference to the document in the document store. +After your documents are indexed by the Vertex AI Vector Search service, it can +respond to search queries, producing lists of indexes into your document store. -The indexer and retriever implementations provided by the Vertex AI plugin use either Cloud Firestore or BigQuery as the document store. The plugin also includes interfaces you can implement to support other document stores. +The indexer and retriever implementations provided by the Vertex AI plugin use +either Cloud Firestore or BigQuery as the document store. The plugin also +includes interfaces you can implement to support other document stores. -Important: Pricing for Vector Search consists of both a charge for every gigabyte of data you ingest and an hourly charge for the VMs that host your deployed indexes. See [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing#vectorsearch). This is likely to be most cost-effective when you are serving high volumes of traffic. Be sure to understand the billing implications the service will have on your project before using it. +Important: Pricing for Vector Search consists of both a charge for every +gigabyte of data you ingest and an hourly charge for the VMs that host your +deployed indexes. +See [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing#vectorsearch). +This is likely to be most cost-effective when you are serving high volumes of +traffic. Be sure to understand the billing implications the service will have on +your project before using it. To use Vertex AI Vector Search: -1. Choose an embedding model. This model is responsible for creating vector embeddings from text or media. Advanced users might use an embedding model optimized for their particular data sets, but for most users, Vertex AI's `text-embedding-004` model is a good choice for English text, the `text-multilingual-embedding-002` model is good for multilingual text, and the `multimodalEmbedding001` model is good for mixed text, images, and video. -2. In the [Vector Search](https://console.cloud.google.com/vertex-ai/matching-engine/indexes) section of the Google Cloud console, create a new index. The most important settings are: - * **Dimensions:** Specify the dimensionality of the vectors produced by your chosen embedding model. The `text-embedding-004` and `text-multilingual-embedding-002` models produce vectors of 768 dimensions. The `multimodalEmbedding001` model can produce vectors of 128, 256, 512, or 1408 dimensions for text and image, and will produce vectors of 1408 dimensions for video. - * **Update method:** Select streaming updates. - - After you create the index, deploy it to a standard (public) endpoint. +1. Choose an embedding model. This model is responsible for creating vector + embeddings from text or media. Advanced users might use an embedding model + optimized for their particular data sets, but for most users, Vertex AI's + `text-embedding-004` model is a good choice for English text, the + `text-multilingual-embedding-002` model is good for multilingual text, and + the `multimodalEmbedding001` model is good for mixed text, images, and video. +2. In + the [Vector Search](https://console.cloud.google.com/vertex-ai/matching-engine/indexes) + section of the Google Cloud console, create a new index. The most important + settings are: + +* **Dimensions:** Specify the dimensionality of the vectors produced by your + chosen embedding model. The `text-embedding-004` and + `text-multilingual-embedding-002` models produce vectors of 768 dimensions. + The `multimodalEmbedding001` model can produce vectors of 128, 256, 512, or + 1408 dimensions for text and image, and will produce vectors of 1408 + dimensions for video. +* **Update method:** Select streaming updates. + +After you create the index, deploy it to a standard (public) endpoint. 3. Get a document indexer and retriever for the document store you want to use: - **Cloud Firestore** + **Cloud Firestore** ```ts import { getFirestoreDocumentIndexer, getFirestoreDocumentRetriever } from '@genkit-ai/vertexai/vectorsearch'; @@ -415,7 +547,7 @@ To use Vertex AI Vector Search: const firestoreDocumentIndexer = getFirestoreDocumentIndexer(db, FIRESTORE_COLLECTION); ``` - **BigQuery** + **BigQuery** ```ts import { getBigQueryDocumentIndexer, getBigQueryDocumentRetriever } from '@genkit-ai/vertexai/vectorsearch'; @@ -427,9 +559,10 @@ To use Vertex AI Vector Search: const bigQueryDocumentIndexer = getBigQueryDocumentIndexer(bq, BIGQUERY_TABLE, BIGQUERY_DATASET); ``` - **Other** + **Other** - To support other documents stores you can provide your own implementations of `DocumentRetriever` and `DocumentIndexer`: + To support other documents stores you can provide your own implementations of + `DocumentRetriever` and `DocumentIndexer`: ```ts const myDocumentRetriever = async (neighbors) => { @@ -442,7 +575,8 @@ To use Vertex AI Vector Search: } ``` - For an example, see [Sample Vertex AI Plugin Retriever and Indexer with Local File](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-custom). + For an example, + see [Sample Vertex AI Plugin Retriever and Indexer with Local File](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-custom). 4. Add a `vectorSearchOptions` block to your `vertexAI` plugin configuration: @@ -472,14 +606,25 @@ To use Vertex AI Vector Search: }); ``` - Provide the embedder you chose in the first step and the document indexer and retriever you created in the previous step. + Provide the embedder you chose in the first step and the document indexer and + retriever you created in the previous step. + + To configure the plugin to use the Vector Search index you created earlier, + you need to provide several values, which you can find in the Vector Search + section of the Google Cloud console: - To configure the plugin to use the Vector Search index you created earlier, you need to provide several values, which you can find in the Vector Search section of the Google Cloud console: +* `indexId`: listed on + the [Indexes](https://console.cloud.google.com/vertex-ai/matching-engine/indexes) + tab +* `indexEndpointId`: listed on + the [Index Endpoints](https://console.cloud.google.com/vertex-ai/matching-engine/index-endpoints) + tab +* `deployedIndexId` and `publicDomainName`: listed on the "Deployed index + info" page, which you can open by clicking the name of the deployed index on + either of the tabs mentioned earlier - * `indexId`: listed on the [Indexes](https://console.cloud.google.com/vertex-ai/matching-engine/indexes) tab - * `indexEndpointId`: listed on the [Index Endpoints](https://console.cloud.google.com/vertex-ai/matching-engine/index-endpoints) tab - * `deployedIndexId` and `publicDomainName`: listed on the "Deployed index info" page, which you can open by clicking the name of the deployed index on either of the tabs mentioned earlier -5. Now that everything is configured, you can use the indexer and retriever in your Genkit application: +5. Now that everything is configured, you can use the indexer and retriever in + your Genkit application: ```ts import { @@ -506,17 +651,23 @@ To use Vertex AI Vector Search: See the code samples for: -* [Vertex Vector Search + BigQuery](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-bigquery) -* [Vertex Vector Search + Firestore](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-firestore) -* [Vertex Vector Search + a custom DB](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-custom) +* [Vertex Vector Search + BigQuery](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-bigquery) +* [Vertex Vector Search + Firestore](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-firestore) +* [Vertex Vector Search + a custom DB](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-custom) ## Context Caching -The Vertex AI Genkit plugin supports **Context Caching**, which allows models to reuse previously cached content to optimize token usage when dealing with large pieces of content. This feature is especially useful for conversational flows or scenarios where the model references a large piece of content consistently across multiple requests. +The Vertex AI Genkit plugin supports **Context Caching**, which allows models to +reuse previously cached content to optimize token usage when dealing with large +pieces of content. This feature is especially useful for conversational flows or +scenarios where the model references a large piece of content consistently +across multiple requests. ### How to Use Context Caching -To enable context caching, ensure your model supports it. For example, `gemini15Flash` and `gemini15Pro` are models that support context caching, and you will have to specify version number `001`. +To enable context caching, ensure your model supports it. For example, +`gemini15Flash` and `gemini15Pro` are models that support context caching, and +you will have to specify version number `001`. You can define a caching mechanism in your application like this: @@ -551,12 +702,15 @@ const llmResponse = await ai.generate({ ``` In this setup: + - **`messages`**: Allows you to pass conversation history. -- **`metadata.cache.ttlSeconds`**: Specifies the time-to-live (TTL) for caching a specific response. +- **`metadata.cache.ttlSeconds`**: Specifies the time-to-live (TTL) for caching + a specific response. ### Example: Leveraging Large Texts with Context -For applications referencing long documents, such as *War and Peace* or *Lord of the Rings*, you can structure your queries to reuse cached contexts: +For applications referencing long documents, such as *War and Peace* or *Lord of +the Rings*, you can structure your queries to reuse cached contexts: ```ts @@ -588,14 +742,20 @@ const llmResponse = await ai.generate({ ``` ### Benefits of Context Caching -1. **Improved Performance**: Reduces the need for repeated processing of large inputs. -2. **Cost Efficiency**: Decreases API usage for redundant data, optimizing token consumption. + +1. **Improved Performance**: Reduces the need for repeated processing of large + inputs. +2. **Cost Efficiency**: Decreases API usage for redundant data, optimizing token + consumption. 3. **Better Latency**: Speeds up response times for repeated or related queries. ### Supported Models for Context Caching -Only specific models, such as `gemini15Flash` and `gemini15Pro`, support context caching, and currently only on version numbers `001`. If an unsupported model is used, an error will be raised, indicating that caching cannot be applied. +Only specific models, such as `gemini15Flash` and `gemini15Pro`, support context +caching, and currently only on version numbers `001`. If an unsupported model is +used, an error will be raised, indicating that caching cannot be applied. ### Further Reading -See more information regarding context caching on Vertex AI in their [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview). +See more information regarding context caching on Vertex AI in +their [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview). diff --git a/js/plugins/vertexai/package.json b/js/plugins/vertexai/package.json index d378eb4aaa..fb9c6e8da1 100644 --- a/js/plugins/vertexai/package.json +++ b/js/plugins/vertexai/package.json @@ -35,8 +35,8 @@ "author": "genkit", "license": "Apache-2.0", "dependencies": { - "@anthropic-ai/sdk": "^0.24.3", - "@anthropic-ai/vertex-sdk": "^0.4.0", + "@anthropic-ai/sdk": "^0.35.0", + "@anthropic-ai/vertex-sdk": "^0.6.0", "@google-cloud/aiplatform": "^3.23.0", "@mistralai/mistralai-gcp": "^1.3.5", "@google-cloud/vertexai": "^1.9.3", diff --git a/js/plugins/vertexai/src/modelgarden/anthropic.ts b/js/plugins/vertexai/src/modelgarden/anthropic.ts index 9fac5c5fd5..3c7e2f5b96 100644 --- a/js/plugins/vertexai/src/modelgarden/anthropic.ts +++ b/js/plugins/vertexai/src/modelgarden/anthropic.ts @@ -16,6 +16,7 @@ import { ContentBlock as AnthropicContent, + CacheControlEphemeral, ImageBlockParam, Message, MessageCreateParamsBase, @@ -27,7 +28,7 @@ import { ToolResultBlockParam, ToolUseBlock, ToolUseBlockParam, -} from '@anthropic-ai/sdk/resources/messages'; +} from '@anthropic-ai/sdk/resources/messages/messages'; import { AnthropicVertex } from '@anthropic-ai/vertex-sdk'; import { GENKIT_CLIENT_HEADER, @@ -146,20 +147,30 @@ export function toAnthropicRequest( model: string, input: GenerateRequest ): MessageCreateParamsBase { - let system: string | undefined = undefined; + let system: Array | undefined = undefined; const messages: MessageParam[] = []; for (const msg of input.messages) { if (msg.role === 'system') { - system = msg.content + const textBlocks = msg.content .map((c) => { if (!c.text) { throw new Error( 'Only text context is supported for system messages.' ); } - return c.text; - }) - .join(); + const textBlock = { + type: 'text', + text: c.text, + } as TextBlockParam; + if (c.custom?.cacheControl) { + textBlock['cacheControl'] = { + type: 'ephemeral', + } as CacheControlEphemeral; + } + return textBlock; + }); + system??= []; + system.push(...textBlocks) } // If the last message is a tool response, we need to add a user message. // https://docs.anthropic.com/en/docs/build-with-claude/tool-use#handling-tool-use-and-tool-result-content-blocks @@ -175,6 +186,7 @@ export function toAnthropicRequest( }); } } + const request = { model, messages, @@ -182,7 +194,7 @@ export function toAnthropicRequest( max_tokens: input.config?.maxOutputTokens ?? 4096, } as MessageCreateParamsBase; if (system) { - request['system'] = system; + request.system = system; } if (input.tools) { request.tools = input.tools?.map((tool) => { @@ -297,6 +309,15 @@ export function fromAnthropicResponse( role: 'model', content: parts.map(fromAnthropicPart), }; + let usageCustom: Record | undefined = undefined; + if ( response.usage.cache_creation_input_tokens) { + usageCustom = {}; + usageCustom['cacheCreationInputTokens']= response.usage.cache_creation_input_tokens; + } + if ( response.usage.cache_read_input_tokens) { + usageCustom = usageCustom || {}; + usageCustom['cacheReadInputTokens']= response.usage.cache_read_input_tokens; + } return { message, finishReason: toGenkitFinishReason( @@ -316,6 +337,7 @@ export function fromAnthropicResponse( ...getBasicUsageStats(input.messages, message), inputTokens: response.usage.input_tokens, outputTokens: response.usage.output_tokens, + custom: usageCustom }, }; } diff --git a/js/plugins/vertexai/tests/modelgarden/anthropic_test.ts b/js/plugins/vertexai/tests/modelgarden/anthropic_test.ts index abf0988ec0..2ef7358a9b 100644 --- a/js/plugins/vertexai/tests/modelgarden/anthropic_test.ts +++ b/js/plugins/vertexai/tests/modelgarden/anthropic_test.ts @@ -17,7 +17,7 @@ import { Message, MessageCreateParamsBase, -} from '@anthropic-ai/sdk/resources/messages.mjs'; +} from '@anthropic-ai/sdk/resources/messages/messages'; import * as assert from 'assert'; import { GenerateRequest, GenerateResponseData } from 'genkit'; import { describe, it } from 'node:test'; @@ -82,6 +82,68 @@ describe('toAnthropicRequest', () => { ], }, }, + { + should: 'should transform system message with caching', + input: { + messages: [ + { + role: 'system', + content: [ + { + text: 'You are an AI assistant tasked with analyzing legal documents.', + }, + ], + }, + { + role: 'system', + content: [ + { + text: 'Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]', + custom: { + cacheControl: { + type: 'ephemeral', + }, + }, + }, + ], + }, + { + role: 'user', + content: [ + { + text: 'What are the key terms and conditions in this agreement?', + }, + ], + }, + ], + }, + expectedOutput: { + max_tokens: 4096, + model: MODEL_ID, + system: [ + { + type: 'text', + text: 'You are an AI assistant tasked with analyzing legal documents.', + }, + { + type: 'text', + text: 'Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]', + cache_control: { type: 'ephemeral' }, + }, + ], + messages: [ + { + role: 'user', + content: [ + { + type: 'text', + text: 'What are the key terms and conditions in this agreement?', + }, + ], + }, + ], + }, + }, { should: 'should transform genkit message (inline base64 image content) correctly', @@ -158,6 +220,8 @@ describe('fromAnthropicResponse', () => { usage: { input_tokens: 123, output_tokens: 234, + cache_creation_input_tokens: 0, + cache_read_input_tokens: 0, }, stop_sequence: null, type: 'message', @@ -240,6 +304,8 @@ describe('fromAnthropicResponse', () => { usage: { input_tokens: 123, output_tokens: 234, + cache_creation_input_tokens: 0, + cache_read_input_tokens: 0, }, content: [ { diff --git a/js/pnpm-lock.yaml b/js/pnpm-lock.yaml index a19d3e0108..3a2e73e40c 100644 --- a/js/pnpm-lock.yaml +++ b/js/pnpm-lock.yaml @@ -778,11 +778,11 @@ importers: plugins/vertexai: dependencies: '@anthropic-ai/sdk': - specifier: ^0.24.3 - version: 0.24.3(encoding@0.1.13) + specifier: ^0.35.0 + version: 0.35.0(encoding@0.1.13) '@anthropic-ai/vertex-sdk': - specifier: ^0.4.0 - version: 0.4.0(encoding@0.1.13) + specifier: ^0.6.0 + version: 0.6.4(encoding@0.1.13) '@google-cloud/aiplatform': specifier: ^3.23.0 version: 3.25.0(encoding@0.1.13) @@ -1388,7 +1388,7 @@ importers: version: link:../../plugins/ollama genkitx-openai: specifier: ^0.10.1 - version: 0.10.1(@genkit-ai/ai@1.4.0)(@genkit-ai/core@1.4.0) + version: 0.10.1(@genkit-ai/ai@1.8.0)(@genkit-ai/core@1.8.0) devDependencies: rimraf: specifier: ^6.0.1 @@ -1800,14 +1800,14 @@ packages: resolution: {integrity: sha512-30iZtAPgz+LTIYoeivqYo853f02jBYSd5uGnGpkFV0M3xOt9aN73erkgYAmZU43x4VfqcnLxW9Kpg3R5LC4YYw==} engines: {node: '>=6.0.0'} - '@anthropic-ai/sdk@0.24.3': - resolution: {integrity: sha512-916wJXO6T6k8R6BAAcLhLPv/pnLGy7YSEBZXZ1XTFbLcTZE8oTy3oDW9WJf9KKZwMvVcePIfoTSvzXHRcGxkQQ==} + '@anthropic-ai/sdk@0.35.0': + resolution: {integrity: sha512-JxVuNIRLjcXZbDW/rJa3vSIoYB5c0wgIQUPsjueeqli9OJyCJpInj0UlvKSSk6R2oCYyg0y2M0H8n8Wyt0l1IA==} '@anthropic-ai/sdk@0.9.1': resolution: {integrity: sha512-wa1meQ2WSfoY8Uor3EdrJq0jTiZJoKoSii2ZVWRY1oN4Tlr5s59pADg9T79FTbPe1/se5c3pBeZgJL63wmuoBA==} - '@anthropic-ai/vertex-sdk@0.4.0': - resolution: {integrity: sha512-E/FL/P1+wDNrhuVg7DYmbiLdW6+xU9d2Vn/dmpJbKF7Vt81SnGxUFYn9zjDk2QOptvQFSOcUb5OCtpEvej+daQ==} + '@anthropic-ai/vertex-sdk@0.6.4': + resolution: {integrity: sha512-rMBlO2jF53TfMRmsQMm1bPO2JRUh4jYddjq/OJLj8DSAkfbCrNWhc0yhDed6oLYJg5s+VpDbvlPzMggqHhTfMw==} '@babel/code-frame@7.25.7': resolution: {integrity: sha512-0xZJFNE5XMpENsgfHYTw8FbX4kv53mFLn2i3XPoq69LyhYSCBJtitaHx9QnsVTrsogI4Z3+HtEfZ2/GFPOtf5g==} @@ -2520,11 +2520,11 @@ packages: '@firebase/webchannel-wrapper@1.0.3': resolution: {integrity: sha512-2xCRM9q9FlzGZCdgDMJwc0gyUkWFtkosy7Xxr6sFgQwn+wMNIWd7xIvYNauU1r64B5L5rsGKy/n9TKJ0aAFeqQ==} - '@genkit-ai/ai@1.4.0': - resolution: {integrity: sha512-s0YZ7quoYF4LYFFVnJz/3GvBmXPl8Ty9a5ZMOCB8k0xmAopiFwKEpaCMFbpIyF04EmB2U8x5/k3bjliD32eZXQ==} + '@genkit-ai/ai@1.8.0': + resolution: {integrity: sha512-TIhFgQCThdVOyrk6qiVF8dPfz4XmL3RxE9OCidhqcpGrGE5YeRvle+nWIbkNIojdLURQf3/dBxNdaqZZ7B3msQ==} - '@genkit-ai/core@1.4.0': - resolution: {integrity: sha512-Y85RsvXfejH7vQOH/O8/GgaKqDeqiDDMnWNKa2Cy2ugwlsy1P5jSHkQ5wUPgCCTSwQG4eOfdmmwGpFVvNi0QXw==} + '@genkit-ai/core@1.8.0': + resolution: {integrity: sha512-XvK/Gq7fi8pFCJftzby/6EWoVBj5EUSai/9/104Y699wbub3qreoIGH2DN/CKOUzt0gD2i7fHeYlyTAnJ+TPbw==} '@gerrit0/mini-shiki@1.24.4': resolution: {integrity: sha512-YEHW1QeAg6UmxEmswiQbOVEg1CW22b1XUD/lNTliOsu0LD0wqoyleFMnmbTp697QE0pcadQiR5cVtbbAPncvpw==} @@ -6111,6 +6111,7 @@ packages: node-domexception@1.0.0: resolution: {integrity: sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ==} engines: {node: '>=10.5.0'} + deprecated: Use your platform's native DOMException instead node-ensure@0.0.0: resolution: {integrity: sha512-DRI60hzo2oKN1ma0ckc6nQWlHU69RH6xN0sjQTjMpChPfTYvKZdcQFfdYK2RWbJcKyUizSIy/l8OTGxMAM1QDw==} @@ -7327,7 +7328,7 @@ snapshots: '@jridgewell/gen-mapping': 0.3.5 '@jridgewell/trace-mapping': 0.3.25 - '@anthropic-ai/sdk@0.24.3(encoding@0.1.13)': + '@anthropic-ai/sdk@0.35.0(encoding@0.1.13)': dependencies: '@types/node': 18.19.53 '@types/node-fetch': 2.6.11 @@ -7336,7 +7337,6 @@ snapshots: form-data-encoder: 1.7.2 formdata-node: 4.4.1 node-fetch: 2.7.0(encoding@0.1.13) - web-streams-polyfill: 3.3.3 transitivePeerDependencies: - encoding @@ -7354,9 +7354,9 @@ snapshots: transitivePeerDependencies: - encoding - '@anthropic-ai/vertex-sdk@0.4.0(encoding@0.1.13)': + '@anthropic-ai/vertex-sdk@0.6.4(encoding@0.1.13)': dependencies: - '@anthropic-ai/sdk': 0.24.3(encoding@0.1.13) + '@anthropic-ai/sdk': 0.35.0(encoding@0.1.13) google-auth-library: 9.14.2(encoding@0.1.13) transitivePeerDependencies: - encoding @@ -8088,9 +8088,9 @@ snapshots: '@firebase/webchannel-wrapper@1.0.3': {} - '@genkit-ai/ai@1.4.0': + '@genkit-ai/ai@1.8.0': dependencies: - '@genkit-ai/core': 1.4.0 + '@genkit-ai/core': 1.8.0 '@opentelemetry/api': 1.9.0 '@types/node': 20.17.17 colorette: 2.0.20 @@ -8102,7 +8102,7 @@ snapshots: transitivePeerDependencies: - supports-color - '@genkit-ai/core@1.4.0': + '@genkit-ai/core@1.8.0': dependencies: '@opentelemetry/api': 1.9.0 '@opentelemetry/context-async-hooks': 1.30.1(@opentelemetry/api@1.9.0) @@ -8435,7 +8435,7 @@ snapshots: '@jest/console@29.7.0': dependencies: '@jest/types': 29.6.3 - '@types/node': 20.16.9 + '@types/node': 20.17.17 chalk: 4.1.2 jest-message-util: 29.7.0 jest-util: 29.7.0 @@ -8568,7 +8568,7 @@ snapshots: dependencies: '@jest/types': 29.6.3 '@sinonjs/fake-timers': 10.3.0 - '@types/node': 20.16.9 + '@types/node': 20.17.17 jest-message-util: 29.7.0 jest-mock: 29.7.0 jest-util: 29.7.0 @@ -8590,7 +8590,7 @@ snapshots: '@jest/transform': 29.7.0 '@jest/types': 29.6.3 '@jridgewell/trace-mapping': 0.3.25 - '@types/node': 20.16.9 + '@types/node': 20.17.17 chalk: 4.1.2 collect-v8-coverage: 1.0.2 exit: 0.1.2 @@ -9622,13 +9622,13 @@ snapshots: '@types/bunyan@1.8.9': dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 '@types/caseless@0.12.5': {} '@types/connect@3.4.36': dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 '@types/connect@3.4.38': dependencies: @@ -9661,7 +9661,7 @@ snapshots: '@types/graceful-fs@4.1.9': dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 '@types/handlebars@4.1.0': dependencies: @@ -9692,13 +9692,13 @@ snapshots: '@types/jsonwebtoken@9.0.6': dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 '@types/long@4.0.2': {} '@types/memcached@2.2.10': dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 '@types/mime@1.3.5': {} @@ -9706,11 +9706,11 @@ snapshots: '@types/mysql@2.15.22': dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 '@types/node-fetch@2.6.11': dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 form-data: 4.0.0 '@types/node@18.19.53': @@ -9741,7 +9741,7 @@ snapshots: '@types/pg@8.6.1': dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 pg-protocol: 1.6.0 pg-types: 2.2.0 @@ -9760,7 +9760,7 @@ snapshots: '@types/request@2.48.12': dependencies: '@types/caseless': 0.12.5 - '@types/node': 20.16.9 + '@types/node': 20.17.17 '@types/tough-cookie': 4.0.5 form-data: 2.5.1 @@ -9769,7 +9769,7 @@ snapshots: '@types/send@0.17.4': dependencies: '@types/mime': 1.3.5 - '@types/node': 20.16.9 + '@types/node': 20.17.17 '@types/serve-static@1.15.5': dependencies: @@ -9783,7 +9783,7 @@ snapshots: '@types/tedious@4.0.14': dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 '@types/tough-cookie@4.0.5': {} @@ -11096,10 +11096,10 @@ snapshots: - encoding - supports-color - genkitx-openai@0.10.1(@genkit-ai/ai@1.4.0)(@genkit-ai/core@1.4.0): + genkitx-openai@0.10.1(@genkit-ai/ai@1.8.0)(@genkit-ai/core@1.8.0): dependencies: - '@genkit-ai/ai': 1.4.0 - '@genkit-ai/core': 1.4.0 + '@genkit-ai/ai': 1.8.0 + '@genkit-ai/core': 1.8.0 openai: 4.53.0(encoding@0.1.13) zod: 3.24.1 transitivePeerDependencies: @@ -11650,7 +11650,7 @@ snapshots: '@jest/expect': 29.7.0 '@jest/test-result': 29.7.0 '@jest/types': 29.6.3 - '@types/node': 20.16.9 + '@types/node': 20.17.17 chalk: 4.1.2 co: 4.6.0 dedent: 1.5.3 @@ -11875,7 +11875,7 @@ snapshots: '@jest/environment': 29.7.0 '@jest/fake-timers': 29.7.0 '@jest/types': 29.6.3 - '@types/node': 20.16.9 + '@types/node': 20.17.17 jest-mock: 29.7.0 jest-util: 29.7.0 @@ -11885,7 +11885,7 @@ snapshots: dependencies: '@jest/types': 29.6.3 '@types/graceful-fs': 4.1.9 - '@types/node': 20.16.9 + '@types/node': 20.17.17 anymatch: 3.1.3 fb-watchman: 2.0.2 graceful-fs: 4.2.11 @@ -11959,7 +11959,7 @@ snapshots: '@jest/test-result': 29.7.0 '@jest/transform': 29.7.0 '@jest/types': 29.6.3 - '@types/node': 20.16.9 + '@types/node': 20.17.17 chalk: 4.1.2 emittery: 0.13.1 graceful-fs: 4.2.11 @@ -11987,7 +11987,7 @@ snapshots: '@jest/test-result': 29.7.0 '@jest/transform': 29.7.0 '@jest/types': 29.6.3 - '@types/node': 20.16.9 + '@types/node': 20.17.17 chalk: 4.1.2 cjs-module-lexer: 1.2.3 collect-v8-coverage: 1.0.2 @@ -12052,7 +12052,7 @@ snapshots: dependencies: '@jest/test-result': 29.7.0 '@jest/types': 29.6.3 - '@types/node': 20.16.9 + '@types/node': 20.17.17 ansi-escapes: 4.3.2 chalk: 4.1.2 emittery: 0.13.1 @@ -12061,7 +12061,7 @@ snapshots: jest-worker@29.7.0: dependencies: - '@types/node': 20.16.9 + '@types/node': 20.17.17 jest-util: 29.7.0 merge-stream: 2.0.0 supports-color: 8.1.1 @@ -12911,7 +12911,7 @@ snapshots: '@protobufjs/path': 1.1.2 '@protobufjs/pool': 1.1.0 '@protobufjs/utf8': 1.1.0 - '@types/node': 20.16.9 + '@types/node': 20.17.17 long: 5.2.3 protobufjs@7.3.2: