Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 29 additions & 82 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,26 @@ OpenAI-compatible RESTful APIs for Amazon Bedrock

## Breaking Changes

The source code is refactored with the new [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) by bedrock which provides native support with tool calls.

If you are facing any problems, please raise an issue.
This solution can now **automatically detect** new models supported in Amazon Bedrock.
So whenever new models are added to Amazon Bedrock, you can immediately try them without the need to wait for code changes to this repo.

This is to use the `ListFoundationModels` api and the `ListInferenceProfiles` api by Amazon Bedrock, due to this change, additional IAM permissions are required to your Lambda/Fargate role.

If you are facing error: 'Unsupported model xxx, please use models API to get a list of supported models' even the model ID is correct,
please either update your existing stack (**Recommended**) with the new template in the deployment folder or manually add below permissions to the related Lambda/Fargate role.

```json
{
"Action": [
"bedrock:ListFoundationModels",
"bedrock:ListInferenceProfiles"
],
"Resource": "*",
"Effect": "Allow"
}
```

Please raise an GitHub issue if you still have problems.

## Overview

Expand All @@ -32,19 +48,7 @@ If you find this GitHub repository useful, please consider giving it a free star

Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.

> **Note:** The legacy [text completion](https://platform.openai.com/docs/api-reference/completions) API is not supported, you should change to use chat completion API.

Supported Amazon Bedrock models family:

- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus) / 3.5 Sonnet
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding

You can call the `models` API to get the full list of model IDs supported.

> **Note:** The default model is set to `anthropic.claude-3-sonnet-20240229-v1:0` which can be changed via Lambda environment variables (`DEFAULT_MODEL`).
> **Note:** The default model is set to `anthropic.claude-3-sonnet-20240229-v1:0` which can be changed via Lambda environment variables (`DEFAULT_MODEL`). You can call the [Models API](./docs/Usage.md#models-api) to get the full list of model IDs supported.

## Get Started

Expand Down Expand Up @@ -160,65 +164,10 @@ print(completion.choices[0].message.content)

Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call.

### Bedrock Cross-Region Inference


Cross-Region Inference supports accessing foundation models across regions, allowing users to invoke models hosted in different AWS regions for inference. Main advantages:
- **Improved Availability**: Provides regional redundancy and enhanced fault tolerance. When issues occur in the primary region, services can failover to backup regions, ensuring continuous service availability and business continuity.
- **Reduced Latency**: Enables selection of regions geographically closest to users, optimizing network paths and reducing transmission time, resulting in better user experience and response times.
- **Better Performance and Capacity**: Implements load balancing to distribute request pressure, provides greater service capacity and throughput, and better handles traffic spikes.
- **Flexibility**: Allows selection of models from different regions based on requirements, meets specific regional compliance requirements, and enables more flexible resource allocation and management.
- **Cost Benefits**: Enables selection of more cost-effective regions, reduces overall operational costs through resource optimization, and improves resource utilization efficiency.


Please check [Bedrock Cross-Region Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html)

**limitation:**
Currently, Bedrock Access Gateway only supports cross-region Inference for the following models:
- Claude 3 Haiku
- Claude 3 Opus
- Claude 3 Sonnet
- Claude 3.5 Sonnet
- Meta Llama 3.1 8b Instruct
- Meta Llama 3.1 70b Instruct
- Meta Llama 3.2 1B Instruct
- Meta Llama 3.2 3B Instruct
- Meta Llama 3.2 11B Vision Instruct
- Meta Llama 3.2 90B Vision Instruct

**Prerequisites:**
- IAM policies must allow cross-region access,Callers need permissions to access models and inference profiles in both regions (added in cloudformation template)
- Model access must be enabled in both regions, which defined in inference profiles

**Example API Usage:**
- To use Bedrock cross-region inference, you include an inference profile when running model inference by specifying the ID of the inference profile as the modelId, such as `us.anthropic.claude-3-5-sonnet-20240620-v1:0`

```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
```



## Other Examples

### AutoGen

Below is an image of setting up the model in AutoGen studio.

![AutoGen Model](assets/autogen-model.png)

### LangChain

Make sure you use `ChatOpenAI(...)` instead of `OpenAI(...)`
Expand Down Expand Up @@ -263,20 +212,14 @@ Short answer is that API Gateway does not support server-sent events (SSE) for s

### Which regions are supported?

This solution only supports the regions where Amazon Bedrock is available, as for now, below are the list.

- US East (N. Virginia): us-east-1
- US West (Oregon): us-west-2
- Asia Pacific (Singapore): ap-southeast-1
- Asia Pacific (Sydney): ap-southeast-2
- Asia Pacific (Tokyo): ap-northeast-1
- Europe (Frankfurt): eu-central-1
- Europe (Paris): eu-west-3

Generally speaking, all regions that Amazon Bedrock supports will also be supported, if not, please raise an issue in Github.

Note that not all models are available in those regions.

### Which models are supported?

You can use the [Models API](./docs/Usage.md#models-api) to get/refresh a list of supported models in the current region.

### Can I build and use my own ECR image

Yes, you can clone the repo and build the container image by yourself (`src/Dockerfile`) and then push to your ECR repo. You can use `scripts/push-to-ecr.sh`
Expand All @@ -285,7 +228,11 @@ Replace the repo url in the CloudFormation template before you deploy.

### Can I run this locally

Yes, you can run this locally.
Yes, you can run this locally, e.g. run below command under `src` folder:

```bash
uvicorn api.app:app --host 0.0.0.0 --port 8000
```

The API base url should look like `http://localhost:8000/api/v1`.

Expand Down
109 changes: 31 additions & 78 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,26 @@

## 重大变更

项目源代码已使用Bedrock提供的新 [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) 进行了重构,该API对工具调用提供了原生支持。
这个方案现在可以**自动检测** Amazon Bedrock 中支持的新模型。
因此,当 Amazon Bedrock 添加新模型时,您可以立即尝试使用它们,无需等待此代码库的更新。

这是通过使用Amazon Bedrock 的 `ListFoundationModels API` 和 `ListInferenceProfiles` API 实现的。由于这一变更,您需要为 Lambda/Fargate 角色添加额外的 IAM 权限。

如果您遇到错误:"Unsupported model xxx, please use models API to get a list of supported models"(即使Model ID 是正确的),
请使用Deployment 文件夹中的新模板更新您现有的堆栈(**推荐**),或手动为相关的 Lambda/Fargate 角色添加以下权限。

```json
{
"Action": [
"bedrock:ListFoundationModels",
"bedrock:ListInferenceProfiles"
],
"Resource": "*",
"Effect": "Allow"
}
```

如果您遇到任何问题,请提 Github Issue
如果依然有问题,请提个GitHub issue

## 概述

Expand All @@ -33,19 +50,7 @@ OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需

请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用新API的更多详细信息。

> 注意: 不支持旧的 [text completion](https://platform.openai.com/docs/api-reference/completions) API,请更改为使用Chat Completion API。

支持的Amazon Bedrock模型家族:

- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus) / 3.5 Sonnet
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding

你可以先调用`models` API 获取支持的详细 model ID 列表。

> 注意: 默认模型为 `anthropic.claude-3-sonnet-20240229-v1:0`, 可以通过更改Lambda环境变量进行更改。
> 注意: 默认模型为 `anthropic.claude-3-sonnet-20240229-v1:0`, 可以通过更改Lambda环境变量进行更改。你可以先调用 [Models API](./docs/Usage.md#models-api) 查看支持的详细 model ID 列表。

## 使用指南

Expand Down Expand Up @@ -158,62 +163,10 @@ print(completion.choices[0].message.content)

请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用Embedding API、多模态API和Tool Call的更多详细信息。

### Bedrock Cross-Region Inference

Cross-Region Inference 支持跨区域访问的基础模型,即允许用户在一个 AWS 区域中调用其他区域的基础模型进行推理。主要优势:
- **提高可用性**: 提供区域冗余,增强容错能力。当主要区域出现问题时可以切换到备用区域,确保服务的持续可用性和业务连续性
- **降低延迟**: 可以选择地理位置最接近用户的区域,优化网络路径,减少传输时间,提供更好的用户体验和响应速度
- **性能和容量优化**: 实现负载均衡,分散请求压力,提供更大的服务容量和吞吐量,能够更好地处理流量峰值
- **灵活性**: 根据需求选择不同区域的模型,满足特定地区的合规要求,更灵活的资源调配和管理
- **成本效益**: 可以选择成本更优的区域,通过优化资源使用降低总体运营成本,更好的资源利用效率

详细介绍请查看[Bedrock Cross-Region Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html)

**限制条件:**
当前 Gateway 只添加了对 Claude 3 Haiku/Claude 3 Opus/Claude 3 Sonnet/Claude 3.5 Sonnet 的跨区域调用
- Claude 3 Haiku
- Claude 3 Opus
- Claude 3 Sonnet
- Claude 3.5 Sonnet
- Meta Llama 3.1 8b Instruct
- Meta Llama 3.1 70b Instruct
- Meta Llama 3.2 1B Instruct
- Meta Llama 3.2 3B Instruct
- Meta Llama 3.2 11B Vision Instruct
- Meta Llama 3.2 90B Vision Instruct

**使用前提:**
- IAM Policy 有 inference profiles 相关的权限和调用模型的权限 (cloudformation template 中已添加)
- 对 inference profiles 中定义的模型和区域中都启用模型访问权限

**使用方法:**
- 在调用模型时设置 modelId 为 inference profile ID, 例如 `us.anthropic.claude-3-5-sonnet-20240620-v1:0`

```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
```


## 其他例子

### AutoGen

例如在AutoGen studio配置和使用模型

![AutoGen Model](assets/autogen-model.png)

### LangChain

请确保使用的示`ChatOpenAI(...)` ,而不是`OpenAI(...)`
Expand Down Expand Up @@ -258,20 +211,14 @@ print(response)

### 支持哪些区域?

只支持Amazon Bedrock可用的区域, 截至当前,包括以下区域:

- 美国东部(弗吉尼亚北部):us-east-1
- 美国西部(俄勒冈州):us-west-2
- 亚太地区(新加坡):ap-southeast-1
- 亚太地区(悉尼):ap-southeast-2
- 亚太地区(东京):ap-northeast-1
- 欧洲(法兰克福):eu-central-1
- 欧洲(巴黎):eu-west-3

通常来说,所有Amazon Bedrock支持的区域都支持,如果不支持,请提个Github Issue。

注意,并非所有模型都在上面区可用。

### 支持哪些模型?

你可以通过[Model API](./docs/Usage_CN.md#models-api) 获取(或更新)当前区支持的模型列表。

### 我可以构建并使用自己的ECR镜像吗?

是的,你可以克隆repo并自行构建容器镜像(src/Dockerfile),然后推送到你自己的ECR仓库。 脚本可以参考`scripts/push-to-ecr.sh`。
Expand All @@ -280,7 +227,13 @@ print(response)

### 我可以在本地运行吗?

是的,你可以在本地运行,那么API Base URL应该类似于`http://localhost:8000/api/v1`
是的,你可以在本地运行, 例如在`src` 文件夹下运行:

```bash
uvicorn api.app:app --host 0.0.0.0 --port 8000
```

那么API Base URL应该类似于`http://localhost:8000/api/v1`

### 使用代理API会有任何性能牺牲或延迟增加吗?

Expand Down
Binary file removed assets/autogen-agent.png
Binary file not shown.
Binary file removed assets/autogen-model.png
Binary file not shown.
Loading