Skip to content

Commit 2e87c2e

Browse files
committed
Add back example that does work
1 parent 72cfa6c commit 2e87c2e

File tree

2 files changed

+107
-1
lines changed

2 files changed

+107
-1
lines changed

CHANGELOG.md

-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2626
- [BREAKING] Backwards compatibility for `require "ruby/openai"` is removed - from v8 on you MUST use `require "openai"`. This fixes a deprecation warning with Ruby 3.4. Thanks to [@ndemianc](https://github.com/ndemianc) for this PR.
2727
- [BREAKING] Removed support for Ruby 2.6. ruby-openai may still work with this version but it's no longer supported.
2828
- Removed the 'OpenAI-Beta' header from Batches API requests.
29-
- Removed the "Exploring chunks used in file search" speca & from the README as it no longer seems to work.
3029

3130
## [7.4.0] - 2025-02-10
3231

README.md

+107
Original file line numberDiff line numberDiff line change
@@ -1344,6 +1344,113 @@ end
13441344

13451345
Note that you have 10 minutes to submit your tool output before the run expires.
13461346

1347+
#### Exploring chunks used in File Search
1348+
1349+
Take a deep breath. You might need a drink for this one.
1350+
1351+
It's possible for OpenAI to share what chunks it used in its internal RAG Pipeline to create its filesearch results.
1352+
1353+
An example spec can be found [here](https://github.com/alexrudall/ruby-openai/blob/main/spec/openai/client/assistant_file_search_spec.rb) that does this, just so you know it's possible.
1354+
1355+
Here's how to get the chunks used in a file search. In this example I'm using [this file](https://css4.pub/2015/textbook/somatosensory.pdf):
1356+
1357+
```ruby
1358+
require "openai"
1359+
1360+
# Make a client
1361+
client = OpenAI::Client.new(
1362+
access_token: "access_token_goes_here",
1363+
log_errors: true # Don't log errors in production.
1364+
)
1365+
1366+
# Upload your file(s)
1367+
file_id = client.files.upload(
1368+
parameters: {
1369+
file: "path/to/somatosensory.pdf",
1370+
purpose: "assistants"
1371+
}
1372+
)["id"]
1373+
1374+
# Create a vector store to store the vectorised file(s)
1375+
vector_store_id = client.vector_stores.create(parameters: {})["id"]
1376+
1377+
# Vectorise the file(s)
1378+
vector_store_file_id = client.vector_store_files.create(
1379+
vector_store_id: vector_store_id,
1380+
parameters: { file_id: file_id }
1381+
)["id"]
1382+
1383+
# Check that the file is vectorised (wait for status to be "completed")
1384+
client.vector_store_files.retrieve(vector_store_id: vector_store_id, id: vector_store_file_id)["status"]
1385+
1386+
# Create an assistant, referencing the vector store
1387+
assistant_id = client.assistants.create(
1388+
parameters: {
1389+
model: "gpt-4o",
1390+
name: "Answer finder",
1391+
instructions: "You are a file search tool. Find the answer in the given files, please.",
1392+
tools: [
1393+
{ type: "file_search" }
1394+
],
1395+
tool_resources: {
1396+
file_search: {
1397+
vector_store_ids: [vector_store_id]
1398+
}
1399+
}
1400+
}
1401+
)["id"]
1402+
1403+
# Create a thread with your question
1404+
thread_id = client.threads.create(parameters: {
1405+
messages: [
1406+
{ role: "user",
1407+
content: "Find the description of a nociceptor." }
1408+
]
1409+
})["id"]
1410+
1411+
# Run the thread to generate the response. Include the "GIVE ME THE CHUNKS" incantation.
1412+
run_id = client.runs.create(
1413+
thread_id: thread_id,
1414+
parameters: {
1415+
assistant_id: assistant_id
1416+
},
1417+
query_parameters: { include: ["step_details.tool_calls[*].file_search.results[*].content"] } # incantation
1418+
)["id"]
1419+
1420+
# Get the steps that happened in the run
1421+
steps = client.run_steps.list(
1422+
thread_id: thread_id,
1423+
run_id: run_id,
1424+
parameters: { order: "asc" }
1425+
)
1426+
1427+
# Retrieve all the steps. Include the "GIVE ME THE CHUNKS" incantation again.
1428+
steps = steps["data"].map do |step|
1429+
client.run_steps.retrieve(
1430+
thread_id: thread_id,
1431+
run_id: run_id,
1432+
id: step["id"],
1433+
parameters: { include: ["step_details.tool_calls[*].file_search.results[*].content"] } # incantation
1434+
)
1435+
end
1436+
1437+
# Now we've got the chunk info, buried deep. Loop through the steps and find chunks if included:
1438+
chunks = steps.flat_map do |step|
1439+
included_results = step.dig("step_details", "tool_calls", 0, "file_search", "results")
1440+
1441+
next if included_results.nil? || included_results.empty?
1442+
1443+
included_results.flat_map do |result|
1444+
result["content"].map do |content|
1445+
content["text"]
1446+
end
1447+
end
1448+
end.compact
1449+
1450+
# The first chunk will be the closest match to the prompt. Finally, if you want to view the completed message(s):
1451+
client.messages.list(thread_id: thread_id)
1452+
```
1453+
13471454
### Image Generation
13481455

13491456
Generate images using DALL·E 2 or DALL·E 3!

0 commit comments

Comments
 (0)