Add back example that does work

alexrudall · alexrudall · commit 2e87c2eb75a3 · 2025-03-14T17:06:26.000Z
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -26,7 +26,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - [BREAKING] Backwards compatibility for `require "ruby/openai"` is removed - from v8 on you MUST use `require "openai"`. This fixes a deprecation warning with Ruby 3.4. Thanks to [@ndemianc](https://github.com/ndemianc) for this PR.
 - [BREAKING] Removed support for Ruby 2.6. ruby-openai may still work with this version but it's no longer supported.
 - Removed the 'OpenAI-Beta' header from Batches API requests.
-- Removed the "Exploring chunks used in file search" speca & from the README as it no longer seems to work.
 
 ## [7.4.0] - 2025-02-10
 
diff --git a/README.md b/README.md
@@ -1344,6 +1344,113 @@ end
 
 Note that you have 10 minutes to submit your tool output before the run expires.
 
+#### Exploring chunks used in File Search
+
+Take a deep breath. You might need a drink for this one.
+
+It's possible for OpenAI to share what chunks it used in its internal RAG Pipeline to create its filesearch results.
+
+An example spec can be found [here](https://github.com/alexrudall/ruby-openai/blob/main/spec/openai/client/assistant_file_search_spec.rb) that does this, just so you know it's possible.
+
+Here's how to get the chunks used in a file search. In this example I'm using [this file](https://css4.pub/2015/textbook/somatosensory.pdf):
+
+```ruby
+require "openai"
+
+# Make a client
+client = OpenAI::Client.new(
+  access_token: "access_token_goes_here",
+  log_errors: true # Don't log errors in production.
+)
+
+# Upload your file(s)
+file_id = client.files.upload(
+  parameters: {
+    file: "path/to/somatosensory.pdf",
+    purpose: "assistants"
+  }
+)["id"]
+
+# Create a vector store to store the vectorised file(s)
+vector_store_id = client.vector_stores.create(parameters: {})["id"]
+
+# Vectorise the file(s)
+vector_store_file_id = client.vector_store_files.create(
+  vector_store_id: vector_store_id,
+  parameters: { file_id: file_id }
+)["id"]
+
+# Check that the file is vectorised (wait for status to be "completed")
+client.vector_store_files.retrieve(vector_store_id: vector_store_id, id: vector_store_file_id)["status"]
+
+# Create an assistant, referencing the vector store
+assistant_id = client.assistants.create(
+  parameters: {
+    model: "gpt-4o",
+    name: "Answer finder",
+    instructions: "You are a file search tool. Find the answer in the given files, please.",
+    tools: [
+      { type: "file_search" }
+    ],
+    tool_resources: {
+      file_search: {
+        vector_store_ids: [vector_store_id]
+      }
+    }
+  }
+)["id"]
+
+# Create a thread with your question
+thread_id = client.threads.create(parameters: {
+  messages: [
+    { role: "user",
+      content: "Find the description of a nociceptor." }
+  ]
+})["id"]
+
+# Run the thread to generate the response. Include the "GIVE ME THE CHUNKS" incantation.
+run_id = client.runs.create(
+  thread_id: thread_id,
+  parameters: {
+    assistant_id: assistant_id
+  },
+  query_parameters: { include: ["step_details.tool_calls[*].file_search.results[*].content"] } # incantation
+)["id"]
+
+# Get the steps that happened in the run
+steps = client.run_steps.list(
+  thread_id: thread_id,
+  run_id: run_id,
+  parameters: { order: "asc" }
+)
+
+# Retrieve all the steps. Include the "GIVE ME THE CHUNKS" incantation again.
+steps = steps["data"].map do |step|
+  client.run_steps.retrieve(
+    thread_id: thread_id,
+    run_id: run_id,
+    id: step["id"],
+    parameters: { include: ["step_details.tool_calls[*].file_search.results[*].content"] } # incantation
+  )
+end
+
+# Now we've got the chunk info, buried deep. Loop through the steps and find chunks if included:
+chunks = steps.flat_map do |step|
+  included_results = step.dig("step_details", "tool_calls", 0, "file_search", "results")
+
+  next if included_results.nil? || included_results.empty?
+
+  included_results.flat_map do |result|
+    result["content"].map do |content|
+      content["text"]
+    end
+  end
+end.compact
+
+# The first chunk will be the closest match to the prompt. Finally, if you want to view the completed message(s):
+client.messages.list(thread_id: thread_id)
+```
+
 ### Image Generation
 
 Generate images using DALL·E 2 or DALL·E 3!