|
4 | 4 | "cell_type": "markdown",
|
5 | 5 | "metadata": {},
|
6 | 6 | "source": [
|
7 |
| - "## Streaming data from APIs" |
| 7 | + "# Bottom-up approach to devise a structure: r0b0t" |
8 | 8 | ]
|
9 | 9 | },
|
10 | 10 | {
|
11 | 11 | "cell_type": "markdown",
|
12 | 12 | "metadata": {},
|
13 | 13 | "source": [
|
14 |
| - "A module is a group of elements whose interactions can be contained, abstracted and hidden under a small interface compared to the interface that would be the result exposing their full domain.\n", |
15 |
| - "\n", |
16 |
| - "The below function, `readAnswer` contains logic for consuming an asynrchronous stream of strings, and sending it to a consumer until there's no more strings to consume. Loops are one of the typical domains that can be contained and exposed through a small interface.\n", |
17 |
| - "\n", |
18 |
| - "Another of the guiding principiles of this design is to expose the minimal interface of existing abstractions. That's the case of `MailboxProcessor<Message>`, which is used in `readAnswer` just an `unit -> Async<string option>` (type `ReadString`).\n", |
19 |
| - "\n", |
20 |
| - "This has the advantage that pieces with substantial internal logic can be implemented in isolation and then assembled into a full program. The challenge then becomes to find how we can decompose our design into such self-contained modules." |
| 14 | + "**r0b0t** is a small program for interacting with Large Language models you can find [here](https://github.com/lamg/r0b0t). This article shows how it was restructured by finding complex operations that could be contained and exposed through simpler interfaces." |
21 | 15 | ]
|
22 | 16 | },
|
23 | 17 | {
|
24 | 18 | "cell_type": "markdown",
|
25 | 19 | "metadata": {},
|
26 | 20 | "source": [
|
27 |
| - "Module for consuming a stream" |
28 |
| - ] |
29 |
| - }, |
30 |
| - { |
31 |
| - "cell_type": "code", |
32 |
| - "execution_count": 6, |
33 |
| - "metadata": { |
34 |
| - "dotnet_interactive": { |
35 |
| - "language": "fsharp" |
36 |
| - }, |
37 |
| - "polyglot_notebook": { |
38 |
| - "kernelName": "fsharp" |
39 |
| - }, |
40 |
| - "vscode": { |
41 |
| - "languageId": "polyglot-notebook" |
42 |
| - } |
43 |
| - }, |
44 |
| - "outputs": [], |
45 |
| - "source": [ |
46 |
| - "type StopInsert = {insertWord: string -> unit; stop: unit -> unit}\n", |
47 |
| - "\n", |
48 |
| - "type ReadString = unit -> Async<string option>\n", |
49 |
| - "\n", |
50 |
| - "let readAnswer (read: ReadString) (si: StopInsert) =\n", |
51 |
| - " let rec loop ()=\n", |
52 |
| - " task {\n", |
53 |
| - " let! r = read()\n", |
54 |
| - " match r with\n", |
55 |
| - " | Some w -> \n", |
56 |
| - " si.insertWord w\n", |
57 |
| - " return! loop ()\n", |
58 |
| - " | None -> \n", |
59 |
| - " si.stop()\n", |
60 |
| - " }\n", |
61 |
| - " loop() |> Async.AwaitTask |> Async.Start\n", |
62 |
| - " " |
| 21 | + "## The [Stream](https://github.com/lamg/r0b0t/tree/master/Lib/Stream) module" |
63 | 22 | ]
|
64 | 23 | },
|
65 | 24 | {
|
66 | 25 | "cell_type": "markdown",
|
67 | 26 | "metadata": {},
|
68 | 27 | "source": [
|
69 |
| - "Module for defining stream\n", |
| 28 | + "A module is a group of elements whose interactions can be contained, abstracted and hidden under a small interface compared to the one that would be the result exposing their full domain.\n", |
70 | 29 | "\n",
|
| 30 | + "The modules defined below follow that principle\n", |
71 | 31 | "\n",
|
72 |
| - "Another indicators of possible decomposition are:\n", |
73 |
| - "- we need code to initialize values\n", |
74 |
| - "- we have producers and consumers " |
75 |
| - ] |
76 |
| - }, |
77 |
| - { |
78 |
| - "cell_type": "code", |
79 |
| - "execution_count": 7, |
80 |
| - "metadata": { |
81 |
| - "dotnet_interactive": { |
82 |
| - "language": "fsharp" |
83 |
| - }, |
84 |
| - "polyglot_notebook": { |
85 |
| - "kernelName": "fsharp" |
86 |
| - }, |
87 |
| - "vscode": { |
88 |
| - "languageId": "polyglot-notebook" |
89 |
| - } |
90 |
| - }, |
91 |
| - "outputs": [ |
92 |
| - { |
93 |
| - "data": { |
94 |
| - "text/html": [ |
95 |
| - "<div><div></div><div></div><div><strong>Installed Packages</strong><ul><li><span>FSharp.Control.AsyncSeq, 3.2.1</span></li></ul></div></div>" |
96 |
| - ] |
97 |
| - }, |
98 |
| - "metadata": {}, |
99 |
| - "output_type": "display_data" |
100 |
| - } |
101 |
| - ], |
102 |
| - "source": [ |
103 |
| - "#r \"nuget: FSharp.Control.AsyncSeq\"\n", |
| 32 | + "Stream exposes the following interface\n", |
104 | 33 | "\n",
|
105 |
| - "open FSharp.Control\n", |
106 |
| - "\n", |
107 |
| - "type Message = AnswerSegment of AsyncReplyChannel<string>\n", |
| 34 | + "```fsharp\n", |
| 35 | + "type GetProvider = unit -> AsyncSeq<string option>\n", |
| 36 | + "type StopInsert = {insertWord: string -> unit; stop: unit -> unit}\n", |
108 | 37 | "\n",
|
109 |
| - "type Provider = MailboxProcessor<Message> -> Async<unit>\n", |
110 |
| - "type GetProvider = unit -> AsyncSeq<string>\n", |
| 38 | + "val main: GetProvider -> StopInsert -> unit\n", |
| 39 | + "```\n", |
111 | 40 | "\n",
|
112 |
| - "let readSegments (inbox: MailboxProcessor<Message>) (xs: AsyncSeq<string>) =\n", |
113 |
| - " xs\n", |
114 |
| - " |> AsyncSeq.takeWhileAsync (fun x ->\n", |
115 |
| - " async {\n", |
116 |
| - " let! msg = inbox.TryReceive()\n", |
| 41 | + "The purpose of this module is to consume a `None` terminated sequence of strings, `AsyncSeq<string option>`, produced by an LLM, and send it to a GUI, hidden in the implementation of `StopInsert`. The choice of this module as starting point comes from the observation that this is one of the places were the most complex operations happen, namely, the coordination between the consumption of asynchronous data and its display on a GUI.\n", |
117 | 42 | "\n",
|
118 |
| - " return\n", |
119 |
| - " match msg with\n", |
120 |
| - " | Some (AnswerSegment chan) ->\n", |
121 |
| - " chan.Reply x\n", |
122 |
| - " true\n", |
123 |
| - " | _ -> false\n", |
124 |
| - " })\n", |
125 |
| - " |> AsyncSeq.toListAsync\n", |
126 |
| - " |> Async.Ignore\n", |
| 43 | + "This coordination between asynchrounous stream of strings and GUI seems unavoidable for this project, therefore it's one of its atomic and complex components.\n", |
127 | 44 | "\n",
|
128 |
| - "let stream (g: GetProvider) =\n", |
129 |
| - " let mb = MailboxProcessor.Start (fun inbox -> g() |> readSegments inbox)\n", |
130 |
| - " fun () -> mb.PostAndTryAsyncReply(AnswerSegment, timeout = 1000)" |
| 45 | + "By focusing first in such complex interactions I try to create abstractions that will effectively contain concerns that otherwise could leak to other project components. That way experimentation and development can be done inside modules without outer interference. This doesn't imply the components are or should be reusable, because still their interfaces are quite particular to this project." |
131 | 46 | ]
|
132 | 47 | },
|
133 | 48 | {
|
134 | 49 | "cell_type": "markdown",
|
135 | 50 | "metadata": {},
|
136 | 51 | "source": [
|
137 |
| - "Definining and consuming stream" |
| 52 | + "## The [GetProviderImpl](https://github.com/lamg/r0b0t/blob/master/Lib/GetProviderImpl.fs) module" |
138 | 53 | ]
|
139 | 54 | },
|
140 | 55 | {
|
141 |
| - "cell_type": "code", |
142 |
| - "execution_count": 8, |
143 |
| - "metadata": { |
144 |
| - "dotnet_interactive": { |
145 |
| - "language": "fsharp" |
146 |
| - }, |
147 |
| - "polyglot_notebook": { |
148 |
| - "kernelName": "fsharp" |
149 |
| - }, |
150 |
| - "vscode": { |
151 |
| - "languageId": "polyglot-notebook" |
152 |
| - } |
153 |
| - }, |
154 |
| - "outputs": [], |
| 56 | + "cell_type": "markdown", |
| 57 | + "metadata": {}, |
155 | 58 | "source": [
|
156 |
| - "let streamFlow (g: GetProvider) (si: StopInsert) = readAnswer (stream g) si" |
| 59 | + "The above module relies on `GetProvider` and `StopInsert`. We can focus now on implementing them. Let's start with a module for `GetProvider`:\n", |
| 60 | + "\n", |
| 61 | + "```fsharp\n", |
| 62 | + "type Conf =\n", |
| 63 | + " { active: Active\n", |
| 64 | + " providers: Map<Provider, ProviderImpl> }\n", |
| 65 | + "\n", |
| 66 | + "val initConf: ProviderModule list -> Provider -> Conf\n", |
| 67 | + "val getProvider: (unit -> Conf) -> (unit -> Prompt) : Stream.Types.GetProvider \n", |
| 68 | + "```\n", |
| 69 | + "\n", |
| 70 | + "You can observe instead of `Conf` and `Prompt`, `getProvider` receives `unit -> Conf` and `unit -> Prompt`. The reason for this is the configuration and the prompt are values stored in mutable variables, in the User Interface." |
157 | 71 | ]
|
158 | 72 | },
|
159 | 73 | {
|
160 | 74 | "cell_type": "markdown",
|
161 | 75 | "metadata": {},
|
162 | 76 | "source": [
|
163 |
| - "## Implementing **GetProvider**" |
| 77 | + "## The [ProviderModule](https://github.com/lamg/r0b0t/tree/master/Lib/ProviderModuleImpl) implementation" |
164 | 78 | ]
|
165 | 79 | },
|
166 | 80 | {
|
167 |
| - "cell_type": "code", |
168 |
| - "execution_count": 9, |
169 |
| - "metadata": { |
170 |
| - "dotnet_interactive": { |
171 |
| - "language": "fsharp" |
172 |
| - }, |
173 |
| - "polyglot_notebook": { |
174 |
| - "kernelName": "fsharp" |
175 |
| - }, |
176 |
| - "vscode": { |
177 |
| - "languageId": "polyglot-notebook" |
178 |
| - } |
179 |
| - }, |
180 |
| - "outputs": [], |
| 81 | + "cell_type": "markdown", |
| 82 | + "metadata": {}, |
181 | 83 | "source": [
|
182 |
| - "open FSharp.Control\n", |
183 |
| - "\n", |
184 |
| - "type Model = string\n", |
185 |
| - "type Provider = string\n", |
186 |
| - "type Prompt = string\n", |
187 |
| - "type Key = string\n", |
188 |
| - "type KeyEnvVar = string\n", |
189 |
| - "\n", |
190 |
| - "type Active = {provider: Provider; model: Model}\n", |
191 |
| - "type ProviderImpl = {models: Model list; answerer: Model -> Prompt -> AsyncSeq<string>}\n", |
192 |
| - "\n", |
193 |
| - "type Conf = {active: Active; providers: Map<Provider, ProviderImpl>}\n", |
194 |
| - "\n", |
195 |
| - "type ProviderModule = {implementation: Key -> ProviderImpl; keyVar: KeyEnvVar; provider: Provider}\n", |
| 84 | + "In the above module, `initConf`, relies on `ProviderModule`, which is the interface that hides the interaction with GitHub Copilot, OpenAI's API, or any other that could be added in the future. You can find implementations for them under the directory `ProviderModuleImpl`.\n", |
196 | 85 | "\n",
|
197 |
| - "let getenv s =\n", |
198 |
| - " System.Environment.GetEnvironmentVariable s |> Option.ofObj\n", |
| 86 | + "Their exposed interface is the following\n", |
199 | 87 | "\n",
|
200 |
| - "let initProviders (xs: ProviderModule list) (_default: Provider) =\n", |
201 |
| - " let providers = \n", |
202 |
| - " xs\n", |
203 |
| - " |> List.choose (fun pm ->\n", |
204 |
| - " getenv pm.keyVar |> Option.map (fun key -> \n", |
205 |
| - " pm.provider, pm.implementation key\n", |
206 |
| - " ) \n", |
207 |
| - " )\n", |
208 |
| - " |> Map.ofList\n", |
209 |
| - " let active = {provider = _default; model = providers[_default].models.Head}\n", |
210 |
| - " {active = active; providers = providers}\n", |
211 |
| - "\n", |
212 |
| - "let getProvider (conf: unit -> Conf) (getPrompt: unit -> Prompt) () =\n", |
213 |
| - " let c = conf ()\n", |
214 |
| - " let prompt = getPrompt ()\n", |
215 |
| - " c.providers[c.active.provider].answerer c.active.model prompt\n", |
216 |
| - " " |
| 88 | + "```fsharp\n", |
| 89 | + "val providerModule: GetProviderImpl.ProviderModule\n", |
| 90 | + "```" |
217 | 91 | ]
|
218 | 92 | },
|
219 | 93 | {
|
220 | 94 | "cell_type": "markdown",
|
221 | 95 | "metadata": {},
|
222 | 96 | "source": [
|
223 |
| - "## Implementing **StopInsert**" |
| 97 | + "## [GUI](https://github.com/lamg/r0b0t/blob/master/Lib/GUI.fs) module: initialization and mutable state" |
224 | 98 | ]
|
225 | 99 | },
|
226 | 100 | {
|
227 |
| - "cell_type": "code", |
228 |
| - "execution_count": 10, |
229 |
| - "metadata": { |
230 |
| - "dotnet_interactive": { |
231 |
| - "language": "fsharp" |
232 |
| - }, |
233 |
| - "polyglot_notebook": { |
234 |
| - "kernelName": "fsharp" |
235 |
| - }, |
236 |
| - "vscode": { |
237 |
| - "languageId": "polyglot-notebook" |
238 |
| - } |
239 |
| - }, |
240 |
| - "outputs": [ |
241 |
| - { |
242 |
| - "data": { |
243 |
| - "text/html": [ |
244 |
| - "<div><div></div><div></div><div><strong>Installed Packages</strong><ul><li><span>GtkSharp, 3.24.24.95</span></li></ul></div></div>" |
245 |
| - ] |
246 |
| - }, |
247 |
| - "metadata": {}, |
248 |
| - "output_type": "display_data" |
249 |
| - } |
250 |
| - ], |
| 101 | + "cell_type": "markdown", |
| 102 | + "metadata": {}, |
251 | 103 | "source": [
|
252 |
| - "#r \"nuget:GtkSharp, 3.24.24.95\"\n", |
253 |
| - "open Gtk\n", |
254 |
| - "\n", |
255 |
| - "type AdjustWord =\n", |
256 |
| - " { chatDisplay: TextView\n", |
257 |
| - " adjustment: Adjustment }\n", |
258 |
| - "\n", |
259 |
| - "let insertWord (b: Builder) : string -> unit =\n", |
260 |
| - " let adj = b.GetObject \"text_adjustment\" :?> Adjustment\n", |
261 |
| - " let chatDisplay = b.GetObject \"chat_display\" :?> TextView\n", |
262 |
| - " let f w =\n", |
263 |
| - " chatDisplay.Buffer.PlaceCursor chatDisplay.Buffer.EndIter\n", |
264 |
| - " chatDisplay.Buffer.InsertAtCursor w\n", |
265 |
| - " adj.Value <- adj.Upper\n", |
266 |
| - "\n", |
267 |
| - " f\n", |
268 |
| - "\n", |
269 |
| - "let newStopInsert (builder: Builder) =\n", |
270 |
| - " let answerSpinner = builder.GetObject \"answer_spinner\" :?> Spinner\n", |
271 |
| - "\n", |
272 |
| - " { stop = answerSpinner.Stop\n", |
273 |
| - " insertWord = insertWord builder }" |
| 104 | + "With the above modules implemented there's nothing left that to create the context where the configuration is used and mutated by the GUI. This allows us to call `initConf`, `getProvider`, and finally `Stream.main` to get answers from LLMs. `Stream.main` also relies on `StopInsert`, but its implementation turned out to be so tied to the `GUI` module that it didn't deserve a separate one." |
274 | 105 | ]
|
275 | 106 | }
|
276 | 107 | ],
|
|
0 commit comments