AI-first Documentation

Context: I explored "AI-first documentation" as a prompt engineering strategy to get an AI to write and edit code that represents math and programming diagrams. In my case, the AI used a python library still-manim, which has associated docs.

In my approach, the unique features of AI-first documentation are:

  • network-accessible, well-structured text resources (not HTML)
  • an overview of all the resources for an AI to select the relevant ones for a task

Here's my approach in more detail:

RAG with AI-Retrieval

gpt-4o doesn't have my custom library still-manim in its weights, so I must inject the relevant info into the context (RAG). Unlike typical RAG, the RAG strategy I use does not using embeddings. For smaller text corpuses an AI can be used for retrieval too, not just generation.

Prompting strategy

  1. The AI receives both the task information and a total overview of the entire docs.
  2. The AI then generates an initial plan to accomplish the task, using both the overview and its general knowledge. The AI then lists the filenames that are most relevant to this plan.
  3. My program hits the endpoints for retrieving the mdx files for these filenames.
  4. With the mdx files injected into its context, the AI generates a final plan and SEARCH/REPLACE code blocks to execute this plan (see an example of these blocks below)

Notes:

  • This prompting generally aligns with findings from aider's benchmarks though there's lots of room for improvement.
  • Prefer outputting plaintext over JSON and using regex
  • Future:
    • The AI might generate "filters" which the program can apply before showing the total overview.
    • Finer-grained resources might be used for retrieval.
    • The AI might navigate a hierarchy of overviews.

Here is a high-level overview of the prompts as of 5/31/24:

Planning Prompt:

""" SMANIM CHEATSHEET: ... DIAGRAM CODE: ... SELECTED MOBJECTS (which the user likely refers to in their instruction): 0. A vmobject mobject, accessed as `graph.vertices[2]`, defined on line 6 1. A vmobject mobject, accessed as `graph.vertices[3]`, defined on line 6 USER INSTRUCTION: set these to red """

Planning Result:

""" Plan: Access the vertices 2 and 3 from the graph and use the `set_color` method to set their color to red. Relevant Files: - graph.mdx - mobject-overview.mdx """

(Cheatsheet, graph.mdx, mobject-overview.mdx)

Editing Prompt:

""" SMANIM DOCUMENTATION: ... DIAGRAM CODE: ... SELECTED MOBJECTS (which the user likely refers to in their instruction): 0. A vmobject mobject, accessed as `graph.vertices[2]`, defined on line 6 1. A vmobject mobject, accessed as `graph.vertices[3]`, defined on line 6 USER INSTRUCTION: set these to red INITIAL PLAN: Access the vertices 2 and 3 from the graph and use the `set_color` method to set their color to red. """

Editing Result:

""" test ### Updated Plan: 1. Access the vertices 2 and 3 from the graph. 2. Use the `set_color` method to set their color to red. 3. Add the modified vertices back to the canvas to ensure the changes are rendered. Now let's implement this updated plan with SEARCH/REPLACE blocks: '''python # ==== SEARCH START ==== canvas.add(graph) # ==== SEARCH END ==== # ==== REPLACE START ==== graph.vertices[2].set_color(RED) graph.vertices[3].set_color(RED) canvas.add(graph) # ==== REPLACE END ==== ''' """

Documentation Structure and Functionality

The docs serve mdx files, not just HTML files.

  • The AI can request these files and a program can hit the endpoint (Example).
  • Typical docs only serve HTML files, which must be manually processed into legible text for an AI. In this process, structural info is lost and the text is typically uglier.
  • mdx files are well-structured with clear headings, code blocks, lists, etc.
  • Existing solutions that map from html to LLM-friendly text do not do great. Jina.ai is designed to do exactly this but in this example it misses the headings and garbles the code block.
  • Note: Copying the files directly into the prompting repo might lead to stale docs. Serving them ensures the AI has up-to-date access.

Each mdx file covers either a concept or an overview and these pages types each have a consistent layout.

  • Concept files have a description of the main uses of the concept and a list of examples (Example).
  • Overview files are included in their entirety (Example).
  • Both types of files are rigidly structured to allow regex to parse and extract the main description and example descriptions for the total overview.
  • Future: Having a rigid layout might not be important when AI is 10x cheaper and faster. At the build step of the static website, the program could identify updated docs and the AI could extract specific details from those docs.

The docs generate a total overview that reasonably fits into a single AI context window.

  • This overview has main descriptions and example descriptions from each file.
  • The overview itself is an mdx file (since AIs are familiar with mdx files).
  • The docs are too large to include directly. Most good AIs cannot fit all the tokens. AIs will ultra-large context windows are prohibitively expensive and slow.
  • My current overview is ~3000 tokens, which is $0.015 worth of input tokens for gpt-4o

Implementation

  • I copied the mdx files into a directory and wrote a simple vercel function to serve them (Code).
  • I generated the total overview from these files (Code).