Part 1 [Part 2]
The Still Manim Editor is a local, real-time coding editor for creating diagrams using the python programming library still-manim. To edit a still-manim diagram, users can edit the code or send simple language commands like "create a weighted graph with 5 vertices." See examples at the link above. See the source code here.
Currently, AI cannot independently create a good diagram. No AI tutor can explain concepts using neat visuals, not in ASCII art, pixels, or code.
But AI can reliably accomplish small subtasks by editing code that represents a diagram. With the still-manim editor, my goal was to create a human-AI environment for creating conceptual diagrams via code and natural language.
Like p5.js, the main interface is a code editor adjacent to the resulting graphic. The core interesting feature is that users can send natural language commands, which an AI fulfills by editing the python code.
Requesting the first step of Dijkstra's in the still-manim editor, with the prompt "show me the first step of dijkstra's":
While the language commands can be useful, they don't always work. If I asked the AI to show the distance labels or show the third step of the algorithm, it might misinterpret me or get stuck. Fortunately, unlike an image or ASCII art, diagram code is something that I as a human can iterate on with a lot of control.
See a better diagram in a larger context hereA core challenge in designing an editor for creating diagrams is making it intuitive for users to "point at" objects, especially while interacting with an AI. Users might refer to objects verbally, saying things like "the left window on the bedroom above the garage" or "the vertex labeled 6". More intuitively, users could directly select specific semantic objects.
Here is a demo that I created within the web editor, which can also be viewed under "Previous Examples" > "language_commands".
Because still-manim diagrams are defined by code, the semantic objects are segmented by design. The objects in the diagram are part of a tree structure, similar to the HTML DOM of a webpage. Each node in the diagram's tree structure is either a singular object, an object with subobjects, or a group. In the example below, the lemon group object contains the lemon object, which contains the lemon spoke/line object.
Selecting objects within still-manim diagrams requires more than a single click due to overlapping objects. In contrast, the interactive objects of a webpage do not overlap. To address this, I implemented a simple method for selection within the tree structure where:
To select multiple objects, hold cmd (mac) or ctrl (windows) while clicking on each object. This selection scheme is good for a working demo but it's limited. For example, two top-level objects with large and overlapping bounding boxes can prevent deeper selection from working in that entire region. Also, it's not possible to select two objects nested within separate top-level branches. Figma's approach handles these cases, but it's harder to implement.
Once selected, an object must have associated metadata that reveals what it is to the AI. In this case, the AI needs to understand the object in the context of the diagram code, so the object metadata is a line number and an access path. The access path is a string that can directly access the variable, if placed after the expression on the specified line number. The LLM prompt might look like this:
"""
...
DIAGRAM CODE:
...
SELECTED MOBJECTS:
0. A vmobject mobject, accessed as `graph.vertices[2]`, defined on line 6
1. A vmobject mobject, accessed as `graph.vertices[3]`, defined on line 6
USER INSTRUCTION:
set these to red
"""
How are these access paths determined?
sys.settrace
(e.g. g = Graph()
) and the line number and variable name are set as instance attributes of the object (Source Code).[2]
).canvas.mobjects[mob_index]
.From the above rules, it's possible that an object has multiple access paths.
from smanim import *
s = Square()
g = Group(s, Graph())
canvas.add(s, g)
canvas.draw()
# the squares's access paths includes both "s" and "g[0]"
# the graph's first vertex access paths include "g[1].vertices[0]"
Which access path should be prioritized and provided to the LLM? The following precedence rules are applied:
As a side effect of each object having access path metadata, a neat feature is now possible. When the user selects a object, its corresponding line in the code can be highlighted. I've found this saves me time when navigating the code.
While this tool currently doesn't work beyond a demo, this project has some interesting and pointy features:
Graph(vertices, edges)
).
graph.vertices[0]
.
This project is worth revisiting when AI can use custom programming languages that are not in the training data. For now, it's in Demo Status.
If I could redo this tool, I would consider:
next_to
and close_to
as constraints to be maintained during program execution or satisfied at the end.Challenge: Design a programming library that generates medium-complex diagrams. These diagrams should include network graphs, cartesian graphs, structural formulas, flowcharts, bar graphs, and any other lightweight graphics we often draw on a whiteboard or create in a jupyter notebook.
We already have separate tools to create these diagrams. Why would we want a single library to combine all these seemingly separate domains?