Show HN: Magentic – Use LLMs as simple Python functions

jstarfish · on Sept 26, 2023

This looks really useful. Langchain is not my idea of a fun time.

Love the examples too. Low-effort humor is the best:

> create_superhero("Garden Man")

> # Superhero(name='Garden Man', age=30, power='Control over plants', enemies=['Pollution Man', 'Concrete Woman'])

brandall10 · on Sept 26, 2023

FWIW, at my last company we had a section in the developer guide encouraging using humor in tests - not only did it make them more fun to write, but it engaged the readership better.

phatskat · on Sept 27, 2023

I’ve been integrating humor into our unit tests for a bit now and have gotten feedback from a few engineers who really seem to appreciate it.

cosmonoot · on Sept 26, 2023

Would check out https://www.askmarvin.ai/ if you're into this.

I haven't downloaded 1.5 yet, but they released this last week: https://www.askmarvin.ai/prompting/prompt_function/

denysvitali · on Sept 27, 2023

At first I was like: "okay, it's just a decorator to add a prompt when you have str as an input and str as an output.

Then I kept on reading, and I have to admit that the object creation with LLMs is really amazing!

smilingemoji · on Sept 26, 2023

The API looks very clean. Today I learned about "..." in Python

quickthrower2 · on Sept 27, 2023

It is just a noop, but here it looks very appropriate/readable because it reads as saying "AI will fill this in".

politelemon · on Sept 27, 2023

It's a misuse of the Python Ellipsis, though PEP has no opinion on it. The Ellipsis is "Special value used mostly in conjunction with extended slicing syntax for user-defined container data types."

In other words, it happens to work and look neat, but pass is the correct way to do it.

jackmpcollins · on Sept 27, 2023

Ellipses is actually used in quite a few places. See the answers and comments on this stackoverflow post[0]. The usage most similar to what I have in the magentic examples is with the `@overload` decorator in the typing module[1].

With that said, you are free to put any code in the function body including `pass` or just a docstring or even `raise NotImplementedError` - it will not be executed. Using Ellipses satisfies VSCode/pyright type checking and seemed neatest to me for the examples and docs. I have some additional notes on this in the README[2].

[0] https://stackoverflow.com/q/772124/9995080

[1] https://docs.python.org/3/library/typing.html#typing.overloa...

[2] https://github.com/jackmpcollins/magentic#type-checking

nodesocket · on Sept 27, 2023

The same as “pass”?

jumploops · on Sept 26, 2023

I built a similar package for Typescript[0], with the goal of having type-safe LLM responses automagically.

It's pretty fun, but I've found that having the LLM write code is often-times what I actually want most of the time.

[0] https://github.com/jumploops/magic

ramraj07 · on Sept 27, 2023

Awesome job with the simplicity, gonna play with it. Have you tried using yaml as the format with the models instead of JSOn? Feel like you'll use far fewer tokens to describe the same thing. Perhaps it's a bit more forgiving as well.

EDIT: Just tried using the decorator to output a fairly complex pydantic model and it failed with "magentic.chat_model.openai_chat_model.StructuredOutputError: Failed to parse model output. You may need to update your prompt to encourage the model to return a specific type."

I typically try to give examples in the pydantic Config class, perhaps those could be piped in for some few-shot methods, and also have some iteration if the model output is not perfectly parseable to correct the output syntax..

jackmpcollins · on Sept 27, 2023

Yes, I'm working on allowing few-shot examples to be provided as part of defining the prompt-function, which should help in cases like this. Unfortunately from my testing just now it appears that OpenAI ignores examples added to the model config.

In the meantime, have a look at the ValidationError traceback which might highlight a specific field that is causing the issue. Some options to resolve the issue might be: the type for this field could be made more lenient (e.g. str); the `Annotated` type hint could be used to give the field a description to help correct the error [0]; the field could be removed. You could also try using gpt-4 by setting the env var MAGENTIC_OPENAI_MODEL [1].

If none of these help resolve it or it appears to be an issue with magentic itself please file a github issue with an example. Comments on how to improve error messages and debugging are also welcome! Thanks for trying it out.

[0] https://docs.pydantic.dev/latest/concepts/fields/#using-anno...

[1] https://github.com/jackmpcollins/magentic#configuration

js98 · on Sept 26, 2023

Very cool! At first the title reminded me of a project me and my colleague are working on called OpenAI-Functools [1], but your concept is quite the opposite, combining LLMs in your code rather seamlessly instead of the other way around. Quite cool, and interesting examples :)

I’ll definitely try to apply it in one of my pet projects. Good stuff

[1] https://github.com/Jakob-98/openai-functools

BoorishBears · on Sept 26, 2023

I've personally found frameworks like this to get in the way of quality COT: It's rare for a prompt that takes great advantage of the LLM's reasoning to fit in the format these generators encourage

A friend mentioned how terrible most cold email generators are at actually generating natural feeling emails. It just took asking him questions about how actual people in marketing come up with emails to come up with a chain of thought that produces intentionally uncanny emails for a wide range of inputs: https://rentry.co/54hbz

It's not like you can't technically fit what I described into bunch of comments (or an obnoxiously long multiline comment), but it'd be bulky and not conducive to general happiness of anyone involved.

I much prefer repurposing Nunjucks templates to keep all of that a separate document that's easy to manage with version control

jackmpcollins · on Sept 26, 2023

With magentic you could do chain-of-thought in two or more steps: one function that generates a string output containing the chain-of-thought reasoning and answer, and a second that takes that output and converts it to the final answer object. I agree though that this is not encouraged or made obvious by the framework.

The approach I'm encouraging with this is to write many functions to achieve your goal. So in the case of your email writing example you might have some of the following prompt-functions - write key bullet points for email about xyz -> list[str] - write email based on bullet points -> str - generate feedback for email to meet criteria abc -> str - update email based on feedback -> str - does email meet all criteria abc -> bool And between these you could have regular python code check things like blacklist/whitelist of keywords, length of paragraphs, and even add hardcoded strings to the feedback based on these checks.

BoorishBears · on Sept 27, 2023

Why would you add a second function for the answer object when you can return an answer object in the same response as the chain of thought?

Overall your second approach makes for really terrible UX and dramatically weakens the performance at the task unless you go and repeat every single definition along the way: ensuring you now have X copies of the prompt spread across the code base and have blown up your token count.

Once you get to that level of granularity between calls, you've pretty much fall back into doing a slower more expensive version of NLP pre-ChatGPT.

te_chris · on Sept 26, 2023

This is great. I hacked a smaller version of this together when I built an LLM app with Elixir. Honestly, the async by default of Ex is so much better suited to this stuff, especially as it’s just api calls.

Tempted to have a go at porting these ideas. Should be v doable with the macro system.

bogtog · on Sept 26, 2023

Really like how this is implemented with decorators. Everything just feels really smooth

Difwif · on Sept 26, 2023

Looks great! I don't normally like these LLM libraries but this one sparks joy. I'll try it out on my next experiment.

Could you highlight how you're parsing to structured objects and how it can fail? Ever since I discovered guidance's method of pattern guides I've been wanting this more and more (only works for local hugging face models though). Wish OpenAI offered a similar API.

jackmpcollins · on Sept 26, 2023

Thanks! Currently magentic just uses OpenAI function-calling; it provides it a function schema that matches the structure of the output object. So it fails in the same ways as function-calling - struggles to match complex schemas, occasionally returns empty arrays, ...

fbnbr · on Sept 27, 2023

I think the comments on great api design got me thinking of a world in which you can have multiple of these frameworks orchestrate together. I could see use in adding this to a platform I’m currently building to overcome some issues llamaindex eg introduces.

jedberg · on Sept 26, 2023

Curious as to why you chose to do it as a decorator instead of just a function call?

jackmpcollins · on Sept 26, 2023

I found this was the most compact way to represent what I wanted to define, and makes it easy to keep the type hints for parameters. If you look inside `@prompt` it's creating a `PromptFunction` instance which I think would be a similar API to what you would end up with without using decorators https://github.com/jackmpcollins/magentic/blob/afdb22513385b...

3abiton · on Sept 27, 2023

I never got on board of decorators in python, but you sold me on it.

jwestbury · on Sept 27, 2023

Another really good use is performance metrics -- decorators to track execution time, for instance, with the ability to specify things like function groups, logic concepts, etc. It makes it trivial to add this sort of observability to your code.

dragonwriter · on Sept 26, 2023

Looking at the code, it looks like it is a way to support typing; just making it a function with the string template would let you return a dynamically-defined function but I think would make it harder to get static typing.

ccmillion · on Sept 27, 2023

See also: `antiscope`, an experiment in subjunctive programming

https://github.com/MillionConcepts/antiscope

conor_f · on Sept 26, 2023

Looks super cool! A few questions:

1) Can you get the actual code output or will this end up calling OpenAI each function call? 2) What latency does it add? What about token usage? 3) Is the functionality deterministic?

jackmpcollins · on Sept 26, 2023

1) The OpenAI API will be queried each time a "prompt-function" is called in python code. If you provide the `functions` argument in order to use function-calling then magentic will not execute the function the LLM has chosen, instead it returns a `FunctionCall` instance which you can validate before calling.

2) I haven't measured additional latency but it should be negligible in comparison to the speed of generation of the LLM. And since it makes it easy to use streaming and async functions you might be able to achieve much faster generation speeds overall - see the Async section in the README. Token usage should also be a negligible change from calling the OpenAI API directly - the only "prompting" magentic does currently is in naming the functions sent to OpenAI, all other input tokens are written by the user. A user switching from explicitly defining the output schema in the prompt to using function-calling via magentic might actually save a few tokens.

3) Functionality is not deterministic, even with `temperature=0`, but since we're working with python functions one option is to just add the `@cache` decorator. This would save you tokens and time when calling the same prompt-function with the same inputs.

---

1) https://github.com/jackmpcollins/magentic#usage 2) https://github.com/jackmpcollins/magentic#asyncio 3) https://docs.python.org/3/library/functools.html#functools.c...

zainhoda · on Sept 26, 2023

Nice! I’m going to try it out and possibly integrate it into my Python package: https://vanna.ai

quickthrower2 · on Sept 27, 2023

Does this do System vs. Assistant vs. User prompting?

jackmpcollins · on Sept 27, 2023

Right now we just pass a single user prompt to the chat model. Setting the system prompt could also be done in the `@prompt` decorator. I've added a github issue to track https://github.com/jackmpcollins/magentic/issues/31

jackmpcollins · on Oct 2, 2023

Update: I've added the ability to add chat messages using a new decorator `@chatprompt` in v0.7.0. See https://github.com/jackmpcollins/magentic/releases/tag/v0.7....

hitori · on Sept 27, 2023

I am amazed that `...` is a valid syntax in Python, not a pseudo grammar.

This library is impressive, I appreciate it and I will apply it to my project.

joelthelion · on Sept 27, 2023

What's the difference between '...' and the more common 'pass'?

hoosieree · on Sept 27, 2023

I find students correctly infer what to do with "..." whereas they were afraid to touch "pass".

E.g, if I gave them this:

    def foo(x):
      ...  #add your implementation here
    
    def bar(x):
      pass #add your implementation here

I'd get back this:

    def foo(x):
      return x+1
      
    def bar(x):
      return x+1
      pass

inpaner · on Sept 27, 2023

In code, using ... implies that the code is yet to be written. pass means it's explicitly a noop.

jwestbury · on Sept 27, 2023

In this case, functionally, nothing. Some other commenters have suggested it does something interesting by implying "AI will provide the logic," whereas "pass" doesn't necessarily do that.

pphysch · on Sept 26, 2023

This is neat. It makes it easy to prototype, and then you can just remove the decorator and write a specific implementation if you need to.

ElectricalUnion · on Sept 26, 2023

Is it really LLMs (plural) when you only have OpenAPI integration?

jackmpcollins · on Sept 26, 2023

Right now it just works with OpenAI chat models (gpt-3.5-turbo, gpt-4) but if there's interest I plan to extend it to have several backends. These would probably each be an existing library that implements generating structured output like https://github.com/outlines-dev/outlines or https://github.com/guidance-ai/guidance. If you have ideas how this should be done let me know - on a github issue would be great to make it visible to others.

jackmpcollins · on Sept 26, 2023

Oh, and some companies offer APIs that match the OpenAI API and there are some open-source projects that do this for llama running locally. Since those would be compatible with the openai python package they will work with magentic too - though some of these do not support function calling.

See for example Anyscale Endpoints https://app.endpoints.anyscale.com/landing and https://github.com/AmineDiro/cria

michaelmior · on Sept 26, 2023

There's also LocalAI[0] which allows the use of local LLMs with an OpenAI compatible API.

[0] https://github.com/go-skynet/LocalAI

jackmpcollins · on Sept 27, 2023

Thanks for sharing! LocalAI supports function calling[0] so this should work for most or all features of magentic - I'm interested to see if concurrent requests work. I will test this out.

[0] https://localai.io/features/openai-functions/

AmazingTurtle · on Sept 26, 2023

I tried out guidance. Encountered endless bugs

msikora · on Sept 26, 2023

OpenAI offers a few different LLMs :)

dragonwriter · on Sept 26, 2023

text-generation-webui offers an OpenAI API implementation, specifically to support OpenAI API clients, so you can get something more than just OpenAI support by just wrapping the OpenAI API.

You could have more flexibility by abstracting out the underlying LLM APIs, but then you also have a bigger deal with supported features of different APIs, the same conceptual feature supported with very different parameter structures, etc., etc.

retrovrv · on Sept 26, 2023

Super cool! Looks quite intuitive, especially for function calls.

pedrovhb · on Sept 27, 2023

Just wanna say, that's pretty great API design :)

czyhandsome · on Sept 27, 2023

Do you support custom LLMs?

jackmpcollins · on Sept 27, 2023

At the moment only those that support the OpenAI Chat API, with function calling for the structured outputs. For example you can use LocalAI[0][1] to run models locally.

[0] https://github.com/go-skynet/LocalAI

[1] https://localai.io/features/openai-functions/

cosmonoot · on Sept 26, 2023

Seems a lot like https://github.com/PrefectHQ/marvin?

The prompting you do seems awfully similar to:

https://www.askmarvin.ai/prompting/prompt_function/

bluecoconut · on Sept 26, 2023

Pretty cool, I made something similar (lambdaprompt[1]), with the same ideal of functions being the best interface for LLMs.

Also, here's some discussion about this style of prompting and ways of working with LLMs from a while ago [2].

[1] https://github.com/approximatelabs/lambdaprompt/ [2] https://news.ycombinator.com/context?id=34422917

bl00p · on Sept 26, 2023

Are you familiar with https://github.com/PrefectHQ/marvin? This looks very similar

jackmpcollins · on Sept 26, 2023

Yes, similar ideas. Marvin [asks the LLM to mimic the python function](https://github.com/PrefectHQ/marvin/blob/f37ad5b15e2e77dd998...), whereas in magentic the function signature just represents the inputs/outputs to the prompt-template/LLM, so the LLM “doesn’t know” that it is pretending to be a python function - you specify all the prompts.

fredoliveira · on Sept 26, 2023

(Completely off-topic, but oh how I wish HN supported markdown)

visarga · on Sept 26, 2023

Write some tests for those functions. It will be worth it. No, I am not kidding, especially for AI we need tests, but we should report accuracy instead of a hard fail/pass.

hoosieree · on Sept 27, 2023

No problem, boss.

    @prompt("Find out if {programs} are correct.")
    def do_they_work(programs: list) -> bool:
        ...

I just pushed it to production. Dashboard is all green. See you when I get back from vacation!

cc_ashby · on Sept 26, 2023

We need a new language/DSL. Python is a lost cause for strings as first-class.

conor_f · on Sept 26, 2023

How so? What disadvantages does having strings as a first class Type have?

cc_ashby · on Sept 28, 2023

I expressed myself too succinctly and without context, sorry.

I meant we need a new DSL better suited for prompt engg, and a UI that better supports longer strings. Actualy this UI can be something compatible with Python.

But overall a reimagination of the dev experience is what I am getting at (like Jupyter for LLMs).

Dm me [redacted] on X for more.

lachlan_gray · on Sept 26, 2023

This is also similar in spirit to LMQL

https://github.com/eth-sri/lmql