Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Magentic – Use LLMs as simple Python functions (github.com/jackmpcollins)
283 points by jackmpcollins on Sept 26, 2023 | hide | past | favorite | 63 comments
This is a Python package that allows you to write function signatures to define LLM queries. This makes it easy to mix regular code with calls to LLMs, which enables you to use the LLM for its creativity and reasoning while also enforcing structure/logic as necessary. LLM output is parsed for you according to the return type annotation of the function, including complex return types such as streaming an array of structured objects.

I built this to show that we can think about using LLMs more fluidly than just chains and chats, i.e. more interchangeably with regular code, and to make it easy to do that.

Please let me know what you think! Contributions welcome.

https://github.com/jackmpcollins/magentic



This looks really useful. Langchain is not my idea of a fun time.

Love the examples too. Low-effort humor is the best:

> create_superhero("Garden Man")

> # Superhero(name='Garden Man', age=30, power='Control over plants', enemies=['Pollution Man', 'Concrete Woman'])


FWIW, at my last company we had a section in the developer guide encouraging using humor in tests - not only did it make them more fun to write, but it engaged the readership better.


I’ve been integrating humor into our unit tests for a bit now and have gotten feedback from a few engineers who really seem to appreciate it.


Would check out https://www.askmarvin.ai/ if you're into this.

I haven't downloaded 1.5 yet, but they released this last week: https://www.askmarvin.ai/prompting/prompt_function/


At first I was like: "okay, it's just a decorator to add a prompt when you have str as an input and str as an output.

Then I kept on reading, and I have to admit that the object creation with LLMs is really amazing!


The API looks very clean. Today I learned about "..." in Python


It is just a noop, but here it looks very appropriate/readable because it reads as saying "AI will fill this in".


It's a misuse of the Python Ellipsis, though PEP has no opinion on it. The Ellipsis is "Special value used mostly in conjunction with extended slicing syntax for user-defined container data types."

In other words, it happens to work and look neat, but pass is the correct way to do it.


Ellipses is actually used in quite a few places. See the answers and comments on this stackoverflow post[0]. The usage most similar to what I have in the magentic examples is with the `@overload` decorator in the typing module[1].

With that said, you are free to put any code in the function body including `pass` or just a docstring or even `raise NotImplementedError` - it will not be executed. Using Ellipses satisfies VSCode/pyright type checking and seemed neatest to me for the examples and docs. I have some additional notes on this in the README[2].

[0] https://stackoverflow.com/q/772124/9995080

[1] https://docs.python.org/3/library/typing.html#typing.overloa...

[2] https://github.com/jackmpcollins/magentic#type-checking


The same as “pass”?


I built a similar package for Typescript[0], with the goal of having type-safe LLM responses automagically.

It's pretty fun, but I've found that having the LLM write code is often-times what I actually want most of the time.

[0] https://github.com/jumploops/magic


Awesome job with the simplicity, gonna play with it. Have you tried using yaml as the format with the models instead of JSOn? Feel like you'll use far fewer tokens to describe the same thing. Perhaps it's a bit more forgiving as well.

EDIT: Just tried using the decorator to output a fairly complex pydantic model and it failed with "magentic.chat_model.openai_chat_model.StructuredOutputError: Failed to parse model output. You may need to update your prompt to encourage the model to return a specific type."

I typically try to give examples in the pydantic Config class, perhaps those could be piped in for some few-shot methods, and also have some iteration if the model output is not perfectly parseable to correct the output syntax..


Yes, I'm working on allowing few-shot examples to be provided as part of defining the prompt-function, which should help in cases like this. Unfortunately from my testing just now it appears that OpenAI ignores examples added to the model config.

In the meantime, have a look at the ValidationError traceback which might highlight a specific field that is causing the issue. Some options to resolve the issue might be: the type for this field could be made more lenient (e.g. str); the `Annotated` type hint could be used to give the field a description to help correct the error [0]; the field could be removed. You could also try using gpt-4 by setting the env var MAGENTIC_OPENAI_MODEL [1].

If none of these help resolve it or it appears to be an issue with magentic itself please file a github issue with an example. Comments on how to improve error messages and debugging are also welcome! Thanks for trying it out.

[0] https://docs.pydantic.dev/latest/concepts/fields/#using-anno...

[1] https://github.com/jackmpcollins/magentic#configuration


Very cool! At first the title reminded me of a project me and my colleague are working on called OpenAI-Functools [1], but your concept is quite the opposite, combining LLMs in your code rather seamlessly instead of the other way around. Quite cool, and interesting examples :)

I’ll definitely try to apply it in one of my pet projects. Good stuff

[1] https://github.com/Jakob-98/openai-functools


I've personally found frameworks like this to get in the way of quality COT: It's rare for a prompt that takes great advantage of the LLM's reasoning to fit in the format these generators encourage

A friend mentioned how terrible most cold email generators are at actually generating natural feeling emails. It just took asking him questions about how actual people in marketing come up with emails to come up with a chain of thought that produces intentionally uncanny emails for a wide range of inputs: https://rentry.co/54hbz

It's not like you can't technically fit what I described into bunch of comments (or an obnoxiously long multiline comment), but it'd be bulky and not conducive to general happiness of anyone involved.

I much prefer repurposing Nunjucks templates to keep all of that a separate document that's easy to manage with version control


With magentic you could do chain-of-thought in two or more steps: one function that generates a string output containing the chain-of-thought reasoning and answer, and a second that takes that output and converts it to the final answer object. I agree though that this is not encouraged or made obvious by the framework.

The approach I'm encouraging with this is to write many functions to achieve your goal. So in the case of your email writing example you might have some of the following prompt-functions - write key bullet points for email about xyz -> list[str] - write email based on bullet points -> str - generate feedback for email to meet criteria abc -> str - update email based on feedback -> str - does email meet all criteria abc -> bool And between these you could have regular python code check things like blacklist/whitelist of keywords, length of paragraphs, and even add hardcoded strings to the feedback based on these checks.


Why would you add a second function for the answer object when you can return an answer object in the same response as the chain of thought?

Overall your second approach makes for really terrible UX and dramatically weakens the performance at the task unless you go and repeat every single definition along the way: ensuring you now have X copies of the prompt spread across the code base and have blown up your token count.

Once you get to that level of granularity between calls, you've pretty much fall back into doing a slower more expensive version of NLP pre-ChatGPT.


This is great. I hacked a smaller version of this together when I built an LLM app with Elixir. Honestly, the async by default of Ex is so much better suited to this stuff, especially as it’s just api calls.

Tempted to have a go at porting these ideas. Should be v doable with the macro system.


Really like how this is implemented with decorators. Everything just feels really smooth


Looks great! I don't normally like these LLM libraries but this one sparks joy. I'll try it out on my next experiment.

Could you highlight how you're parsing to structured objects and how it can fail? Ever since I discovered guidance's method of pattern guides I've been wanting this more and more (only works for local hugging face models though). Wish OpenAI offered a similar API.


Thanks! Currently magentic just uses OpenAI function-calling; it provides it a function schema that matches the structure of the output object. So it fails in the same ways as function-calling - struggles to match complex schemas, occasionally returns empty arrays, ...


I think the comments on great api design got me thinking of a world in which you can have multiple of these frameworks orchestrate together. I could see use in adding this to a platform I’m currently building to overcome some issues llamaindex eg introduces.


Curious as to why you chose to do it as a decorator instead of just a function call?


I found this was the most compact way to represent what I wanted to define, and makes it easy to keep the type hints for parameters. If you look inside `@prompt` it's creating a `PromptFunction` instance which I think would be a similar API to what you would end up with without using decorators https://github.com/jackmpcollins/magentic/blob/afdb22513385b...


I never got on board of decorators in python, but you sold me on it.


Another really good use is performance metrics -- decorators to track execution time, for instance, with the ability to specify things like function groups, logic concepts, etc. It makes it trivial to add this sort of observability to your code.


Looking at the code, it looks like it is a way to support typing; just making it a function with the string template would let you return a dynamically-defined function but I think would make it harder to get static typing.


See also: `antiscope`, an experiment in subjunctive programming

https://github.com/MillionConcepts/antiscope


Looks super cool! A few questions:

1) Can you get the actual code output or will this end up calling OpenAI each function call? 2) What latency does it add? What about token usage? 3) Is the functionality deterministic?


1) The OpenAI API will be queried each time a "prompt-function" is called in python code. If you provide the `functions` argument in order to use function-calling then magentic will not execute the function the LLM has chosen, instead it returns a `FunctionCall` instance which you can validate before calling.

2) I haven't measured additional latency but it should be negligible in comparison to the speed of generation of the LLM. And since it makes it easy to use streaming and async functions you might be able to achieve much faster generation speeds overall - see the Async section in the README. Token usage should also be a negligible change from calling the OpenAI API directly - the only "prompting" magentic does currently is in naming the functions sent to OpenAI, all other input tokens are written by the user. A user switching from explicitly defining the output schema in the prompt to using function-calling via magentic might actually save a few tokens.

3) Functionality is not deterministic, even with `temperature=0`, but since we're working with python functions one option is to just add the `@cache` decorator. This would save you tokens and time when calling the same prompt-function with the same inputs.

---

1) https://github.com/jackmpcollins/magentic#usage 2) https://github.com/jackmpcollins/magentic#asyncio 3) https://docs.python.org/3/library/functools.html#functools.c...


Nice! I’m going to try it out and possibly integrate it into my Python package: https://vanna.ai


Does this do System vs. Assistant vs. User prompting?


Right now we just pass a single user prompt to the chat model. Setting the system prompt could also be done in the `@prompt` decorator. I've added a github issue to track https://github.com/jackmpcollins/magentic/issues/31


Update: I've added the ability to add chat messages using a new decorator `@chatprompt` in v0.7.0. See https://github.com/jackmpcollins/magentic/releases/tag/v0.7....


I am amazed that `...` is a valid syntax in Python, not a pseudo grammar.

This library is impressive, I appreciate it and I will apply it to my project.


What's the difference between '...' and the more common 'pass'?


I find students correctly infer what to do with "..." whereas they were afraid to touch "pass".

E.g, if I gave them this:

    def foo(x):
      ...  #add your implementation here
    
    def bar(x):
      pass #add your implementation here
I'd get back this:

    def foo(x):
      return x+1
      
    def bar(x):
      return x+1
      pass


In code, using ... implies that the code is yet to be written. pass means it's explicitly a noop.


In this case, functionally, nothing. Some other commenters have suggested it does something interesting by implying "AI will provide the logic," whereas "pass" doesn't necessarily do that.


This is neat. It makes it easy to prototype, and then you can just remove the decorator and write a specific implementation if you need to.


Is it really LLMs (plural) when you only have OpenAPI integration?


Right now it just works with OpenAI chat models (gpt-3.5-turbo, gpt-4) but if there's interest I plan to extend it to have several backends. These would probably each be an existing library that implements generating structured output like https://github.com/outlines-dev/outlines or https://github.com/guidance-ai/guidance. If you have ideas how this should be done let me know - on a github issue would be great to make it visible to others.


Oh, and some companies offer APIs that match the OpenAI API and there are some open-source projects that do this for llama running locally. Since those would be compatible with the openai python package they will work with magentic too - though some of these do not support function calling.

See for example Anyscale Endpoints https://app.endpoints.anyscale.com/landing and https://github.com/AmineDiro/cria


There's also LocalAI[0] which allows the use of local LLMs with an OpenAI compatible API.

[0] https://github.com/go-skynet/LocalAI


Thanks for sharing! LocalAI supports function calling[0] so this should work for most or all features of magentic - I'm interested to see if concurrent requests work. I will test this out.

[0] https://localai.io/features/openai-functions/


I tried out guidance. Encountered endless bugs


OpenAI offers a few different LLMs :)


text-generation-webui offers an OpenAI API implementation, specifically to support OpenAI API clients, so you can get something more than just OpenAI support by just wrapping the OpenAI API.

You could have more flexibility by abstracting out the underlying LLM APIs, but then you also have a bigger deal with supported features of different APIs, the same conceptual feature supported with very different parameter structures, etc., etc.


Super cool! Looks quite intuitive, especially for function calls.


Just wanna say, that's pretty great API design :)


Do you support custom LLMs?


At the moment only those that support the OpenAI Chat API, with function calling for the structured outputs. For example you can use LocalAI[0][1] to run models locally.

[0] https://github.com/go-skynet/LocalAI

[1] https://localai.io/features/openai-functions/


Seems a lot like https://github.com/PrefectHQ/marvin?

The prompting you do seems awfully similar to:

https://www.askmarvin.ai/prompting/prompt_function/


Pretty cool, I made something similar (lambdaprompt[1]), with the same ideal of functions being the best interface for LLMs.

Also, here's some discussion about this style of prompting and ways of working with LLMs from a while ago [2].

[1] https://github.com/approximatelabs/lambdaprompt/ [2] https://news.ycombinator.com/context?id=34422917


Are you familiar with https://github.com/PrefectHQ/marvin? This looks very similar


Yes, similar ideas. Marvin [asks the LLM to mimic the python function](https://github.com/PrefectHQ/marvin/blob/f37ad5b15e2e77dd998...), whereas in magentic the function signature just represents the inputs/outputs to the prompt-template/LLM, so the LLM “doesn’t know” that it is pretending to be a python function - you specify all the prompts.


(Completely off-topic, but oh how I wish HN supported markdown)


Write some tests for those functions. It will be worth it. No, I am not kidding, especially for AI we need tests, but we should report accuracy instead of a hard fail/pass.


No problem, boss.

    @prompt("Find out if {programs} are correct.")
    def do_they_work(programs: list) -> bool:
        ...

I just pushed it to production. Dashboard is all green. See you when I get back from vacation!


We need a new language/DSL. Python is a lost cause for strings as first-class.


How so? What disadvantages does having strings as a first class Type have?


I expressed myself too succinctly and without context, sorry.

I meant we need a new DSL better suited for prompt engg, and a UI that better supports longer strings. Actualy this UI can be something compatible with Python.

But overall a reimagination of the dev experience is what I am getting at (like Jupyter for LLMs).

Dm me [redacted] on X for more.


This is also similar in spirit to LMQL

https://github.com/eth-sri/lmql




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: