Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Django ORM vs. SQLAlchemy (reddit.com)
72 points by googletron on Jan 29, 2012 | hide | past | favorite | 24 comments


SQLAlchemy is one of the best designed libraries I have ever used for any software task, period. I've done all sorts of crazy reflexive joins with it, used the meta programming capabilities to bootstrap my ORM from nothing, added my own rails like scope features to the session maker, etc. SQLAlchemy just plain works(TM). If you're a Python guy, learning it is one of the best things you can do.


SQLAlchemy is just beautiful piece of engineering work. What I always marvel at is how detailed, descriptive and just plain helpful its error messages are. It's a small yet extremely useful feature, given the natural complexity of ORM and bazillion ways that things may go wrong when mapping from relations to objects.


The big difference between the Django ORM and SQLAlchemy is that SQLA takes longer to get going with since it's more complex, but can do a lot more. Thus SQLA is more dangerous: it allows you to take your application further and further down the rabbit hole, making you think that you can actually use an ORM for everything that you do and pretend like row <==> object relationship is one to one.

For every project, there exists a point where an ORM stops being an asset and starts being a liability. With Django ORM it happens pretty early on, once certain types of queries cannot be run with it since it is very limited in scope. With SQLA this point comes when performance drops (by my metrics it adds quite a bit of CPU time on instantiating SQLA object wrappers around rows). This usually happens late in the game, after you've done all your development, deployed and have been running for a little while. In other words: at the most inconvenient time.

On top of that, consider the following piece of code:

  print zoo.location
  print len(zoo.animals)
  print zoo.name
Notice that line 2 of the above code may or may not do external IO, talking to an unreliable database over TCP/IP. It is a small miracle that it works. However, the code looks so right. It seems so easy to fall for the convenience of an ORM instead of architecting your database connector, thus decoupling your data storage from your application.

Edit: I'd like to add that if you have a small project with well-defined scope, it is possible to use an ORM efficiently. However, if performance is of any consideration or you anticipate that the project will grow (most projects do), then consider creating an API to access your data instead. In the short term it will be almost as quick to develop. In the long term, it will save you many headaches.


create an API and then....write hand-coded SQL in the implementation ? The notion that Django is better than SQLAlchemy only because it's so limited, and therefore you won't accidentally write inefficient code sounds very strange. Profiling a SQLAlchemy application is very straightforward (watch the SQL at all times when testing), and should be done throughout every stage of development. There's no "surprise at the most inconvenient time" if you're paying appropriate attention up to that point.

SQLAlchemy is tailored for performance above all, in that it allows the database to be used in its natural mode of set-based thinking. You can write real queries, and keep decisions the database is capable of making on the database side. The naive system of "row by agonizing row" programming, where you load everything into memory in order to reason about your data, is not really how relational databases are supposed to be used. You need a tool that works along with the relational paradigm in order to work that way, or else it's back to hand-written SQL strings. And it is absolutely false that a large scale application written out as hundreds of hardcoded SQL strings is easier to manage or expand upon than one that makes good use of an abstraction layer.

Also your example about len(zoo.animals) applies to any ORM. Django, Storm, Hibernate, Active Record, all have lazy evaluated collections. At least with SQLAlchemy and Hibernate, "zoo.animals" is only evaluated once, and not on each access.


There are two "bite you in the ass" points about ORMs that I am talking about. The first is writing ineffecient SQL which as you mention may typically be avoided by reading all queries. The second is the RAM and CPU required by the ORM. When you require massive concurrency and very small response times, SQLAlchemy simply requires too many resources. There is no way to avoid that.

I agree that loading rows one by one and hard-coding their relationships into you business logic is not the most convenient thing. However, the point of SQL is not really to let you say "zoo.animals". The point is: you ask it questions and it returns answers. Where ORMs break down is where you start talking about aggregate rows or complex computed properties. What do you mean, sometimes animal has property potential_mates_with_offspring_count and sometimes it does not?

Yes any ORM that uses lazy evaluation will suffer from the issue above. That is why I am not a fan of any ORM. Instead, a carefully written and thought through data access layer is much more scalable and may provide true abstraction of your data storage mechanism.

Lastly, no zoo.animals is not guaranteed to be cached, since it may be a declared attribute. The session is not a cache so you cannot rely on it for that.


> When you require massive concurrency and very small response times, SQLAlchemy simply requires too many resources. There is no way to avoid that.

For the loading full objects use case, you can stretch it much more by using query caching and hopefully Pypy will help a lot here as well. But you can ditch most of the object loading overhead by just requesting individual columns back, then using just the abstraction layer as the next level, then finally feeding your query into a raw cursor. But these are optimizations that you apply only in those spots that it's needed. You certainly don't have to ditch the automation of query rendering, the rendered form of a query can be cached too.

I also certainly agree that data access or service layers are a good thing, and perhaps you have a notion that "using an ORM" means "the ORM is your public data layer" - I don't actually think that's the case. But in my view you still want to make full use of automation in order to implement such a layer.


I really really wanted to like the Django ORM, I was going to use the very nice django admin as a quick replacement of the administrative tools we had for an existing database. The ORM just couldn't keep up with anything but the lowest common denominator of database design. I think this comment really nails the difference:

The way I try to think of it is, Django's ORM was designed for web applications, and SQLAlchemy was designed for databases.


Next up, emacs vs. vi, Linux vs. Mac.

Having just read through that thread, and zzzeek's earlier post (http://www.reddit.com/r/Python/comments/olech/is_django_cons...) I still don't see that any of this has changed from the last time I read a similar thread.

My take:

Yes, SQLAlchemy can do more queries than Django's ORM.

No, you are unlikely to need these in practice for most web apps (zzzeek's example (for a generic User<->Address schema) is "Give me all households in New York with exactly two occupants where neither occupant has any residences outside of the city." Really? When was the last time you had a query like this in a web app?)

Yes, SQLAlchemy is a PITA to learn and use, unless you use the Django/Rails style declarative base interface.

Yes, you will lose out on basically all of the Django infrastructure if you use SQLAlchemy: django-admin, fixtures, introspecting the schema to generate models, django-south, etc.

If you need complex queries or like writing web frameworks from scratch, SQLAlchemy is the way to go. If you just want to use Django and get your user data out of a table, then just use Django's ORM.


> No, you are unlikely to need these in practice for most web apps (zzzeek's example (for a generic User<->Address schema) is "Give me all households in New York with exactly two occupants where neither occupant has any residences outside of the city." Really? When was the last time you had a query like this in a web app?)

I write web applications in the financial analysis sector, and the general technique of writing queries where subqueries must be composed together to combine groups of coarse-grained sets into an ever more specifically narrowed down set is very common - we do reporting of a fairly complex temporally-oriented schema and we can throw up bar charts and excel spreadsheets right up on the client, straight from the DB, without loading intermediary objects into memory or anything like that. The queries are highly composed, we're using aggregates to flatten out sets, then joining to that, we're using window functions in some cases, and we can do it all straight in the ORM with not a literal scrap of SQL anywhere. When the dataset grows from 1000 rows to 1000000 we'll be able to scale since we're using the relational database as it was meant to be used.

I learned how to write those kinds of queries when I worked at MLB.com. There, we wrote scripts that generated all kinds of elements of the homepage and article pages, reams of baseball stats, game wrapups, schedules - again all pulled straight from a really, really vast schema spreading over three databases. Before I worked at MLB I barely knew what a join was.

Looking through my source here, I just counted nine such queries just in two source files related to reporting alone. If you come out to Pycon this year, come find me and I can show you some of what I do.

Obviously the example in my post/talk had to be contrived since I couldn't exactly put up code from work. Perhaps I'll try to think of a more real world example for my Pycon talk this year.


I think you just proved my point. If I had to pick problem domains for which Django's ORM is poorly suited, "complex temporally-oriented schema ... in the financial analysis sector", or something like it, would be pretty high up on the list :)


Yet financial institutions are flocking to Django like everyone else. My goal is to try to keep the relational flame alive, I guess.


as reflected in the comments there, they are really quite different. if you're writing a simple (these days, even not-so-simple) django app that wants to store some objects, then django-orm works just fine. alternatively, if you want a python interface to sql (ie you want to construct sql queries in an embedded dsl) then sqlalchemy is beautiful.

and if you're a guru who wants to use django with orm, but with more control, then you can also use sqlalchemy; but for most django users it's overkill.

more interesting is the question: "what is sqlalchemy for java?". the closest i could find a while back was empiredb - they were great guys, very helpful, but it felt like a very small project. is there anything else?


hibernate, which was pretty influential on sqlalchemy.


bobbyi: It seems that your comments are autodead for no good reason. From what I understand you can mail with info@ycombinator.com to correct this situation.


looks like you've been hell-banned, bobbyi [for people who don't have showdead enabled, they suggested hibernate].

my understanding of hibernate is that philosophically hibernate is more like django-orm than sqlalchemy, in that it doesn't embed sql in an dsl (you either map objects directly or use hql in strings). the kind of think i was meaning by "sqlalchemy in java" was something more like c#'s linq, or the scala equivalent(s?).

but i may be wrong, as i haven't used hibernate much. in which case please educate me...

[some context for java people: sqlalchemy will let you define an sql schema in python, then create the database from that, and then let you construct - again, in python (objects and chained method calls) - sql queries against that database. so it's more like "sql in python" than an orm, although it does also support orm if you want (and i suspect because of that sqlalchemy's orm is likely closer to ibatis than hibernate).]


Well DSLs aren't very feasible in Java due to the language, so they came up with HQL instead - the goal is the same, a rapid way to describe SQL statements in terms of objects. In my view, there's not a huge difference between parsing HQL into a tokenized structure, versus building the tokenized structure directly using a generative system like `query(class).filter(class.foo=='bar')`. They do also have a slightly more DSL-like thing called "criteria" queries, which are of the "criteria.add(Restrictions.like("name", "ed"))" variety, a far cry from linq though.

But an ORM is about a lot more than just building SELECT queries. Hibernate has a Session (SQLAlchemy copied this name from Hibernate), built on top of similar unit of work/identity map patterns - it also uses pretty much the same notion of transaction integration (that is, flush() any number of times in a transaction, including before queries, then commit() at the end). SQLAlchemy originally copied the name for their "save_or_update()" method, until taking a cue from Storm and renaming it to "add()". The merge() method in SQLAlchemy is directly from Hibernate, to the point where I read its source to determine its exact behavior in some edge cases.

The notion of "cascades" and the options it accepts are almost identical to Hibernate. There's the fact that collections are represented persistently in memory once loaded, rather than doing a "load on every iteration" kind of pattern that it appears all the other Python ORMs do, as this was the pattern that seemed more familiar to me as well as a lot more efficient in a majority of cases.

Collection loading via joins or a second query that loads for a full set of parent objects at once is pretty much inspired from Hibernate, though I sought to dramatically improve Hibernate's behavior here by getting "joined eager loading" to work totally transparently, including if the parent query had LIMIT/OFFSET on it.

Originally we had also copied this Hibernate concept called "entity name mappings" until realizing it was a terrible idea not at all relevant to Python. The horizontal sharding extension, which I've always viewed as just a proof of concept/experimental thing, was designed pretty much verbatim from Hibernate's similarly little-used API...people were really bugging me for a transparent, horizontal sharding system for awhile, even though they still seem unwise to me, so I figured copying their idea was better than just totally guessing how to do it.

Overall, SQLAlchemy can be seen as a product largely derived from Hibernate in Java and SQLObject in Python, rounded out with various patterns in "Patterns of Enterprise Architecture", and later on from the Storm ORM as well.


Isn't Hibernate SQLAlchemy for Java?


half-joke: sqlalchemy with jython?


For most web applications peewee will be just as nice as Djangos ORM.



The only thing I really want is a good and easy way to use SQLAlchemy to manage a change-set as well for my schema so that I can upgrade/downgrade at will. Something like what Rails has built-in would be awesome.

Is something like that already out there? Something that works well and can easily be integrated into a project?


There's SQLAlchemy-Migrate as someone mentioned, and I've also recently released Alembic (https://bitbucket.org/zzzeek/alembic) which is already being used in some production environments. Alembic includes semi-automatic version generation in a similar way as South for Django does.



There is a lot to be recommended in Django. However, there are some infuriating limitations. I would love a version of Django which was based on SQLAlchemy, and which supports all the fun things Django users love like admin and South. We'd have the best of both worlds.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: