It's worth noting that a lot of the early database designs, including this 2018 video pre-date some dramatic improvements to dynamodb usability.
I think the biggest ones were:
- an increase in the number of GSIs you can create (Dec 2018) [1]
- making on-demand possible [2]
- an increase in the default limit for number of tables you can create (Mar 2022) [3]
I don't think these new features necessarily make the single-table, overloaded GSI strategy that's discussed in the video obsolete, but they enable applications which are growing to adopt an incremental GSI approach and use multiple tables as their data access patterns mature.
Some other posters have recommended Alex DeBrie's dynamodb book and I also think that's an excellent resource, but I'd caution people who are getting into dynamodb not to be scared by the claims that dynamodb is inflexible to data access changes, since AWS has been adding a lot of functionality to support multi-table, unknown access patterns, emerging secondary indexes, etc.
Something else important to mention is that dynamodb now re-consolidates tables.
This is a lousy explanation, but Read/Write quota is split evenly over all partitions. Each partition is created based on the hash-key used, and there's an upper limit on how much data can be stored in any given partition. So if you end up with a hot hash-key, lots of stuff in it, that data gets split over more and more and more partitions, and the overall throughput goes down (quota is split evenly over partitions).
I believe this is still a general risk, and you need to be extremely canny about your use of hash key to avoid it, but historically they couldn't reconsolidate partitions. So you'd end up with a table in a terrible state with quota having to be sky high to still get effective performance. The only option then was to completely rotate tables. New table with a better hash-key, migrate data (or whatever else you needed to do).
Now at least, once the data is gone, the partitions will reconsolidate, so an entire table isn't a complete loss.
This bit me badly - An application that did significant autoscaling, and hit a peak of 30,000 read/write requests per second - But typically did more like 300.
The conversation with the Amazon support engineer told us that we had over a hundred partitions (which even he admitted was high for that number), and so our quota was effectively giving us 0 iops per partition. This obviously didn't work, and their only solution was "scale it back up, copy everything to a new table". Which we did, but was an engineering effort I'd rather have avoided.
People don't need to be scared they just need to do their homework.
In my opinion having more tables and more GSIs available won't help you very much if you started with flawed data model (unless you kept making the same design mistakes 256 times). A team that tries to claw back from a flawed table design by pilling up GSIs is just in for a world of pain.
So if you are planing to go with Dynamo:
- Read about the data modeling tecniques
- Figure out your access patterns
- Check if your application and model can withstand the eventual consistency of GSIs
- Have a plan to rework your data model if requirements change: Are you going to incrementally rewrite your table? Are you going to export it and bulk load a fixed data model? How much is that going to cost?
I think the biggest ones were:
- an increase in the number of GSIs you can create (Dec 2018) [1]
- making on-demand possible [2]
- an increase in the default limit for number of tables you can create (Mar 2022) [3]
I don't think these new features necessarily make the single-table, overloaded GSI strategy that's discussed in the video obsolete, but they enable applications which are growing to adopt an incremental GSI approach and use multiple tables as their data access patterns mature.
Some other posters have recommended Alex DeBrie's dynamodb book and I also think that's an excellent resource, but I'd caution people who are getting into dynamodb not to be scared by the claims that dynamodb is inflexible to data access changes, since AWS has been adding a lot of functionality to support multi-table, unknown access patterns, emerging secondary indexes, etc.
- [1] https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-dy...
- [2] https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-n...
- [3] https://aws.amazon.com/about-aws/whats-new/2022/03/amazon-dy...