Wow congrats on making it through with CometD and Bayeux. I think salesforce has realized that a lot of their APIs aren’t useful when using non standard approaches. I think the move from SOAP to also including REST was a signal that they’re trying to be more useful in this realm. I definitely agree a RabbitMQ sink would've been so ideal! I’m hoping it’s somewhere on their roadmap to make our lives easier.
That's awesome that you set this up, thanks for paving the way! Our users who want real-time syncs and are watchful of their REST API limits typically opt in for our streaming solution (I believe the pub/sub api was a recent addition to their change data capturing APIs). Too much polling definitely has its issues with limits as you mentioned, but we allow our users to set their own frequency to meet their needs.
If I'm understanding correctly, the scheduled jobs refer to the Bulk API (I agree it executes at seemingly random speeds). We only use the bulk api on the initial “seed”, where we write a large amount of data from salesforce to postgres. Otherwise, when it comes to reading/writing data, we stick to the REST API, which we’ve found pretty performant and which Heroku Connect seems to rely on, too: https://devcenter.heroku.com/articles/mapping-configuration-...
> Are you limited to working with customers that are paying Salesforce enough…
Yeah, right now we do require that users are on Salesforce plans that include API access, which are Performance, Developer, Unlimited, and Enterprise (or Professional w/ API add-ons).
I'm guessing the GP's scheduled jobs are running within Salesforce, probably Apex. I'd note that I've seen inconsistent async processing delays even in EE and UE clients. First of all, I'm pretty sure everyone is on shared infrastructure, and second, the delay is at least in part relative to the amount of recent processing.
Yeah definitely agree, we had to rework our logic for these cases. We've worked with ~4 Million row objects. In general with our polling strategy syncs, the problem was more the size of each record. So a table with 1M rows but 400 fields was way more problematic than 2M of just 5 fields.
Yes, you’re correct on that! Sorry for the lack of clarity on the docs. The snapshots would be stored in a client side table in these cases. These snapshot tables/collections can either be stored in your own PG or MongoDB.