I only know Mastodon: There's a large LGBTQ+ community who don't want to be indexed out of concern for their safety (because, for example, there are small far-right communities on other instances looking to cause mayhem). The ability to be oneself online without worrying about getting brigaded, doxxed, or whatever, is a selling point. Indexing runs counter to that.
Agreed, but is it a choice or a requirement from the software? I love choice, that instances (and users!) can and should get to choose the level of indexing. If it's software wide, not sure i can get behind that. Especially since .. doesn't ActivityPub sort of ruin that? Ie any instance you federate with could expose your data, regardless of Mastodon's rules.
Federated instance data visibility and agreed visibility rules of course come into play. But if i understand this right it would mean you can only federate with instances or software that agrees to your privacy rules. Which is totally viable, but that also then means Mastodon can't federate with, for example, a Reddit clone that wants to be indexed.
It's a complex web of requirements. Definitely something i'll be thinking on as i implement my own ActivityPub instance.
I don't know enough/anything about ActivityPub, but maybe different...apps? (Mastodon et al vs kbin/lemmy) could have different agreed-on rules. The forum-style ones would be appropriate in search results, so they could be indexed. People linking to Mastodon posts on lemmy might be unavoidable I guess, but there's at least a bit more separation.
One thing i do hope though is that if instances that want to hide content, or users hide content, that the instance also publishes that data during syncing. Which is to say, i want to write a custom instance but i don't want to mistakenly leak something someone wanted to be private.
In theory it's a well understood problem-space in the Mastodon stack, so hopefully they only federate/publish what they are okay with being public. But it's concerning regardless.