@schmittlauch @lain

𝐼𝒻 𝐼 π“‚π’Άπ“Ž 𝒾𝓃𝓉𝑒𝓇𝒿𝑒𝒸𝓉

I really wish DHTs would be removed from the literature because they seem so good but they have so many unsolved problems that they trick people into designing systems that can't be safely implemented.

1. DoS -> If you're hashing the /64 of IP6 space then I need to get a /48 (trivial), getting a /32 is not so trivial but it's not really hard. Domains are actually probably less bad than /64s

@cjd @schmittlauch do you know of anything else that could solve the problem here?

@lain @schmittlauch
My instinct is to do repeated searches with dns-like TTL caches. So forwarding a search to the entire fediverse is obviously pretty horrible, but nodes who didn't recently use a tag shouldn't be bothered with the question.

So perhaps use a pubsub which allows you to subscribe to recent tag activity, for example "I have messages from the past 1 minute which use the tags" [ .... ]
Then you can limit the number of nodes you have to search...

@cjd @lain Searches alone don't provide any subscription functionality, as having to poll for posts will just overload the network at high interest for a tag.
Furthermore, many use cases mandate post delivery to happen at least close to real-time. This wouldn't be possible with TTL-flooding at all.

Regarding flooding, I hope you know how "well" Gnutella worked?

@schmittlauch @lain
My idea is definitely not well thought out, but the reason why I tried to make it work on top of querying/pubsub is because when someone is running a shitty server (freefedifollowers), you can just block the server and be done with it, whereas in a DHT you would have to get everybody onboard.

@cjd @lain Well, the operator of the responsible relay can still block some instances (whether it should is another question).
In my design subscribing/ querying instances just receive the post IDs and can still decide not to fetch all these posts from the originating node, based on the URI.

@schmittlauch @lain
What about a fully replicated gossip net? I mean even the bittorrent DHT isn't really a DHT anymore, there are nodes which are just storing the entire dataset (and forwarding it to trackers and answering DHT requests, and acting as a sybil net)

@schmittlauch @lain
So thinking about it a bit more, I think what I'd do is the following:
1. Gossip all of the data in order to reduce the load
2. When you get an update about server X from server Y, next time you want to learn about server X, ask server Y again (unless server Y goes down, in which case you switch)
3. Publish the chain of servers through-which the updates from server X reached you
This way you can do blacklisting and whitelisting which is resistant to fakery.

@cjd @lain
> 1. Gossip all of the data

What do you mean with "all of the data"? Apart from the hashtag posts assigned to an instance by the DHT as a relay or storage node, instances shall only receive the posts they subscribed to or queried. Otherwise we'll get an overload-causing "all or nothing" situation for smaller nodes like with current relays.

2. & 3.: So we're building ourselves a multicast tree by learning, right? That is a possible approach, but how does it perform better in ->

@schmittlauch
Everything -> everything you proposed to put in the dht (I like your idea of a message id only)

re performance, nodes can set a preference number to indicate how much they want you to pull from them so you tend to build a tree with hub nodes that can handle it. Ofc each update from node x should be signed and time-stamped by node x so it can't be tampered with.
@lain

@cjd
> Everything -> everything you proposed to put in the dht

Still not sure what you mean.
The DHT part assigns responsibility of handling a bunch of hashtags to an instance. All other hashtag posts don't have to reside on that instance if it doesn't deliberately fetch them/ subscribe to them at another instance out of interest.

->
@lain

@cjd

So either instances can fetch all posts of all hashtags they can get a hold on, which would benefit dissemination. Or they just take what they're interested in themselves, which'd decrease the probability of finding posts of unpopular hashtags.

@lain

@schmittlauch
I can write what I'm thinking on a pad tomorrow morning if you think it is interesting, at least then it would be properly written rather than me phone posting bits and pieces of how it could work...
@lain

@cjd
Yes, feel free to write a consistent wrap-up of your ideas. Although I'm a bit sceptical, it will widen my thoughts on other approaches and structures, I'll definitely read it.
@lain

Follow

@schmittlauch @cjd @lain Just wanted to say that I'm happy to see this discussion happening here. We effectively need to solve the same problems for Datashards, so... timely!

Β· Web Β· 1 Β· 0 Β· 1

@cwebber @schmittlauch @lain
Ok, I think that more or less covers it, please let me know if you have any questions or if I'm doing something wrong...

@schmittlauch @cwebber @lain @cj @yaaps
Thanks for giving it such a thorough consideration !
We can continue the conversation in gist comments...

@cjd @schmittlauch @lain Looping in @emacsen. This is exactly the same stuff we need to figure out for how to distribute Datashards content across the fediverse. Can we collaborate on this vision?

@cwebber @schmittlauch @lain @emacsen
My suggestion is just that, a suggestion. I think it provides rich tools for dealing with bad actors but that's just me...

The only table I really want to bang my fist on is please don't slip on the DHT-banana-peel.

AFAICT the only protocol using DHT at scale is bittorrent (are there others?) and their usage is very unique. I would argue that in their usage it's a motte and bailey.

@cjd @schmittlauch @lain @emacsen I am getting the feeling that gossip networks are closer to what we want than xmpp, yeah. But I don't actually know how to build them.

I'm really glad we have more people taking interest; this is the part I know very little about. It would be good to have other people step up and take leadership in this area.

@kaniini @emacsen @cjd @lain @schmittlauch That's good news. Horray! Now we need to figure out where to coordinate this work... we have on freenode, but I'm getting the sense we need to do something more long-lived.

A few options:
- The W3C Credentials CG might be interested in picking it up w3c-ccg.github.io/ and we could use their calls, mailing lists
- We could maybe coordinate it on socialhub.activitypub.rocks once we have it up. Thoughts @how ?
- Something else?

@cwebber @kaniini @emacsen @cjd @lain
@schmittlauch

I have setup integration with GitHub so we can go back and forth with the W3C workflow. We can definitely dedicate a space on SocialHub to address this

@cwebber @schmittlauch @lain @emacsen
The bailey in this case is the trackers. They work really well, they're fast, they're centrally administered so if something goes wrong, someone can deal with it.

But if the baddies threaten to take down the trackers, the bittorrent people say "ohh you are fools, you can take down the trackers all you like, we have <drumroll> The DHT", and that's true, if the trackers go down, the network will continue to function.

But then you have the DHT attacks...

@cjd @schmittlauch @lain @emacsen Do you think hosting such things over tor .onion services or I2P helps? Makes it harder to take down nodes. But OTOH, I'd also love to be able to use the fediverse servers we already have to distribute content without setting up separate daemons necessarily (I'm guessing that's where the Pleroma devs plan to take things)

@cwebber @schmittlauch @lain @emacsen

I think it would be really cool if actually everything was gossiped, so then the fediverse could cross network boundaries (some nodes in tor, some in i2p, some in Hyperboria, some in China), but that's just a dream and the bandwidth to move media around makes such a thing untenable.

@cjd @schmittlauch @lain @emacsen Also, there are two things that can be gossiped:
- who has the content
- the content

presumably we'd do the former, but occasionally as part of the system "grab" bits of interesting stuff?

@cjd @schmittlauch @lain @emacsen Another thing I'm really unsure of:

Should a node, once it has content that is "important" to it (eg, let's say my node containing this very post) continue to hold onto it and respond to queries asking for content?

On the one hand, this helps important content survive. On the other hand, it helps reveal who has the content.

I wonder if we can make progress on this without going full-freenet ;)

@cwebber @schmittlauch @lain @emacsen

Having the originating server store the content and other servers only "cache" it makes logical sense because the originating server is the one which has the direct relationship with the person who created the content (who is probably the relevant data-subject).

@cwebber @schmittlauch @lain @emacsen
Interesting question: Why not just gossip ALL public messages between nodes ?
This solves:
* Hashtags
* Groups
* Full text search

@cjd @cwebber @lain @emacsen As you're already starting the discussion before I had time to read your proposal thoroughly:
Do you intend to store-and-forward messages through all nodes on the path, or is gossiping just used for discovery and delivery is done directly routed?

@schmittlauch @cwebber @lain @emacsen
I had sketched it as a store-and-forward of the message URLs, but perhaps we find it worth store-and-forwarding the content as well (?)

@schmittlauch @cwebber @lain @emacsen
Also I don't want to come off as rushing a solution.

You have a strong interest in this, evidenced by the paper you put significant time and effort into, I'm occupied by other things and I only have a marginal interest in making the fediverse more flexible in how it deals with attacks.

At this point your proposal is more standards-ready than mine, yours has a champion (you), mine doesn't because I don't have the time.

@cjd @cwebber @lain @emacsen I'll try to read your proposal as soon as possible. I like your enthusiasm and you quickly getting onto things, but am also a bit appalled by how quick you put together an alternative suggestion and people discussing it.

I need to remember my considerations for *not* building gossip (I didn't know that term back then) trees 6 months(!) ago πŸ˜…

@schmittlauch @cjd @lain @emacsen At least conversations are starting and people are excited!

Both of you have made proposals; nobody has gotten to implementation yet, so it's ok, there's still plenty of time for us to unpack and discuss.

@cwebber @emacsen @cjd @schmittlauch i think having both dht and gossip wouldn't be too bad, just like bittorrent has dht and trackers. i think instances will be here for quite a while so we don't need to go full p2p all the time.

@lain @emacsen @cjd @schmittlauch I think one thing that happened at APConf is that a lot of us started to get excited about the viability of bringing Datashards to the fediverse. It seems to me that the Pleroma team is looking to take leadership here, and that's really great and increases my confidence.

We didn't have Mastodon devs at the table when these conversations occurred; eventually we want to start looping in @gargron and @nightpool and others about what we're thinking.

@cwebber @nightpool @Gargron @schmittlauch @cjd @emacsen i mean, that's what the discourse forum will be for :) right now it's just bouncing off some ideas.

@lain @nightpool @emacsen @cjd @gargron @schmittlauch Yes you're completely right. I hear some jerk needs to set up the DNS so everyone can start using the Discourse forum ;)

@lain @emacsen @cjd @schmittlauch @gargron @nightpool I also want to say that we want to be careful about rolling this stuff out in testing stages; Datashards is still in flux and *will be shaped by* the participation of the fediverse. We want to be careful about not rolling it out completely to the wider fediverse before we're sure about how it works.

@cwebber @lain @emacsen @cjd

Unfortunately I mustn't get too excited as I still have exams and other projects to do /0\

Remember @lain & @cwebber, not everyone works full-time on Fediverse stuff :P

@schmittlauch @lain @emacsen @cjd That's true and full ack (and empathy) there! Your opinions and review are still valuable though :)

@cwebber @schmittlauch @lain @cjd I don't remember is @cj is on this thread. There are too many threads!

I also have a call with someone from Arne from Freenet on Monday.

Will Discourse help keep things easy to follow? I'm finding this challenging.

Show more

@cjd @schmittlauch @lain @emacsen The good news also is that we don't have to do the 100% best thing initially; your statement here is *at minimum* an extremely good starting place and is way better than how the *current* fediverse distribution works. We have the advantage that Datashards doesn't specify the routing algorithm and can compose with multiple approaches, so we can tweak that later.

@cwebber @schmittlauch @lain @emacsen
But, like a rabbit has two holes, the bittorrent people can weather an attack on the DHT because everything will simply fallback to.. the trackers.

Then there is a third actor which a sort of hidden tracker. Back in the day, they were running what were effectively sybil nets in the DHT which were "good sybil nets" that were answering requests just like a traditional tracker.

This system shouldn't be derided, it won a war. But opacity was a big part of it.

@cjd @emacsen @lain @schmittlauch @cwebber what sorts of attacks on bittorrent DHTs are there, and how reliable are they at effectively keeping torrents suppressed?

@cwebber @cjd @schmittlauch @lain @emacsen I also mentioned a Chord-like discovery/routing algorithm for content addressing data, nodes appearing and disappearing, and with as much or little data redundancy required, but it was during the busy rush and I don't have great internet atm

@cj @cwebber @cjd @schmittlauch @lain @emacsen
I remember that. Chord turned into a very productive starting point for me looking into the characteristics of routing topographies. Tx

I'm trying to equip myself to express these ideas clearly, because I'm convinced that community engagement is essential to uptake, and we need to democratize this vocabulary for that to happen

What are the families of routing protocols that share common characteristics - DHTs, gossip, multicast, etc.? Which ones are decentralized, rather than distributed, and why? This community has reasons for being decentralized rather than distributed. How does a given architecture address those reasons?

How do you define message scopes? Is there a way to define public scope that will lead to people with similar opinions as me discovering my post in greater numbers than those opposed to my perspective? This takes labor from the community, how is it different from what we have now and why is that important? Forget what's possible for skilled attackers for 30 seconds, how do marginalize people dealing with Basic Becky Bigot benefit from a given feature?

@cj @cwebber @cjd @schmittlauch @lain @emacsen
I should've read the paper before the thread :newlol:

In the FUD about DHTs, I missed the critical point that this paper isn't addressing general delivery or arbitrary retrieval of messages in the public scope, which many actually want to be something a little less public, but tagged posts, which is (currently) a signal that the poster is looking for broad discoverability

With these considerations in mind, the abuse profile is minimal. Disruption of the network means a fallback to the status quo. Targeted disruption of an instance means that the instance drops from the DHT network

There are 2 differences between this and my "n-dimensional hyper-torus" thread recently. Besides the fact that this paper is coherent, I would add that instances should should participate in a Chord for each hashtag. This may result in multiple networks around a given tag, e.g. loli, as those given to opinions on some topics may also given to sharp disagreements

The strength of the fediverse is social discovery along affinity groups. An intellectually rigorous proposal to create "one ring" for hash tag discovery might encounter resistance where a more accessible document describing a *slightly* more complex proposal that clearly shows the participation requirements continuing to scale with the size of the base would be better received

The only weak point I noticed in the analysis is that instance size has a long tail of small instances and the assertion "Storing 24 GiB of data for a year is manageable for a single node," is erroneous. If the storage requirement was commensurate with participation in hash tag usage, then a single user instance on lean hardware would be more consistently able to participate

@cj @cwebber @cjd @schmittlauch @lain @emacsen
If I'm (finally) reading the paper correctly, the suggested topology is that each node has a predecessor and a successor in each of two DHTs, where each DHT has a separate realm of function in a single network where the content is addressed by a hash of the tag. It's a toroid

The topology I'm suggesting is 2 or 4 relations per hash tag where posts are addressed by a content hash

@yaaps @cj @cwebber @cjd @lain @emacsen
Thx for the feedback.
I don't really see why you'd want to create a new Chord ring of multiple nodes for each hashtag: How do you use the key-value lookup capabilities of a DHT on just a single hashtag?
Regarding the resilience against deliberate blocking of a tag: Relay/ storage instances don't have to store the (questionable) content, but just the post IDs. ->

@yaaps @cj @cwebber @cjd @lain @emacsen But you're correct, they could possibly decide to drop all posts of that tag. But at least there is a basic level of redundancy per tag.

The 24GiB calculation might be a bit misleading: This is the required number for one of the *largest* hashtags und was just used for considering hash tag splitting necessary or unnecessary.

@schmittlauch
Thanks for taking the time to make considered and well thought out responses when you've already invested considerable labor on this. It's after midnight here and I feel that I owe you the same courtesy πŸ‘

@schmittlauch @cj @cwebber @cjd @lain @emacsen
The perception of the Fediverse as having a shape where a ring topology would be a natural choice for routing is highly local. Not only do instances grow towards others with affinities, many aggressively prune unfavorable connections. Unfortunately, this proposal isn't sufficiently agnostic to content for those instances to co-exist despite their antagonisms (cont'd)

@schmittlauch @cj @cwebber @cjd @lain @emacsen
While the network is sufficiently robust against technical attacks, it is wide open on common social vectors. When technical tags are allocated to justice-oriented instances, many will feel that it is not only acceptable, but a moral requirement, to avoid replicating messages and index entries to/from instances that don't conform to their expectations of conduct (cont'd)

@schmittlauch @cj @cwebber @cjd @lain @emacsen
The most common abuse of tags is highjacking. It's fairly common for interests operating in opposition to have turf wars over a hash tag. While this is sometimes necessary and always difficult to prevent, it would be helpful if posts representing the interests in these contests were routed to minimize interactions between the combatants (cont'd)

@schmittlauch @cj @cwebber @cjd @lain @emacsen
That's where we start looking at the requirements for a network of consent and consider what topology can accommodate those requirements

You need multiple interconnected networks with a division of work that preserves the social affinities of the networks involved and scales with instance size. Instance level blocks should not create undesired behavior and we have to anticipate that average packet sizes will grow with Datashards, Pixelfed, federated blogs, and gaming platforms entering the Fediverse

The requirement for multiple networks can be intuited from graphs of the existing relationships in the fedi or derived from the definition of consent in that meaningful consent cannot be determined in the absence of viable alternatives

@schmittlauch @cj @cwebber @cjd @lain @emacsen
Here's the link to my "n-dimensional hypertorus" post. It's part of a thread, but the only required context for understanding that post is that I'm describing idea spaces using polar (actually hyper-spherical) coordinates

banana.dog/@yaaps/102716695692

I didn't set out in this thread to promote my own idea. The paper is thorough. It takes a good idea and presents it well, but it's disjoint to the needs of the community. That can be reconciled by rotating the coverage 90Β° and iterating the pattern over local subsections

@yaaps @cj @cwebber @cjd @lain @emacsen
I understand your point, but I'm afraid the outcome of your ideas might be too far off from my goals for global hashtags:

> Accessing an object requires knowing it's identifier, possessing the decryption key (if encrypted), and having a relationship with an affinity group homing the object

This sounds more relevant to groups, where like-minded folks are interacting.
But just imagine something like #MeToo being limited too affinity groups: ->

@yaaps @cj @cwebber @cjd @lain @emacsen

Criticising abuse of power and sexism would've never made it out of the feminist filter bubble (affinity group)!

Sign in to participate in the conversation
Octodon

Octodon is a nice general purpose instance. more