"a lot of money to change the google results for Github ICE" probably sums this up.

Still, I have thoughts.


If you're doing a very long term software archive, you might want to think about bootstapping.

guix has. guix will not be included, because they have the good taste to not use github. (Well, I found a 4 year old mirror of it on there.)

Show thread

Debian won't be included either, same reason. Maybe there's some linux distro that will be, as a coherent whole, not a vast array of random repos and forks?

Show thread

I'm still waiting BTW, for Software Heritage to provide a way for a git repo not on github to be archived.

I've been waiting for years.

Currently the only way I have to get code into their archive is to get the code into Debian.

· · Web · 3 · 2 · 5

While the Internet Archive does have good APIs, it's still obnoxious that both these orgs are privelidging the github pipeline, thus providing one more incentive to use github.

Show thread

Actually I found this

But when I put in a git repo from git.joeyh.name, I get "the provided origin url is blacklisted". Wha?

Show thread

hmm, maybe it says that if that's already on their list of domains that they fully archive. It also says blacklisted when I put in a salsa.debian.org git repo.

Show thread

Some people working for softwareheritage are on the fediverse cc @zacchiro @david

hi @joeyh
I see no reason why you get such a 'blacklisted URL' response... Archiving requests that do no match our list of blacklisted ones (which your domain seems not be one of, for what I can see) are for now granted by hand. Meanwhile, I'va added a listing task for your cgit so all git repos there should be loaded some day (no guaranteed ETA however).
cc @arthurlutzim @zacchiro

@david @arthurlutzim @zacchiro I mean, it's nice I can complain to people I know and maybe get my stuff archived eventually

while meanwhile anyone who drinks the github koolaid can get anything archived instantly

@david @arthurlutzim @zacchiro looks like you're both dependant on and facilitating proprietary software to me

@joeyh @david @arthurlutzim
hi Joey, thanks for your feedback here !
(brief thread follows addressing the various points you've raised)

@joeyh @david @arthurlutzim
I remember well your "you're facilitating proprietary services argument", which I think it was at LibrePlanet a few years back.
You were absolutely right!, and your argument struck me. It had a big influence in us deploying the "save code now" service at the beginning of this year: softwareheritage.org/2019/01/1

@joeyh @david @arthurlutzim
that service is meant to enable everyone to save *any* VCS (among those whose protocol we currently support). There is a mod-q in front of it, but the way it works is that the first time we see a domain, we'll review it, and can whitelist *all* repos hosted there.

@joeyh @david @arthurlutzim
If you get the message of being blacklisted it can be a bug in either the mod-q management, or possibly a simple "bad message" saying you're blacklisted while in fact we simply haven't processed your request yet (to whitelist it).

@joeyh @arthurlutzim
if you can give me—or, even better, @david :-) — more details, we'll be happy to investigate, because it really sounds just like a bug.

And, to be clear, I'm not giving special attention to this request of yours because we know each other, it's just the first time we receive a bug report about the issue. (It might be a new issue, or just that we have few users requesting saving of repos not on the forges we crawl.)

@joeyh @arthurlutzim @david
all these things standing, I'd love to hear if (and if so, why) you still think we're encouraging using proprietary services, because we definitely do *not* want to do that

for example, curl archive.softwareheritage.org/a shows there are no save requests yet for that repo. When I paste the repo url into archive.softwareheritage.org/s it says it's blacklisted.

Posting to the API successfully queues it, so I think the form is broken.

@zacchiro problem seems particular to firefox, it works in chromium.

Actual error from the API endpoint when firefox hits it is:
{"detail":"CSRF Failed: Referer checking failed - no Referer."}

@zacchiro my firefox has network.http.sendRefererHeader;1 (unsure why maybe a privacy setting sets that), this prevents it from sending Refer except when clicking on links.

@joeyh thanks for the additional details, we'll forward to our web dev and make sure it also works with Firefox out-of-the-box

@joeyh @david @arthurlutzim

I'm also curious at why you say "dependant on" proprietary software here. If you mean because we have a copy on Azure, yes, we do (because each additional copy is good!), but we also have an independent copy on premise at Inria, deployed using only Free Software.

If you mean something else, I'd like to hear about it and, if it's actually a real issue, fix it.

@zacchiro you're largely relying on whatever github does to moderate content, to allow you to ingest all the content from them, without further moderation.

@joeyh it is true the moderation standards are different. But FWIW is not only GitHub, there are a bunch of other source code providers we take content from without moderation, and many of them are entirely built/deployed free software only (Debian, PyPI, GitLab [for CE instances], etc.) →

@joeyh → the reason we don't mod by default those is that software there already undergo either some curation (for distros) or has chances to be hit by takedown requests (or similar) *before* we archive stuff there. →

@joeyh → The day we'll have a much larger legal team we'll be happy to remove moderation also from user submitted "save code now" requests.

It's essentially a capacity issue and where it is most important to invest human resources. Right now we're prioritizing expanding archive coverage and archive features. YMMV (and that's fair).

@joeyh I gave a eulogy a few years ago where I explained to the family that the software of the deceased had landed in Debian, and what that meant.

Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!