Conversation

Eniko | Kitsune Tails out now!

Do we need a project for archiving the internet archive?

9
4
0

@eniko yes! was just last week daydreaming about a bittorrent tracker that works in reverse to back up the whole thing by sending all the peers a little piece of it.

0
0
1

@eniko That would require like, almost as much money/resources as the Archive itself. But from what I've seen/heard (I was there today in fact lmao), they got a few parts of the sites online and are working on the rest.

People should absolutely donate though https://archive.org/donate

1
0
0

@eniko nearly everything on the archive has a .torrent file associated with it. if people took distributed seeding seriously, all you’d need is a copy of the index/website to browse for the torrent file- a somewhat smaller task than trying to backup the whole archive.

it doesn’t come without risks or problems of its own though

0
0
0

@eramdam yeah. Just, it feels like such a huge centralized point of failure. A bit scary given how much it's getting attacked (and I don't mean just the cyber vandalism)

1
0
0

@eniko yeah. Maybe there's already plans to make it less so, but also, from what ive heard from someone there. the data is stored in multiple locations and such so I wouldn't expect it to go "poof" in a single swing but dont quote me on that lmao

0
0
0

@eniko

Better redundancy and internet facing security would be a better idea I suspect.

This comes down to resources and funding. The larger players on the Internet should step up and offer more comprehensive support.

0
0
0
@eniko Yes! Getting sites *out* of the Wayback Machine is surprisingly challenging!
0
0
1

@eniko I'm working on it but I need more floppies

1
1
0

@foone if you stacked all the floppies you'd need to store the internet archive's data would it reach to the moon?

2
0
0

@feedmd @eniko

In the ccmp space, we need a curated and constantly updated directory of where to find things and people willing to keep redundant chunks of IA and other sites where only a single copy of things exists. I'm thinking a federation of topic specialists working on the catalog. This effort is way too big and the content too diverse to curate without federation.

The file distribution has worked with bitsavers for decades, the one thing I never attempted was the catalog. I had relied on search engines to ferret into the content, but over time, especially with LLM, they are now failing miserably.

I don't think anyone is going to pick up the ball archiving what is on IA outside of their knowledge domain.

0
0
0

Nope. Only about 1/3 of the way there if you've got the double sided double density 2.88MB floppy.

> The Internet Archive, as of January 2024, attests to have stored well over 99 petabytes of data so far.

https://en.m.wikipedia.org/wiki/Wayback_Machine

A 3.5" floppy is 3.3mm thick and at max could hold 2.88 Mb

https://retrocomputing.stackexchange.com/questions/25911/dimension-of-a-3-5-inch-floppy-disk

https://www.wolframalpha.com/input?i=%2899+petabytes+%2F+2.88+megabyte+%29+*+3.3+millimeter+

https://www.wolframalpha.com/input?i=distance+to+the+moon

@eniko
@foone

3
0
0

@JessTheUnstill @eniko @foone i was doing roughly the same math before i saw your reply - qalculate is great for this kinda stuff!

if you take a standard 1440kB DD 3.5" floppy, you get a storage density of 436.3 GB/km, so 99PB / ( 436.3 GB/km ) = 226 875km, about 2/3rds the way to the moon, and you'd need 68.75 billion of them.

of course, if you can only get your hands on billions of 720kB floppies, you blow right past the moon, but even stacking them on the long end gets you nowhere near mars

1
0
0

@andor @irina @JessTheUnstill @eniko it just so happens I bought a ls120 drive on Sunday! Perfect timing

0
0
0

@JessTheUnstill @eniko @foone And that's not even accounting for compression… of the floppies in the lower part of the stack into much thinner floppies

(decompression is left as an exercise to the reader)

0
0
0

@JessTheUnstill @eniko @foone If they can manage to get 1.68MB on a 1.44MB floppy, maybe we can get 3.36MB on a 2.88MB, further decreasing the stack size!

The radiation will probably do a number on the data integrity though, so we might need some redundancy.

1
0
0