blepblapbeepM to The Epstein FilesEnglish · 25 days ago

Datahoarders started the process. We're here to finish it.

NSFW

1

2

Datahoarders started the process. We're here to finish it.

NSFW

blepblapbeepM to The Epstein FilesEnglish · 25 days ago

1

The r/datahoarder subreddit started archiving and working together to get the newest (Jan 30th, 2026) DOJ drop of the Epstein files out there. However, sometime the next day, users and posts started getting deleted - first by the mods, and then by Reddit themselves.

Here’s what they started: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Short version: most of the datasets are now archived and available with magnet links / as torrents. See https://lemmy.world/post/42440468 for a list of that info.

Dataset 9 still remains partially archived, with two larger, partial versions floating around: one with 49MB / 180MB, and one with 101MB / 180MB.

Neither are verified, and neither have been comprehensively reviewed to see what’s missing. Yet.

The purpose of this community is to figure out a system to ensure that:

we have all the files, and
the files are made available in perpetuity in case the DOJ decides to remove some or all of them.

I suspect the answer will be that we need to manually go through each page to ensure everything’s been archived. For instance, some pages have PDFs that are merely blank, or state they’re a placeholder. But users have discovered that if you change the filetype, you may find the file is an mp4, or m4a, or all manner of audio, video and image files.

Others have discovered that if you go through the pages manually on the DOJ site, you’ll often see a PDF with the same filename as a jpeg or video file. This most likely means the matching filenames are a placeholder page + a different file type. Great if you’re trying to download a few files, not so great when there could be over a million files and you want to automate the process.

So here’s my suggestion: we come at this from two angles.

One, we use dataset 9 list of files (compiled by a r/datahoarder user and archived in the Internet Archive - note, this is a very large txt file https://dn721809.ca.archive.org/0/items/dataset9_url_list/dataset9_url_list.txt), and basically check off each file, as it’s archived. Any suggestions on how to do this are welcome.

Secondly, we manually go through each page of dataset 9 on the DOJ website, and download each one individually to ensure nothing’s missed. We create threads for every 100 files, so that people can jump in and help, as well as share where they’ve archived each page of files.

Feel free to use this thread to discuss ways to go about this, or share what others are already doing.

Chat

blepblapbeepOPM
link
fedilink
English
arrow-up
1·
23 days ago
Having now manually worked through the list of URLs, it starts at zero (0) and goes to URL 3319 (page 3320 in pagination). As there are 50 links per page, this means there should be 3320 pages, and 166000 files released, in total.

The Epstein FilesNSFW

theEpsteinFiles

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !theEpsteinFiles@thelemmy.club

Want to help archive The Epstein Files? You’ve come to the right place. Dataset 9 still needs archiving in its entirety - this is the place to collaborate, plan it out, and get it out there. Before it gets deleted, altered or worse.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
1 user / 6 months
9 local subscribers
11 subscribers
37 Posts
3 Comments
Modlog

mods:
blepblapbeep