The r/datahoarder subreddit started archiving and working together to get the newest (Jan 30th, 2026) DOJ drop of the Epstein files out there. However, sometime the next day, users and posts started getting deleted - first by the mods, and then by Reddit themselves.

Here’s what they started: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Short version: most of the datasets are now archived and available with magnet links / as torrents. See https://lemmy.world/post/42440468 for a list of that info.

Dataset 9 still remains partially archived, with two larger, partial versions floating around: one with 49MB / 180MB, and one with 101MB / 180MB.

Neither are verified, and neither have been comprehensively reviewed to see what’s missing. Yet.

The purpose of this community is to figure out a system to ensure that:

  • we have all the files, and
  • the files are made available in perpetuity in case the DOJ decides to remove some or all of them.

I suspect the answer will be that we need to manually go through each page to ensure everything’s been archived. For instance, some pages have PDFs that are merely blank, or state they’re a placeholder. But users have discovered that if you change the filetype, you may find the file is an mp4, or m4a, or all manner of audio, video and image files.

Others have discovered that if you go through the pages manually on the DOJ site, you’ll often see a PDF with the same filename as a jpeg or video file. This most likely means the matching filenames are a placeholder page + a different file type. Great if you’re trying to download a few files, not so great when there could be over a million files and you want to automate the process.

So here’s my suggestion: we come at this from two angles.

One, we use dataset 9 list of files (compiled by a r/datahoarder user and archived in the Internet Archive - note, this is a very large txt file https://dn721809.ca.archive.org/0/items/dataset9_url_list/dataset9_url_list.txt), and basically check off each file, as it’s archived. Any suggestions on how to do this are welcome.

Secondly, we manually go through each page of dataset 9 on the DOJ website, and download each one individually to ensure nothing’s missed. We create threads for every 100 files, so that people can jump in and help, as well as share where they’ve archived each page of files.

Feel free to use this thread to discuss ways to go about this, or share what others are already doing.

  • blepblapbeepOPM
    link
    fedilink
    English
    arrow-up
    1
    ·
    23 days ago

    Having now manually worked through the list of URLs, it starts at zero (0) and goes to URL 3319 (page 3320 in pagination). As there are 50 links per page, this means there should be 3320 pages, and 166000 files released, in total.