I think this is a pretty niche demand and probably another topic for r/DataHoarder but anyway, here I am.

I created this application to basically have a way to store my WhatsApp messages away from the Google/Meta servers. Or at least not depend so much on Google backup.

Whatsapp has a very limited export functionality, which any user can use through the app’s own interface. Once these messages and media have been exported, you can place them in a folder monitored by ChatVault, send them to an email monitored by ChatVault or upload them via the interface. Once ingested by chatvault, it will record the chat media on disk and save the messages in a database in a structured way. These messages can be accessed in a front end similar to a chat application.

It’s still under development, some things need to be improved (mainly the UI), it’s still far from ideal, it’s true, the way Whatsapp allows us to export messages is quite bad, which makes the entire process of exporting and ingesting it into chatvault quite coupled but it can still be useful for those who want to store their messages independently, just like I wanted.

https://github.com/vitormarcal/chatvault

Edit: add an application interface image

The UI still needs some work, but it serves the purpose

  • teslawhytho@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Please don’t take this as criticism, this is a great idea and I fully plan on contributing to the codebase.

    With that said I spent a few hours trying to get it to work. No luck. Docker, no. Docker compose. No.

    I took the code and built / run manually. That worked but then I couldn’t import a chat. I tested with one line with no attachments. From just that one line, here are the problems so far:

    • it doesn’t seem like WhatsApp has a standard way of exporting the text file. Your text file and my text file are different. In the US the format is [datetime] name msg. In your file it’s different and so it breaks the moment it hits the [.
    • unfortunately not accounting for locale. US stupidly uses mm/dd/yy. You have hardcoded the formatter for dd/mm/yy. Maybe you need to have a locale selection in the UI before import. Without that, no US messages are coming in.
    • it doesn’t account for 6/6/23. It’s expecting 06/06/2023. Again formatter and padding can fix that.
    • ui creates an entry in chats table for every attempt regardless of if a message was imported.
    • exceptions in other languages.
    • missing tests for the stuff above

    Again none of that is supposed to be criticism. This is a great idea and I fully intend to help out with it.

    Good job!

    • NeoJackOfBlades@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Hello, criticism is certainly welcome!
      If you can open an issue on github it will be easier for me to follow, as I may not see these comments.
      About the message date, you are right! Until I divulge the project, I was the only user so I didn’t know it could have multiple types of message date formats, then I developed for a specific format (which isn’t even uncommon).
      Someone had already warned me about this and so I started to develop something that could format the date message in a rigth way. It’s almost ready, but unfortunately, there are dates that
      end up being ambiguous like 01/01/2023 and it’s not possible to infer the correct format, so I’ll probably have to create an environment variable for that, which I really didn’t want.
      If you look on github I opened a bug issue for this (although maybe it’s not really a bug but rather an improvement, because it works but with a specific format) and in the Github’s Projects part it’s already under development.
      Regarding the duplication of data, I mentioned it in a comment here on this post, but perhaps I should have made it clearer. Anyway, as I said in the post, the project is still far from ideal, despite it working very well for my use case (every import I do is always new messages).
      Anyway, I know this is a very important point, so I created a way of deduplication considering the last message in the database as a cutoff parameter. This is already in the latest version of the docker image.
      Regarding docker, more information about the error is welcome, you weren’t the first person to talk about it but I couldn’t replicate the problem. I tested it on Fedora and the Ubuntu server, I built it locally, I pulled it from the Registry, did docker system prune --volumes -a, and it still worked as expected.