Dump wechat messages from android

Yuxin Wu 1344589d8d request emoji from cdnUrl 8 năm trước cách đây
common 43cda58e71 move utils to common 9 năm trước cách đây
legacy 2d928222c6 update 9 năm trước cách đây
libchat 68b3f2e543 fix printing 9 năm trước cách đây
screenshots 7ee5312783 add screenshots 10 năm trước cách đây
third-party 8db2d6d11a use newer version of sqlcipher 9 năm trước cách đây
wechat 1344589d8d request emoji from cdnUrl 8 năm trước cách đây
.gitattributes e84fc52eeb add gitattr 9 năm trước cách đây
.gitignore 460a0b82d7 move sqlcipher to third-party 9 năm trước cách đây
LICENSE.txt 93c1eecc23 add license 10 năm trước cách đây
README.md 2bd26b6138 update script with argparse 9 năm trước cách đây
android-interact.sh 2fc7b060ff Merge pull request #24 from xrf/compatibility 9 năm trước cách đây
compatibility.sh 7136819979 Allow android-interact.sh to be run in another directory 9 năm trước cách đây
count-message.sh 2d928222c6 update 9 năm trước cách đây
decrypt-db.py 7763aecfa3 bug fix 9 năm trước cách đây
dump-audio.py e259a5e436 dump audio 9 năm trước cách đây
dump-html.py 1344589d8d request emoji from cdnUrl 8 năm trước cách đây
dump-msg.py 77191e1885 group message with proper handling of user name 9 năm trước cách đây
list-chats.py 2bd26b6138 update script with argparse 9 năm trước cách đây
plot-num-msg-by-time.py 8638b8d824 Update plot-num-msg-by-time.py 9 năm trước cách đây

README.md

Dump WeChat Messages from Android

导出安卓微信聊天数据

WeChat(微信), as the most popular mobile IM app in China, doesn't give users any method to export well-formatted history message. This tool can parse and export WeChat messages on a rooted android phone.

Right now it can dump messages in text-only mode, or generate a single-file html containing voice messages, images, emoji, etc.

NEWS: WeChat 6.0+ uses silk to encode audio. The code is updated.

NEWS: WeChat 6.3 uses a new avatar storage. The code is updated.

If this tools works for you, please take a moment to add your phone/OS to the wiki. If it doesn't work, please leave an issue together with your phone/OS/wechat version.

How to use:

Dependencies:

  • python-PIL
  • PyQuery
  • pysox
  • pysqlcipher
  • numpy
  • csscompressor (suggested, optional)
  • adb and rooted android phone connected to a Linux/Mac OS.
  • Silk audio decoder (included; just run ./third-party/compile_silk.sh)
  • gnu-sed

Get Necessary Data:

Note that commands involving ./android-interact.sh are meant to be run on the computer.

  • (Requires Linux or Mac) Get the decrypted WeChat database and the avatar index:

    • Automatic: ./android-interact.sh db-decrypt
    • Manual:

      • Figure out your ${userid} by inspecting the contents of /data/data/com.tencent.mm/MicroMsg on the root filesystem of the device. It should be a 32-character-long name consisting of hexadecimal digits.
      • Get /data/data/com.tencent.mm/MicroMsg/${userid}/{EnMicroMsg.db,sfs/avatar.index} from the device, possible ways are:
        • ./android-interact.sh db
        • Use your rooted file system manager app
      • Get WeChat uin (an integer), possible ways are:
        • ./android-interact.sh uin, which pulls the value from /data/data/com.tencent.mm/shared_prefs/system_config_prefs.xml
        • Login to web wechat, get wxuin=1234567 from document.cookie
      • Get your phone IMEI number (a positive integer), possible ways are:
        • ./android-interact.sh imei
        • Call *#06# on your phone
        • Find IMEI in system settings
      • Decrypt database, will produce decrypted.db:

        ./decrypt-db.py <path to EnMicroMsg.db> <imei> <uin>
        

    NOTE: you may need to try different ways to getting imei & uin, because things behave differently on different phones.

    Also, if the decryption doesn't work with pysqlcipher, maybe try the version of sqlcipher in legacy.

  • Copy the WeChat user resource directory /mnt/sdcard/tencent/MicroMsg/${userid}/{emoji,image2,sfs,video,voice2} from the phone's SD card to the resource directory:

    • ./android-interact.sh res
    • You might need to tweak RES_DIR in the script if the default doesn't work
    • This can take a long time. Some ways to do this faster:

      • If there's enough free space on the SD card, you can combine all the files via busybox tar without compression in the adb shell, use adb pull to copy the tar archive to the computer, and then extract it. BusyBox is needed as the Android system's tar may choke on long paths.
      • Alternatively, you can use pipes. This is slower, but doesn't require any free space on the SD card:

        # copy MicroMsg to the current directory
        adb shell 'cd /mnt/sdcard/tencent &&
                   busybox tar czf - MicroMsg 2>/dev/null | busybox base64' |
            base64 -di | tar xzf -
        

Run:

  • Parse and dump text messages of every chat (requires decrypted.db):

    ./dump-msg.py decrypted.db output_dir
    
  • List all chats (requires decrypted.db):

    ./list-chats.py decrypted.db
    
  • Generate statistical report on text messages (requires output_dir from ./dump-msg.py):

    ./count-message.sh output_dir
    
  • Dump messages of one contact to html, containing voice messages, emojis, and images (requires decrypted.db, avatar.index, and resource):

    ./dump-html.py decrypted.db avatar.index resource "<contact_name>" output.html
    

Examples:

See here for an example html.

Screenshots of generated html:

byvoid

TODO List

  • Search by uid/username
  • Faster way to copy a directory from android (I don't know..).
  • Fix rare unhandled types: > 10000 and < 0
  • Better user experiences... see grep 'TODO' wechat -R
  • more easy-to-use for non-programmers (GUI?)

Donate!

Paypal: [paypal]