bluedrag (bluedrag) wrote,
bluedrag
bluedrag

  • Mood:

LJ -> DW import: adjusting links

Let's say you decided to leave LiveJournal and to migrate to Dreamwidth. You do a full import of your journal to DW and everything looks fine and dandy, but there is one problem: if your LJ posts linked to each other, the import process would not adjust such links, and they still would point to livejournal.com. Clearly unacceptable! I wrote a quick and dirty script to fix that. Details below.

1) Get the ljdump.py script, put it in a directory.

2) In the same directory create a config file, ljdump.config:
<xml>
  <server>https://www.dreamwidth.org</server>
  <username>bluedrag</username>
  <password>password</password>
  <journal>bluedrag</journal>
</xml>
(substitute bluedrag with your own journal name, and password with your password).

3) Still in the same directory, run ljdump.py . It will create a subdirectory with the name of your journal and a full backup of all entries and comments. My script relies on the presence of that backup and the config file (ljdump.config).

4) Save my script to a file (fix_links.py) and run it in the same directory. It will go over your entries (but not comments) and will try to change all links to your livejournal posts (and tags) to the corresponding Dreamwidth links. It will ask before writing every entry back (but if you feel especially bold you can comment out the raw_input line).

Disclaimers: No warranties, and only tested on my journal. But it worked, and I feel great about the result.

Hasta la vista, LJ!

The script (big chunks borrowed from ljdump):


#!/usr/bin/python

import codecs, os, pickle, pprint, re, shutil, sys, urllib2, xml.dom.minidom, xmlrpclib

import glob
import re
import xml.etree.ElementTree as ET

url = {}
posts = {}


try:
    from hashlib import md5
except ImportError:
    import md5 as _md5
    md5 = _md5.new

def calcchallenge(challenge, password):
    return md5(challenge+md5(password).hexdigest()).hexdigest()

def flatresponse(response):
    r = {}
    while True:
        name = response.readline()
        if len(name) == 0:
            break
        if name[-1] == '\n':
            name = name[:len(name)-1]
        value = response.readline()
        if value[-1] == '\n':
            value = value[:len(value)-1]
        r[name] = value
    return r

def getljsession(server, username, password):
    r = urllib2.urlopen(server+"/interface/flat", "mode=getchallenge")
    response = flatresponse(r)
    r.close()
    r = urllib2.urlopen(server+"/interface/flat", "mode=sessiongenerate&user=%s&auth_method=challenge&auth_challenge=%s&auth_response=%s" % (username, response['challenge'], calcchallenge(response['challenge'], password)))
    response = flatresponse(r)
    r.close()
    return response['ljsession']

def dochallenge(server, params, password):
    challenge = server.LJ.XMLRPC.getchallenge()
    params.update({
        'auth_method': "challenge",
        'auth_challenge': challenge['challenge'],
        'auth_response': calcchallenge(challenge['challenge'], password)
    })
    return params


def process(server_url, username, password, journal):
    for filename in sorted(glob.glob(journal+'/L-*')):
         try:
             tree = ET.parse(filename)
         except ET.ParseError as e:
             print '%s: %s' % (filename, e)
             continue
         root = tree.getroot()
         dw_url = root.find('url').text
         try:
             import_source = root.find('props').find('import_source').text
         except AttributeError:
             print '%s: LJ url not found' % filename
             continue

         lj_url = re.sub(r'livejournal\.com/(.*?)/(.*)', r'http://\1.livejournal.com/\2.html', import_source)
         #print "%s -> %s" % (lj_url, dw_url)
         url[lj_url] = dw_url
         posts[dw_url] = root

    ljsession = getljsession(server_url, username, password)
    server = xmlrpclib.ServerProxy(server_url + "/interface/xmlrpc")
         
    for dw_url, post in sorted(posts.iteritems()):
        old_text = post.find('event').text
        new_text = re.sub(r'http://([\w\d_-]+)\.livejournal\.com/tag/',
                          r'http://\1.dreamwidth.org/tag/', old_text)
        for lj, dw in url.iteritems():
            new_text = new_text.replace(lj, dw)

        if old_text != new_text:
            print new_text
            print dw_url
            print

            itemid = post.find('itemid').text
            try:
                subject = post.find('subject').text
            except AttributeError:
                subject = ''
            print itemid, subject

            s = raw_input('Proceed? (y/n) ')
            if s != 'y':
                continue
            
            e = server.LJ.XMLRPC.editevent(dochallenge(server, {
                'username': username,
                'ver': 1,
                'event': new_text,
                'itemid': itemid,
                'subject': subject,
                #'lineendings': 'unix',
            }, password))
            print "Edit result:", e
            print


if os.access("ljdump.config", os.F_OK):
    config = xml.dom.minidom.parse("ljdump.config")
    server = config.documentElement.getElementsByTagName("server")[0].childNodes[0].data
    username = config.documentElement.getElementsByTagName("username")[0].childNodes[0].data
    password = config.documentElement.getElementsByTagName("password")[0].childNodes[0].data
    journals = config.documentElement.getElementsByTagName("journal")
    if journals:
        for e in journals:
            process(server, username, password, e.childNodes[0].data)
    else:
        process(server, username, password, username)
        


This entry was originally posted at http://bluedrag.dreamwidth.org/296158.html. Please comment there using OpenID.
Tags: #print, linux, meta
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments