bluedrag (bluedrag) wrote,

  • Mood:

LJ -> DW import: adjusting links

Let's say you decided to leave LiveJournal and to migrate to Dreamwidth. You do a full import of your journal to DW and everything looks fine and dandy, but there is one problem: if your LJ posts linked to each other, the import process would not adjust such links, and they still would point to Clearly unacceptable! I wrote a quick and dirty script to fix that. Details below.

1) Get the script, put it in a directory.

2) In the same directory create a config file, ljdump.config:
(substitute bluedrag with your own journal name, and password with your password).

3) Still in the same directory, run . It will create a subdirectory with the name of your journal and a full backup of all entries and comments. My script relies on the presence of that backup and the config file (ljdump.config).

4) Save my script to a file ( and run it in the same directory. It will go over your entries (but not comments) and will try to change all links to your livejournal posts (and tags) to the corresponding Dreamwidth links. It will ask before writing every entry back (but if you feel especially bold you can comment out the raw_input line).

Disclaimers: No warranties, and only tested on my journal. But it worked, and I feel great about the result.

Hasta la vista, LJ!

The script (big chunks borrowed from ljdump):


import codecs, os, pickle, pprint, re, shutil, sys, urllib2, xml.dom.minidom, xmlrpclib

import glob
import re
import xml.etree.ElementTree as ET

url = {}
posts = {}

    from hashlib import md5
except ImportError:
    import md5 as _md5
    md5 =

def calcchallenge(challenge, password):
    return md5(challenge+md5(password).hexdigest()).hexdigest()

def flatresponse(response):
    r = {}
    while True:
        name = response.readline()
        if len(name) == 0:
        if name[-1] == '\n':
            name = name[:len(name)-1]
        value = response.readline()
        if value[-1] == '\n':
            value = value[:len(value)-1]
        r[name] = value
    return r

def getljsession(server, username, password):
    r = urllib2.urlopen(server+"/interface/flat", "mode=getchallenge")
    response = flatresponse(r)
    r = urllib2.urlopen(server+"/interface/flat", "mode=sessiongenerate&user=%s&auth_method=challenge&auth_challenge=%s&auth_response=%s" % (username, response['challenge'], calcchallenge(response['challenge'], password)))
    response = flatresponse(r)
    return response['ljsession']

def dochallenge(server, params, password):
    challenge = server.LJ.XMLRPC.getchallenge()
        'auth_method': "challenge",
        'auth_challenge': challenge['challenge'],
        'auth_response': calcchallenge(challenge['challenge'], password)
    return params

def process(server_url, username, password, journal):
    for filename in sorted(glob.glob(journal+'/L-*')):
             tree = ET.parse(filename)
         except ET.ParseError as e:
             print '%s: %s' % (filename, e)
         root = tree.getroot()
         dw_url = root.find('url').text
             import_source = root.find('props').find('import_source').text
         except AttributeError:
             print '%s: LJ url not found' % filename

         lj_url = re.sub(r'livejournal\.com/(.*?)/(.*)', r'http://\\2.html', import_source)
         #print "%s -> %s" % (lj_url, dw_url)
         url[lj_url] = dw_url
         posts[dw_url] = root

    ljsession = getljsession(server_url, username, password)
    server = xmlrpclib.ServerProxy(server_url + "/interface/xmlrpc")
    for dw_url, post in sorted(posts.iteritems()):
        old_text = post.find('event').text
        new_text = re.sub(r'http://([\w\d_-]+)\.livejournal\.com/tag/',
                          r'http://\', old_text)
        for lj, dw in url.iteritems():
            new_text = new_text.replace(lj, dw)

        if old_text != new_text:
            print new_text
            print dw_url

            itemid = post.find('itemid').text
                subject = post.find('subject').text
            except AttributeError:
                subject = ''
            print itemid, subject

            s = raw_input('Proceed? (y/n) ')
            if s != 'y':
            e = server.LJ.XMLRPC.editevent(dochallenge(server, {
                'username': username,
                'ver': 1,
                'event': new_text,
                'itemid': itemid,
                'subject': subject,
                #'lineendings': 'unix',
            }, password))
            print "Edit result:", e

if os.access("ljdump.config", os.F_OK):
    config = xml.dom.minidom.parse("ljdump.config")
    server = config.documentElement.getElementsByTagName("server")[0].childNodes[0].data
    username = config.documentElement.getElementsByTagName("username")[0].childNodes[0].data
    password = config.documentElement.getElementsByTagName("password")[0].childNodes[0].data
    journals = config.documentElement.getElementsByTagName("journal")
    if journals:
        for e in journals:
            process(server, username, password, e.childNodes[0].data)
        process(server, username, password, username)

This entry was originally posted at Please comment there using OpenID.
Tags: #print, linux, meta
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.