Wikivoyage parser with heuristics

While travelling through Vietnam, I was using wikivoyage quite intensely as a travel guide and so I started to contribute to some articles there myself. When editing an article, especially cleaning up and structuring long semi-formatted lists of hotels and restaurant was quite annoying, but given the semi-structured shape of the lists, it’s not straight forward, to automate the formatting.

Being annoyed enough by the editing, I took it as a challenge, and wrote a parser making use of a bunch of heuristical rules to classify the list entries, split them into chunks, apply formatting rules on the chunks, and merge it together again into a nicely formatted list entry. So some ugly unstructred listing like

* '''Birmingham Buddhist Centre''', 11 Park Rd, Moseley (''#1, #35 or #50 bus''), ''+44 121'' 449 5279 (''[mailto:[email protected] [email protected]]''), [http://www.birminghambuddhistcentre.org.uk/]. A centre run by the Friends of the Western Buddhist Order'' .

* '''Hotel Indah Manila''' 350 A J Villegas St. Tel: ''+63 2'' 5361188, 5362288. [http://www.hotelindah.com/] Rates start at ₱2000 for this modest 76-room hotel. Facilities include Café Indah and conference and function rooms. Airport and city transfers, tour assistance, and laundry service are available.

becomes nicely formatted into

* {{vCard| type=sight| subtype=religious| name=Birmingham Buddhist Centre| address=11 Park Rd, Moseley| directions=#1, #35 or #50 bus| phone=+44 121 449 5279| [email protected]| url=http://www.birminghambuddhistcentre.org.uk/| description=A centre run by the Friends of the Western Buddhist Order.}}

* {{vCard| type=hotel| subtype=hotel| name=Hotel Indah Manila| address=350 A J Villegas St| phone=+63 2 5361188, 5362288| url=http://www.hotelindah.com/| price=Rates start at ₱2000 for this modest 76-room hotel| description=Facilities include Café Indah and conference and function rooms. Airport and city transfers, tour assistance, and laundry service are available.}}

I wrote it as a library and gave it a web frontend using CGI or as a standalone version using bottle. After using python intensely for several years, it’s actually the first time, that I used it to display some web content instead of PHP, and I was a bit surprised, how straight forward it was. So, give it a try, and let me know what you think about it! The source code is available at github.