Dev/Command-line/Wikidata
From XOWA: the free, open-source, offline wiki application
XOWA can import Wikidata through the command-line
Import using the XML dump
XOWA can build wikidata using the XML dump at www.mediwa/wikidatawiki/. This import is basically the same as an import of any other wiki.
The script for the XML import follows.
// build wikidata database; this only needs to be done once, whenever wikidata is updated add ('www.wikidata.org' , 'util.cleanup') {delete_all = 'y';} add ('www.wikidata.org' , 'util.download') {dump_type = 'pages-articles';} add ('www.wikidata.org' , 'util.download') {dump_type = 'categorylinks';} add ('www.wikidata.org' , 'util.download') {dump_type = 'page_props';} add ('www.wikidata.org' , 'util.download') {dump_type = 'image';} add ('www.wikidata.org' , 'text.init'); add ('www.wikidata.org' , 'text.page'); add ('www.wikidata.org' , 'text.cat.core'); add ('www.wikidata.org' , 'text.cat.link'); add ('www.wikidata.org' , 'text.cat.hidden'); add ('www.wikidata.org' , 'text.term'); add ('www.wikidata.org' , 'text.css'); add ('www.wikidata.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
Import using the JSON dump
As of v2.6.3, XOWA also provides basic support for building wikidata from the JSON dump. This support was added for the following reasons:
- Current delay in XML dumps: The last good wikidata XML dump was 2+ months old due to problems with dump generation. See: https://phabricator.wikimedia.org/T98585
- JSON dumps recommended: Wikidata seems to prefer using the JSON dump over the XML dump. See: http://www.wikidata.org/wiki/Wikidata:Database_download
- JSON dumps are more frequent: The JSON dumps have been dumping regularly on a weekly basis. In contrast the XML dumps take 3 - 4 weeks.
Despite these reasons, there are limitations to the JSON dump.
- Non-JSON pages not available: The JSON dump doesn't provide other pages, such as the Main Page or MediaWiki pages. Only pages in the main and property namespaces are available. This is by design. See: https://lists.wikimedia.org/pipermail/wikidata/2015-June/006441.html
- Page metadata not available : Certain properties are not available, such as page_id and last_modified. XOWA provides substitutes for these values, but they will not match the Wikimedia version
The script for the JSON import follows.
add ('www.wikidata.org' , 'util.cleanup') {delete_all = 'y';} // TODO: add ('www.wikidata.org' , 'util.download') {dump_type = 'wikidata-json';} add ('www.wikidata.org' , 'wbase.json_dump'); add ('www.wikidata.org' , 'text.term'); add ('www.wikidata.org' , 'text.css'); add ('www.wikidata.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz|*.json');}