Dev/Command-line
From XOWA: the free, open-source, offline wiki application
XOWA can import a wiki using a plain text file and a command-line.
Contents
Import simple.wikipedia.org through the command-line
- Open up a command-line. For example, on Windows, run cmd
- Run the following: java -jar C:\000\200_dev\110_java\100_core\out\production\400_xowa\ --cmd_file C:\xowa_release\xowa_build.gfs --app_mode cmd
- Wait about 10 minutes for the script to complete
- Launch XOWA and enter simple.wikipedia.org in the URL bar
Import a different wiki by editing the build script
- Open the following file in a text editor: C:\xowa_release\xowa_build.gfs. See Script below for the full text.
- Replace all instances of simple.wikipedia.org with the domain name. For example, for English Wikipedia, use en.wikipedia.org
- Run the command-line import again.
- Launch XOWA and enter in the domain name in the the URL bar.
Import a wiki with a manual download
Download the wiki dump
- Navigate to https://dumps.wikimedia.org/enwiki
- Click on the latest directory
- Download the file just under "Articles, templates, media/file descriptions, and primary meta-pages.". It should read enwiki-latest-pages-articles.xml.bz2
- The download is 11+ GB and may take anywhere between 2 and 5 hours to complete.
- If you also want talk pages, you should download the "Recombine all pages, current versions only." version. It should read enwiki-latest-pages-meta-current.xml.bz2. Note that this dump is twice the size of the regular dump.
Specify location of the wiki dump
- In the build script, replace the following line:
- add ('simple.wikipedia.org', 'text.init') {src_bz2_fil = '/your_directory/simplewiki-20130103-pages-articles.xml.bz2';}
Script
// do not show a "Press enter to continue" at the end of the script app.bldr.pause_at_end = 'n'; // run xowa.gfs app.scripts.run_file_by_type('xowa_cfg_app'); // import wiki; for more info see [[Dev/Command-line]] app.bldr.cmds { // delete all files in directory; note that subdirectories and file databases ("-file.xowa") will not be deleted add ('simple.wikipedia.org' , 'util.cleanup') {delete_all = 'y';} // download main dump file; contains all articles add ('simple.wikipedia.org' , 'util.download') {dump_type = 'pages-articles';} // download categorylinks file; contains links from category to pages add ('simple.wikipedia.org' , 'util.download') {dump_type = 'categorylinks';} // download page_props file; contains information on hidden categories add ('simple.wikipedia.org' , 'util.download') {dump_type = 'page_props';} // start wiki import add ('simple.wikipedia.org' , 'text.init'); // import articles add ('simple.wikipedia.org' , 'text.page'); // generate search data add ('simple.wikipedia.org' , 'text.search'); // end import add ('simple.wikipedia.org' , 'text.term'); // import css into wiki add ('simple.wikipedia.org' , 'text.css'); // create main category table (also mark hidden categories) add ('simple.wikipedia.org' , 'wiki.page_props'); // create category links add ('simple.wikipedia.org' , 'wiki.categorylinks'); // cleanup temp files; delete xml and bz2 add ('simple.wikipedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');} } // run cmds app.bldr.run;