Thumbs.html 48 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106
  1. <!DOCTYPE html>
  2. <html dir="ltr">
  3. <head>
  4. <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
  5. <title>Dev/Command-line/Thumbs - XOWA</title>
  6. <link rel="shortcut icon" href="https://gnosygnu.github.io/xowa/xowa_logo.png" />
  7. <link rel="stylesheet" href="https://gnosygnu.github.io/xowa/xowa_common.css" type="text/css">
  8. <style data-source="xowa" type="text/css">
  9. .console {font-family: monospace; color: #EEEEEE ; background-color: black ; border: medium solid black;}
  10. .code
  11. ,.path
  12. ,.url {font-family: monospace; color: black ; background-color: #f9f9f9 ; border: medium solid #f9f9f9;}
  13. .bold {font-weight: 900;}
  14. </style>
  15. </head>
  16. <body class="mediawiki ltr sitedir-ltr ns-0 ns-subject skin-vector action-submit vector-animateLayout" spellcheck="false">
  17. <div id="mw-page-base" class="noprint"></div>
  18. <div id="mw-head-base" class="noprint"></div>
  19. <div id="content" class="mw-body">
  20. <h1 id="firstHeading" class="firstHeading"><span>Dev/Command-line/Thumbs</span></h1>
  21. <div id="bodyContent" class="mw-body-content">
  22. <div id="siteSub">From XOWA: the free, open-source, offline wiki application</div>
  23. <div id="contentSub"></div>
  24. <div id="mw-content-text" lang="en" dir="ltr" class="mw-content-ltr">
  25. <p>
  26. XOWA can make complete wikis which will have the following:
  27. </p>
  28. <ul>
  29. <li>
  30. All images downloaded offline
  31. </li>
  32. <li>
  33. All pages compiled into HTML (pages will load faster)
  34. </li>
  35. </ul>
  36. <p>
  37. This process is run by a custom command-line <code>make</code> script.
  38. </p>
  39. <p>
  40. <br>
  41. </p>
  42. <table class="metadata plainlinks ambox ambox-delete" style="">
  43. <tr>
  44. <td class="mbox-empty-cell">
  45. </td>
  46. <td class="mbox-text" style="">
  47. <p>
  48. <span class="mbox-text-span">Please note that this script is for power users. It is not meant for casual users.</span>
  49. </p>
  50. <p>
  51. <span class="mbox-text-span">Please read through these instructions carefully. If you fail to follow these instructions, you may end up downloading millions of images by accident, and have your IP address banned by Wikimedia.</span>
  52. </p>
  53. <p>
  54. <span class="mbox-text-span">Also, the script will change in the future, and without any warning. There is no backward compatibility. Although the XOWA databases have a fixed format, the scripts do not. If you discover that your script breaks, please refer to this page, contact me for assistance, or go through the code.</span>
  55. </p>
  56. </td>
  57. </tr>
  58. </table>
  59. <p>
  60. <br>
  61. </p>
  62. <div id="toc" class="toc">
  63. <div id="toctitle" class="toctitle">
  64. <h2>
  65. Contents
  66. </h2>
  67. </div>
  68. <ul>
  69. <li class="toclevel-1 tocsection-1">
  70. <a href="#Overview"><span class="tocnumber">1</span> <span class="toctext">Overview</span></a>
  71. </li>
  72. <li class="toclevel-1 tocsection-2">
  73. <a href="#Process"><span class="tocnumber">2</span> <span class="toctext">Process</span></a>
  74. </li>
  75. <li class="toclevel-1 tocsection-3">
  76. <a href="#Script"><span class="tocnumber">3</span> <span class="toctext">Script</span></a>
  77. <ul>
  78. <li class="toclevel-2 tocsection-4">
  79. <a href="#make_commons"><span class="tocnumber">3.1</span> <span class="toctext">make_commons</span></a>
  80. </li>
  81. <li class="toclevel-2 tocsection-5">
  82. <a href="#make_wikidata"><span class="tocnumber">3.2</span> <span class="toctext">make_wikidata</span></a>
  83. </li>
  84. <li class="toclevel-2 tocsection-6">
  85. <a href="#make_wiki"><span class="tocnumber">3.3</span> <span class="toctext">make_wiki</span></a>
  86. </li>
  87. <li class="toclevel-2 tocsection-7">
  88. <a href="#Resuming"><span class="tocnumber">3.4</span> <span class="toctext">Resuming</span></a>
  89. </li>
  90. </ul>
  91. </li>
  92. <li class="toclevel-1 tocsection-8">
  93. <a href="#Appendix"><span class="tocnumber">4</span> <span class="toctext">Appendix</span></a>
  94. <ul>
  95. <li class="toclevel-2 tocsection-9">
  96. <a href="#Requirements"><span class="tocnumber">4.1</span> <span class="toctext">Requirements</span></a>
  97. <ul>
  98. <li class="toclevel-3 tocsection-10">
  99. <a href="#Hardware"><span class="tocnumber">4.1.1</span> <span class="toctext">Hardware</span></a>
  100. </li>
  101. <li class="toclevel-3 tocsection-11">
  102. <a href="#Internet-connectivity"><span class="tocnumber">4.1.2</span> <span class="toctext">Internet-connectivity</span></a>
  103. </li>
  104. <li class="toclevel-3 tocsection-12">
  105. <a href="#Pre-existing_image_databases_for_your_wiki_(optional)"><span class="tocnumber">4.1.3</span> <span class="toctext">Pre-existing image databases for your wiki (optional)</span></a>
  106. </li>
  107. </ul>
  108. </li>
  109. <li class="toclevel-2 tocsection-13">
  110. <a href="#gfs_script"><span class="tocnumber">4.2</span> <span class="toctext">gfs script</span></a>
  111. </li>
  112. <li class="toclevel-2 tocsection-14">
  113. <a href="#Terms"><span class="tocnumber">4.3</span> <span class="toctext">Terms</span></a>
  114. <ul>
  115. <li class="toclevel-3 tocsection-15">
  116. <a href="#lnki"><span class="tocnumber">4.3.1</span> <span class="toctext">lnki</span></a>
  117. </li>
  118. <li class="toclevel-3 tocsection-16">
  119. <a href="#orig"><span class="tocnumber">4.3.2</span> <span class="toctext">orig</span></a>
  120. </li>
  121. <li class="toclevel-3 tocsection-17">
  122. <a href="#xfer"><span class="tocnumber">4.3.3</span> <span class="toctext">xfer</span></a>
  123. </li>
  124. <li class="toclevel-3 tocsection-18">
  125. <a href="#fsdb"><span class="tocnumber">4.3.4</span> <span class="toctext">fsdb</span></a>
  126. </li>
  127. </ul>
  128. </li>
  129. <li class="toclevel-2 tocsection-19">
  130. <a href="#Examples"><span class="tocnumber">4.4</span> <span class="toctext">Examples</span></a>
  131. <ul>
  132. <li class="toclevel-3 tocsection-20">
  133. <a href="#Simple_Wikipedia_example_with_documentation"><span class="tocnumber">4.4.1</span> <span class="toctext">Simple Wikipedia example with documentation</span></a>
  134. </li>
  135. <li class="toclevel-3 tocsection-21">
  136. <a href="#Script:_gnosygnu's_actual_English_Wikipedia_script_(dirty;_provided_for_reference_only)"><span class="tocnumber">4.4.2</span> <span class="toctext">Script: gnosygnu's actual English Wikipedia script (dirty; provided for reference only)</span></a>
  137. </li>
  138. </ul>
  139. </li>
  140. </ul>
  141. </li>
  142. <li class="toclevel-1 tocsection-22">
  143. <a href="#Change_log"><span class="tocnumber">5</span> <span class="toctext">Change log</span></a>
  144. </li>
  145. </ul>
  146. </div>
  147. <h2>
  148. <span class="mw-headline" id="Overview">Overview</span>
  149. </h2>
  150. <p>
  151. The <code>make</code> script works in the following way:
  152. </p>
  153. <ul>
  154. <li>
  155. Loads the wikitext for a page.
  156. </li>
  157. <li>
  158. Converts the wikitext to HTML and saves it.
  159. </li>
  160. <li>
  161. Gathers a list of [[File]] links.
  162. </li>
  163. <li>
  164. Repeats for each page until there are no more pages
  165. </li>
  166. <li>
  167. Downloads the list of [[File]] to create the XOWA file databases.
  168. </li>
  169. </ul>
  170. <h2>
  171. <span class="mw-headline" id="Process">Process</span>
  172. </h2>
  173. <ul>
  174. <li>
  175. Open up a terminal
  176. <ul>
  177. <li>
  178. On Windows, run <code>cmd</code>
  179. </li>
  180. <li>
  181. On Linux / Mac OS X, run the Terminal app
  182. </li>
  183. </ul>
  184. </li>
  185. <li>
  186. Change to the xowa root directory
  187. <ul>
  188. <li>
  189. For example, if xowa is setup in <code>C:\xowa</code>, run <code>cd C:\xowa</code>
  190. </li>
  191. </ul>
  192. </li>
  193. <li>
  194. Create a text file in your xowa root folder called <code>make_xowa.gfs</code> with a text-editor.
  195. <ul>
  196. <li>
  197. For Windows, Notepad++ is recommended, or any other text editor that does not have Windows line-ending. (Do not use Notepad)
  198. </li>
  199. <li>
  200. For other systems, you can use a text-editor like Atom, jEdit, or whatever you're most comfortable with
  201. </li>
  202. </ul>
  203. </li>
  204. <li>
  205. Copy each of the scripts below to the text file
  206. </li>
  207. <li>
  208. Run the following command. Make sure to match the jar path and jar file
  209. </li>
  210. </ul>
  211. <dl>
  212. <dd>
  213. <code>java -jar C:\xowa\xowa_windows_64.jar --app_mode cmd --cmd_file C:\xowa\make_xowa.gfs --show_license n --show_args n</code>
  214. </dd>
  215. </dl>
  216. <ul>
  217. <li>
  218. Wait for the script to complete
  219. </li>
  220. </ul>
  221. <h2>
  222. <span class="mw-headline" id="Script">Script</span>
  223. </h2>
  224. <p>
  225. The <code>make</code> script should be run in 3 parts:
  226. </p>
  227. <ol>
  228. <li>
  229. <code>make_commons</code> script: Builds <b>commons.wikimedia.org</b> which is needed to provide image metadata for the download
  230. </li>
  231. <li>
  232. <code>make_wikidata</code> script: Builds <b>www.wikidata.org</b> which needed for data from {{#property}} calls or Module code.
  233. </li>
  234. <li>
  235. <code>make_wiki</code> script: Build the actual wiki
  236. </li>
  237. </ol>
  238. <p>
  239. Note that other wikis can re-use the same commons and wikidata. For example, if you want to build enwiki and dewiki, you only need to build <code>make_commons</code> and <code>make_wikidata</code> once.
  240. </p>
  241. <h3>
  242. <span class="mw-headline" id="make_commons"><code>make_commons</code></span>
  243. </h3>
  244. <ul>
  245. <li>
  246. Copy the following into <code>make_xowa.gfs</code>
  247. </li>
  248. </ul>
  249. <pre class='code'>
  250. app.bldr.pause_at_end_('n');
  251. app.scripts.run_file_by_type('xowa_cfg_app');
  252. app.cfg.set_temp('app', 'xowa.app.web.enabled', 'y');
  253. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.text', '0');
  254. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.html', '0');
  255. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.file', '0');
  256. app.bldr.cmds {
  257. // build commons database; this only needs to be done once, whenever commons is updated
  258. add ('commons.wikimedia.org' , 'util.cleanup') {delete_all = 'y';}
  259. add ('commons.wikimedia.org' , 'util.download') {dump_type = 'pages-articles';}
  260. add ('commons.wikimedia.org' , 'util.download') {dump_type = 'page_props';}
  261. add ('commons.wikimedia.org' , 'util.download') {dump_type = 'image';}
  262. add ('commons.wikimedia.org' , 'text.init');
  263. add ('commons.wikimedia.org' , 'text.page');
  264. add ('commons.wikimedia.org' , 'text.term');
  265. add ('commons.wikimedia.org' , 'text.css');
  266. add ('commons.wikimedia.org' , 'wiki.page_props');
  267. add ('commons.wikimedia.org' , 'wiki.image');
  268. add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
  269. add ('commons.wikimedia.org' , 'wiki.page_dump.make');
  270. add ('commons.wikimedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
  271. add ('commons.wikimedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
  272. }
  273. app.bldr.run;
  274. </pre>
  275. <ul>
  276. <li>
  277. Run the script using the process above
  278. <ul>
  279. <li>
  280. For 2020-02, this script will take about 7 hours to complete and use 125 GB of disk space.
  281. </li>
  282. </ul>
  283. </li>
  284. </ul>
  285. <h3>
  286. <span class="mw-headline" id="make_wikidata"><code>make_wikidata</code></span>
  287. </h3>
  288. <ul>
  289. <li>
  290. Copy the following into <code>make_xowa.gfs</code>
  291. </li>
  292. </ul>
  293. <pre class='code'>
  294. app.bldr.pause_at_end_('n');
  295. app.scripts.run_file_by_type('xowa_cfg_app');
  296. app.cfg.set_temp('app', 'xowa.app.web.enabled', 'y');
  297. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.text', '0');
  298. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.html', '0');
  299. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.file', '0');
  300. app.bldr.cmds {
  301. // build wikidata database; this only needs to be done once, whenever wikidata is updated
  302. add ('www.wikidata.org' , 'util.cleanup') {delete_all = 'y';}
  303. add ('www.wikidata.org' , 'util.download') {dump_type = 'pages-articles';}
  304. add ('www.wikidata.org' , 'util.download') {dump_type = 'categorylinks';}
  305. add ('www.wikidata.org' , 'util.download') {dump_type = 'page_props';}
  306. add ('www.wikidata.org' , 'util.download') {dump_type = 'image';}
  307. add ('www.wikidata.org' , 'text.init');
  308. add ('www.wikidata.org' , 'text.page');
  309. add ('www.wikidata.org' , 'text.term');
  310. add ('www.wikidata.org' , 'text.css');
  311. add ('www.wikidata.org' , 'wiki.page_props');
  312. add ('www.wikidata.org' , 'wiki.categorylinks');
  313. add ('www.wikidata.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
  314. }
  315. app.bldr.run;
  316. </pre>
  317. <ul>
  318. <li>
  319. Run the script using the process above
  320. <ul>
  321. <li>
  322. For 2020-02, this script can take about 24 hours to complete and use 250 GB of disk space.
  323. </li>
  324. </ul>
  325. </li>
  326. </ul>
  327. <h3>
  328. <span class="mw-headline" id="make_wiki"><code>make_wiki</code></span>
  329. </h3>
  330. <ul>
  331. <li>
  332. Copy the following into <code>make_xowa.gfs</code>
  333. </li>
  334. </ul>
  335. <pre class='code'>
  336. app.bldr.pause_at_end_('n');
  337. app.scripts.run_file_by_type('xowa_cfg_app');
  338. app.cfg.set_temp('app', 'xowa.app.web.enabled', 'y');
  339. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.text', '0');
  340. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.html', '0');
  341. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.file', '0');
  342. app.bldr.cmds {
  343. // build simple.wikipedia.org
  344. add ('simple.wikipedia.org' , 'util.cleanup') {delete_all = 'y';}
  345. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'pages-articles';}
  346. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'categorylinks';}
  347. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'page_props';}
  348. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'image';}
  349. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'pagelinks';} // needed for sorting search results by PageRank
  350. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'imagelinks';}
  351. add ('simple.wikipedia.org' , 'text.init');
  352. add ('simple.wikipedia.org' , 'text.page') {
  353. // calculate redirect_id for #REDIRECT pages. needed for html databases
  354. redirect_id_enabled = 'y';
  355. }
  356. add ('simple.wikipedia.org' , 'text.search');
  357. // upload desktop css
  358. add ('simple.wikipedia.org' , 'text.css');
  359. // upload mobile css
  360. add ('simple.wikipedia.org' , 'text.css') {css_key = 'xowa.mobile'; /* css_dir = 'C:\xowa\user\anonymous\wiki\simple.wikipedia.org-mobile\html\'; */}
  361. add ('simple.wikipedia.org' , 'text.term');
  362. add ('simple.wikipedia.org' , 'wiki.page_props');
  363. add ('simple.wikipedia.org' , 'wiki.categorylinks');
  364. // create local "page" tables in each "text" database for "lnki_temp"
  365. add ('simple.wikipedia.org' , 'wiki.page_dump.make');
  366. // create a redirect table for pages in the File namespace
  367. add ('simple.wikipedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
  368. // create an "image" table to get the metadata for all files in the current wiki
  369. add ('simple.wikipedia.org' , 'wiki.image');
  370. // create an "imagelinks" table to find out which images are used for the wiki
  371. add ('simple.wikipedia.org' , 'wiki.imagelinks');
  372. // parse all page-to-page links
  373. add ('simple.wikipedia.org' , 'wiki.page_link');
  374. // calculate a score for each page using the page-to-page links
  375. add ('simple.wikipedia.org' , 'search.page__page_score') {iteration_max = 100;}
  376. // update link score statistics for the search tables
  377. add ('simple.wikipedia.org' , 'search.link__link_score') {page_rank_enabled = 'y';}
  378. // update word count statistics for the search_word table
  379. add ('simple.wikipedia.org' , 'search.word__link_count');
  380. // cleanup all downloaded files as well as temporary files
  381. add ('simple.wikipedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
  382. // v2 html generator; allows for multi-threaded / multi-machine builds
  383. add ('simple.wikipedia.org' , 'wiki.mass_parse.init') {cfg {ns_ids = '0|4|14|8';}}
  384. // uncomment the next line to resume parsing. See === Resuming === below
  385. // add ('simple.wikipedia.org' , 'wiki.mass_parse.resume');
  386. // NOTE: must change manual_now
  387. add ('simple.wikipedia.org' , 'wiki.mass_parse.exec') {
  388. cfg {
  389. // locks time to a specific value so all pages use the same time when calling Date.Now()
  390. manual_now = '2020-02-01 01:02:03';
  391. // number of threads; set to 1 to skip multi-threaded behavior
  392. num_wkrs = 8;
  393. // enables building full-text search indexes
  394. indexer_enabled = 'y';
  395. // optimization; loads all templates in memory instead of loading each one from disk
  396. load_all_templates = 'y';
  397. // optimization; loads all imglinks in memory instead of loading each one from disk
  398. // an imglink maps a given image (File:Abc.png) to a repo (commons vs local wiki) as well as a rename
  399. load_all_imglinks = 'y';
  400. // number of pages after which XOWA empties cache
  401. cleanup_interval = 50;
  402. // DEPRECATE: uncomment these 2 lines to use custom HTML zip compression
  403. // hzip_enabled = 'y';
  404. // hdiff_enabled ='y';
  405. // uncomment these 3 lines if using the build script as a "worker" helping a "server"
  406. // num_pages_in_pool = 32000;
  407. // mgr_url = '\\server_machine_name\xowa\wiki\en.wikipedia.org\tmp\xomp\';
  408. // wkr_machine_name = 'worker_machine_1'
  409. }
  410. }
  411. // note that if multi-machine mode is enabled, all worker directories must be manually copied to the server directory (a build command will be added later)
  412. add ('simple.wikipedia.org' , 'wiki.mass_parse.make');
  413. // aggregate the lnkis
  414. add ('simple.wikipedia.org' , 'file.lnki_regy');
  415. // generate orig metadata for files in the current wiki (for example, for pages in en.wikipedia.org/wiki/File:*)
  416. add ('simple.wikipedia.org' , 'file.page_regy') {build_commons = 'n';}
  417. // generate all orig metadata for all lnkis
  418. add ('simple.wikipedia.org' , 'file.orig_regy');
  419. // generate list of files to download based on "orig_regy" and XOWA image code
  420. add ('simple.wikipedia.org' , 'file.xfer_temp.thumb');
  421. // aggregate list one more time
  422. add ('simple.wikipedia.org' , 'file.xfer_regy');
  423. // identify images that have already been downloaded
  424. add ('simple.wikipedia.org' , 'file.xfer_regy_update');
  425. // download images. This step may also take a long time, depending on how many images are needed
  426. add ('simple.wikipedia.org' , 'file.fsdb_make') {
  427. commit_interval = 1000; progress_interval = 200; select_interval = 10000;
  428. ns_ids = '0|4|14';
  429. // specify whether original wiki databases are v1 (.sqlite3) or v2 (.xowa)
  430. src_bin_mgr__fsdb_version = 'v1';
  431. // always redownload certain files
  432. src_bin_mgr__fsdb_skip_wkrs = 'page_gt_1|small_size';
  433. // allow downloads from wikimedia
  434. src_bin_mgr__wmf_enabled = 'y';
  435. }
  436. // generate registry of original metadata by file title
  437. add ('simple.wikipedia.org' , 'file.orig_reg');
  438. // drop page_dump tables
  439. add ('simple.wikipedia.org' , 'wiki.page_dump.drop');
  440. }
  441. app.bldr.run;
  442. </pre>
  443. <ul>
  444. <li>
  445. Change the <code>manual_now</code> above to match the first day of the current month. For example, if today is <code>2020-02-16</code>, change it to <code>manual_now = '2020-02-01 01:02:03'</code>.
  446. </li>
  447. <li>
  448. Run the script using the process above
  449. <ul>
  450. <li>
  451. For 2020-02, this script can take about 1 hour to complete and use 5 GB of disk space.
  452. </li>
  453. </ul>
  454. </li>
  455. </ul>
  456. <h3>
  457. <span class="mw-headline" id="Resuming">Resuming</span>
  458. </h3>
  459. <p>
  460. The <code>wiki.mass_parse.exec</code> may take many hours. For English Wikipedia, it can take up to 5 days, even with 8 threads
  461. </p>
  462. <p>
  463. During this time, the build can be canceled by any of the following:
  464. </p>
  465. <ul>
  466. <li>
  467. Manual: User presses Ctrl+C
  468. </li>
  469. <li>
  470. Unanticipated: Process dies or machine shuts down
  471. </li>
  472. </ul>
  473. <p>
  474. To resume the build, the following steps can be applied
  475. </p>
  476. <ul>
  477. <li>
  478. Comment out all commands before <code>wiki.mass_parse.exec</code> using a block comment
  479. <ul>
  480. <li>
  481. Place a <code>/*</code> before the line with 'util.cleanup'
  482. </li>
  483. <li>
  484. Place a <code>*/</code> after the line with 'wiki.mass_parse.init'
  485. </li>
  486. </ul>
  487. </li>
  488. <li>
  489. Uncomment the line for 'wiki.mass_parse.resume'
  490. </li>
  491. <li>
  492. Run the command-line again
  493. </li>
  494. </ul>
  495. <dl>
  496. <dd>
  497. <code>java -jar C:\xowa\xowa_windows_64.jar --app_mode cmd --cmd_file C:\xowa\make_xowa.gfs --show_license n --show_args n</code>
  498. </dd>
  499. </dl>
  500. <h2>
  501. <span class="mw-headline" id="Appendix">Appendix</span>
  502. </h2>
  503. <h3>
  504. <span class="mw-headline" id="Requirements">Requirements</span>
  505. </h3>
  506. <h4>
  507. <span class="mw-headline" id="Hardware">Hardware</span>
  508. </h4>
  509. <p>
  510. You should have a recent-generation machine with relatively high-performance hardware, especially if you're planning to run the <code>make</code> script for English Wikipedia.
  511. </p>
  512. <p>
  513. For context, here is my current machine setup for generating the image dumps:
  514. </p>
  515. <ul>
  516. <li>
  517. Processor: Intel Core i7-4770K; 3.5 GHz with 8 MB L3 cache
  518. </li>
  519. <li>
  520. Memory: 16 GB DDR3 SDRAM DDR3 1600 (PC3 12800)
  521. </li>
  522. <li>
  523. Hard Drive: 1TB SSD
  524. </li>
  525. <li>
  526. Operating System: openSUSE 13.2
  527. </li>
  528. </ul>
  529. <p>
  530. (Note: The hardware was assembled in late 2013.)
  531. </p>
  532. <p>
  533. For English Wikipedia, it takes about 50 hours for the entire process.
  534. </p>
  535. <h4>
  536. <span class="mw-headline" id="Internet-connectivity">Internet-connectivity</span>
  537. </h4>
  538. <p>
  539. You should have a broadband connection to the internet. The script will need to download dump files from Wikimedia and some dump files (like English Wikipedia) will be in the tens of GB.
  540. </p>
  541. <p>
  542. <br>
  543. </p>
  544. <h4>
  545. <span class="mw-headline" id="Pre-existing_image_databases_for_your_wiki_(optional)">Pre-existing image databases for your wiki (optional)</span>
  546. </h4>
  547. <p>
  548. XOWA will automatically re-use the images from existing image databases so that you do not have to redownload them. This is particularly useful for large wikis where redownloading millions of images would be unwanted.
  549. </p>
  550. <p>
  551. It is strongly advised that you download the image database for your wiki. You can find a full list here: <a href="http://xowa.sourceforge.net/image_dbs.html" rel="nofollow" class="external free">http://xowa.sourceforge.net/image_dbs.html</a> Note that if an image database does not exist for your wiki, you can still proceed to use the script
  552. </p>
  553. <ul>
  554. <li>
  555. If you have v1 image databases, they should be placed in <code>/xowa/file/wiki_domain-prv</code>. For example, English Wikipedia should have <code>/xowa/file/en.wikipedia.org-prv/fsdb.main/fsdb.bin.0000.sqlite3</code>
  556. </li>
  557. <li>
  558. If you have v2 image databases, they should be placed in <code>/xowa/wiki/wiki_domain/prv</code>. For example, English Wikipedia should have <code>/xowa/wiki/en.wikipedia.org/prv/en.wikipedia.org-file-ns.000-db.001.xowa</code>
  559. </li>
  560. </ul>
  561. <h3>
  562. <span class="mw-headline" id="gfs_script">gfs script</span>
  563. </h3>
  564. <p>
  565. The script is written in the <code>gfs</code> format. This is a custom scripting format specific to XOWA. It is similar to JSON, but also supports commenting.
  566. </p>
  567. <p>
  568. Unfortunately the error-handling for gfs is quite minimal. When making changes, please do them in small steps and be prepared to go to backups.
  569. </p>
  570. <p>
  571. The following is a brief list of rules:
  572. </p>
  573. <ul>
  574. <li>
  575. Comments are made with either "//","\n" or "/*","*/". For example: <code>// single-line comment</code> or <code>/* multi-line comment*/</code>
  576. </li>
  577. <li>
  578. Booleans are "y" and "n" (yes / no or true / false). For example: <code>enabled = 'y';</code>
  579. </li>
  580. <li>
  581. Numbers are 32-bit integers and are not enclosed in quotes. For example, <code>count = 10000;</code>
  582. </li>
  583. <li>
  584. Strings are surrounded by apostrophes (') or quotes ("). For example: <code>key = 'val';</code>
  585. </li>
  586. <li>
  587. Statements are terminated by a semi-colon (;). For example: <code>procedure1;</code>
  588. </li>
  589. <li>
  590. Statements can take arguments in parentheses. For example: <code>procedure1('argument1', 'argument2', 'argument3');</code>
  591. </li>
  592. <li>
  593. Statements are grouped with curly braces. ({}). For example: <code>group {procedure1; procedure2; procedure3;}</code>
  594. </li>
  595. </ul>
  596. <h3>
  597. <span class="mw-headline" id="Terms">Terms</span>
  598. </h3>
  599. <h4>
  600. <span class="mw-headline" id="lnki">lnki</span>
  601. </h4>
  602. <p>
  603. A <code>lnki</code> is short for "<b>l</b>i<b>nk</b> <b>i</b>nternal". It refers to all wikitext with the double bracket syntax: [[A]]. A more elaborate example for files would be [[File:A.png|thumb|200x300px|upright=.80]]. Note that the abbreviation was chosen to differentiate it from <code>lnke</code> which is short for "<b>l</b>i<b>nk</b> <b>e</b>nternal".
  604. </p>
  605. <p>
  606. For the purposes of the script, all lnki data comes from the wikitext in the current wiki's data dump
  607. </p>
  608. <h4>
  609. <span class="mw-headline" id="orig">orig</span>
  610. </h4>
  611. <p>
  612. An <code>orig</code> is short for "<b>orig</b>inal file". It refers to the original file metadata.
  613. </p>
  614. <p>
  615. For the purposes of this script, all orig data comes from commons.wikimedia.org
  616. </p>
  617. <h4>
  618. <span class="mw-headline" id="xfer">xfer</span>
  619. </h4>
  620. <p>
  621. An <code>xfer</code> is short for "transfer file". It refers to the actual file to be downloaded.
  622. </p>
  623. <h4>
  624. <span class="mw-headline" id="fsdb">fsdb</span>
  625. </h4>
  626. <p>
  627. The <code>fsdb</code> is short for "<b>f</b>ile <b>s</b>ystem <b>d</b>ata<b>b</b>ase". It refers to the file as it is stored in the internal table format of the XOWA image databases.
  628. </p>
  629. <p>
  630. <br>
  631. </p>
  632. <h3>
  633. <span class="mw-headline" id="Examples">Examples</span>
  634. </h3>
  635. <h4>
  636. <span class="mw-headline" id="Simple_Wikipedia_example_with_documentation">Simple Wikipedia example with documentation</span>
  637. </h4>
  638. <pre class='code'>
  639. app.bldr.pause_at_end_('n');
  640. app.scripts.run_file_by_type('xowa_cfg_app');
  641. app.cfg.set_temp('app', 'xowa.app.web.enabled', 'y');
  642. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.text', '0');
  643. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.html', '0');
  644. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.file', '0');
  645. app.bldr.cmds {
  646. // build commons database; this only needs to be done once, whenever commons is updated
  647. add ('commons.wikimedia.org' , 'util.cleanup') {delete_all = 'y';}
  648. add ('commons.wikimedia.org' , 'util.download') {dump_type = 'pages-articles';}
  649. add ('commons.wikimedia.org' , 'util.download') {dump_type = 'page_props';}
  650. add ('commons.wikimedia.org' , 'util.download') {dump_type = 'image';}
  651. add ('commons.wikimedia.org' , 'text.init');
  652. add ('commons.wikimedia.org' , 'text.page');
  653. add ('commons.wikimedia.org' , 'text.term');
  654. add ('commons.wikimedia.org' , 'text.css');
  655. add ('commons.wikimedia.org' , 'wiki.page_props');
  656. add ('commons.wikimedia.org' , 'wiki.image');
  657. add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
  658. add ('commons.wikimedia.org' , 'wiki.page_dump.make');
  659. add ('commons.wikimedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
  660. add ('commons.wikimedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
  661. // build wikidata database; this only needs to be done once, whenever wikidata is updated
  662. add ('www.wikidata.org' , 'util.cleanup') {delete_all = 'y';}
  663. add ('www.wikidata.org' , 'util.download') {dump_type = 'pages-articles';}
  664. add ('www.wikidata.org' , 'util.download') {dump_type = 'categorylinks';}
  665. add ('www.wikidata.org' , 'util.download') {dump_type = 'page_props';}
  666. add ('www.wikidata.org' , 'util.download') {dump_type = 'image';}
  667. add ('www.wikidata.org' , 'text.init');
  668. add ('www.wikidata.org' , 'text.page');
  669. add ('www.wikidata.org' , 'text.term');
  670. add ('www.wikidata.org' , 'text.css');
  671. add ('www.wikidata.org' , 'wiki.page_props');
  672. add ('www.wikidata.org' , 'wiki.categorylinks');
  673. add ('www.wikidata.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
  674. // build simple.wikipedia.org
  675. add ('simple.wikipedia.org' , 'util.cleanup') {delete_all = 'y';}
  676. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'pages-articles';}
  677. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'categorylinks';}
  678. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'page_props';}
  679. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'image';}
  680. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'pagelinks';} // needed for sorting search results by PageRank
  681. add ('simple.wikipedia.org' , 'util.download') {dump_type = 'imagelinks';}
  682. add ('simple.wikipedia.org' , 'text.init');
  683. add ('simple.wikipedia.org' , 'text.page') {
  684. // calculate redirect_id for #REDIRECT pages. needed for html databases
  685. redirect_id_enabled = 'y';
  686. }
  687. add ('simple.wikipedia.org' , 'text.search');
  688. // upload desktop css
  689. add ('simple.wikipedia.org' , 'text.css');
  690. // upload mobile css
  691. add ('simple.wikipedia.org' , 'text.css') {css_key = 'xowa.mobile'; /* css_dir = 'C:\xowa\user\anonymous\wiki\simple.wikipedia.org-mobile\html\'; */}
  692. add ('simple.wikipedia.org' , 'text.term');
  693. add ('simple.wikipedia.org' , 'wiki.page_props');
  694. add ('simple.wikipedia.org' , 'wiki.categorylinks');
  695. // create local "page" tables in each "text" database for "lnki_temp"
  696. add ('simple.wikipedia.org' , 'wiki.page_dump.make');
  697. // create a redirect table for pages in the File namespace
  698. add ('simple.wikipedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
  699. // create an "image" table to get the metadata for all files in the current wiki
  700. add ('simple.wikipedia.org' , 'wiki.image');
  701. // create an "imagelinks" table to find out which images are used for the wiki
  702. add ('simple.wikipedia.org' , 'wiki.imagelinks');
  703. // parse all page-to-page links
  704. add ('simple.wikipedia.org' , 'wiki.page_link');
  705. // calculate a score for each page using the page-to-page links
  706. add ('simple.wikipedia.org' , 'search.page__page_score') {iteration_max = 100;}
  707. // update link score statistics for the search tables
  708. add ('simple.wikipedia.org' , 'search.link__link_score') {page_rank_enabled = 'y';}
  709. // update word count statistics for the search_word table
  710. add ('simple.wikipedia.org' , 'search.word__link_count');
  711. // cleanup all downloaded files as well as temporary files
  712. add ('simple.wikipedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
  713. // OBSOLETE: use v2
  714. // v1 html generator
  715. // parse every page in the listed namespace and gather data on their lnkis.
  716. // this step will take the longest amount of time.
  717. /*
  718. add ('simple.wikipedia.org' , 'file.lnki_temp') {
  719. // save data every # of pages
  720. commit_interval = 10000;
  721. // update progress every # of pages
  722. progress_interval = 50;
  723. // free memory by flushing internal caches every # of pages
  724. cleanup_interval = 50;
  725. // specify # of pages to read into memory at a time, where # is in MB. For example, 25 means read approximately 25 MB of page text into memory
  726. select_size = 25;
  727. // namespaces to parse. See en.wikipedia.org/wiki/Help:Namespaces
  728. ns_ids = '0|4|14';
  729. // enable generation of ".html" databases. This will increase processing time by 20% - 25%
  730. hdump_bldr {
  731. // generate html databases
  732. enabled = 'y';
  733. // compression method for html: 1=none; 2=zip; 3=gz; 4=bz2
  734. zip_tid = 3;
  735. // enable additional custom compression
  736. hzip_enabled = 'y';
  737. // perform extra validation step of custom compression
  738. hzip_diff = 'y';
  739. }
  740. }
  741. */
  742. // v2 html generator; allows for multi-threaded / multi-machine builds
  743. add ('simple.wikipedia.org' , 'wiki.mass_parse.init') {cfg {ns_ids = '0|4|14|8';}}
  744. add ('simple.wikipedia.org' , 'wiki.mass_parse.exec') {
  745. cfg {
  746. num_wkrs = 8; load_all_templates = 'y'; cleanup_interval = 50; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2016-08-01 01:02:03';
  747. load_all_imglinks = 'y';
  748. // uncomment the following 3 lines if using the build script as a "worker" helping a "server"
  749. // num_pages_in_pool = 32000;
  750. // mgr_url = '\\server_machine_name\xowa\wiki\en.wikipedia.org\tmp\xomp\';
  751. // wkr_machine_name = 'worker_machine_1'
  752. }
  753. }
  754. // note that if multi-machine mode is enabled, all worker directories must be manually copied to the server directory (a build command will be added later)
  755. add ('simple.wikipedia.org' , 'wiki.mass_parse.make');
  756. // aggregate the lnkis
  757. add ('simple.wikipedia.org' , 'file.lnki_regy');
  758. // generate orig metadata for files in the current wiki (for example, for pages in en.wikipedia.org/wiki/File:*)
  759. add ('simple.wikipedia.org' , 'file.page_regy') {build_commons = 'n';}
  760. // generate all orig metadata for all lnkis
  761. add ('simple.wikipedia.org' , 'file.orig_regy');
  762. // generate list of files to download based on "orig_regy" and XOWA image code
  763. add ('simple.wikipedia.org' , 'file.xfer_temp.thumb');
  764. // aggregate list one more time
  765. add ('simple.wikipedia.org' , 'file.xfer_regy');
  766. // identify images that have already been downloaded
  767. add ('simple.wikipedia.org' , 'file.xfer_regy_update');
  768. // download images. This step may also take a long time, depending on how many images are needed
  769. add ('simple.wikipedia.org' , 'file.fsdb_make') {
  770. commit_interval = 1000; progress_interval = 200; select_interval = 10000;
  771. ns_ids = '0|4|14';
  772. // specify whether original wiki databases are v1 (.sqlite3) or v2 (.xowa)
  773. src_bin_mgr__fsdb_version = 'v1';
  774. // always redownload certain files
  775. src_bin_mgr__fsdb_skip_wkrs = 'page_gt_1|small_size';
  776. // allow downloads from wikimedia
  777. src_bin_mgr__wmf_enabled = 'y';
  778. }
  779. // generate registry of original metadata by file title
  780. add ('simple.wikipedia.org' , 'file.orig_reg');
  781. // drop page_dump tables
  782. add ('simple.wikipedia.org' , 'wiki.page_dump.drop');
  783. }
  784. app.bldr.run;
  785. </pre>
  786. <h4>
  787. <span class="mw-headline" id="Script:_gnosygnu's_actual_English_Wikipedia_script_(dirty;_provided_for_reference_only)">Script: gnosygnu's actual English Wikipedia script (dirty; provided for reference only)</span>
  788. </h4>
  789. <pre class='code'>
  790. app.bldr.pause_at_end_('n');
  791. app.scripts.run_file_by_type('xowa_cfg_app');
  792. app.cfg.set_temp('app', 'xowa.app.web.enabled', 'y');
  793. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.text', '0');
  794. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.html', '0');
  795. app.cfg.set_temp('app', 'xowa.bldr.db.layout_size.file', '0');
  796. app.bldr.cmds {
  797. /*
  798. add ('www.wikidata.org' , 'util.cleanup') {delete_all = 'y';}
  799. add ('www.wikidata.org' , 'util.download') {dump_type = 'pages-articles';}
  800. add ('www.wikidata.org' , 'util.download') {dump_type = 'categorylinks';}
  801. add ('www.wikidata.org' , 'util.download') {dump_type = 'page_props';}
  802. add ('www.wikidata.org' , 'util.download') {dump_type = 'image';}
  803. add ('www.wikidata.org' , 'text.init');
  804. add ('www.wikidata.org' , 'text.page');
  805. add ('www.wikidata.org' , 'text.term');
  806. add ('www.wikidata.org' , 'text.css');
  807. add ('www.wikidata.org' , 'wiki.image');
  808. add ('www.wikidata.org' , 'wiki.page_dump.make');
  809. add ('www.wikidata.org' , 'wiki.page_props');
  810. add ('www.wikidata.org' , 'wiki.categorylinks');
  811. add ('www.wikidata.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
  812. // add ('www.wikidata.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
  813. add ('commons.wikimedia.org' , 'util.cleanup') {delete_all = 'y';}
  814. add ('commons.wikimedia.org' , 'util.download') {dump_type = 'pages-articles';}
  815. add ('commons.wikimedia.org' , 'util.download') {dump_type = 'image';}
  816. add ('commons.wikimedia.org' , 'util.download') {dump_type = 'page_props';}
  817. add ('commons.wikimedia.org' , 'text.init');
  818. add ('commons.wikimedia.org' , 'text.page');
  819. add ('commons.wikimedia.org' , 'text.term');
  820. add ('commons.wikimedia.org' , 'text.css');
  821. add ('commons.wikimedia.org' , 'wiki.image');
  822. add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
  823. add ('commons.wikimedia.org' , 'wiki.page_dump.make');
  824. add ('commons.wikimedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
  825. // add ('commons.wikimedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
  826. add ('en.wikipedia.org' , 'util.download') {dump_type = 'pages-articles';}
  827. add ('en.wikipedia.org' , 'util.download') {dump_type = 'pagelinks';}
  828. add ('en.wikipedia.org' , 'util.download') {dump_type = 'categorylinks';}
  829. add ('en.wikipedia.org' , 'util.download') {dump_type = 'page_props';}
  830. add ('en.wikipedia.org' , 'util.download') {dump_type = 'image';}
  831. add ('en.wikipedia.org' , 'util.download') {dump_type = 'imagelinks';}
  832. */
  833. /*
  834. // en.wikipedia.org
  835. add ('en.wikipedia.org' , 'text.init');
  836. add ('en.wikipedia.org' , 'text.page') {redirect_id_enabled = 'y';}
  837. add ('en.wikipedia.org' , 'text.search');
  838. add ('en.wikipedia.org' , 'text.css');
  839. add ('en.wikipedia.org' , 'text.term');
  840. add ('en.wikipedia.org' , 'wiki.image');
  841. add ('en.wikipedia.org' , 'wiki.imagelinks');
  842. add ('en.wikipedia.org' , 'wiki.page_dump.make');
  843. add ('en.wikipedia.org' , 'wiki.redirect') {commit_interval = 1000; progress_interval = 100; cleanup_interval = 100;}
  844. add ('en.wikipedia.org' , 'wiki.page_link');
  845. add ('en.wikipedia.org' , 'search.page__page_score') {iteration_max = 100;}
  846. add ('en.wikipedia.org' , 'search.link__link_score') {page_rank_enabled = 'y';
  847. score_adjustment_mgr {
  848. match_mgr {
  849. get(0) {
  850. add('bgn', 'mult', '.999', 'List_of_', 'National_Register_of_Historic_Places_listings_');
  851. add('end', 'mult', '.999', '_United_States_Census');
  852. add('all', 'mult', '.999', 'Copyright_infringement', 'Time_zone', 'Daylight_saving_time');
  853. add('all', 'add' , '0' , 'Animal');
  854. }
  855. }
  856. }
  857. }
  858. add ('en.wikipedia.org' , 'search.word__link_count');
  859. add ('en.wikipedia.org' , 'wiki.page_props');
  860. add ('en.wikipedia.org' , 'wiki.categorylinks');
  861. */
  862. /*
  863. add ('en.wikipedia.org' , 'file.page_regy') {build_commons = 'n'}
  864. add ('en.wikipedia.org' , 'wiki.mass_parse.init') {cfg {ns_ids = '0|4|100|14|8';}}
  865. // add ('en.wikipedia.org' , 'wiki.mass_parse.resume');
  866. add ('en.wikipedia.org' , 'wiki.mass_parse.exec') {cfg {
  867. num_wkrs = 8; load_all_templates = 'y'; load_ifexists_ns = '*'; cleanup_interval = 25; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2017-01-01 01:02:03';}
  868. // num_wkrs = 1; load_all_templates = 'n'; load_all_imglnks = 'n'; cleanup_interval = 50; hzip_enabled = 'y'; hdiff_enabled ='y'; manual_now = '2016-07-28 01:02:03';}
  869. }
  870. add ('en.wikipedia.org' , 'wiki.mass_parse.make');
  871. */
  872. /*
  873. add ('en.wikipedia.org' , 'file.lnki_temp') {
  874. commit_interval = 10000; progress_interval = 50; cleanup_interval = 50; select_size = 25;
  875. ns_ids = '0|4|14|100|12|8|6|10|828|108|118|446|710|2300|2302|2600';
  876. hdump_bldr {enabled = 'y'; hzip_enabled = 'y'; hzip_diff = 'y';}
  877. }
  878. */
  879. /*
  880. add ('commons.wikimedia.org' , 'file.page_regy') {build_commons = 'y'}
  881. add ('en.wikipedia.org' , 'file.page_regy') {build_commons = 'n';}
  882. add ('en.wikipedia.org' , 'file.lnki_regy');
  883. // add ('en.wikipedia.org' , 'wiki.image');
  884. add ('en.wikipedia.org' , 'file.orig_regy');
  885. add ('en.wikipedia.org' , 'file.xfer_temp.thumb');
  886. add ('en.wikipedia.org' , 'file.xfer_regy');
  887. add ('en.wikipedia.org' , 'file.xfer_regy_update');
  888. */
  889. /*
  890. add ('en.wikipedia.org' , 'file.fsdb_make') {
  891. commit_interval = 1000; progress_interval = 200; select_interval = 10000;
  892. ns_ids = '0|4|100|14|8';
  893. // // specify whether original wiki databases are v1 (.sqlite3) or v2 (.xowa)
  894. // src_bin_mgr__fsdb_version = 'v2';
  895. // trg_bin_mgr__fsdb_version = 'v1';
  896. // always redownload certain files
  897. src_bin_mgr__fsdb_skip_wkrs = 'page_gt_1|small_size';
  898. // allow downloads from wikimedia
  899. src_bin_mgr__wmf_enabled = 'y';
  900. }
  901. add ('en.wikipedia.org' , 'file.orig_reg');
  902. add ('en.wikipedia.org' , 'wiki.page_dump.drop');
  903. add ('en.wikipedia.org' , 'file.page_file_map.create');
  904. */
  905. }
  906. app.bldr.run;
  907. </pre>
  908. <h2>
  909. <span class="mw-headline" id="Change_log">Change log</span>
  910. </h2>
  911. <ul>
  912. <li>
  913. 2016-10-12: explicitly set web_access_enabled to y
  914. </li>
  915. <li>
  916. 2017-02-02: updated script for multi-threaded version and new options
  917. </li>
  918. <li>
  919. 2020-02-16: rewrote page to provide more explicit step-by-steps. Moved content to glossary
  920. </li>
  921. </ul>
  922. </div>
  923. </div>
  924. </div>
  925. <div id="mw-head" class="noprint">
  926. <div id="left-navigation">
  927. <div id="p-namespaces" class="vectorTabs">
  928. <h3>Namespaces</h3>
  929. <ul>
  930. <li id="ca-nstab-main" class="selected"><span><a id="ca-nstab-main-href" href="index.html">Page</a></span></li>
  931. </ul>
  932. </div>
  933. </div>
  934. </div>
  935. <div id='mw-panel' class='noprint'>
  936. <div id='p-logo'>
  937. <a style="background-image: url(https://gnosygnu.github.io/xowa/xowa_logo.png);" href="http://xowa.org/" title="Visit the main page"></a>
  938. </div>
  939. <div class="portal" id='xowa-portal-home'>
  940. <h3>XOWA</h3>
  941. <div class="body">
  942. <ul>
  943. <li><a href="http://xowa.org/index.html" title='Visit the main page'>Main page</a></li>
  944. <li><a href="http://xowa.org/screenshots.html" title='See screenshots of XOWA'>Screenshots</a></li>
  945. <li><a href="https://www.youtube.com/watch?v=q0qbXYXEH6M" title="See a video of XOWA Desktop in action">Video</a></li>
  946. <li><a href="http://xowa.org/home/wiki/Help/Download_XOWA.html" title='Download the XOWA application'>Download XOWA</a></li>
  947. <li><a href="http://xowa.org/home/wiki/Dashboard/Image_databases.html" title='Download offline wikis and image databases'>Download wikis</a></li>
  948. </ul>
  949. </div>
  950. </div>
  951. <div class="portal" id='xowa-portal-started'>
  952. <h3>Getting started</h3>
  953. <div class="body">
  954. <ul>
  955. <li><a href="http://xowa.org/home/wiki/App/Setup/System_requirements.html" title='Get XOWA&apos;s system requirements'>Requirements</a></li>
  956. <li><a href="http://xowa.org/home/wiki/App/Setup/Installation.html" title='Get instructions for installing XOWA'>Installation</a></li>
  957. <li><a href="http://xowa.org/home/wiki/App/Import/Simple_Wikipedia.html" title='Learn how to set up Simple Wikipedia'>Simple Wikipedia</a></li>
  958. <li><a href="http://xowa.org/home/wiki/App/Import/English_Wikipedia.html" title='Learn how to set up English Wikipedia'>English Wikipedia</a></li>
  959. <li><a href="http://xowa.org/home/wiki/App/Import/Other_wikis.html" title='Learn how to set up other Wikipedias'>Other Wikipedias</a></li>
  960. </ul>
  961. </div>
  962. </div>
  963. <div class="portal" id='xowa-portal-android'>
  964. <h3>Android</h3>
  965. <div class="body">
  966. <ul>
  967. <li><a href="http://xowa.org/home/wiki/Android/Setup.html" title='Setup XOWA on your Android device'>Setup</a></li>
  968. <li><a href="https://www.youtube.com/watch?v=jsMTBxGweUw" title="See a video of XOWA Android in action">Video</a></li>
  969. </ul>
  970. </div>
  971. </div>
  972. <div class="portal" id='xowa-portal-help'>
  973. <h3>Help</h3>
  974. <div class="body">
  975. <ul>
  976. <li><a href="http://xowa.org/home/wiki/Help/About.html" title='Get more information about XOWA'>About</a></li>
  977. <li><a href="http://xowa.org/home/wiki/Help/Contents.html" title='View a list of help topics'>Contents</a></li>
  978. <li><a href="http://xowa.org/home/wiki/Help/Media.html" title='Read what others have written about XOWA'>Media</a></li>
  979. <li><a href="http://xowa.org/home/wiki/Help/Feedback.html" title='Questions? Comments? Leave feedback for XOWA'>Feedback</a></li>
  980. </ul>
  981. </div>
  982. </div>
  983. <div class="portal" id='xowa-portal-blog'>
  984. <h3>Blog</h3>
  985. <div class="body">
  986. <ul>
  987. <li><a href="http://xowa.org/home/wiki/Blog.html" title='Follow XOWA''s development process'>Current</a></li>
  988. </ul>
  989. </div>
  990. </div>
  991. <div class="portal" id='xowa-portal-links'>
  992. <h3>Links</h3>
  993. <div class="body">
  994. <ul>
  995. <li><a href="http://dumps.wikimedia.org/backup-index.html" title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
  996. <li><a href="https://archive.org/search.php?query=xowa" title="Search archive.org for XOWA files">XOWA @ archive.org</a></li>
  997. <li><a href="http://en.wikipedia.org" title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
  998. </ul>
  999. </div>
  1000. </div>
  1001. <div class="portal" id='xowa-portal-donate'>
  1002. <h3>Donate</h3>
  1003. <div class="body">
  1004. <ul>
  1005. <li><a href="https://archive.org/donate/index.php" title="Support archive.org!">archive.org</a></li><!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/ -->
  1006. <li><a href="https://donate.wikimedia.org/wiki/Special:FundraiserRedirector" title="Support Wikipedia!">Wikipedia</a></li>
  1007. <li><a href="http://xowa.org/home/wiki/Help/Donate.html" title="Support XOWA!">XOWA</a></li>
  1008. </ul>
  1009. </div>
  1010. </div>
  1011. </div>
  1012. </body>
  1013. </html>