Internals.html 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401
  1. <!DOCTYPE html>
  2. <html dir="ltr">
  3. <head>
  4. <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
  5. <title>App/Category/Internals - XOWA</title>
  6. <link rel="shortcut icon" href="https://gnosygnu.github.io/xowa/xowa_logo.png" />
  7. <link rel="stylesheet" href="https://gnosygnu.github.io/xowa/xowa_common.css" type="text/css">
  8. </head>
  9. <body class="mediawiki ltr sitedir-ltr ns-0 ns-subject skin-vector action-submit vector-animateLayout" spellcheck="false">
  10. <div id="mw-page-base" class="noprint"></div>
  11. <div id="mw-head-base" class="noprint"></div>
  12. <div id="content" class="mw-body">
  13. <h1 id="firstHeading" class="firstHeading"><span>App/Category/Internals</span></h1>
  14. <div id="bodyContent" class="mw-body-content">
  15. <div id="siteSub">From XOWA: the free, open-source, offline wiki application</div>
  16. <div id="contentSub"></div>
  17. <div id="mw-content-text" lang="en" dir="ltr" class="mw-content-ltr">
  18. <p>
  19. This page will document some of the internals of V2
  20. </p>
  21. <div id="toc" class="toc">
  22. <div id="toctitle">
  23. <h2>
  24. Contents
  25. </h2>
  26. </div>
  27. <ul>
  28. <li class="toclevel-1 tocsection-1">
  29. <a href="#Builder_commands"><span class="tocnumber">1</span> <span class="toctext">Builder commands</span></a>
  30. <ul>
  31. <li class="toclevel-2 tocsection-2">
  32. <a href="#ctg.hiddencat_sql"><span class="tocnumber">1.1</span> <span class="toctext">ctg.hiddencat_sql</span></a>
  33. </li>
  34. <li class="toclevel-2 tocsection-3">
  35. <a href="#ctg.hiddencat_ttl"><span class="tocnumber">1.2</span> <span class="toctext">ctg.hiddencat_ttl</span></a>
  36. </li>
  37. <li class="toclevel-2 tocsection-4">
  38. <a href="#ctg.link_sql"><span class="tocnumber">1.3</span> <span class="toctext">ctg.link_sql</span></a>
  39. </li>
  40. <li class="toclevel-2 tocsection-5">
  41. <a href="#ctg.link_idx"><span class="tocnumber">1.4</span> <span class="toctext">ctg.link_idx</span></a>
  42. </li>
  43. </ul>
  44. </li>
  45. <li class="toclevel-1 tocsection-6">
  46. <a href="#.2Fcategory2.2F"><span class="tocnumber">2</span> <span class="toctext">/category2/</span></a>
  47. <ul>
  48. <li class="toclevel-2 tocsection-7">
  49. <a href="#.2Fmain.2F"><span class="tocnumber">2.1</span> <span class="toctext">/main/</span></a>
  50. </li>
  51. <li class="toclevel-2 tocsection-8">
  52. <a href="#.2Flink.2F"><span class="tocnumber">2.2</span> <span class="toctext">/link/</span></a>
  53. </li>
  54. </ul>
  55. </li>
  56. </ul>
  57. </div>
  58. <h2>
  59. <span class="mw-headline" id="Builder_commands">Builder commands</span>
  60. </h2>
  61. <p>
  62. For reference, this is the current script to set up the V2 Category system
  63. </p>
  64. <pre>
  65. app.bldr.pause_at_end_('n');
  66. app.bldr.cmds
  67. .add_many('simple.wikipedia.org', 'ctg.hiddencat_sql', 'ctg.hiddencat_ttl', 'ctg.link_sql', 'ctg.link_idx').owner
  68. ;
  69. app.bldr.run;
  70. </pre>
  71. <p>
  72. Note that 'ctg.link_sql' and 'ctg.link_idx' are required.
  73. </p>
  74. <p>
  75. Note that 'ctg.hiddencat_sql' and 'ctg.hiddencat_ttl' can be omitted. However, it is recommended that they be run (for English Wikipedia, it adds less than 5 minutes to the entire process).
  76. </p>
  77. <h3>
  78. <span class="mw-headline" id="ctg.hiddencat_sql">ctg.hiddencat_sql</span>
  79. </h3>
  80. <ul>
  81. <li>
  82. This command will look for a file matching *page_props.sql in the wiki directory
  83. </li>
  84. </ul>
  85. <dl>
  86. <dd>
  87. For example: /xowa/wiki/simple.wikipedia.org/simplewiki-latest-page_props.sql. Note this sql will have a format of (page_id, prop_name, prop_val)
  88. </dd>
  89. </dl>
  90. <ul>
  91. <li>
  92. It will then parse the .sql file and look for entries having a prop_name of "hiddencat". For example (1, 'hiddencat', '')
  93. </li>
  94. </ul>
  95. <ul>
  96. <li>
  97. When it's done, it will generate a Base85 encoded list of all page_ids
  98. </li>
  99. </ul>
  100. <dl>
  101. <dd>
  102. The output directory will be /xowa/wiki/simple.wikipedia.org/tmp/ctg.hiddencat_sql/make/
  103. </dd>
  104. <dd>
  105. An example of a file would be:
  106. </dd>
  107. </dl>
  108. <pre>
  109. !!!!#
  110. !!!!$
  111. </pre>
  112. <h3>
  113. <span class="mw-headline" id="ctg.hiddencat_ttl">ctg.hiddencat_ttl</span>
  114. </h3>
  115. <ul>
  116. <li>
  117. This command will look at the output of ctg.hiddencat_sql and find the appropriate title for the given id
  118. </li>
  119. </ul>
  120. <dl>
  121. <dd>
  122. This step is necessary as the category indexes are sorted by title, not by id.
  123. </dd>
  124. </dl>
  125. <ul>
  126. <li>
  127. When it's done, it will generate a sorted list of title|id.
  128. </li>
  129. </ul>
  130. <dl>
  131. <dd>
  132. The output directory will be /xowa/wiki/simple.wikipedia.org/tmp/ctg.hiddencat_ttl/make/
  133. </dd>
  134. <dd>
  135. An example of a file would be:
  136. </dd>
  137. </dl>
  138. <pre>
  139. A|!!!!#
  140. B|!!!!$
  141. </pre>
  142. <h3>
  143. <span class="mw-headline" id="ctg.link_sql">ctg.link_sql</span>
  144. </h3>
  145. <ul>
  146. <li>
  147. This command will look for a file matching *categorylinks.sql in the wiki directory
  148. </li>
  149. </ul>
  150. <dl>
  151. <dd>
  152. For example: /xowa/wiki/simple.wikipedia.org/simplewiki-latest-categorylinks.sql.
  153. </dd>
  154. </dl>
  155. <ul>
  156. <li>
  157. It will then parse the .sql file and extract the following data: category_name, page_id, page_member_type, page_sortkey, page_member_add_date
  158. </li>
  159. </ul>
  160. <ul>
  161. <li>
  162. When it's done, it will generate a sorted list of category|type|sortkey|id|date.
  163. </li>
  164. </ul>
  165. <dl>
  166. <dd>
  167. The output directory will be /xowa/wiki/simple.wikipedia.org/tmp/ctg.link_sql/make/
  168. </dd>
  169. <dd>
  170. An example of a file would be:
  171. </dd>
  172. </dl>
  173. <pre>
  174. A|p|Page_1_sortkey|!!!!%|!!!@!|
  175. B|p|Page_2_sortkey|!!!!^|!!!@@|
  176. </pre>
  177. <h3>
  178. <span class="mw-headline" id="ctg.link_idx">ctg.link_idx</span>
  179. </h3>
  180. <ul>
  181. <li>
  182. This command will generate the /category2/ hive based on the output of the above commands. It uses the following:
  183. <ul>
  184. <li>
  185. Category link data as built in /xowa/wiki/simple.wikipedia.org/tmp/ctg.link_sql/make/.
  186. </li>
  187. <li>
  188. Category hidden data as built in /xowa/wiki/simple.wikipedia.org/tmp/ctg.hiddencat_ttl/make/.
  189. </li>
  190. </ul>
  191. </li>
  192. </ul>
  193. <ul>
  194. <li>
  195. It will then merge the output of the above data and generate the /main/ and /link/ sudirectories in /category2/
  196. </li>
  197. </ul>
  198. <h2>
  199. <span class="mw-headline" id=".2Fcategory2.2F">/category2/</span>
  200. </h2>
  201. <h3>
  202. <span class="mw-headline" id=".2Fmain.2F">/main/</span>
  203. </h3>
  204. <p>
  205. The main files are located at /xowa/wiki/simple.wikipedia.org/site/category2/main/. They follow the same hive structure as the other directories (a main reg.csv and subdirectories of the format of /00/00/00/00/0123456789.xdat)
  206. </p>
  207. <p>
  208. Each file contains header information for a category. Presently, this includes the following:
  209. </p>
  210. <ul>
  211. <li>
  212. Category name
  213. </li>
  214. <li>
  215. Hidden: "y" means hidden; "n" means not hidden
  216. </li>
  217. <li>
  218. Number of subcategories (Base85 encoded)
  219. </li>
  220. <li>
  221. Number of files (Base85 encoded)
  222. </li>
  223. <li>
  224. Number of pages (Base85 encoded)
  225. </li>
  226. </ul>
  227. <dl>
  228. <dd>
  229. EX: <code>A|y|!!!!!|!!!!!|!!!!!|</code>
  230. </dd>
  231. </dl>
  232. <h3>
  233. <span class="mw-headline" id=".2Flink.2F">/link/</span>
  234. </h3>
  235. <p>
  236. The link files are located at /xowa/wiki/simple.wikipedia.org/site/category2/link/. They also follow the same hive structure as the other directories.
  237. </p>
  238. <p>
  239. Each file contains members of a category. Presently, this includes the following:
  240. </p>
  241. <ul>
  242. <li>
  243. Category name
  244. </li>
  245. <li>
  246. Length of subcategories data
  247. </li>
  248. <li>
  249. Length of files data
  250. </li>
  251. <li>
  252. Length of pages data
  253. </li>
  254. <li>
  255. A series of entries listing category members
  256. <ul>
  257. <li>
  258. Note that these entries are broken into subgroups (subcategories / files / pages) depending on the preceding lengths.
  259. </li>
  260. <li>
  261. Each entry is in a semi-colon delimited format
  262. <ul>
  263. <li>
  264. page_id (Base85 encoded)
  265. </li>
  266. <li>
  267. page_member_add_date (Base85 encoded)
  268. </li>
  269. <li>
  270. page_sortkey
  271. </li>
  272. </ul>
  273. </li>
  274. </ul>
  275. </li>
  276. </ul>
  277. <dl>
  278. <dd>
  279. <dl>
  280. <dd>
  281. EX (for entry): <code>|!!!!%;!!!@!;Page_1_sortkey|</code>
  282. </dd>
  283. </dl>
  284. </dd>
  285. <dd>
  286. EX (for all): <code>A|!!!!!|!!!!!|!!!!X|!!!!%;!!!@!;Page_1_sortkey|!!!!^;!!!@@;Page_2_sortkey|</code>
  287. </dd>
  288. </dl>
  289. <p>
  290. <br>
  291. </p>
  292. </div>
  293. </div>
  294. </div>
  295. <div id="mw-head" class="noprint">
  296. <div id="left-navigation">
  297. <div id="p-namespaces" class="vectorTabs">
  298. <h3>Namespaces</h3>
  299. <ul>
  300. <li id="ca-nstab-main" class="selected"><span><a id="ca-nstab-main-href" href="index.html">Page</a></span></li>
  301. </ul>
  302. </div>
  303. </div>
  304. </div>
  305. <div id='mw-panel' class='noprint'>
  306. <div id='p-logo'>
  307. <a style="background-image: url(https://gnosygnu.github.io/xowa/xowa_logo.png);" href="http://xowa.org/" title="Visit the main page"></a>
  308. </div>
  309. <div class="portal" id='xowa-portal-home'>
  310. <h3>XOWA</h3>
  311. <div class="body">
  312. <ul>
  313. <li><a href="http://xowa.org/index.html" title='Visit the main page'>Main page</a></li>
  314. <li><a href="http://xowa.org/screenshots.html" title='See screenshots of XOWA'>Screenshots</a></li>
  315. <li><a href="http://xowa.org/home/wiki/Help/Download_XOWA.html" title='Download the XOWA application'>Download XOWA</a></li>
  316. <li><a href="http://xowa.org/home/wiki/Dashboard/Image_databases.html" title='Download offline wikis and image databases'>Download wikis</a></li>
  317. </ul>
  318. </div>
  319. </div>
  320. <div class="portal" id='xowa-portal-started'>
  321. <h3>Getting started</h3>
  322. <div class="body">
  323. <ul>
  324. <li><a href="http://xowa.org/home/wiki/App/Setup/System_requirements.html" title='Get XOWA&apos;s system requirements'>Requirements</a></li>
  325. <li><a href="http://xowa.org/home/wiki/App/Setup/Installation.html" title='Get instructions for installing XOWA'>Installation</a></li>
  326. <li><a href="http://xowa.org/home/wiki/App/Import/Simple_Wikipedia.html" title='Learn how to set up Simple Wikipedia'>Simple Wikipedia</a></li>
  327. <li><a href="http://xowa.org/home/wiki/App/Import/English_Wikipedia.html" title='Learn how to set up English Wikipedia'>English Wikipedia</a></li>
  328. <li><a href="http://xowa.org/home/wiki/App/Import/Other_wikis.html" title='Learn how to set up other Wikipedias'>Other Wikipedias</a></li>
  329. </ul>
  330. </div>
  331. </div>
  332. <div class="portal" id='xowa-portal-android'>
  333. <h3>Android</h3>
  334. <div class="body">
  335. <ul>
  336. <li><a href="http://xowa.org/home/wiki/Android/Setup.html" title='Setup XOWA on your Android device'>Setup</a></li>
  337. </ul>
  338. </div>
  339. </div>
  340. <div class="portal" id='xowa-portal-help'>
  341. <h3>Help</h3>
  342. <div class="body">
  343. <ul>
  344. <li><a href="http://xowa.org/home/wiki/Help/About.html" title='Get more information about XOWA'>About</a></li>
  345. <li><a href="http://xowa.org/home/wiki/Help/Contents.html" title='View a list of help topics'>Contents</a></li>
  346. <li><a href="http://xowa.org/home/wiki/Help/Media.html" title='Read what others have written about XOWA'>Media</a></li>
  347. <li><a href="http://xowa.org/home/wiki/Help/Feedback.html" title='Questions? Comments? Leave feedback for XOWA'>Feedback</a></li>
  348. </ul>
  349. </div>
  350. </div>
  351. <div class="portal" id='xowa-portal-blog'>
  352. <h3>Blog</h3>
  353. <div class="body">
  354. <ul>
  355. <li><a href="http://xowa.org/home/wiki/Blog.html" title='Follow XOWA''s development process'>Current</a></li>
  356. </ul>
  357. </div>
  358. </div>
  359. <div class="portal" id='xowa-portal-links'>
  360. <h3>Links</h3>
  361. <div class="body">
  362. <ul>
  363. <li><a href="http://dumps.wikimedia.org/backup-index.html" title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
  364. <li><a href="https://archive.org/search.php?query=xowa" title="Search archive.org for XOWA files">XOWA @ archive.org</a></li>
  365. <li><a href="http://en.wikipedia.org" title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
  366. </ul>
  367. </div>
  368. </div>
  369. <div class="portal" id='xowa-portal-donate'>
  370. <h3>Donate</h3>
  371. <div class="body">
  372. <ul>
  373. <li><a href="https://archive.org/donate/index.php" title="Support archive.org!">archive.org</a></li><!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/ -->
  374. <li><a href="https://donate.wikimedia.org/wiki/Special:FundraiserRedirector" title="Support Wikipedia!">Wikipedia</a></li>
  375. <!-- <li><a href="" title="Support XOWA! (but only after you've supported archive.org and Wikipedia)">XOWA</a></li> -->
  376. </ul>
  377. </div>
  378. </div>
  379. <div class="portal" id='xowa-portal-image'>
  380. <br/>
  381. <a href="https://play.google.com/store/apps/details?id=org.xowa" class="image">
  382. <img width='140px' src="https://gnosygnu.github.io/xowa/en-play-badge.png" />
  383. </a>
  384. </div>
  385. </div>
  386. </body>
  387. </html>