App/Category
As of v3.9.2.1 XOWA has one Category system: v3.
- The first part of this page will discuss v3.
- The second part of the page is an archived copy of the earlier v1 and v2 explanation
Version 3
Version 3 was introduced to handle Categories for HTML dumps on PC and Android. The high-level details are as follows:
- Uses the Wikimedia categorylinks and page_props dumps: Like v2, v3 downloads separate MediaWiki dumps for categorylinks.sql and page_props.sql . Both files are needed to generate accurate renditions of the Wikipedia Category system.
- Generates "*-xtn.category.*.xowa" files: v3 stores all the category info in "*-xtn.category.*.xowa" files. For smaller wikis, the data is stored instead in the "-core.xowa" file
- Works with both Wikitext databases and HTML databases: v3 will work with wikis imported by "Import/Online" (Wikitext) as well as "Download Central" (HTML)
- Is backwards compatible with v1 and v2: v3.9.2.1 will work with v1, v2, and v3 category systems
- Is generated automatically with import: v3 is now generated automatically when importing a wiki. Previously, v2 would require a separate post-processing step under Import/Offline.
- Smaller size: v3 makes some database changes to reduce file size. For English Wikipedia, that means a difference between 10 GB and 8 GB. 8 GB may sound like a lot for Categories, but keep in mind there are over 100 million page to category links.
- Does not work for text database dumps: XOWA originally started off storing files in text files instead of sqlite files. I switched over to SQLite three years ago and phased out text databases two years ago. It's possible that some users with old wikis (3 years old) may still have these text databases. If so, then the new Category system won't work.
Version 1
Version 1 is a simplistic category system.
- It relies only on page content inside the xml file. It does not use any of the category*.sql dumps.
Note the following limitations:
- Does not work with large categories. It gets linearly worse with more members (do not use it to load a category with over 10,000 members)
- Does not support paging. If a category has 1,000 members, it will load title information on all 1,000 (instead of just the first 200)
- Does not use sortkey. For example, Jimmy Wales will alphabetize under J (for Jimmy Wales) instead of W (for Wales, Jimmy)
- Does not accurately reflect page membership in categories.
- For example, most hidden categories are added to a template which is then included in a page.
- Specifically, a page called "File:GNU.png" may belong to "All free media". However, the "File:GNU.png" page doesn't have the [[Category:All_free_media]] but instead embeds a template {{All_non_free_media}} which has the [[Category:All_free_media]]
- Since a full parse (with templates) of the entire xml file would take many hours, this membership data is omitted.
V1 should be considered obsolete. No signficant changes will be made to it, as V2 is the official category system.
However, because V1 is faster to setup than V2, it still remains the default (with a strong recommendation to upgrade to V2 when time permits)
Version 2
Version 2 is an accurate category system.
- It uses the Wikimedia dump files: categorylinks.sql, page_props.sql
It addresses each of the limitations of version 1, including
- Works with large categories
- Supports paging
- Uses sortkey
- Accurately includes all members of a category
It has a few limitations:
- It requires additional dump files (as mentioned above).
- It takes longer to setup. A separate .sql file must be parsed. For English Wikipedia this process takes about another hour.
- It takes more disk space. The v2 system stores sortkeys individually per entry (just like Wikipedia). However this text data greatly increases the overall file size. English Wikipedia will have about 10.0 GB of extra data.
V2 is the official category system and should generate Category pages just like Wikipedia.
For more information about V2 setup see App/Category/Building
For more information about V2 internals see App/Category/Internals