Dev/Design/Sqlite/File-sizes

From XOWA: the free, open-source, offline wiki application

The XOWA sqlite import currently defaults to a multi-file format. This format is chosen for two reasons:

  • Large wikis and FAT32:
    • Most flash memory cards use a FAT32 file-system. FAT32 is particulary convenient when exchanging files between Windows, Linux, Mac OS X and Android.
    • FAT32 has a limit of 4GB for any one file. A large wiki like en.wikipedia.org will easily take 20 GB.
    • Multiple files allow the 20 GB data to be broken into smaller pieces: each less than 4 GB
  • Slight performance gains
A smaller database file may be easier to query than a large one because all the pages will be grouped closer together on disk
For example, consider a wiki page that requires 50 template pages.
With a single-file format, these 50 pages may be scattered anywhere over the 20 GB file.
With a multi-file format, these 50 pages may be scattered anywhere over a smaller 280 MB file. A disk drive will have to seek over a smaller section of disk. For a smaller wiki, the entire template file may be stored in the hard disk cache.

The file format is controlled by other arguments

ns_file_map

The ns_file_map argument is a new-line/semi-colon delimited string. The default value is the following:

Template;Module

Note that each line has a list of namespace names. Multiple namespaces can be delimited with the ";". The namespace name must be the "canonical" English name.

Note that an empty string will default everything to be stored in the core database. If a single file database is desired, specify "".

db_text_max value

This is a number that represents the maximum number of MB of text data that can be stored in the file. Note the following

  • Once a file reaches that number, it will spill over into a new file.
For example, file 002 is the text database. After 3,000 MB of text data is stored in file 002, the next 3,000 MB of text data will be stored in file 003.
  • The number is a rough approximation of total database size. A precise value cannot be used b/c of the following non-deterministic variables:
    • Sqlite database page size (data / indexes will not fill up an entire page)
    • Sqlite table / database overhead
As such, please use a number which is 80% of the desired size. For example, if you want a database no greater than 4,000 MB (4.0 GB), use 3,000

db_categorylink_max and db_wikidata_max value

This is a number that represents the maximum number of MB of categorylink data that can be stored in the file. Note the following:

  • This number functions similarly to the db_text_max value above. (Once the max is reached, new data will spill over into a new file)
  • However, it is more precise than db_text_max. The number specified is 90% of the actual value (presumably due to less page fragmentation)

db_wikidata_max value

This is a number that represents the maximum number of MB of wikidata label data that can be stored in the file. Note the following:

  • This number only affects www.wikidata.org wikis
  • This number only recognizes 0 and not 0.
    • To put all wikidata data in one database, use 0
    • To put all wikidata data in another database, use any number > 0

Namespaces

XOWA

Getting started

Android

Help

Blog

Donate