123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320 |
- <!DOCTYPE html>
- <html dir="ltr">
- <head>
- <title>XOWA: Set up English Wikipedia</title>
- <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
- <link rel="shortcut icon" href="//gnosygnu.github.io/xowa/xowa_logo.png" />
- <link rel="stylesheet" href="//gnosygnu.github.io/xowa/xowa_common.css" type="text/css">
- </head>
- <body class="mediawiki ltr sitedir-ltr ns-0 ns-subject skin-vector action-submit vector-animateLayout" spellcheck="false">
- <div id="mw-page-base" class="noprint"></div>
- <div id="mw-head-base" class="noprint"></div>
- <div id="content" class="mw-body">
- <h1 id="firstHeading" class="firstHeading"><span>Set up English Wikipedia</span></h1>
- <div id="bodyContent" class="mw-body-content">
- <div id="siteSub">From XOWA: a free, open-source, offline wiki application</div>
- <div id="contentSub"></div>
- <div id="mw-content-text" lang="en" dir="ltr" class="mw-content-ltr">
- <!-- page_bgn -->
- <div id="toc" class="toc">
- <div id="toctitle">
- <h2>
- Contents
- </h2>
- </div>
- <ul>
- <li class="toclevel-1 tocsection-1">
- <a href="#Overview"><span class="tocnumber">1</span> <span class="toctext">Overview</span></a>
- </li>
- <li class="toclevel-1 tocsection-2">
- <a href="#Part_1:_Set_up_the_wiki"><span class="tocnumber">2</span> <span class="toctext">Part 1: Set up the wiki</span></a>
- <ul>
- <li class="toclevel-2 tocsection-3">
- <a href="#Option_1:_Import_the_wiki_with_XOWA"><span class="tocnumber">2.1</span> <span class="toctext">Option 1: Import the wiki with XOWA</span></a>
- <ul>
- <li class="toclevel-3 tocsection-4">
- <a href="#Overview_2"><span class="tocnumber">2.1.1</span> <span class="toctext">Overview</span></a>
- </li>
- <li class="toclevel-3 tocsection-5">
- <a href="#Steps"><span class="tocnumber">2.1.2</span> <span class="toctext">Steps</span></a>
- </li>
- </ul>
- </li>
- <li class="toclevel-2 tocsection-6">
- <a href="#Option_2:_Download_the_wiki_from_archive.org"><span class="tocnumber">2.2</span> <span class="toctext">Option 2: Download the wiki from archive.org</span></a>
- <ul>
- <li class="toclevel-3 tocsection-7">
- <a href="#Overview_3"><span class="tocnumber">2.2.1</span> <span class="toctext">Overview</span></a>
- </li>
- <li class="toclevel-3 tocsection-8">
- <a href="#Steps_2"><span class="tocnumber">2.2.2</span> <span class="toctext">Steps</span></a>
- </li>
- </ul>
- </li>
- </ul>
- </li>
- <li class="toclevel-1 tocsection-9">
- <a href="#Part_2:_Download_the_images"><span class="tocnumber">3</span> <span class="toctext">Part 2: Download the images</span></a>
- <ul>
- <li class="toclevel-2 tocsection-10">
- <a href="#Steps_3"><span class="tocnumber">3.1</span> <span class="toctext">Steps</span></a>
- </li>
- </ul>
- </li>
- <li class="toclevel-1 tocsection-11">
- <a href="#Updating_the_wiki"><span class="tocnumber">4</span> <span class="toctext">Updating the wiki</span></a>
- </li>
- <li class="toclevel-1 tocsection-12">
- <a href="#Disk_space_usage"><span class="tocnumber">5</span> <span class="toctext">Disk space usage</span></a>
- </li>
- <li class="toclevel-1 tocsection-13">
- <a href="#Notes"><span class="tocnumber">6</span> <span class="toctext">Notes</span></a>
- </li>
- </ul>
- </div>
- <h2>
- <span class='mw-headline' id='Overview'>Overview</span>
- </h2>
- <p>
- English Wikipedia has a lot of data. There are 15.0+ million pages with 20.0+ GB of text, as well as 4.0+ million thumbnails.
- </p>
- <p>
- Setting all this up on your computer will not be a quick process. As a general estimate, you will need about 30 GB and 5 hours processing time. If you want images as well, the numbers increase to 100 GB of disk space and 30+ hours of processing time. However, when you are done, you will have a complete, recent copy of English Wikipedia with images that can fit on a 128 GB SD card.
- </p>
- <p>
- Although the process itself is not hard, I <b>strongly recommend</b> that you try Simple Wikipedia first. Simple Wikipedia has 184,000 pages and 90,000 images. The text version uses 200 MB and sets up in 5 minutes. With images, this expands to 2 GB and 30 minutes of downloading time. Simple Wikipedia is a reasonably accurate simulation of English Wikipedia -- just much smaller. It'll also give you a pretty good idea of what XOWA can do.
- </p>
- <h2>
- <span class='mw-headline' id='Part_1:_Set_up_the_wiki'>Part 1: Set up the wiki</span>
- </h2>
- <p>
- The first part is to set up the wiki. You have two approaches for this part: import the wiki with XOWA or download a copy from the internet.
- </p>
- <h3>
- <span class='mw-headline' id='Option_1:_Import_the_wiki_with_XOWA'>Option 1: Import the wiki with XOWA</span>
- </h3>
- <h4>
- <span class='mw-headline' id='Overview_2'>Overview</span>
- </h4>
- <ul>
- <li>
- XOWA will download the database dump from the Wikimedia backup servers
- </li>
- <li>
- The database dump will be 10+ GB and take about 3 hours to download
- </li>
- <li>
- XOWA will take about 2.5 hours to build the wiki. The final wiki will use about 20 GB of disk space.<sup id="cite_ref-wiki_files_0-0" class="reference"><a href="#cite_note-wiki_files-0">[1]</a></sup>
- </li>
- </ul>
- <h4>
- <span class='mw-headline' id='Steps'>Steps</span>
- </h4>
- <ul>
- <li>
- Launch XOWA
- </li>
- <li>
- Use the menu bar and select Tools -> Import From List. Alternatively, you can enter <code>home/wiki/Help:Import/List</code> into the address bar
- </li>
- <li>
- Find <b>en.wikipedia.org</b>
- </li>
- <li>
- Click on the "download" link to the left.
- </li>
- </ul>
- <p>
- That's it. The import process has now started. This part takes at least 5 hours so you may want to let it run for a while. When it's done, it will automatically load the Main Page.
- </p>
- <h3>
- <span class='mw-headline' id='Option_2:_Download_the_wiki_from_archive.org'>Option 2: Download the wiki from archive.org</span>
- </h3>
- <h4>
- <span class='mw-headline' id='Overview_3'>Overview</span>
- </h4>
- <ul>
- <li>
- The download will be approximately 20 GB.
- </li>
- <li>
- When the download is completed, extract the files to J:\gplx\xowa\wiki\en.wikipedia.org
- </li>
- </ul>
- <h4>
- <span class='mw-headline' id='Steps_2'>Steps</span>
- </h4>
- <ul>
- <li>
- Download the file <a href="https://archive.org/details/Xowa_enwiki_latest" class="external text" rel="nofollow">from here</a>
- </li>
- <li>
- After the download completes, unzip the archive file in J:\gplx\xowa\. When you are done you should have a file like J:\gplx\xowa\wiki\en.wikipedia.org\en.wikipedia.org-core.xowa
- </li>
- <li>
- Launch XOWA
- </li>
- <li>
- Enter "w:" in the address bar. The Main_Page should load.
- </li>
- </ul>
- <h2>
- <span class='mw-headline' id='Part_2:_Download_the_images'>Part 2: Download the images</span>
- </h2>
- <p>
- This part takes much longer to complete. It will require at least 70 GB of disk space and 24+ hours of download time. You'll be downloading compressed files from archive.org.<sup id="cite_ref-images_are_thumbnails_1-0" class="reference"><a href="#cite_note-images_are_thumbnails-1">[2]</a></sup>
- </p>
- <h4>
- <span class='mw-headline' id='Steps_3'>Steps</span>
- </h4>
- <ul>
- <li>
- Go to <a href="https://archive.org/details/Xowa_enwiki_latest" class="external text" rel="nofollow">https://archive.org/details/Xowa_enwiki_latest</a>
- </li>
- <li>
- Download each of the listed links marked <code>image</code>
- </li>
- <li>
- Extract the files to J:\gplx\xowa\. When you are done, you will have files from J:\gplx\xowa\wiki\en.wikipedia.org\en.wikipedia.org-file-ns.000-db.001.xowa to J:\gplx\xowa\wiki\en.wikipedia.org\en.wikipedia.org-file-ns.000-db.023.xowa as well as several others
- </li>
- </ul>
- <h2>
- <span class='mw-headline' id='Updating_the_wiki'>Updating the wiki</span>
- </h2>
- <p>
- Wikipedia is constantly updating. New pages are added, and existing pages are changed to include different images. The above steps will give you a complete set of images for 2015-04-03. However, if you want to stay up to date with Wikipedia, then you may also want to download the monthly updates.
- </p>
- <p>
- Monthly updates will be posted at the same url: <a href="https://archive.org/details/Xowa_enwiki_latest" class="external text" rel="nofollow">https://archive.org/details/Xowa_enwiki_latest</a> There will be a new link with the name of the wiki dump: for example: <code>2015-05-02</code>. They will have new images introduced in the Wikipedia dump for that month. Note that these updates should be downloaded and unzipped in order (i.e.: first 2015-05-02, then 2015-06-02, etc). There are some files that appear in multiple sets: the most recent copy of the file should always replace the earlier version.
- </p>
- <p>
- Note that if you update your wiki, you do not have to update the images. The two are independent of each other. In other words, you can use the 2017-01-01 English Wikipedia xml dump with the 2015-04-03 English Wikipedia images. Note that new images in the 2017-01-01 dump will not show up until you download the appropriate monthly updates.
- </p>
- <h2>
- <span class='mw-headline' id='Disk_space_usage'>Disk space usage</span>
- </h2>
- <p>
- Some may wonder why XOWA needs so much disk space, especially when compared to other apps. The following is a brief list of reasons:
- </p>
- <ul>
- <li>
- XOWA is complete. It includes all articles across all namespaces, including the Wikipedia namespace, the Portal namespace, the Help namespace, and several others. It also includes redirect stubs. Other apps will only provide articles in the Main namespace.
- </li>
- <li>
- XOWA includes Categories as well. Other apps will skip Categories altogether.
- </li>
- <li>
- XOWA shows all content on the page. Other apps will omit sections, such as Table of Contents or Navigation boxes at the bottom of the page.
- </li>
- <li>
- XOWA includes all images for the Main namespace, the Portal namespace and the Wikipedia namespace. Other apps will only provide images for the Main namespace
- </li>
- <li>
- XOWA provides an accurate sized thumbnail for an article. If an article shows an 800 pixel wide image, XOWA shows an 800 pixel wide image. Other apps will actually show a smaller 220 pixel wide image.
- </li>
- <li>
- XOWA includes the latest content. Other apps may be many months (if not years) behind.
- </li>
- </ul>
- <h2>
- <span class='mw-headline' id='Notes'>Notes</span>
- </h2>
- <ol class="references">
- <li id="cite_note-wiki_files-0">
- <span class="mw-cite-backlink"><a href="#cite_ref-wiki_files_0-0">^</a></span> <span class="reference-text">Note that when the import completes, it will move the 10 GB file to /xowa/wiki/#dump/done. This file can be deleted safely. Note that XOWA doesn't delete the file, as some users may want to keep the 10 GB file around for archival purposes, and redownoading 10 GB would be time-consuming.</span>
- </li>
- <li id="cite_note-images_are_thumbnails-1">
- <span class="mw-cite-backlink"><a href="#cite_ref-images_are_thumbnails_1-0">^</a></span> <span class="reference-text">Note that these images are thumbnails, and are not the originals. They will show correctly in the context of the article, but if you want the original file, you will need to download the tarballs. See <a href="/wiki/Help:Offline_images" id="xowa_lnki_2" title="Offline images">Help:Offline images</a></span>
- </li>
- </ol>
- <!-- page_end -->
- </div>
- </div>
- </div>
- <div id="mw-head" class="noprint">
- <div id="left-navigation">
- <div id="p-namespaces" class="vectorTabs">
- <h3>Namespaces</h3>
- <ul>
- <li id="ca-nstab-main" class="selected"><span><a id="ca-nstab-main-href" href="index.html">Page</a></span></li>
- </ul>
- </div>
- </div>
- </div>
- <div id='mw-panel' class='noprint'>
- <div id='p-logo'>
- <a style="background-image: url(//gnosygnu.github.io/xowa/xowa_logo.png);" href="//gnosygnu.github.io/xowa/" title="Visit the main page"></a>
- </div>
- <div class="portal" id='xowa-portal-home'>
- <h3>XOWA</h3>
- <div class="body">
- <ul>
- <li><a href="//gnosygnu.github.io/xowa/" title='Visit the main page'>Main page</a></li>
- <li><a href="//gnosygnu.github.io/xowa/screenshots.html" title='See screenshots of XOWA'>Screenshots</a></li>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/Help/Download_XOWA.html" title='Download the XOWA application'>Download XOWA</a></li>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/Dashboard/Image_databases.html" title='Download offline wikis and image databases'>Download wikis</a></li>
- </ul>
- </div>
- </div>
- <div class="portal" id='xowa-portal-stargin'>
- <h3>Getting started</h3>
- <div class="body">
- <ul>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/App/Setup/System_requirements.html" title='Get XOWA's system requirements'>Requirements</a></li>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/App/Setup/Installation.html" title='Get instructions for installing XOWA'>Installation</a></li>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/App/Import/Simple_Wikipedia.html" title='Learn how to set up Simple Wikipedia'>Simple Wikipedia</a></li>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/App/Import/English_Wikipedia.html" title='Learn how to set up English Wikipedia'>English Wikipedia</a></li>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/App/Import/Other_wikis.html" title='Learn how to set up other Wikipedias'>Other Wikipedias</a></li>
- </ul>
- </div>
- </div>
- <div class="portal" id='xowa-portal-help'>
- <h3>Help</h3>
- <div class="body">
- <ul>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/Help/About.html" title='Get more information about XOWA'>About</a></li>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/Help/Contents.html" title='View a list of help topics'>Contents</a></li>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/Help/Media.html" title='Read what others have written about XOWA'>Media</a></li>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/Help/Feedback.html" title='Questions? Comments? Leave feedback for XOWA'>Feedback</a></li>
- </ul>
- </div>
- </div>
-
- <div class="portal" id='xowa-portal-blog'>
- <h3>Blog</h3>
- <div class="body">
- <ul>
- <li><a href="//gnosygnu.github.io/xowa/wiki/home/page/Blog.html" title='Follow XOWA''s development process'>Current</a></li>
- </ul>
- </div>
- </div>
- <div class="portal" id='xowa-portal-links'>
- <h3>Links</h3>
- <div class="body">
- <ul>
- <li><a href="http://dumps.wikimedia.org/backup-index.html" title="Get wiki datababase dumps directly from Wikimedia">Wikimedia dumps</a></li>
- <li><a href="https://archive.org/search.php?query=xowa" title="Search archive.org for XOWA files">XOWA @ archive.org</a></li>
- <li><a href="http://en.wikipedia.org" title="Visit Wikipedia (and compare to XOWA!)">English Wikipedia</a></li>
- </ul>
- </div>
- </div>
- <div class="portal" id='xowa-portal-donate'>
- <h3>Donate</h3>
- <div class="body">
- <ul>
- <li><a href="https://archive.org/donate/index.php" title="Support archive.org!">archive.org</a></li><!-- listed first due to recent fire damages: http://blog.archive.org/2013/11/06/scanning-center-fire-please-help-rebuild/ -->
- <li><a href="https://donate.wikimedia.org/wiki/Special:FundraiserRedirector" title="Support Wikipedia!">Wikipedia</a></li>
- <!-- <li><a href="" title="Support XOWA! (but only after you've supported archive.org and Wikipedia)">XOWA</a></li> -->
- </ul>
- </div>
- </div></div>
- </body>
- </html>
|