Getting started --------------- Use a terminal and go to the root of the folder (Wikipedia_to_hdoc). Generating .hdoc of a Wikipedia article with an URL --------------------------------------------------- 1 - Run the comand corresponding to your OS On windows : runURL.bat yourWikipediaUrl yourFilename yourWikipediaUrl is the Wikipedia URL yourFilename is the name of the directory in which output files will be placed For instance : runURL.bat https://fr.wikipedia.org/wiki/Constructeur_(programmation) constructeur On Linux : sh runURL.sh yourWikipediaUrl yourFilename yourWikipediaUrl is the Wikipedia URL yourFilename is the name of the directory in which output files will be placed For instance : sh runURL.sh https://fr.wikipedia.org/wiki/Constructeur_(programmation) constructeur 2 - Get the .hdoc in the output/yourFilename folder Generating .hdoc of a Wikipedia article with a local file --------------------------------------------------------- 1 - Copy the content of the Wikipedia article you want to convert in the directory named "input" and in a file called "source.xml". 2 - Run the comand corresponding to your OS windows : runFile.bat On Linux : sh runFile.sh 3 - Get the .hdoc in the output/source folder To do --------------------------------------------------------- Concerning images : 1 - Extract the metadata information from the meta.xml file for each image. You can do that by creating an XSL file that will be called from the ant task generated by xslt/get_ressources_urls.xsl. In that file you have the hand on each meta.xml File. 2 - Verify that images are well zipped to avoid any problem while testing in Opale 3 - Images inside paragraphs break the validation of the hdoc schema, do a preposition to change the schema and handle that. Concerning listings : 1 - Succeed in finding the language of the part of code of the wikipedia article Be aware of the following things --------------------------------------------------------- 1 - Not all images have a metadata information (only the ones who ) 2 - The title of the images have a metadata information (only the ones who are not included in the text) BUG --- 1 - Linux sh files doesn't work with UTC proxy but works outside UTC