README.md 4.46 KB
Newer Older
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
1
Converter : Wikipedia_to_Hdoc
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
2
===============
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
3

Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
4
Licence : 
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
5
---------------
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
6 7 8
GPL 3.0
http://www.gnu.org/licenses/gpl-3.0.txt

Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
9 10 11 12 13 14 15 16 17

Credits :
---------------
Carrel Billiard Harold

Harriga Merouane

Lhomme Nicolas

Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
18
Previous developers
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
19

Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
20 21
Presentation
---------------
Harriga Merouane's avatar
Harriga Merouane committed
22
This converter transforms a wkipedia page (from a link or a saved page) to a Hdoc document.
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
23

Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
24
User Documentation
25 26
---------------

Harriga Merouane's avatar
Harriga Merouane committed
27
Use a terminal and go to the root of the folder (Wikipedia_to_hdoc).
28 29 30 31

Generating .hdoc of a Wikipedia article with an URL
---------------------------------------------------

Harriga Merouane's avatar
Harriga Merouane committed
32
1 - Run the command corresponding to your OS
Nicolas Lhomme's avatar
Nicolas Lhomme committed
33
        
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
        On windows : 
            runURL.bat yourWikipediaUrl yourFilename
                yourWikipediaUrl is the Wikipedia URL
                yourFilename is the name of the directory in which output files will be placed
                
            For instance : runURL.bat https://fr.wikipedia.org/wiki/Constructeur_(programmation) constructeur
        
        On Linux : 
            sh runURL.sh yourWikipediaUrl yourFilename
                yourWikipediaUrl is the Wikipedia URL
                yourFilename is the name of the directory in which output files will be placed
            
            For instance : sh runURL.sh https://fr.wikipedia.org/wiki/Constructeur_(programmation) constructeur
            
2 - Get the .hdoc in the output/yourFilename folder

Generating .hdoc of a Wikipedia article with a local file
---------------------------------------------------------

lhommeni's avatar
lhommeni committed
53
1 - Copy the content of the Wikipedia article you want to convert in the directory named "input" and in a file called "source.xml".
Nicolas Lhomme's avatar
Nicolas Lhomme committed
54 55
    Display the source code of the wikipedia page, copy it and paste it in the new file source.xml
    Make sure to copy/paste the source code and not save it directly as a file.
Nicolas Lhomme's avatar
Nicolas Lhomme committed
56

Harriga Merouane's avatar
Harriga Merouane committed
57
2 - Run the command corresponding to your OS
Nicolas Lhomme's avatar
Nicolas Lhomme committed
58 59 60
        
        
         windows : 
61 62 63 64 65
            runFile.bat
        
        On Linux : 
            sh runFile.sh
                       
66 67 68 69
3 - Get the .hdoc in the output/source folder

To do
---------------------------------------------------------
Nicolas Lhomme's avatar
Nicolas Lhomme committed
70 71 72 73 74 75 76
In general :

1 - Handle Notes and References

2 - Display the table on the right in the introduction (text and images)


lhommeni's avatar
lhommeni committed
77
Concerning images :
78

lhommeni's avatar
lhommeni committed
79
1 - Extract the metadata information from the meta.xml file for each image. You can do that by creating an XSL file that will be called from the ant task generated by xslt/get_ressources_urls.xsl. In that file you have the hand on each meta.xml File.
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
80

81
2 - Verify that images are well zipped to avoid any problem while testing in Opale
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
82

83 84
3 - Images inside paragraphs break the validation of the hdoc schema, do a preposition to change the schema and handle that.

Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
85

lhommeni's avatar
lhommeni committed
86 87 88 89
Concerning listings :

1 - Succeed in finding the language of the part of code of the wikipedia article

haroldcb's avatar
haroldcb committed
90 91 92 93

Concerning tables : 

1 - Solve the encoding problem,
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
94

haroldcb's avatar
haroldcb committed
95
2 - Change Hdoc Scheme in order to accept images in tables?
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
96

Harriga Merouane's avatar
Harriga Merouane committed
97
3 - Display complex tables as tables in Opale (not as extern files)
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
98

99 100
Technical notes
---------------
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
101
For images you can refer to the  get-ressources-with-meta.xsl  and  official-meta.xml. Read the commentary and it will help you  finish the to do task regarding images. These files are included just for the needs of Capitalization.
102

103 104
Be aware of the following things
---------------------------------------------------------
Harriga Merouane's avatar
Harriga Merouane committed
105 106
1 - Not all the images have a metadata information in  Wikipedia pages.
2 - The title of images don't exist in the metadata information that you will retrieve. Use the @alt attribute when you have the hand on an image to get it.
107 108 109

BUG
---
Harriga Merouane's avatar
Harriga Merouane committed
110
1 - Linux sh file doesn't work with UTC proxy but works outside of UTC.
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
111 112

2 - Random errors might occur
Harriga Merouane's avatar
Harriga Merouane committed
113
Wikipedia is a great tool : everyone can participate. However, it does not provide contributors with best practices that everyone follows. The result is a lot of different ways to write articles. This is why this converter might not handle some situations (even if all files tried worked), and it might not be able to output some Wikipedia articles at its current state.
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
114 115 116

3 - Small issues with Opale
Links can be invisible if you use an old version of Opale. This is not a problem coming from the Wikipedia to Hdoc converter. Make sure you use an updated version of Opale to test your scar archives.
Harold Carrel Billiard's avatar
Harold Carrel Billiard committed
117
Another thing is that Opale might indicate that the scar file contains errors once imported. Actually, these "errors" are warnings. The archives work, as they were validated when making the scar file. These warnings come from Opale, but you can ignore them.