README.md 6.61 KB
Newer Older
1 2 3
Converter hdoc_to_pdf
-----------------------

4
The purpose of this converter is to obtain a PDF file from a hdoc document.
5 6 7 8 9 10 11 12 13 14 15


License GPL3.0
--------------

http://www.gnu.org/licenses/gpl-3.0.txt


Credits
-------

16 17 18
*   2016
    - Raphaël Debray
    - Baptiste Perraud
19 20 21 22 23 24 25 26 27 28 29 30


Dependance
----------


This project can be used alone if you only want to convert a hdoc file into a PDF file.


User documentation
------------------

31
There are two different ways to use the converter hdoc_to_pdf: by running a script run.bat/run.sh or by command line using a terminal (allows the user to specify some parameters).
bperraud's avatar
bperraud committed
32
The folder samples contains a hdoc file which may be used for some tests.
33

34 35 36 37 38 39 40 41 42 43 44 45
#### Running the script run.bat/run.sh:

Use this method if you do not want to use a terminal.

1. Download hdoc_converter.zip and unzip it.
2. Add your source file into the input folder. It must be a .hdoc file.
3. Place _only one file_ in that folder.
4. On Linux or Mac, run the script run.sh. On Windows, run the script run.bat.
5. Your file has been converted, the result is in the output folder.  

#### Terminal:

Baptiste Perraud's avatar
Baptiste Perraud committed
46
By using the terminal you can specify one parameter to the conversion at the moment: the source file.
47 48 49 50 51 52 53 54 55 56 57

1. Download hdoc_converter.zip and unzip it.
2. Open your terminal and go into the folder hdoc_to_pdf.
3. Run the following command:

    "ant -buildfile hdoc_to_pdf.ant"

    You can specify the source file by adding parameters.
    Use -DInputFile to specify the source file.
    Exemple:

Baptiste Perraud's avatar
Baptiste Perraud committed
58
    "ant -buildfile hdoc_to_pdf.ant -DInputFile=sample.hdoc"
59 60 61


This parameter is optional. Your file has been converted, the result is in the output folder.
62 63


Baptiste Perraud's avatar
Baptiste Perraud committed
64 65 66 67 68 69 70 71 72 73 74

Flying Saucer limitations
-------------------------

* Nested ul in ol are sometimes converted to ol... [only noticed once, to be verified]
* It seems that FS doesn't support the max-width or max-height for img tags, which makes proper scaling harder... For now, as a temporary solution, we scale all images at a width of 80mm.
* ToC lines rendering is sometimes ugly if the title label is too long: dotted leader or even page number may appear on the following line, sometimes colliding between themselves.
* Inline elements like em cause bad paragraphs justification if they are rendered at the beginning of a new line.
* FS doesn't support the CSS widows/orphans properties, which makes their handling harder.


75 76 77
Known bugs
----------

Baptiste Perraud's avatar
Baptiste Perraud committed
78 79
* Sometimes, they are still unwanted page breaks before a heading + list (e.g. h4 then ol).
* A schema validation is executed by jing during the hdoc_to_pdf conversion. Normally, if the validation fails, the process should abort because we are not treating a valid hdoc file. However, at the moment, the script only warns the user of the error and goes on, because the schemas and the opale_to_hdoc converter are not synchronized at the moment (it needs to be corrected).
80

Baptiste Perraud's avatar
Baptiste Perraud committed
81
Generic Todo
82
------------
83

Baptiste Perraud's avatar
Baptiste Perraud committed
84 85 86 87 88
* Rework the hdoc_to_pdf.ant and find_content.xsl scripts to allow multifiles handling.
* Handle as fully as possible of widows and girl orphans; trying to match Prince's layout and implementing the suitable CSS rules (which shall not be interpreted by FS).
* Allow the user to override some specific CSS rules, according to the main layout logical rules.
* Provide the user with a full set of options/parameters to customise the output: bound/unbound, odd/even margins, report/article LaTeX format (first page formating), etc.
* Bonus: find out a HTML editor to manually add line breaks to a hdoc file in order to resolve widows and girl orphans problems after the PDF file's generation.
89

bperraud's avatar
bperraud committed
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
Specific Todo list
------------------

* Ajouter le paramètre de reliure ("bound") au script ant
* Intégrer les styles CSS selon le paramètre "bound" dans un xsl
* Ajouter le paramètre de recto-verso au script ant
* Intégrer les styles CSS selon le paramètre recto-verso dans un xsl
* Ajout le support des marges pour documents oneside reliés
* Ajout le support des marges pour documents twoside reliés
* Identifier les règles CSS principales de traitement des tableaux
* Gérer les espacements veuves/o. pour les paragraphes
* Gérer les espacements veuves/o. pour les listes
* Gérer les espacements veuves/o. pour les tableaux
* Gérer les espacements veuves/o. pour les images
* Support des objets : ajouter une consigne dans le README de convertir tout objet graphique (odg, etc.) en image avant l'exécution
* Support des objets : ajouter des règles xsl de transformation des <object> en <img>
* Permettre à l'utilisateur de surcharger les règles CSS selon les règles logiques de la mise en page par défaut


109 110 111 112

Technical notes
---------------

113
* This converter works with _only one_ hdoc file in the input folder at the moment, please ensure to clean the folder before proceeding with the hdoc you want to convert to PDF. When the multifiles ability is set within the hdoc_to_pdf converter, the opale_to_pdf one shall naturally work because it already implements the opale_to_hdoc multifiles handling (the copy of all the hdoc results into the input directory of the hdoc_to_pdf converter).
Baptiste Perraud's avatar
Baptiste Perraud committed
114
* The java classes we use for the project are located in the "lib/MyPDFGenerator Sources" folder, please modify these if needed before compiling and adding the new jar file to the lib folder. In Eclipse, when the class is modified and ready to be exported, please choose the "Runnable jar file" export option.
115 116 117

User Story
----------
118

119 120 121 122 123 124
* Cas d'un fichier hdoc à convertir :
  * L'utilisateur dispose d'un fichier hdoc en entrée, il veut obtenir un fichier pdf paginé en sortie.
  * Il accède au convertisseur (dossier dédié) hdoc_to_pdf.
  * Il place le fichier hdoc dans le dossier input.
  * Il lance le script run.bat/run.sh ou exécute directement le script ant hdoc_to_pdf.ant.
  * Il récupère le fichier pdf dans le dossier output.
125 126 127 128


Capitalisation
--------------
Baptiste Perraud's avatar
Baptiste Perraud committed
129 130 131 132

* A16 : during this semester, we have built a hdoc_to_pdf converter from scratch, which aims to be integrated in the global hdoc project. We use the java library Flying Saucer (FS) for the purpose, but this tool has some limitations, the ones we have already noticed are listed above.
At the moment, the converter is functional and deals with main PDF layout properties: title and authors, pages numbering, headings ranks, ToC generationk, basic inline formating (+ fonts) and nested lists for instance. Some elements still need to be worked on, especially the widows/orphans behaviours for the lists. Other elements need to be handled, like the tabulars or specific objects (e.g. odg resources).
The main objective has been to keep whenever it is possible the right formating and typographic rules (often in comparison to the LateX ones), and thus deliver a readable printed document at the end.