Commit c853c8cc authored by Aghiles's avatar Aghiles
Browse files

Merge branch 'master' of https://gitlab.utc.fr/crozatst/hdoc

parents 2ea5f2f6 16fe948d
!tmp/.gitkeep
!output/.gitkeep
......@@ -29,6 +29,7 @@ User documentation
------------------
There are two different ways to use the converter hdoc_to_pdf: by running a script run.bat/run.sh or by command line using a terminal (allows the user to specify some parameters).
The folder samples contains a hdoc file which may be used for some tests.
#### Running the script run.bat/run.sh:
......@@ -42,7 +43,7 @@ Use this method if you do not want to use a terminal.
#### Terminal:
By using the terminal you can specify some parameters to the conversion at the moment: the source file.
By using the terminal you can specify one parameter to the conversion at the moment: the source file.
1. Download hdoc_converter.zip and unzip it.
2. Open your terminal and go into the folder hdoc_to_pdf.
......@@ -54,46 +55,78 @@ By using the terminal you can specify some parameters to the conversion at the m
Use -DInputFile to specify the source file.
Exemple:
"ant -buildfile hdoc_to_optim.ant -DInputFile=sample.hdoc"
"ant -buildfile hdoc_to_pdf.ant -DInputFile=sample.hdoc"
This parameter is optional. Your file has been converted, the result is in the output folder.
Flying Saucer limitations
-------------------------
* Nested ul in ol are sometimes converted to ol... [only noticed once, to be verified]
* It seems that FS doesn't support the max-width or max-height for img tags, which makes proper scaling harder... For now, as a temporary solution, we scale all images at a width of 80mm.
* ToC lines rendering is sometimes ugly if the title label is too long: dotted leader or even page number may appear on the following line, sometimes colliding between themselves.
* Inline elements like em cause bad paragraphs justification if they are rendered at the beginning of a new line.
* FS doesn't support the CSS widows/orphans properties, which makes their handling harder.
Known bugs
----------
* Nested ul in ol are sometimes converted to ol.
* It seems that FS doesn't support the max-width for img tags, which makes proper scaling harder.
* ToC lines rendering is sometimes ugly if the title label is too long.
* Inline elements like em cause bad paragraphs justification.
* Sometimes, they are still unwanted page breaks before a heading + list (e.g. h4 then ol).
* A schema validation is executed by jing during the hdoc_to_pdf conversion. Normally, if the validation fails, the process should abort because we are not treating a valid hdoc file. However, at the moment, the script only warns the user of the error and goes on, because the schemas and the opale_to_hdoc converter are not synchronized at the moment (it needs to be corrected).
Generic Todo
------------
* Generate a clean PDF file (using the LaTeX formatting example)
- Create a default CSS file with basic spine rules
- Get the right free font (equivalent to the LaTeX's one)
* Generate the ToC according to the converted (by XSL) headings of the hdoc
* Handle as fully as possible of widows and girl orphans; trying to match Prince's layout and implementing the suitable CSS rules (which shall not be interpreted by FS)
* Allow the user to override some specific CSS rules, according to the main layout logical rules
* Bonus: find out a HTML editor to manually add line breaks to a hdoc file in order to resolve widows and girl orphans problems after the PDF file's generation
* Rework the hdoc_to_pdf.ant and find_content.xsl scripts to allow multifiles handling.
* Handle as fully as possible of widows and girl orphans; trying to match Prince's layout and implementing the suitable CSS rules (which shall not be interpreted by FS).
* Allow the user to override some specific CSS rules, according to the main layout logical rules.
* Provide the user with a full set of options/parameters to customise the output: bound/unbound, odd/even margins, report/article LaTeX format (first page formating), etc.
* Bonus: find out a HTML editor to manually add line breaks to a hdoc file in order to resolve widows and girl orphans problems after the PDF file's generation.
Specific Todo list
------------------
* Ajouter le paramètre de reliure ("bound") au script ant
* Intégrer les styles CSS selon le paramètre "bound" dans un xsl
* Ajouter le paramètre de recto-verso au script ant
* Intégrer les styles CSS selon le paramètre recto-verso dans un xsl
* Ajout le support des marges pour documents oneside reliés
* Ajout le support des marges pour documents twoside reliés
* Identifier les règles CSS principales de traitement des tableaux
* Gérer les espacements veuves/o. pour les paragraphes
* Gérer les espacements veuves/o. pour les listes
* Gérer les espacements veuves/o. pour les tableaux
* Gérer les espacements veuves/o. pour les images
* Support des objets : ajouter une consigne dans le README de convertir tout objet graphique (odg, etc.) en image avant l'exécution
* Support des objets : ajouter des règles xsl de transformation des <object> en <img>
* Permettre à l'utilisateur de surcharger les règles CSS selon les règles logiques de la mise en page par défaut
Technical notes
---------------
* This converter works with _only one_ hdoc file in the input folder, please ensure to clean the folder before proceeding with the hdoc you want to convert to PDF.
* This converter works with _only one_ hdoc file in the input folder at the moment, please ensure to clean the folder before proceeding with the hdoc you want to convert to PDF. When the multifiles ability is set within the hdoc_to_pdf converter, the opale_to_pdf one shall naturally work because it already implements the opale_to_hdoc multifiles handling (the copy of all the hdoc results into the input directory of the hdoc_to_pdf converter).
* The java classes we use for the project are located in the "lib/MyPDFGenerator Sources" folder, please modify these if needed before compiling and adding the new jar file to the lib folder. In Eclipse, when the class is modified and ready to be exported, please choose the "Runnable jar file" export option.
User Story
----------
* L'utilisateur dispose d'un fichier hdoc en entrée, il veut obtenir un fichier pdf paginé en sortie.
* Il accède au convertisseur (dossier dédié) hdoc_to_pdf.
* Il place le fichier hdoc dans le dossier input.
* Il lance le script run.bat/run.sh ou exécute directement le script ant hdoc_to_pdf.ant.
* Il récupère le fichier pdf dans le dossier output.
* Cas d'un fichier hdoc à convertir :
* L'utilisateur dispose d'un fichier hdoc en entrée, il veut obtenir un fichier pdf paginé en sortie.
* Il accède au convertisseur (dossier dédié) hdoc_to_pdf.
* Il place le fichier hdoc dans le dossier input.
* Il lance le script run.bat/run.sh ou exécute directement le script ant hdoc_to_pdf.ant.
* Il récupère le fichier pdf dans le dossier output.
Capitalisation
--------------
* A16 : during this semester, we have built a hdoc_to_pdf converter from scratch, which aims to be integrated in the global hdoc project. We use the java library Flying Saucer (FS) for the purpose, but this tool has some limitations, the ones we have already noticed are listed above.
At the moment, the converter is functional and deals with main PDF layout properties: title and authors, pages numbering, headings ranks, ToC generationk, basic inline formating (+ fonts) and nested lists for instance. Some elements still need to be worked on, especially the widows/orphans behaviours for the lists. Other elements need to be handled, like the tabulars or specific objects (e.g. odg resources).
The main objective has been to keep whenever it is possible the right formating and typographic rules (often in comparison to the LateX ones), and thus deliver a readable printed document at the end.
......@@ -45,14 +45,21 @@
@bottom-center { content: counter(page) }
}
body {
html, body {
padding: 0;
margin: 0;
height: auto;
}
* {
font-family: "CMU Serif";
}
a {
text-decoration: none;
color: black;
}
p {
text-align: justify;
text-indent: 2em;
......@@ -62,6 +69,8 @@ p {
word-break: normal;
}
span[data-hdoc-type="syntax"] {font-family: "CMU Typewriter Text"}
h1 {font-size: 2em}
h2 {font-size: 1.6em}
h3 {font-size: 1.4em}
......@@ -77,26 +86,11 @@ h6 {font-style: italic}
body {counter-reset: h2}
h2 {counter-reset: h3}
h3 {counter-reset: h4}
/*
h4 {counter-reset: h5}
h5 {counter-reset: h6}
*/
h2::before {counter-increment: h2; content: counter(h2) ".\0000a0\0000a0"}
h3::before {counter-increment: h3; content: counter(h2) "." counter(h3) ".\0000a0\0000a0"}
h4::before {counter-increment: h4; content: counter(h2) "." counter(h3) "." counter(h4) ".\0000a0\0000a0"}
/*
h5::before {counter-increment: h5; content: counter(h2) "." counter(h3) "." counter(h4) "." counter(h5) ".\0000a0\0000a0"}
h6::before {counter-increment: h6; content: counter(h2) "." counter(h3) "." counter(h4) "." counter(h5) "." counter(h6) ".\0000a0\0000a0"}
*/
/*
h2.nocount:before,
h3.nocount:before,
h4.nocount:before,
h5.nocount:before,
h6.nocount:before { content: ""; counter-increment: none }
*/
h2.nocount:before,
h3.nocount:before,
h4.nocount:before { content: ""; counter-increment: none }
......@@ -110,6 +104,7 @@ ul > li p, ol > li p {
ul {
list-style-type: none;
padding-left: 3em;
}
ul > li {
......@@ -118,7 +113,7 @@ ul > li {
ul > li::before {
position: absolute;
left: -5mm;
left: -1em;
content: "\2013";
}
......@@ -146,27 +141,31 @@ ul.toc > li {
ul.toc > li::before {
left: 0;
position: absolute;
content: counters(toc, ".") ". ";
content: counters(toc, ".") "\A0";
counter-increment: toc;
}
ul.toc.level2 > li {padding-left: 8mm;}
ul.toc.level3 > li {padding-left: 11mm;}
ul.toc.level4 > li {padding-left: 13mm;}
ul.toc.level5 > li {padding-left: 15mm;}
ul.toc.level6 > li {padding-left: 17mm;}
ul.toc.level2 > li {
padding-left: 8mm;
font-family: "CMU Serif Extra";
}
ul.toc > li > a::after {
content: leader(dotted);
text-decoration: none;
color: black;
ul.toc.level2 > li > a {
font-family: "CMU Serif Extra";
}
ul.toc.level3 > li {padding-left: 11mm}
ul.toc.level4 > li {padding-left: 13mm}
ul.toc.level3 > li > a::after, ul.toc.level4 > li > a::after {content: leader(dotted)}
ul.toc > li > span {
position: absolute;
right: -10mm;
}
ul.toc.level2 > li > span {font-family: "CMU Serif Extra"}
ul.toc > li > span::after {
content: target-counter(attr(href), page);
}
......@@ -189,7 +188,7 @@ p.authors {
img {
display: block;
position: relative;
width: 10cm;
width: 80mm;
margin: auto;
padding-top: 5mm;
padding-bottom: 5mm;
......
......@@ -6,7 +6,11 @@
<pathelement location="lib/ant-contrib.jar"/>
</classpath>
</taskdef>
<taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/>
<taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask">
<classpath>
<pathelement location="lib/jing.jar"/>
</classpath>
</taskdef>
<!-- Arguments properties -->
<property name="InputPath" location="input"/>
......@@ -69,9 +73,10 @@
<trycatch property="foo" reference="bar">
<try>
<jing file="${tmpdir}/META-INF/container.xml" rngfile="${Schema}/container/hdoc1-container.rng"></jing>
<echo>container.xml is valid</echo>
</try>
<catch>
<echo>Validation failed</echo>
<echo>Warning : Validation for container.xml failed</echo>
</catch>
</trycatch>
......
!SESSION 2016-12-17 12:27:04.837 -----------------------------------------------
eclipse.buildId=debbuild
java.version=1.8.0_111
java.vendor=Oracle Corporation
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=fr_FR
Command-line arguments: -os linux -ws gtk -arch x86_64
!ENTRY org.eclipse.core.resources 2 10035 2016-12-17 12:27:08.384
!MESSAGE The workspace exited with unsaved changes in the previous session; refreshing workspace to recover changes.
!SESSION 2016-12-19 10:30:32.885 -----------------------------------------------
eclipse.buildId=debbuild
java.version=1.8.0_111
java.vendor=Oracle Corporation
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=fr_FR
Command-line arguments: -os linux -ws gtk -arch x86_64
!ENTRY org.eclipse.core.resources 2 10035 2016-12-19 10:30:47.991
!MESSAGE The workspace exited with unsaved changes in the previous session; refreshing workspace to recover changes.
!SESSION 2017-01-09 11:50:02.177 -----------------------------------------------
eclipse.buildId=debbuild
java.version=1.8.0_111
java.vendor=Oracle Corporation
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=fr_FR
Command-line arguments: -os linux -ws gtk -arch x86_64
!ENTRY org.eclipse.core.resources 2 10035 2017-01-09 11:50:05.612
!MESSAGE The workspace exited with unsaved changes in the previous session; refreshing workspace to recover changes.
!SESSION 2017-01-09 11:54:48.350 -----------------------------------------------
eclipse.buildId=debbuild
java.version=1.8.0_111
java.vendor=Oracle Corporation
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=fr_FR
Command-line arguments: -os linux -ws gtk -arch x86_64
!ENTRY org.eclipse.core.resources 2 10035 2017-01-09 11:55:07.703
!MESSAGE The workspace exited with unsaved changes in the previous session; refreshing workspace to recover changes.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment