Commit b4664a3f authored by Hachem Rihab's avatar Hachem Rihab
Browse files

Merge branch 'master' of gitlab.utc.fr:crozatst/hdoc

parents 381abc42 b91fb2d1
This diff is collapsed.
# HDOC CONVERTER PROJECT
# Hdoc Converter Projet
License GPL3.0
http://www.gnu.org/licenses/gpl-3.0.txt
Credits :
Université de Technologie de Compiègne (http://www.utc.fr)
NF29 students (http://www4.utc.fr/~nf29)
## What is Hdoc ?
Please refer to the [Hdoc converter project website](http://hdoc.crzt.fr/www/co/hdocConverter.html)
>The aim of the project is to propose:
>- a generic XML schema based on XHTML5 for documentary purpose (Hdoc format);
>- a set of converters to transform document formats from and to Hdoc;
>- a web site to manage the converters (Hdoc Converter Portal).
Please refer to the Hdoc Converter Project website:
http://hdoc.crzt.fr
## What is this repository ?
This repository gathers some of the Hdoc converters, if not all of them.
Project URL : https://gitlab.utc.fr/crozatst/hdoc.git
## How to use Hdoc Converters ?
This repository gathers some of the Hdoc converters, if not all of them.
\ No newline at end of file
In order to use a converter, choose the corresponding folder and consult README.md for instructions.
# antce
"antce" is not for use, it is just a base for autonomous multi-OS ANT launcher
# Etherpad to Hdoc -- HDOC CONVERTER PROJECT
## [TL;DR](http://i.imgur.com/18B7f07.jpg)
- This module is able to convert several [etherpad](http://etherpad.org/) files (exported as html files) to the hdoc format.
- To do so :
1. please place your html files in the `/input` folder
# Converter etherpad_to_hdoc
## License
License GPL3.0
http://www.gnu.org/licenses/gpl-3.0.txt
## Crédits
- Jean-Côme Douteau
- Gabrielle Rit
- Jean Vintache
- Fecherolle Cécile
## Presentation
This module is able to convert several [etherpad](http://etherpad.org/) files (exported as html files) to the hdoc format.
## User documentation
### Running etherpad_to_hdoc.ant
1. Create an etherpad document and export it as an html file.
1. please place your html files in the `/input` folder
2. run the `run.[bat|sh]` script of your choice depending on your OS
3. and retrieve the hdoc outputs in the `/output` folder
## Unsupported
- Markdown
- Author paternity
- Etherpad timeline
- Chat
## Known bugs
- Nested lists in lists are not supported
Example :
`<ul>
<li>
<ul>
<li>
Never gonna give you up.
</li>
</ul>
</li>
<ul>`
- As a consequence, etherpad indentation is not supported because it is coded as nested lists.
## TODO
- Markdown
## Technical notes
### Description of etherpad_to_hdoc.ant
#### Prelude
- Importation of necessary classes (antlib, htmlcleaner, jing)
- Creation of directories architecture tree
#### Transformations
- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure
#### Post-transformations actions
- Build hdoc structure
- Jing checks if the output file is validated with the right rng schema
- Zip the directory into hdoc archive
### Supported tags
html tags -> hdoc tags
- u, s, em, strong -> em
- li -> li
- ol -> ol
- br -> p
## Capitalisation
We learned how to use xsl sheets with text file as an input : we had to use regular expressions to extract content.
\ No newline at end of file
# Etherpad To Lexique -- HDOC CONVERTER PROJECT
Comment récupérer un document de type Etherpad et le transformer en document lexique.
*les chemins d'accès indiqués sont relatifs à ce fichier readme*
1. Télécharger un document Etherpad en format HTML
1. Créer ou rejoindre un document etherpad puis l'exporter sous un format html (bouton `Importer/Exporter`)
2. **Enregistrer le document sous le nom `pad.html`** dans le dossier `/input` (si le dossier n'existe pas, le créer)
NB: il ne doit pour le moment n'y avoir qu'un seul fichier nommé de cette façon.
2. Exécuter le fichier `/run.bat` ou `/run.sh` selon l'OS ; un fichier `.scar` est créé dans le dossier `/output`
3. Ouvrir le document produit avec lexique
1. Ouvrir Scenari, ouvrir la liste des entrepôts distants et choisir UTC-etu_lexique.
2. Aller dans le dossier sandBox/etherpad-to-lexique.
3. Réaliser un cliquer glisser avec votre fichier `.scar` dans le dossier ou bien faire un clic droit sur le dossier puis Importer.
4. Ouvrir le fichier Main.xml nouvellement créé.
\ No newline at end of file
# Etherpad2Lexique -- HDOC CONVERTER PROJECT
## License
[GPL 3.0](http://www.gnu.org/licenses/gpl-3.0.txt)
## Credits
- Rit Gabrielle
- Vintache Jean
- Douteau Jean-Côme
- Fecherolle Cécile (2014)
##Presentation
How to transform an etherpad document in lexique document.
Filepath in this document are relative to this file readme.
## Dependence
- Etherpad2Hdoc
- Hdoc2Lexique
## User Documentation
1. Download an etherpad document in html format.
1. Create or join an etherpad document then export it in html format (Import/Export Button) in the `/input` directory (if the directory does not exists, you have to create it).
2. Name it pad.html
2. Execute the file `/run.bat` or `/run.sh` depending on the OS. A `.scar` file is created in the directory `/output`
*If the `/input` directory contains multiple files, they won't be all treated.
3. Open the document with Scenari
1. Open Scenari, and choose "UTC-etu_lexique" as distant depot.
2. Go in the directory `sandBox/etherpad-to-lexique.`
3. Import your `.scar` file in the directory.
4. Open the file Main.xml created.
##Unsupported
- MarkDown
- Timeline and author paternity
- Chat
##Known bugs
Nested lists in lists are not supported
example :
`<ul>
<li>
<ul>
<li>
Never gonna let you down.
</li>
</ul>
</li>
<ul>`
## TODO
- Work with markdown
- Correct nested lists
## Technical notes
### Description of etherpad_to_hdoc.ant
#### Prelude
- Importation of necessary classes (antlib, htmlcleaner, jing)
- Creation of directories architecture tree
#### Transformations
- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure
#### Post-transformations actions
- Build hdoc structure
- Jing checks if the output file is validated with the right rng schema
- Zip the directory into hdoc archive
## Capitalisation
# Etherpad To Opale -- HDOC CONVERTER PROJECT
Comment récupérer un document de type Etherpad et le transformer en document opale.
*les chemins d'accès indiqués sont relatifs à ce fichier readme*
1. Télécharger un document Etherpad en format HTML
1. Créer ou rejoindre un document etherpad puis l'exporter sous un format html (bouton `Importer/Exporter`) dans le dossier `/input` (si le dossier n'existe pas, le créer)
2. Exécuter le fichier `/run.bat` ou `/run.sh` selon l'OS ; un fichier `.scar` est créé dans le dossier `/output`
*si le dossier `/input` contient plusieurs fichiers html, ils sont tous traités*
3. Ouvrir le document produit avec opale
1. Ouvrir Scenari, ouvrir la liste des entrepôts distants et choisir UTC-etu_opale.
2. Aller dans le dossier sandBox/etherpad-to-opale.
3. Réaliser un cliquer glisser avec votre fichier `.scar` dans le dossier ou bien faire un clic droit sur le dossier puis Importer.
4. Ouvrir le fichier Main.xml nouvellement créé.
# Etherpad2Opale -- HDOC CONVERTER PROJECT
## License
[GPL 3.0](http://www.gnu.org/licenses/gpl-3.0.txt)
## Credits
- Rit Gabrielle
- Vintache Jean
- Douteau Jean-Côme
- Fecherolle Cécile (2014)
##Presentation
How to transform an etherpad document in opale document.
Filepath in this document are relative to this file readme.
## Dependence
- Etherpad2Hdoc
- Hdoc2Opale
## User Documentation
1. Download an etherpad document in html format.
1. Create or join an etherpad document then export it in html format (Import/Export Button) in the `/input` directory (if the directory does not exists, you have to create it).
2. Execute the file `/run.bat` or `/run.sh` depending on the OS. A `.scar` file is created in the directory `/output`
*If the `/input` directory contains multiple files, they will be all treated.
3. Open the document with Opale
1. Open Scenari, and choose "UTC-etu_opale" as distant depot.
2. Go in the directory `sandBox/etherpad-to-opale.`
3. Import your `.scar` file in the directory.
4. Open the file Main.xml created.
##Unsupported
- MarkDown
- Timeline and author paternity
- Chat
##Known bugs
Nested lists in lists are not supported
example :
`<ul>
<li>
<ul>
<li>
Never gonna give you up.
</li>
</ul>
</li>
<ul>`
## TODO
- Work with markdown
- Correct nested lists
## Technical notes
### Description of etherpad_to_hdoc.ant
#### Prelude
- Importation of necessary classes (antlib, htmlcleaner, jing)
- Creation of directories architecture tree
#### Transformations
- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure
#### Post-transformations actions
- Build hdoc structure
- Jing checks if the output file is validated with the right rng schema
- Zip the directory into hdoc archive
## Capitalisation
# Etherpad To Optim -- HDOC CONVERTER PROJECT
Comment récupérer un document de type Etherpad et le transformer en document optim.
*les chemins d'accès indiqués sont relatifs à ce fichier readme*
1. Télécharger un document Etherpad en format HTML
1. Créer ou rejoindre un document etherpad puis l'exporter sous un format html (bouton `Importer/Exporter`)
2. **Enregistrer le document sous le nom `pad.html`** dans le dossier `/input` (si le dossier n'existe pas, le créer)
NB: il ne doit pour le moment n'y avoir qu'un seul fichier nommé de cette façon.
2. Exécuter le fichier `/run.bat` ou `/run.sh` selon l'OS ; un fichier `.scar` est créé dans le dossier `/output`
3. Ouvrir le document produit avec optim
1. Ouvrir Scenari, ouvrir la liste des entrepôts distants et choisir UTC-etu_optim.
2. Aller dans le dossier sandBox/etherpad-to-optim.
3. Réaliser un cliquer glisser avec votre fichier `.scar` dans le dossier ou bien faire un clic droit sur le dossier puis Importer.
4. Ouvrir le fichier Main.xml nouvellement créé.
# Etherpad2Optim -- HDOC CONVERTER PROJECT
## License
[GPL 3.0](http://www.gnu.org/licenses/gpl-3.0.txt)
## Credits
- Rit Gabrielle
- Vintache Jean
- Douteau Jean-Côme
- Fecherolle Cécile (2014)
##Presentation
How to transform an etherpad document in optim document.
Filepath in this document are relative to this file readme.
## Dependence
- Etherpad2Hdoc
- Hdoc2Optim
## User Documentation
1. Download an etherpad document in html format.
1. Create or join an etherpad document then export it in html format (Import/Export Button) in the `/input` directory (if the directory does not exists, you have to create it).
2. Name it pad.html
2. Execute the file `/run.bat` or `/run.sh` depending on the OS. A `.scar` file is created in the directory `/output`
*If the `/input` directory contains multiple files, they won't be all treated.
3. Open the document with Scenari
1. Open Scenari, and choose "UTC-etu_optim" as distant depot.
2. Go in the directory `sandBox/etherpad-to-otpim.`
3. Import your `.scar` file in the directory.
4. Open the file Main.xml created.
##Unsupported
- MarkDown
- Timeline and author paternity
- Chat
##Known bugs
Nested lists in lists are not supported
example :
`<ul>
<li>
<ul>
<li>
Never gonna let you down.
</li>
</ul>
</li>
<ul>`
## TODO
- Work with markdown
- Correct nested lists
## Technical notes
### Description of etherpad_to_hdoc.ant
#### Prelude
- Importation of necessary classes (antlib, htmlcleaner, jing)
- Creation of directories architecture tree
#### Transformations
- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure
#### Post-transformations actions
- Build hdoc structure
- Jing checks if the output file is validated with the right rng schema
- Zip the directory into hdoc archive
## Capitalisation
......@@ -17,20 +17,15 @@
<target name="convert">
<delete dir="${tmp}" failonerror="false"/>
<sleep seconds="2"/>
<sleep seconds="1"/>
<mkdir dir="${tmp}"/>
<delete dir="${out}" failonerror="false"/>
<sleep seconds="2"/>
<sleep seconds="1"/>
<mkdir dir="${out}"/>
<delete dir="${log}" failonerror="false"/>
<sleep seconds="2"/>
<sleep seconds="1"/>
<mkdir dir="${log}"/>
<antcall target="UnzipHdocFile"/>
<antcall target="ValidateInput" />
<antcall target="FindContentFiles"/>
<for param="inputFile">
<path>
<fileset dir="${in}" includes="**/*.hdoc"/>
......@@ -38,187 +33,137 @@
<sequential>
<local name="filename"/>
<basename property="filename" file="@{inputFile}"/>
<ant antfile="${tmp}/${filename}/generateContentPath.xml">
<property name="filename" value="${filename}"/>
</ant>
</sequential>
</for>
<antcall target="ValidateOutput"/>
<antcall target="DivideOutput"/>
<antcall target="UnzipHdocFile">
<param name="filename" value="${filename}"/>
</antcall>
<antcall target="ZipOutput"/>
<antcall target="ZipDividedOutput"/>
<antcall target="ValidateInput">
<param name="filename" value="${filename}"/>
</antcall>
</target>
<antcall target="FindContentFiles">
<param name="filename" value="${filename}"/>
</antcall>
<target name="CleanDirectory">
<delete>
<fileset dir="${tmp}">
<include name="*"/>
</fileset>
</delete>
</target>
<ant antfile="${tmp}/${filename}/generateContentPath.xml">
<property name="filename" value="${filename}"/>
</ant>
<target name="UnzipHdocFile">
<!-- Unzip the input hdoc file. Decompressed folder is named "decompressedHdoc" : this name is the only one which
refers to the hdoc file furthermore in this project. -->
<for param="inputFile">
<path>
<fileset dir="${in}" includes="**/*.hdoc"/>
</path>
<sequential>
<local name="filename"/>
<basename property="filename" file="@{inputFile}"/>
<unzip src="${in}/${filename}" dest="${tmp}/${filename}/decompressedHdoc"/>
<chmod dir="${tmp}/${filename}/decompressedHdoc" perm="777"/>
</sequential>
</for>
</target>
<antcall target="ValidateOutput">
<param name="filename" value="${filename}"/>
</antcall>
<target name="FindContentFiles">
<!-- Finds the absolute path of container.xml and applies transformation0.xsl on it.-->
<for param="inputFile">
<path>
<fileset dir="${in}" includes="**/*.hdoc"/>
</path>
<sequential>
<local name="filename"/>
<basename property="filename" file="@{inputFile}"/>
<first id="first">
<fileset dir="${tmp}/${filename}/decompressedHdoc/META-INF" includes="**/container.xml"/>
</first>
<xslt in="${toString:first}" out="${tmp}/${filename}/generateContentPath.xml" style="${xsl}/transformation0.xsl" processor="org.apache.tools.ant.taskdefs.optional.TraXLiaison">
<param name="filename" expression="${filename}"/>
<param name="lib" expression="${lib}"/>
</xslt>
<chmod file="${tmp}/${filename}/generateContentPath.xml" perm="777"/>
</sequential>
</for>
</target>
<antcall target="DivideOutput">
<param name="filename" value="${filename}"/>
</antcall>
<target name="ZipOutput">
<for param="inputFile">
<path>
<fileset dir="${in}" includes="**/*.hdoc"/>
</path>
<sequential>
<local name="filename"/>
<basename property="filename" file="@{inputFile}"/>
<propertyregex property="properFilename" input="${filename}" regexp=".hdoc" replace="" casesensitive="false" override="true" />
<antcall target="ZipOutput">
<param name="filename" value="${filename}"/>
</antcall>
<copy file="${bibtex}/.wspmeta" todir="${tmp}/${filename}/decompressedOpale"/>
<mkdir dir="${tmp}/${filename}/decompressedOpale/res"/>
<ant antfile="${tmp}/${filename}/moveRessourceFiles.xml"/>
<zip basedir="${tmp}/${filename}/decompressedOpale" destfile="${out}/${properFilename}/output.scar" encoding="UTF-8"/>
</sequential>
</for>
</target>
<antcall target="ZipDividedOutput">
<param name="filename" value="${filename}"/>
</antcall>
<target name="ZipDividedOutput">
<for param="inputFile">
<path>
<fileset dir="${in}" includes="**/*.hdoc"/>
</path>
<sequential>
<local name="filename"/>
<basename property="filename" file="@{inputFile}"/>
<propertyregex property="properFilename" input="${filename}" regexp=".hdoc" replace="" casesensitive="false" override="true" />
<copy file="${bibtex}/.wspmeta" todir="${tmp}/${filename}/decompressedOpaleDivided"/>
<copy todir="${tmp}/${filename}/decompressedOpaleDivided/res" >
<fileset dir="${tmp}/${filename}/decompressedOpale/res" includes="**"/>
</copy>
<copy todir="${tmp}/${filename}/decompressedOpaleDivided/references" >
<fileset dir="${tmp}/${filename}/decompressedOpale/references" includes="**"/>
</copy>
<zip basedir="${tmp}/${filename}/decompressedOpaleDivided" destfile="${out}/${properFilename}/dividedOutput.scar" encoding="UTF-8"/>
</sequential>
</for>
</target>
<!-- Validating the XML container file -->
<target name="ValidateInput">
<for param="inputFile">
<path>
<fileset dir="${in}" includes="**/*.hdoc"/>
</path>
<sequential>
<local name="filename"/>
<basename property="filename" file="@{inputFile}"/>
<trycatch property="foo" reference="bar">
<try>
<jing file="${tmp}/${filename}/decompressedHdoc/META-INF/container.xml" rngfile="${schema}/hdoc1-container.rng"></jing>
</try>
<catch>
<echo>Validation failed</echo>
</catch>
</trycatch>
</sequential>
</for>
</target>
<!-- Validating the XML output -->
<target name="ValidateOutput">
<for param="inputFile">
<path>
<fileset dir="${in}" includes="**/*.hdoc"/>
</path>
<sequential>
<local name="filename"/>
<basename property="filename" file="@{inputFile}"/>
<trycatch property="foo" reference="bar">
<try>
<jing file="${tmp}/${filename}/decompressedOpale/main.xml" rngfile="${schema}/op_ue.rng"></jing>
</try>
<catch>
<echo>Validation failed</echo>
</catch>
</trycatch>
</sequential>
</for>
<target name="UnzipHdocFile">
<!-- Unzip the input hdoc file. Decompressed folder is named "decompressedHdoc" : this name is the only one which
refers to the hdoc file furthermore in this project. -->
<unzip src="${in}/${filename}" dest="${tmp}/${filename}/decompressedHdoc"/>
<chmod dir="${tmp}/${filename}/decompressedHdoc" perm="777"/>
</target>
<target name="DivideOutput">
<for param="inputFile">
<path>
<fileset dir="${in}" includes="**/*.hdoc"/>
</path>
<sequential>
<local name="filename"/>
<basename property="filename" file="@{inputFile}"/>
<mkdir dir="${tmp}/${filename}/decompressedOpaleDivided"/>
<!-- Adding IDS to the general output file -->
<xslt
in="${tmp}/${filename}/decompressedOpale/main.xml"
out="${tmp}/${filename}/outputWithCourseUcIds.xml"
style="${xsl}/addCourseUcIds.xsl"
processor="org.apache.tools.ant.taskdefs.optional.TraXLiaison"
/>
<!-- Generating the root file (with refs to other files) -->
<xslt
in="${tmp}/${filename}/outputWithCourseUcIds.xml"
out="${tmp}/${filename}/decompressedOpaleDivided/main.xml"
style="${xsl}/addCourseUcReferences.xsl"
processor="org.apache.tools.ant.taskdefs.optional.TraXLiaison"
/>
<!-- Generating the ANT file that will copy the files -->
<xslt
in="${tmp}/${filename}/outputWithCourseUcIds.xml"
out="${tmp}/${filename}/exportUnits.ant"
style="${xsl}/prepareCourseUcCopies.xsl"
processor="org.apache.tools.ant.taskdefs.optional.TraXLiaison"
>
<param name="filename" expression="${filename}"/>
</xslt>
<!-- Executing that ANT file -->
<ant antfile="${tmp}/${filename}/exportUnits.ant"/>
</sequential>
</for>
</target>
<target name="FindContentFiles">
<!-- Finds the absolute path of container.xml and applies transformation0.xsl on it.-->
<first id="first">
<fileset dir="${tmp}/${filename}/decompressedHdoc/META-INF" includes="**/container.xml"/>
</first>
<xslt in="${toString:first}" out="${tmp}/${filename}/generateContentPath.xml" style="${xsl}/transformation0.xsl" processor="org.apache.tools.ant.taskdefs.optional.TraXLiaison">
<param name="filename" expression="${filename}"/>
<param name="lib" expression="${lib}"/>
</xslt>
<chmod file="${tmp}/${filename}/generateContentPath.xml" perm="777"/>
</target>