Commit 119a60f9 authored by aperdria's avatar aperdria
Browse files

Merge branch 'master' of gitlab.utc.fr:crozatst/hdoc

parents 2281bcf4 50e4eeec
# HDOC CONVERTER PROJECT
# Hdoc Converter Projet
License GPL3.0
http://www.gnu.org/licenses/gpl-3.0.txt
Credits :
Université de Technologie de Compiègne (http://www.utc.fr)
NF29 students (http://www4.utc.fr/~nf29)
## What is Hdoc ?
Please refer to the [Hdoc converter project website](http://hdoc.crzt.fr/www/co/hdocConverter.html)
>The aim of the project is to propose:
>- a generic XML schema based on XHTML5 for documentary purpose (Hdoc format);
>- a set of converters to transform document formats from and to Hdoc;
>- a web site to manage the converters (Hdoc Converter Portal).
Please refer to the Hdoc Converter Project website:
## What is this repository ?
http://hdoc.crzt.fr
## What is this repository ?
This repository gathers some of the Hdoc converters, if not all of them.
Project URL : https://gitlab.utc.fr/crozatst/hdoc.git
## How to use Hdoc Converters ?
In order to use a converter, choose the corresponding folder and consult README.md for instructions.
# antce
"antce" is not for use, it is just a base for autonomous multi-OS ANT launcher
# Etherpad to Hdoc -- HDOC CONVERTER PROJECT
## [TL;DR](http://i.imgur.com/18B7f07.jpg)
- This module is able to convert several [etherpad](http://etherpad.org/) files (exported as html files) to the hdoc format.
- To do so :
# Converter etherpad_to_hdoc
## License
License GPL3.0
http://www.gnu.org/licenses/gpl-3.0.txt
## Crédits
- Jean-Côme Douteau
- Gabrielle Rit
- Jean Vintache
- Fecherolle Cécile
## Presentation
This module is able to convert several [etherpad](http://etherpad.org/) files (exported as html files) to the hdoc format.
## User documentation
### Running etherpad_to_hdoc.ant
1. Create an etherpad document and export it as an html file.
1. please place your html files in the `/input` folder
2. run the `run.[bat|sh]` script of your choice depending on your OS
3. and retrieve the hdoc outputs in the `/output` folder
## Unsupported
- Markdown
- Author paternity
- Etherpad timeline
- Chat
## Known bugs
Nested lists in lists are not supported
Example :
`<ul>
<li>
<ul>
<li>
Never gonna give you up.
</li>
</ul>
</li>
<ul>`
## TODO
- Markdown
## Technical notes
### Description of etherpad_to_hdoc.ant
#### Prelude
- Importation of necessary classes (antlib, htmlcleaner, jing)
- Creation of directories architecture tree
#### Transformations
- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure
#### Post-transformations actions
- Build hdoc structure
- Jing checks if the output file is validated with the right rng schema
- Zip the directory into hdoc archive
### Supported tags
html tags -> hdoc tags
- u, s, em, strong -> em
- li -> li
- ol -> ol
- br -> p
## Capitalisation
We learned how to use xsl sheets with text file as an input : we had to use regular expressions to extract content.
\ No newline at end of file
# Etherpad To Lexique -- HDOC CONVERTER PROJECT
Comment récupérer un document de type Etherpad et le transformer en document lexique.
*les chemins d'accès indiqués sont relatifs à ce fichier readme*
1. Télécharger un document Etherpad en format HTML
1. Créer ou rejoindre un document etherpad puis l'exporter sous un format html (bouton `Importer/Exporter`)
2. **Enregistrer le document sous le nom `pad.html`** dans le dossier `/input` (si le dossier n'existe pas, le créer)
NB: il ne doit pour le moment n'y avoir qu'un seul fichier nommé de cette façon.
2. Exécuter le fichier `/run.bat` ou `/run.sh` selon l'OS ; un fichier `.scar` est créé dans le dossier `/output`
3. Ouvrir le document produit avec lexique
1. Ouvrir Scenari, ouvrir la liste des entrepôts distants et choisir UTC-etu_lexique.
2. Aller dans le dossier sandBox/etherpad-to-lexique.
3. Réaliser un cliquer glisser avec votre fichier `.scar` dans le dossier ou bien faire un clic droit sur le dossier puis Importer.
4. Ouvrir le fichier Main.xml nouvellement créé.
\ No newline at end of file
# Etherpad2Lexique -- HDOC CONVERTER PROJECT
## License
[GPL 3.0](http://www.gnu.org/licenses/gpl-3.0.txt)
## Credits
- Rit Gabrielle
- Vintache Jean
- Douteau Jean-Côme
- Fecherolle Cécile (2014)
##Presentation
How to transform an etherpad document in lexique document.
Filepath in this document are relative to this file readme.
## Dependence
- Etherpad2Hdoc
- Hdoc2Lexique
## User Documentation
1. Download an etherpad document in html format.
1. Create or join an etherpad document then export it in html format (Import/Export Button) in the `/input` directory (if the directory does not exists, you have to create it).
2. Name it pad.html
2. Execute the file `/run.bat` or `/run.sh` depending on the OS. A `.scar` file is created in the directory `/output`
*If the `/input` directory contains multiple files, they won't be all treated.
3. Open the document with Scenari
1. Open Scenari, and choose "UTC-etu_lexique" as distant depot.
2. Go in the directory `sandBox/etherpad-to-lexique.`
3. Import your `.scar` file in the directory.
4. Open the file Main.xml created.
##Unsupported
- MarkDown
- Timeline and author paternity
- Chat
##Known bugs
Nested lists in lists are not supported
example :
`<ul>
<li>
<ul>
<li>
Never gonna let you down.
</li>
</ul>
</li>
<ul>`
## TODO
- Work with markdown
- Correct nested lists
## Technical notes
### Description of etherpad_to_hdoc.ant
#### Prelude
- Importation of necessary classes (antlib, htmlcleaner, jing)
- Creation of directories architecture tree
#### Transformations
- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure
#### Post-transformations actions
- Build hdoc structure
- Jing checks if the output file is validated with the right rng schema
- Zip the directory into hdoc archive
## Capitalisation
# Etherpad To Opale -- HDOC CONVERTER PROJECT
Comment récupérer un document de type Etherpad et le transformer en document opale.
*les chemins d'accès indiqués sont relatifs à ce fichier readme*
1. Télécharger un document Etherpad en format HTML
1. Créer ou rejoindre un document etherpad puis l'exporter sous un format html (bouton `Importer/Exporter`) dans le dossier `/input` (si le dossier n'existe pas, le créer)
2. Exécuter le fichier `/run.bat` ou `/run.sh` selon l'OS ; un fichier `.scar` est créé dans le dossier `/output`
*si le dossier `/input` contient plusieurs fichiers html, ils sont tous traités*
3. Ouvrir le document produit avec opale
1. Ouvrir Scenari, ouvrir la liste des entrepôts distants et choisir UTC-etu_opale.
2. Aller dans le dossier sandBox/etherpad-to-opale.
3. Réaliser un cliquer glisser avec votre fichier `.scar` dans le dossier ou bien faire un clic droit sur le dossier puis Importer.
4. Ouvrir le fichier Main.xml nouvellement créé.
# Etherpad2Opale -- HDOC CONVERTER PROJECT
## License
[GPL 3.0](http://www.gnu.org/licenses/gpl-3.0.txt)
## Credits
- Rit Gabrielle
- Vintache Jean
- Douteau Jean-Côme
- Fecherolle Cécile (2014)
##Presentation
How to transform an etherpad document in opale document.
Filepath in this document are relative to this file readme.
## Dependence
- Etherpad2Hdoc
- Hdoc2Opale
## User Documentation
1. Download an etherpad document in html format.
1. Create or join an etherpad document then export it in html format (Import/Export Button) in the `/input` directory (if the directory does not exists, you have to create it).
2. Execute the file `/run.bat` or `/run.sh` depending on the OS. A `.scar` file is created in the directory `/output`
*If the `/input` directory contains multiple files, they will be all treated.
3. Open the document with Opale
1. Open Scenari, and choose "UTC-etu_opale" as distant depot.
2. Go in the directory `sandBox/etherpad-to-opale.`
3. Import your `.scar` file in the directory.
4. Open the file Main.xml created.
##Unsupported
- MarkDown
- Timeline and author paternity
- Chat
##Known bugs
Nested lists in lists are not supported
example :
`<ul>
<li>
<ul>
<li>
Never gonna give you up.
</li>
</ul>
</li>
<ul>`
## TODO
- Work with markdown
- Correct nested lists
## Technical notes
### Description of etherpad_to_hdoc.ant
#### Prelude
- Importation of necessary classes (antlib, htmlcleaner, jing)
- Creation of directories architecture tree
#### Transformations
- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure
#### Post-transformations actions
- Build hdoc structure
- Jing checks if the output file is validated with the right rng schema
- Zip the directory into hdoc archive
## Capitalisation
# Etherpad To Optim -- HDOC CONVERTER PROJECT
Comment récupérer un document de type Etherpad et le transformer en document optim.
*les chemins d'accès indiqués sont relatifs à ce fichier readme*
1. Télécharger un document Etherpad en format HTML
1. Créer ou rejoindre un document etherpad puis l'exporter sous un format html (bouton `Importer/Exporter`)
2. **Enregistrer le document sous le nom `pad.html`** dans le dossier `/input` (si le dossier n'existe pas, le créer)
NB: il ne doit pour le moment n'y avoir qu'un seul fichier nommé de cette façon.
2. Exécuter le fichier `/run.bat` ou `/run.sh` selon l'OS ; un fichier `.scar` est créé dans le dossier `/output`
3. Ouvrir le document produit avec optim
1. Ouvrir Scenari, ouvrir la liste des entrepôts distants et choisir UTC-etu_optim.
2. Aller dans le dossier sandBox/etherpad-to-optim.
3. Réaliser un cliquer glisser avec votre fichier `.scar` dans le dossier ou bien faire un clic droit sur le dossier puis Importer.
4. Ouvrir le fichier Main.xml nouvellement créé.
# Etherpad2Optim -- HDOC CONVERTER PROJECT
## License
[GPL 3.0](http://www.gnu.org/licenses/gpl-3.0.txt)
## Credits
- Rit Gabrielle
- Vintache Jean
- Douteau Jean-Côme
- Fecherolle Cécile (2014)
##Presentation
How to transform an etherpad document in optim document.
Filepath in this document are relative to this file readme.
## Dependence
- Etherpad2Hdoc
- Hdoc2Optim
## User Documentation
1. Download an etherpad document in html format.
1. Create or join an etherpad document then export it in html format (Import/Export Button) in the `/input` directory (if the directory does not exists, you have to create it).
2. Name it pad.html
2. Execute the file `/run.bat` or `/run.sh` depending on the OS. A `.scar` file is created in the directory `/output`
*If the `/input` directory contains multiple files, they won't be all treated.
3. Open the document with Scenari
1. Open Scenari, and choose "UTC-etu_optim" as distant depot.
2. Go in the directory `sandBox/etherpad-to-otpim.`
3. Import your `.scar` file in the directory.
4. Open the file Main.xml created.
##Unsupported
- MarkDown
- Timeline and author paternity
- Chat
##Known bugs
Nested lists in lists are not supported
example :
`<ul>
<li>
<ul>
<li>
Never gonna let you down.
</li>
</ul>
</li>
<ul>`
## TODO
- Work with markdown
- Correct nested lists
## Technical notes
### Description of etherpad_to_hdoc.ant
#### Prelude
- Importation of necessary classes (antlib, htmlcleaner, jing)
- Creation of directories architecture tree
#### Transformations
- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure
#### Post-transformations actions
- Build hdoc structure
- Jing checks if the output file is validated with the right rng schema
- Zip the directory into hdoc archive
## Capitalisation
......@@ -12,7 +12,7 @@
<xsl:template match="h:html">
<project name="moveRessourceFiles" basedir="." default="moveRessourceFiles">
<property file="global.properties"/>
<property name="filename" location="{$filename}"/>
<property name="filename2" location="{$filename}"/>
<target name="moveRessourceFiles">
<xsl:apply-templates select="./*"/>
</target>
......@@ -32,16 +32,16 @@
<!-- Targeted markups. -->
<xsl:template match="h:img">
<copy tofile="${{filename}}/decompressedOpale/res/{./@src}" file="${{filename}}/decompressedHdoc/{./@src}"/>
<copy tofile="${{filename2}}/decompressedOpale/res/{./@src}" file="${{filename2}}/decompressedHdoc/{./@src}"/>
</xsl:template>
<xsl:template match="h:audio">
<copy tofile="${{filename}}/decompressedOpale/res/{./@src}" file="${{filename}}/decompressedHdoc/{./@src}"/>
<copy tofile="${{filename2}}/decompressedOpale/res/{./@src}" file="${{filename2}}/decompressedHdoc/{./@src}"/>
</xsl:template>
<xsl:template match="h:video">
<copy tofile="${{filename}}/decompressedOpale/res/{./@src}" file="${{filename}}/decompressedHdoc/{./@src}"/>
<copy tofile="${{filename2}}/decompressedOpale/res/{./@src}" file="${{filename2}}/decompressedHdoc/{./@src}"/>
</xsl:template>
<xsl:template match="h:object">
<copy tofile="${{filename}}/decompressedOpale/res/{./@data}" file="${{filename}}/decompressedHdoc/{./@data}"/>
<copy tofile="${{filename2}}/decompressedOpale/res/{./@data}" file="${{filename2}}/decompressedHdoc/{./@data}"/>
</xsl:template>
<!-- These markups are matched in order to minimize "apply-templates" side-effects (i.e. their contents are not relevant for this transformation). -->
......
......@@ -18,7 +18,7 @@
<property name="lib" location="${{basedir}}/lib"/>
<property name="log" location="${{basedir}}/log"/>
<property name="schema" location="${{basedir}}/schema"/>
<property name="filename" location="{$filename}"/>
<property name="filename" location="${$filename}"/>
<taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask">
<classpath>
<pathelement location="../${lib}/jing.jar"/>
......
......@@ -36,7 +36,7 @@
style="${{xsl}}/moveRessourceFiles.xsl"
processor="org.apache.tools.ant.taskdefs.optional.TraXLiaison"
>
<param name="filename" expression="${{tmp}}/${{filename}}"/>
<param name="filename" expression="${{tmp}}\${{filename}}"/>
</xslt>
<chmod file="${{tmp}}/${{filename}}/moveRessourceFiles.xml" perm="777"/>
<xslt
......
......@@ -55,7 +55,7 @@
</target>
<target name="buildOutput" depends="unzipSource">
<xslt classpath="${libdir}/saxon9he.jar" style="${xsldir}/termToRef.xsl" basedir="${srcdir}" destdir="${OutputPath}" followsymlinks="false" extension=".ref">
<xslt classpath="${libdir}/saxon9he.jar" style="${xsldir}/termToRdf.xsl" basedir="${srcdir}" destdir="${OutputPath}" followsymlinks="false" extension=".rdf">
<include name="**/*.term"/>
</xslt>
<copy file="opale.wspmeta" tofile="${OutputPath}/.wspmeta"></copy>
......
......@@ -4,11 +4,9 @@
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:ont="nf29ont"
xmlns:sp="http://www.utc.fr/ics/scenari/v3/primitive"
xmlns:sc="http://www.utc.fr/ics/scenari/v3/core"
xmlns:lx="scpf.org:lexicon"
xmlns:op="utc.fr:ics/opale3"
exclude-result-prefixes="xs"
exclude-result-prefixes="xs sp sc lx"
version="2.0">
<xsl:output method="xml" indent="yes" />
......
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological