Commit 28def063 authored by Decorde Jeffrey's avatar Decorde Jeffrey
Browse files


parent 50e4eeec
Hdoc to Opale
Put the `.hdoc` files in the input folder, run `run.bat` or `` according to your operating system and get the .scar in the output folder.
Now, the script is able to manage several files.
This project is under [GPL 3.0 license](
### Autumn 2015
* Jeffrey DECORDE
* Jean-Baptiste MARTIN
### Previous work
* Rémi Vansteelandt
"Hdoc to Opale" is an hdoc converted to Opale files. It's a set of ANT scripts and XSL files
There's no particular dependencies needed to run the converter.
User Documentation
### Running the script
* Put the `.hdoc` files in the input folder
* Run `run.bat` or `` according to your operating system
* The output files are in the output folder
### How it works
Before, the converted processed all files in the same time. Now, it processes each files one by one. That will allow some improvements in the future.
### Single file conversion with parameter
The script currently doesn't support any paramater to specify the file to convert. If multiple files are in the `input` directory, the script will convert all these files.
Known bugs
### New bugs
* The build failed for an unknown reason : This bug happened when you run the converter on Windows. Actually, it's a right issue but we haven't been able to solve it. We added sleep instructions to reduce the error rate but it's clearly not the good solution.
### Previous bugs not solved
* Ressource files : Each ressource files used in content.xml is copied into the scar archive, links works in the outputFile. But once you're in Opale you will see (in the explorer) that each ressource files' icon is crossed (the outputFile's isn't though).
* Hdoc's "container.xml" namespace : All hdoc samples given in the "Download" section have been tested and should work well. If you want to use your own hdoc files, make sure that container.xml's root contains version="1.0" and xmlns="" attributes. Otherwise ant building will fail.
* Hdoc's "content.xml" path : It is not specified in hdoc's standard but I assume that hdoc's content file (usually named "content.xml") fullpath is a relative path that begins at the root of the hdoc archive. That means that the full-path attribute of <rootfile> (in container.xml) begins with a character, and not a special sequence such as ".", ".." or "/". It is already the case for my samples, but if you want to use your own samples make sure this full-path attribute is ok.
* Table cell size : The hdoc's <table> markup is supported, yet cell size may seem awkward in Opale. If you want to see what I mean, convert sample02.hdoc and put the .scar file into Opale : there is a table and its first column is oversized.
Todo list
* Single file conversion with parameter :
* Solve bugs
* (Optional) Port to XSLT 2.0
Technical Notes
### How the converter works
This converter is using standard NF29 conversion project structure : I use a main ANT file (hdoc_to_opale.ant), which handles routine tasks (zipping/unzipping archives, copying files), XSL-XSLT transformation or other ANT scripts calls.
This main ANT file is composed of several targets : we chose not to use their "depends" attributes in order to make the building process easier to understand (and, if needed, to correct).
Basically, the default target (named "convert") calls every other targets or ANT scripts in a precise order via the <antcall> (or <ant>) task.
### Temporary files
During the conversion process, we use several temporary files : their content depends on hdoc's files (such as "container.xml", "content.xml" and ressource files). See details below.
### What does the main ANT file do ?
He processes each file one by one. For each file, he does this in that order :
* It unzips hdoc the file. The unzipped hdoc folder is named "decompressedHdoc".
* It gets hdoc's "container.xml" path and applies on it "transformation0.xsl".
* The output file is named "generateContentPath.xml" (it is an ANT script) and consists in three XSL-XSLT transformations. These transformations are :
* A transformation that matches hdoc's content file (usually named "content.xml") and applies on it "moveRessourceFiles.xsl". The output file is named "moveRessourceFiles.xml" (an ANT script) and is used to... move ressource files (such as images, audio files, video files etc.) from "decompressedHdoc" to the future scar archive. This ANT script is called further in the conversion process.
* A transformation that matches hdoc's content file (usually named "content.xml") and applies on it "transformation2.xsl" : this is the main XSL-XSLT transformation, which converts hdoc content into Opale content. The output file is named "outputFile.xml".
* A transformation that matches hdoc's content file (usually name "content.xml" and applies on it "prepareReferencesConversions.xsl". This is used to find the bibtexml files and convert them to Opale's format. This transformation generates an ANT file that copies the references files, and converts them to Opale's reference files format using bibtex_to_opale.
* It calls "generateContentPath.xml".
* It creates a folder named "decompressedOpale" (this is the future .scar archive). Then, it moves "outputFile.xml", ".wspmeta", and every ressource files in decompressedOpale before zipping it into a .scar archive.
* It cleans every temporary files and folders created during the conversion process : "generateContenPath.xml", "moveRessourceFiles.xml", "decompressedHdoc" and "decompressedOpale".
* Once this is done, the converter creates the "divided" output, which is the same content, but divided into several XML files. This output is useful when the documents are larger, but it should be used in most cases regardless.
* In order to create this output, there are several steps :
the main ANT runs addCourseUcIds.xsl, which adds ID attributes to <courseUc> elements from the previous output as well as file names, and puts the result in tmp/outputWithCourseUcIds.xml
once this is done, it runs addCourseUcReferences.xsl, which removes the courseUc contents from the main output and adds references to external XML files instead. The result is put in main.xml
it then runs prepareCourseUcCopies.xsl, which finds all course UCs, and creates an ANT file in tmp/exportUnits.ant that will execute a copy for each course UC and put it external files
* Finally, it runs tmp/exportUnits.ant, which will copy all course UCs in separate files (that are referenced by the main file)
The converter also performs checks on both the input and the output. It first checks that all XML input files are valid (according to the schema stored in the schema/ folder), and then checks the output files too in order to ensure that Opale will be able to open the output files.
### What is .wspmeta?
This is a simple file that contains useful information for Opale. Every .scar archives must contain this file, what it exactly does is not the purpose of this project.
### Ressource files management
As you may have noticed, I use a specific transformation in order to copy images, video files, audio files etc. from decompressedHdoc to decompressedOpale. I used to copy every files and folders which were in decompressedHdoc to decompressedOpale with an ANT task in my ANT main file but there is a better way to do it.
When moveRessourceFiles.xsl is applied on hdoc's content file (usually named "content.xml"), it creates the appropriate ANT copy task for each <img>, <audio>, <video> or <object> markup. All these tasks are included in a single target which will be called in the main ANT file (via moveRessourceFiles.xml).
- Refactor the ant script
- Resolve bugs
- (Optional) Port to XSLT 2.0
### Main XSL-XSLT transformation (transformation.xsl)
It consists in a simple ANT task : the main transformation file ("transformation2.xsl") matches hdoc markups and tries to convert them into opale markups.
For any further details (supported/unsupported markups and microdatas) I suggest you to read "transformation2.xsl"'s comments and Supported/Unsupported section (if needed, send me an-email).
- Table cell size
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment