7.84 KB
Newer Older
Decorde Jeffrey's avatar
Decorde Jeffrey committed
1 2
Hdoc to Opale
Decorde Jeffrey's avatar
Decorde Jeffrey committed

Decorde Jeffrey's avatar
Decorde Jeffrey committed
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

This project is under [GPL 3.0 license](


### Autumn 2015

* Jeffrey DECORDE
* Jean-Baptiste MARTIN

### Previous work

* Rémi Vansteelandt


"Hdoc to Opale" is an hdoc converted to Opale files. It's a set of ANT scripts and XSL files


There's no particular dependencies needed to run the converter.

User Documentation

### Running the script

* Put the `.hdoc` files in the input folder
* Run `run.bat` or `` according to your operating system
* The output files are in the output folder

### How it works

Before, the converted processed all files in the same time. Now, it processes each files one by one. That will allow some improvements in the future.


### Single file conversion with parameter

The script currently doesn't support any paramater to specify the file to convert. If multiple files are in the `input` directory, the script will convert all these files.

Known bugs

### New bugs

* The build failed for an unknown reason : This bug happened when you run the converter on Windows. Actually, it's a right issue but we haven't been able to solve it. We added sleep instructions to reduce the error rate but it's clearly not the good solution.

### Previous bugs not solved

* Ressource files : Each ressource files used in content.xml is copied into the scar archive, links works in the outputFile. But once you're in Opale you will see (in the explorer) that each ressource files' icon is crossed (the outputFile's isn't though).
* Hdoc's "container.xml" namespace : All hdoc samples given in the "Download" section have been tested and should work well. If you want to use your own hdoc files, make sure that container.xml's root contains version="1.0" and xmlns="" attributes. Otherwise ant building will fail.
* Hdoc's "content.xml" path : It is not specified in hdoc's standard but I assume that hdoc's content file (usually named "content.xml") fullpath is a relative path that begins at the root of the hdoc archive. That means that the full-path attribute of <rootfile> (in container.xml) begins with a character, and not a special sequence such as ".", ".." or "/". It is already the case for my samples, but if you want to use your own samples make sure this full-path attribute is ok.
* Table cell size : The hdoc's <table> markup is supported, yet cell size may seem awkward in Opale. If you want to see what I mean, convert sample02.hdoc and put the .scar file into Opale : there is a table and its first column is oversized.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
66 67 68

Todo list
Decorde Jeffrey's avatar
Decorde Jeffrey committed
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115

* Single file conversion with parameter :
* Solve bugs
* (Optional) Port to XSLT 2.0

Technical Notes

### How the converter works

This converter is using standard NF29 conversion project structure : I use a main ANT file (hdoc_to_opale.ant), which handles routine tasks (zipping/unzipping archives, copying files), XSL-XSLT transformation or other ANT scripts calls.
This main ANT file is composed of several targets : we chose not to use their "depends" attributes in order to make the building process easier to understand (and, if needed, to correct).
Basically, the default target (named "convert") calls every other targets or ANT scripts in a precise order via the <antcall> (or <ant>) task.

### Temporary files

During the conversion process, we use several temporary files : their content depends on hdoc's files (such as "container.xml", "content.xml" and ressource files). See details below.

### What does the main ANT file do ?

He processes each file one by one. For each file, he does this in that order :
* It unzips hdoc the file. The unzipped hdoc folder is named "decompressedHdoc".
* It gets hdoc's "container.xml" path and applies on it "transformation0.xsl".
* The output file is named "generateContentPath.xml" (it is an ANT script) and consists in three XSL-XSLT transformations. These transformations are :
  * A transformation that matches hdoc's content file (usually named "content.xml") and applies on it "moveRessourceFiles.xsl". The output file is named "moveRessourceFiles.xml" (an ANT script) and is used to... move ressource files (such as images, audio files, video files etc.) from "decompressedHdoc" to the future scar archive. This ANT script is called further in the conversion process.
  * A transformation that matches hdoc's content file (usually named "content.xml") and applies on it "transformation2.xsl" : this is the main XSL-XSLT transformation, which converts hdoc content into Opale content. The output file is named "outputFile.xml".
  * A transformation that matches hdoc's content file (usually name "content.xml" and applies on it "prepareReferencesConversions.xsl". This is used to find the bibtexml files and convert them to Opale's format. This transformation generates an ANT file that copies the references files, and converts them to Opale's reference files format using bibtex_to_opale.
* It calls "generateContentPath.xml".
* It creates a folder named "decompressedOpale" (this is the future .scar archive). Then, it moves "outputFile.xml", ".wspmeta", and every ressource files in decompressedOpale before zipping it into a .scar archive.
* It cleans every temporary files and folders created during the conversion process : "generateContenPath.xml", "moveRessourceFiles.xml", "decompressedHdoc" and "decompressedOpale".
* Once this is done, the converter creates the "divided" output, which is the same content, but divided into several XML files. This output is useful when the documents are larger, but it should be used in most cases regardless.
* In order to create this output, there are several steps :
the main ANT runs addCourseUcIds.xsl, which adds ID attributes to <courseUc> elements from the previous output as well as file names, and puts the result in tmp/outputWithCourseUcIds.xml
once this is done, it runs addCourseUcReferences.xsl, which removes the courseUc contents from the main output and adds references to external XML files instead. The result is put in main.xml
it then runs prepareCourseUcCopies.xsl, which finds all course UCs, and creates an ANT file in tmp/exportUnits.ant that will execute a copy for each course UC and put it external files
* Finally, it runs tmp/exportUnits.ant, which will copy all course UCs in separate files (that are referenced by the main file)
The converter also performs checks on both the input and the output. It first checks that all XML input files are valid (according to the schema stored in the schema/ folder), and then checks the output files too in order to ensure that Opale will be able to open the output files.

### What is .wspmeta?

This is a simple file that contains useful information for Opale. Every .scar archives must contain this file, what it exactly does is not the purpose of this project.

### Ressource files management

As you may have noticed, I use a specific transformation in order to copy images, video files, audio files etc. from decompressedHdoc to decompressedOpale. I used to copy every files and folders which were in decompressedHdoc to decompressedOpale with an ANT task in my ANT main file but there is a better way to do it.
When moveRessourceFiles.xsl is applied on hdoc's content file (usually named "content.xml"), it creates the appropriate ANT copy task for each <img>, <audio>, <video> or <object> markup. All these tasks are included in a single target which will be called in the main ANT file (via moveRessourceFiles.xml).
Decorde Jeffrey's avatar
Decorde Jeffrey committed

Decorde Jeffrey's avatar
Decorde Jeffrey committed
### Main XSL-XSLT transformation (transformation.xsl)
Decorde Jeffrey's avatar
Decorde Jeffrey committed

Decorde Jeffrey's avatar
Decorde Jeffrey committed
119 120
It consists in a simple ANT task : the main transformation file ("transformation2.xsl") matches hdoc markups and tries to convert them into opale markups.
For any further details (supported/unsupported markups and microdatas) I suggest you to read "transformation2.xsl"'s comments and Supported/Unsupported section (if needed, send me an-email).
Decorde Jeffrey's avatar
Decorde Jeffrey committed

Decorde Jeffrey's avatar
Decorde Jeffrey committed
122 123