README.md 8.2 KB
Newer Older
Decorde Jeffrey's avatar
Decorde Jeffrey committed
1 2
Hdoc to Opale
===
Decorde Jeffrey's avatar
Decorde Jeffrey committed
3

Decorde Jeffrey's avatar
Decorde Jeffrey committed
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
License
-------

This project is under [GPL 3.0 license](http://www.gnu.org/licenses/gpl-3.0.txt).

Credits
-------

### Autumn 2015

* Ivan D'HALLUIN
* Jeffrey DECORDE
* Jean-Baptiste MARTIN

### Previous work

* Maxime MARGERIN
* Rémi Vansteelandt

Presentation
---

Decorde Jeffrey's avatar
Decorde Jeffrey committed
26
"Hdoc to Opale" is an hdoc converter to Opale files. It's a set of ANT scripts and XSL files
Decorde Jeffrey's avatar
Decorde Jeffrey committed
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

Dependencies
---

There's no particular dependencies needed to run the converter.

User Documentation
---

### Running the script

* Put the `.hdoc` files in the input folder
* Run `run.bat` or `run.sh` according to your operating system
* The output files are in the output folder

### How it works

Decorde Jeffrey's avatar
Decorde Jeffrey committed
44
Before, the converter processed all files in the same time. Now, it processes each files one by one. That will allow some improvements in the future.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
45 46 47 48 49 50

Unsupported
---

### Single file conversion with parameter

Decorde Jeffrey's avatar
Decorde Jeffrey committed
51
The script currently doesn't support any paramater to specify the file to convert. If multiple files are in the `input` directory, the script will convert all these files. To do this, I suggest to add an optional parameter in the run.bat and run.sh files which has the name of an input file as a value. Then, in the main ANT file, 'hdoc_to_opale.ant', you'll have to check the presence of the parameter and choose the processing according to it.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
52 53 54 55 56 57 58 59 60 61 62 63 64 65

Known bugs
---

### New bugs

* The build failed for an unknown reason : This bug happened when you run the converter on Windows. Actually, it's a right issue but we haven't been able to solve it. We added sleep instructions to reduce the error rate but it's clearly not the good solution.

### Previous bugs not solved

* Ressource files : Each ressource files used in content.xml is copied into the scar archive, links works in the outputFile. But once you're in Opale you will see (in the explorer) that each ressource files' icon is crossed (the outputFile's isn't though).
* Hdoc's "container.xml" namespace : All hdoc samples given in the "Download" section have been tested and should work well. If you want to use your own hdoc files, make sure that container.xml's root contains version="1.0" and xmlns="urn:utc.fr:ics:hdoc:container" attributes. Otherwise ant building will fail.
* Hdoc's "content.xml" path : It is not specified in hdoc's standard but I assume that hdoc's content file (usually named "content.xml") fullpath is a relative path that begins at the root of the hdoc archive. That means that the full-path attribute of <rootfile> (in container.xml) begins with a character, and not a special sequence such as ".", ".." or "/". It is already the case for my samples, but if you want to use your own samples make sure this full-path attribute is ok.
* Table cell size : The hdoc's <table> markup is supported, yet cell size may seem awkward in Opale. If you want to see what I mean, convert sample02.hdoc and put the .scar file into Opale : there is a table and its first column is oversized.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
66 67 68


Todo list
Decorde Jeffrey's avatar
Decorde Jeffrey committed
69 70
---

Decorde Jeffrey's avatar
Decorde Jeffrey committed
71
* Single file conversion with parameter
Decorde Jeffrey's avatar
Decorde Jeffrey committed
72 73
* Clean the bibtex_to_opale : the code in this folder seems to be never used although he's called in the file prepareReferencesConversions.xsl. In the case which this folder is never used, it has to be deleted.
* Qualify the Opale supported version 
Decorde Jeffrey's avatar
Decorde Jeffrey committed
74 75 76 77 78 79 80 81 82 83
* Solve bugs

Technical Notes
---

### How the converter works

This converter is using standard NF29 conversion project structure : I use a main ANT file (hdoc_to_opale.ant), which handles routine tasks (zipping/unzipping archives, copying files), XSL-XSLT transformation or other ANT scripts calls.
This main ANT file is composed of several targets : we chose not to use their "depends" attributes in order to make the building process easier to understand (and, if needed, to correct).
Basically, the default target (named "convert") calls every other targets or ANT scripts in a precise order via the <antcall> (or <ant>) task.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
84
The files in the input folder are processed one by one.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
85 86 87

### Temporary files

Decorde Jeffrey's avatar
Decorde Jeffrey committed
88
During the conversion process, we use several temporary files : their content depends on hdoc's files (such as "container.xml", "content.xml" and ressource files). There is temporary folder for each file processed.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117

### What does the main ANT file do ?

He processes each file one by one. For each file, he does this in that order :
* It unzips hdoc the file. The unzipped hdoc folder is named "decompressedHdoc".
* It gets hdoc's "container.xml" path and applies on it "transformation0.xsl".
* The output file is named "generateContentPath.xml" (it is an ANT script) and consists in three XSL-XSLT transformations. These transformations are :
  * A transformation that matches hdoc's content file (usually named "content.xml") and applies on it "moveRessourceFiles.xsl". The output file is named "moveRessourceFiles.xml" (an ANT script) and is used to... move ressource files (such as images, audio files, video files etc.) from "decompressedHdoc" to the future scar archive. This ANT script is called further in the conversion process.
  * A transformation that matches hdoc's content file (usually named "content.xml") and applies on it "transformation2.xsl" : this is the main XSL-XSLT transformation, which converts hdoc content into Opale content. The output file is named "outputFile.xml".
  * A transformation that matches hdoc's content file (usually name "content.xml" and applies on it "prepareReferencesConversions.xsl". This is used to find the bibtexml files and convert them to Opale's format. This transformation generates an ANT file that copies the references files, and converts them to Opale's reference files format using bibtex_to_opale.
* It calls "generateContentPath.xml".
* It creates a folder named "decompressedOpale" (this is the future .scar archive). Then, it moves "outputFile.xml", ".wspmeta", and every ressource files in decompressedOpale before zipping it into a .scar archive.
* It cleans every temporary files and folders created during the conversion process : "generateContenPath.xml", "moveRessourceFiles.xml", "decompressedHdoc" and "decompressedOpale".
* Once this is done, the converter creates the "divided" output, which is the same content, but divided into several XML files. This output is useful when the documents are larger, but it should be used in most cases regardless.
* In order to create this output, there are several steps :
the main ANT runs addCourseUcIds.xsl, which adds ID attributes to <courseUc> elements from the previous output as well as file names, and puts the result in tmp/outputWithCourseUcIds.xml
once this is done, it runs addCourseUcReferences.xsl, which removes the courseUc contents from the main output and adds references to external XML files instead. The result is put in main.xml
it then runs prepareCourseUcCopies.xsl, which finds all course UCs, and creates an ANT file in tmp/exportUnits.ant that will execute a copy for each course UC and put it external files
* Finally, it runs tmp/exportUnits.ant, which will copy all course UCs in separate files (that are referenced by the main file)
The converter also performs checks on both the input and the output. It first checks that all XML input files are valid (according to the schema stored in the schema/ folder), and then checks the output files too in order to ensure that Opale will be able to open the output files.

### What is .wspmeta?

This is a simple file that contains useful information for Opale. Every .scar archives must contain this file, what it exactly does is not the purpose of this project.

### Ressource files management

As you may have noticed, I use a specific transformation in order to copy images, video files, audio files etc. from decompressedHdoc to decompressedOpale. I used to copy every files and folders which were in decompressedHdoc to decompressedOpale with an ANT task in my ANT main file but there is a better way to do it.
When moveRessourceFiles.xsl is applied on hdoc's content file (usually named "content.xml"), it creates the appropriate ANT copy task for each <img>, <audio>, <video> or <object> markup. All these tasks are included in a single target which will be called in the main ANT file (via moveRessourceFiles.xml).
Decorde Jeffrey's avatar
Decorde Jeffrey committed
118

Decorde Jeffrey's avatar
Decorde Jeffrey committed
119
### Main XSL-XSLT transformation (transformation.xsl)
Decorde Jeffrey's avatar
Decorde Jeffrey committed
120

Decorde Jeffrey's avatar
Decorde Jeffrey committed
121
It consists in a simple ANT task : the main transformation file ("transformation2.xsl") matches hdoc markups and tries to convert them into opale markups.