README.md 8.2 KB
Newer Older
Decorde Jeffrey's avatar
Decorde Jeffrey committed
1
2
Hdoc to Opale
===
Decorde Jeffrey's avatar
Decorde Jeffrey committed
3

Decorde Jeffrey's avatar
Decorde Jeffrey committed
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
License
-------

This project is under [GPL 3.0 license](http://www.gnu.org/licenses/gpl-3.0.txt).

Credits
-------

### Autumn 2015

* Ivan D'HALLUIN
* Jeffrey DECORDE
* Jean-Baptiste MARTIN

### Previous work

* Maxime MARGERIN
* Rémi Vansteelandt

Presentation
---

Decorde Jeffrey's avatar
Decorde Jeffrey committed
26
"Hdoc to Opale" is an hdoc converter to Opale files. It's a set of ANT scripts and XSL files
Decorde Jeffrey's avatar
Decorde Jeffrey committed
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

Dependencies
---

There's no particular dependencies needed to run the converter.

User Documentation
---

### Running the script

* Put the `.hdoc` files in the input folder
* Run `run.bat` or `run.sh` according to your operating system
* The output files are in the output folder

### How it works

Decorde Jeffrey's avatar
Decorde Jeffrey committed
44
Before, the converter processed all files in the same time. Now, it processes each files one by one. That will allow some improvements in the future.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
45
46
47
48
49
50

Unsupported
---

### Single file conversion with parameter

Decorde Jeffrey's avatar
Decorde Jeffrey committed
51
The script currently doesn't support any paramater to specify the file to convert. If multiple files are in the `input` directory, the script will convert all these files. To do this, I suggest to add an optional parameter in the run.bat and run.sh files which has the name of an input file as a value. Then, in the main ANT file, 'hdoc_to_opale.ant', you'll have to check the presence of the parameter and choose the processing according to it.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
52
53
54
55
56
57
58
59
60
61
62
63
64
65

Known bugs
---

### New bugs

* The build failed for an unknown reason : This bug happened when you run the converter on Windows. Actually, it's a right issue but we haven't been able to solve it. We added sleep instructions to reduce the error rate but it's clearly not the good solution.

### Previous bugs not solved

* Ressource files : Each ressource files used in content.xml is copied into the scar archive, links works in the outputFile. But once you're in Opale you will see (in the explorer) that each ressource files' icon is crossed (the outputFile's isn't though).
* Hdoc's "container.xml" namespace : All hdoc samples given in the "Download" section have been tested and should work well. If you want to use your own hdoc files, make sure that container.xml's root contains version="1.0" and xmlns="urn:utc.fr:ics:hdoc:container" attributes. Otherwise ant building will fail.
* Hdoc's "content.xml" path : It is not specified in hdoc's standard but I assume that hdoc's content file (usually named "content.xml") fullpath is a relative path that begins at the root of the hdoc archive. That means that the full-path attribute of <rootfile> (in container.xml) begins with a character, and not a special sequence such as ".", ".." or "/". It is already the case for my samples, but if you want to use your own samples make sure this full-path attribute is ok.
* Table cell size : The hdoc's <table> markup is supported, yet cell size may seem awkward in Opale. If you want to see what I mean, convert sample02.hdoc and put the .scar file into Opale : there is a table and its first column is oversized.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
66
67
68


Todo list
Decorde Jeffrey's avatar
Decorde Jeffrey committed
69
70
---

Decorde Jeffrey's avatar
Decorde Jeffrey committed
71
* Single file conversion with parameter
Decorde Jeffrey's avatar
Decorde Jeffrey committed
72
73
* Clean the bibtex_to_opale : the code in this folder seems to be never used although he's called in the file prepareReferencesConversions.xsl. In the case which this folder is never used, it has to be deleted.
* Qualify the Opale supported version 
Decorde Jeffrey's avatar
Decorde Jeffrey committed
74
75
76
77
78
79
80
81
82
83
* Solve bugs

Technical Notes
---

### How the converter works

This converter is using standard NF29 conversion project structure : I use a main ANT file (hdoc_to_opale.ant), which handles routine tasks (zipping/unzipping archives, copying files), XSL-XSLT transformation or other ANT scripts calls.
This main ANT file is composed of several targets : we chose not to use their "depends" attributes in order to make the building process easier to understand (and, if needed, to correct).
Basically, the default target (named "convert") calls every other targets or ANT scripts in a precise order via the <antcall> (or <ant>) task.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
84
The files in the input folder are processed one by one.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
85
86
87

### Temporary files

Decorde Jeffrey's avatar
Decorde Jeffrey committed
88
During the conversion process, we use several temporary files : their content depends on hdoc's files (such as "container.xml", "content.xml" and ressource files). There is temporary folder for each file processed.
Decorde Jeffrey's avatar
Decorde Jeffrey committed
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117

### What does the main ANT file do ?

He processes each file one by one. For each file, he does this in that order :
* It unzips hdoc the file. The unzipped hdoc folder is named "decompressedHdoc".
* It gets hdoc's "container.xml" path and applies on it "transformation0.xsl".
* The output file is named "generateContentPath.xml" (it is an ANT script) and consists in three XSL-XSLT transformations. These transformations are :
  * A transformation that matches hdoc's content file (usually named "content.xml") and applies on it "moveRessourceFiles.xsl". The output file is named "moveRessourceFiles.xml" (an ANT script) and is used to... move ressource files (such as images, audio files, video files etc.) from "decompressedHdoc" to the future scar archive. This ANT script is called further in the conversion process.
  * A transformation that matches hdoc's content file (usually named "content.xml") and applies on it "transformation2.xsl" : this is the main XSL-XSLT transformation, which converts hdoc content into Opale content. The output file is named "outputFile.xml".
  * A transformation that matches hdoc's content file (usually name "content.xml" and applies on it "prepareReferencesConversions.xsl". This is used to find the bibtexml files and convert them to Opale's format. This transformation generates an ANT file that copies the references files, and converts them to Opale's reference files format using bibtex_to_opale.
* It calls "generateContentPath.xml".
* It creates a folder named "decompressedOpale" (this is the future .scar archive). Then, it moves "outputFile.xml", ".wspmeta", and every ressource files in decompressedOpale before zipping it into a .scar archive.
* It cleans every temporary files and folders created during the conversion process : "generateContenPath.xml", "moveRessourceFiles.xml", "decompressedHdoc" and "decompressedOpale".
* Once this is done, the converter creates the "divided" output, which is the same content, but divided into several XML files. This output is useful when the documents are larger, but it should be used in most cases regardless.
* In order to create this output, there are several steps :
the main ANT runs addCourseUcIds.xsl, which adds ID attributes to <courseUc> elements from the previous output as well as file names, and puts the result in tmp/outputWithCourseUcIds.xml
once this is done, it runs addCourseUcReferences.xsl, which removes the courseUc contents from the main output and adds references to external XML files instead. The result is put in main.xml
it then runs prepareCourseUcCopies.xsl, which finds all course UCs, and creates an ANT file in tmp/exportUnits.ant that will execute a copy for each course UC and put it external files
* Finally, it runs tmp/exportUnits.ant, which will copy all course UCs in separate files (that are referenced by the main file)
The converter also performs checks on both the input and the output. It first checks that all XML input files are valid (according to the schema stored in the schema/ folder), and then checks the output files too in order to ensure that Opale will be able to open the output files.

### What is .wspmeta?

This is a simple file that contains useful information for Opale. Every .scar archives must contain this file, what it exactly does is not the purpose of this project.

### Ressource files management

As you may have noticed, I use a specific transformation in order to copy images, video files, audio files etc. from decompressedHdoc to decompressedOpale. I used to copy every files and folders which were in decompressedHdoc to decompressedOpale with an ANT task in my ANT main file but there is a better way to do it.
When moveRessourceFiles.xsl is applied on hdoc's content file (usually named "content.xml"), it creates the appropriate ANT copy task for each <img>, <audio>, <video> or <object> markup. All these tasks are included in a single target which will be called in the main ANT file (via moveRessourceFiles.xml).
Decorde Jeffrey's avatar
Decorde Jeffrey committed
118

Decorde Jeffrey's avatar
Decorde Jeffrey committed
119
### Main XSL-XSLT transformation (transformation.xsl)
Decorde Jeffrey's avatar
Decorde Jeffrey committed
120

Decorde Jeffrey's avatar
Decorde Jeffrey committed
121
It consists in a simple ANT task : the main transformation file ("transformation2.xsl") matches hdoc markups and tries to convert them into opale markups.