README.md 2.12 KB
Newer Older
qaomia's avatar
qaomia committed
1 2 3 4 5 6
# Converter etherpad_to_hdoc

## License
License GPL3.0
http://www.gnu.org/licenses/gpl-3.0.txt

Jean Vintache's avatar
Jean Vintache committed
7
## Credits
Jean Vintache's avatar
Jean Vintache committed
8 9 10 11 12 13
- 2015
    - Jean-Côme Douteau
    - Gabrielle Rit
    - Jean Vintache
- 2014
    - Fecherolle Cécile
qaomia's avatar
qaomia committed
14 15 16 17 18 19

## Presentation
This module is able to convert several [etherpad](http://etherpad.org/) files (exported as html files) to the hdoc format.

## User documentation

Jean Vintache's avatar
Jean Vintache committed
20 21 22 23 24
## Running etherpad_to_hdoc.ant
1. Create an etherpad document and export it as an html file.
2. please place your html files in the `/input` folder
3. run the `run.[bat|sh]` script of your choice depending on your OS
4. and retrieve the hdoc outputs in the `/output` folder
qaomia's avatar
qaomia committed
25 26

## Unsupported
Jean Vintache's avatar
Jean Vintache committed
27 28 29 30
- Markdown
- Author paternity
- Etherpad timeline
- Chat
qaomia's avatar
qaomia committed
31 32

## Known bugs
qaomia's avatar
qaomia committed
33
- Nested lists in lists are not supported
qaomia's avatar
qaomia committed
34 35 36 37 38 39 40 41 42 43
Example : 
`<ul>
	<li>
		<ul>
			<li>
			Never gonna give you up.
			</li>
		</ul>
	</li>
<ul>`
qaomia's avatar
qaomia committed
44
- As a consequence, etherpad indentation is not supported because it is coded as nested lists.
Jean Vintache's avatar
Jean Vintache committed
45

qaomia's avatar
qaomia committed
46
## TODO
Jean Vintache's avatar
Jean Vintache committed
47
- Markdown
qaomia's avatar
qaomia committed
48 49 50 51 52

## Technical notes
### Description of etherpad_to_hdoc.ant

#### Prelude
Jean Vintache's avatar
Jean Vintache committed
53 54
- Importation of necessary classes (antlib, htmlcleaner, jing)
- Creation of directories architecture tree
qaomia's avatar
qaomia committed
55 56

#### Transformations
Jean Vintache's avatar
Jean Vintache committed
57 58 59 60 61 62
- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure
qaomia's avatar
qaomia committed
63 64

#### Post-transformations actions
Jean Vintache's avatar
Jean Vintache committed
65 66 67
- Build hdoc structure
- Jing checks if the output file is validated with the right rng schema
- Zip the directory into hdoc archive
qaomia's avatar
qaomia committed
68

qaomia's avatar
qaomia committed
69
### Supported tags
Jean Vintache's avatar
Jean Vintache committed
70 71 72 73 74
- html tags -> hdoc tags
- u, s, em, strong -> em
- li -> li
- ol -> ol
- br -> p
qaomia's avatar
qaomia committed
75 76

## Capitalisation
77
Using regular expression with xsl is a good way to parse a non xml file.