README.md 1.94 KB
Newer Older
qaomia's avatar
qaomia committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# Converter etherpad_to_hdoc

## License
License GPL3.0
http://www.gnu.org/licenses/gpl-3.0.txt

## Crédits
	- Jean-Côme Douteau
	- Gabrielle Rit
	- Jean Vintache
	- Fecherolle Cécile

## Presentation
This module is able to convert several [etherpad](http://etherpad.org/) files (exported as html files) to the hdoc format.

## User documentation

### Running etherpad_to_hdoc.ant
	1. Create an etherpad document and export it as an html file.
	1. please place your html files in the `/input` folder
Jean Vintache's avatar
Jean Vintache committed
21 22
    2. run the `run.[bat|sh]` script of your choice depending on your OS
    3. and retrieve the hdoc outputs in the `/output` folder
qaomia's avatar
qaomia committed
23 24 25 26 27 28 29 30 31 32

## Unsupported
	- Markdown
	- Author paternity
	- Etherpad timeline
	- Chat

## Known bugs

## TODO
qaomia's avatar
qaomia committed
33
	- Markdown
qaomia's avatar
qaomia committed
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

## Technical notes
### Description of etherpad_to_hdoc.ant

#### Prelude
	- Importation of necessary classes (antlib, htmlcleaner, jing)
	- Creation of directories architecture tree

#### Transformations
	- Use of htmlcleaner to transform the input file from html to xhtml. For more info, see http://htmlcleaner.sourceforge.net/index.php.
	- Apply html2xhtml.xsl : this xsl extracts the content into <body> tags
	- Apply html2xhtmlv1.xsl : this xsl is used as a fix and adds br tag at the end of lists (ul and ol)
	- Apply html2xhtmlv2.xsl : this xsl surround text line with p tags and transforms non-hdoc tags into hdoc tags as s, u, strong tags.
	- Apply html2xhtml3.xsl : this xsl is used as a fix, it deletes p tags when its child is ul or ol
	- Apply xhtml2hdoc.xsl : this xsl transforms the content into hdoc structure

#### Post-transformations actions
	- Build hdoc structure
	- Jing checks if the output file is validated with the right rng schema
	- Zip the directory into hdoc archive

qaomia's avatar
qaomia committed
55 56 57 58 59 60 61 62 63
### Supported tags
html tags -> hdoc tags
	- u, s, em, strong -> em
	- li -> li
	- ol -> ol
	- br -> p

## Capitalisation
We learned how to use xsl sheets with text file as an input : we had to use regular expressions to extract content.