public class Clean
extends java.lang.Object
...
....
Such rules are applied to the element's content and then to the element itself until none of the rules more apply. Having applied all the rules to an element, it will have a style attribute with one or more properties. Other rules strip the element they apply to, replacing it by style properties on the contents, e.g....
... These rules are applied to an element before processing its content and replace the current element by the first element in the exposed content. After applying both sets of rules, you can replace the style attribute by a class value and style rule in the document head. To support this, an association of styles and class names is built. A naive approach is to rely on string matching to test when two property lists are the same. A better approach would be to first sort the properties before matching.
Constructor and Description |
---|
Clean(TagTable tagTable)
Instantiates a new Clean.
|
Modifier and Type | Method and Description |
---|---|
void |
bQ2Div(Node node)
Replace implicit blockquote by div with an indent taking care to reduce nested blockquotes to a single div with
the indent set to match the nesting depth.
|
void |
cleanTree(Lexer lexer,
Node doc)
Clean an html tree.
|
void |
cleanWord2000(Lexer lexer,
Node node)
This is a major clean up to strip out all the extra stuff you get when you save as web page from Word 2000.
|
void |
dropSections(Lexer lexer,
Node node)
Drop if/endif sections inserted by word2000.
|
void |
emFromI(Node node)
Replace i by em and b by strong.
|
static void |
fixNodeLinks(Node node)
Ensure bidirectional links are consistent.
|
boolean |
isWord2000(Node root)
Check if the current document is a converted Word document.
|
void |
list2BQ(Node node)
Some people use dir or ul without an li to indent the content.
|
void |
nestedEmphasis(Node node)
simplifies ...
|
Node |
pruneSection(Lexer lexer,
Node node)
node is
<![if ...]> prune up to <![endif]> . |
void |
purgeWord2000Attributes(Node node)
Remove word2000 attributes from node.
|
Node |
stripSpan(Lexer lexer,
Node span)
Word2000 uses span excessively, so we strip span out.
|
public Clean(TagTable tagTable)
tagTable
- tag table instancepublic static void fixNodeLinks(Node node)
node
- root nodepublic void cleanTree(Lexer lexer, Node doc)
lexer
- Lexerdoc
- root nodepublic void nestedEmphasis(Node node)
node
- root Nodepublic void emFromI(Node node)
node
- root Nodepublic void list2BQ(Node node)
node
- root Nodepublic void bQ2Div(Node node)
node
- root Nodepublic Node pruneSection(Lexer lexer, Node node)
<![if ...]>
prune up to <![endif]>
.lexer
- Lexernode
- Nodepublic void dropSections(Lexer lexer, Node node)
lexer
- Lexernode
- Node root nodepublic void purgeWord2000Attributes(Node node)
node
- node to cleanuppublic Node stripSpan(Lexer lexer, Node span)
lexer
- Lexerspan
- Node spanpublic void cleanWord2000(Lexer lexer, Node node)
lexer
- Lexernode
- node to clean uppublic boolean isWord2000(Node root)
root
- root Nodetrue
if the document has been geenrated by Microsoft Word.