org.w3c.tidy
Class Node

java.lang.Object
  extended by org.w3c.tidy.Node

public class Node
extends java.lang.Object

Used for elements and text nodes element name is null for text nodes start and end are offsets into lexbuf which contains the textual content of all elements in the parse tree. Parent and content allow traversal of the parse tree in any direction. attributes are represented as a linked list of AttVal nodes which hold the strings for attribute/value pairs.

Version:
$Revision: 930 $ ($Author: aditsu $)
Author:
Dave Raggett dsr@w3.org , Andy Quick ac.quick@sympatico.ca (translation to Java), Fabrizio Giustina

Field Summary
protected  org.w3c.dom.Node adapter
          DOM adapter.
static short ASP_TAG
          node type: asp tag.
protected  AttVal attributes
          Attribute/Value linked list.
static short CDATA_TAG
          node type: CDATA.
protected  boolean closed
          true if closed by explicit end tag.
static short COMMENT_TAG
          node type: comment.
protected  Node content
          Contained node.
static short DOCTYPE_TAG
          node type: doctype.
protected  java.lang.String element
          Tag name.
protected  int end
          end of span onto text array.
static short END_TAG
          End tag.
protected  boolean implicit
          true if inferred.
static short JSTE_TAG
          node type: jste tag.
protected  Node last
          last node.
protected  boolean linebreak
          true if followed by a line break.
protected  Node next
          next node.
protected  Node parent
          parent node.
static short PHP_TAG
          node type: php tag.
protected  Node prev
          pevious node.
static short PROC_INS_TAG
          node type: .
static short ROOT_NODE
          node type: root.
static short SECTION_TAG
          node type: section tag.
protected  int start
          start of span onto text array.
static short START_END_TAG
          Start of an end tag.
static short START_TAG
          Start tag.
protected  Dict tag
          tag's dictionary definition.
static short TEXT_NODE
          node type: text.
protected  byte[] textarray
          the text array.
protected  short type
          TextNode, StartTag, EndTag etc.
protected  Dict was
          old tag when it was changed.
static short XML_DECL
          node type: doctype.
 
Constructor Summary
Node()
          Instantiates a new text node.
Node(short type, byte[] textarray, int start, int end)
          Instantiates a new node.
Node(short type, byte[] textarray, int start, int end, java.lang.String element, TagTable tt)
          Instantiates a new node.
 
Method Summary
 void addAttribute(java.lang.String name, java.lang.String value)
          Adds an attribute to the node.
 void addClass(java.lang.String classname)
          Add a css class to the node.
 void checkAttributes(Lexer lexer)
          Default method for checking an element's attributes.
 boolean checkNodeIntegrity()
          Checks for node integrity.
protected  Node cloneNode(boolean deep)
          Clone this node.
static void coerceNode(Lexer lexer, Node node, Dict tag)
          Coerce a node.
 void discardDocType()
          Discard the doctype node.
static Node discardElement(Node element)
          Remove node from markup tree and discard it.
protected static Node escapeTag(Lexer lexer, Node element)
          Escapes the given tag.
 boolean expectsContent()
          Does the node expect contents?
 Node findBody(TagTable tt)
          Find the body node.
 Node findDocType()
          Find the doctype element.
 Node findHEAD(TagTable tt)
          Find the head tag.
 Node findHTML(TagTable tt)
          Find the "html" element.
 Node findTITLE(TagTable tt)
           
static void fixEmptyRow(Lexer lexer, Node row)
          If a table row is empty then insert an empty cell.This practice is consistent with browser behavior and avoids potential problems with row spanning cells.
protected  org.w3c.dom.Node getAdapter()
          Returns a DOM Node which wrap the current tidy Node.
 AttVal getAttrByName(java.lang.String name)
          Returns an attribute with the given name in the current node.
 boolean hasOneChild()
          Does the node have one (and only one) child?
static void insertDocType(Lexer lexer, Node element, Node doctype)
          The doctype has been found after other tags, and needs moving to before the html element.
static boolean insertMisc(Node element, Node node)
          Insert a node at the end.
 void insertNodeAfterElement(Node node)
          Insert node into markup tree after element.
static void insertNodeAsParent(Node element, Node node)
          Insert node into markup tree in pace of element which is moved to become the child of the node.
 void insertNodeAtEnd(Node node)
          Insert node into markup tree.
 void insertNodeAtStart(Node node)
          Insert a node into markup tree.
static void insertNodeBeforeElement(Node element, Node node)
          Insert node into markup tree before element.
 boolean isBlank(Lexer lexer)
          Is the node content empty or blank? Assumes node is a text node.
 boolean isDescendantOf(Dict tag)
          Is this node contained in a given tag?
 boolean isElement()
          Is the node an element?
 boolean isJavaScript()
          Used to check script node for script language.
 boolean isNewNode()
          Is this a new (user defined) node? Used to determine how attributes without values should be printed.
static void moveBeforeTable(Node row, Node node, TagTable tt)
          Unexpected content in table row is moved to just before the table in accordance with Netscape and IE.
 void removeAttribute(AttVal attr)
          Remove an attribute from node and then free it.
 void removeNode()
          Extract this node and its children from a markup tree.
 void repairDuplicateAttributes(Lexer lexer)
          The same attribute name can't be used more than once in each element.
protected  void setType(short newType)
          Setter for node type.
 java.lang.String toString()
           
static void trimEmptyElement(Lexer lexer, Node element)
          Trim an empty element.
static void trimInitialSpace(Lexer lexer, Node element, Node text)
          This maps <p> hello <em> world </em> to <p> hello <em> world </em>.
static void trimSpaces(Lexer lexer, Node element)
          Move initial and trailing space out.
static void trimTrailingSpace(Lexer lexer, Node element, Node last)
          This maps hello world to hello world .
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

ROOT_NODE

public static final short ROOT_NODE
node type: root.

See Also:
Constant Field Values

DOCTYPE_TAG

public static final short DOCTYPE_TAG
node type: doctype.

See Also:
Constant Field Values

COMMENT_TAG

public static final short COMMENT_TAG
node type: comment.

See Also:
Constant Field Values

PROC_INS_TAG

public static final short PROC_INS_TAG
node type: .

See Also:
Constant Field Values

TEXT_NODE

public static final short TEXT_NODE
node type: text.

See Also:
Constant Field Values

START_TAG

public static final short START_TAG
Start tag.

See Also:
Constant Field Values

END_TAG

public static final short END_TAG
End tag.

See Also:
Constant Field Values

START_END_TAG

public static final short START_END_TAG
Start of an end tag.

See Also:
Constant Field Values

CDATA_TAG

public static final short CDATA_TAG
node type: CDATA.

See Also:
Constant Field Values

SECTION_TAG

public static final short SECTION_TAG
node type: section tag.

See Also:
Constant Field Values

ASP_TAG

public static final short ASP_TAG
node type: asp tag.

See Also:
Constant Field Values

JSTE_TAG

public static final short JSTE_TAG
node type: jste tag.

See Also:
Constant Field Values

PHP_TAG

public static final short PHP_TAG
node type: php tag.

See Also:
Constant Field Values

XML_DECL

public static final short XML_DECL
node type: doctype.

See Also:
Constant Field Values

parent

protected Node parent
parent node.


prev

protected Node prev
pevious node.


next

protected Node next
next node.


last

protected Node last
last node.


start

protected int start
start of span onto text array.


end

protected int end
end of span onto text array.


textarray

protected byte[] textarray
the text array.


type

protected short type
TextNode, StartTag, EndTag etc.


closed

protected boolean closed
true if closed by explicit end tag.


implicit

protected boolean implicit
true if inferred.


linebreak

protected boolean linebreak
true if followed by a line break.


was

protected Dict was
old tag when it was changed.


tag

protected Dict tag
tag's dictionary definition.


element

protected java.lang.String element
Tag name.


attributes

protected AttVal attributes
Attribute/Value linked list.


content

protected Node content
Contained node.


adapter

protected org.w3c.dom.Node adapter
DOM adapter.

Constructor Detail

Node

public Node()
Instantiates a new text node.


Node

public Node(short type,
            byte[] textarray,
            int start,
            int end)
Instantiates a new node.

Parameters:
type - node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECL
textarray - array of bytes contained in the Node
start - start position
end - end position

Node

public Node(short type,
            byte[] textarray,
            int start,
            int end,
            java.lang.String element,
            TagTable tt)
Instantiates a new node.

Parameters:
type - node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECL
textarray - array of bytes contained in the Node
start - start position
end - end position
element - tag name
tt - tag table instance
Method Detail

getAttrByName

public AttVal getAttrByName(java.lang.String name)
Returns an attribute with the given name in the current node.

Parameters:
name - attribute name.
Returns:
AttVal instance or null if no attribute with the iven name is found

checkAttributes

public void checkAttributes(Lexer lexer)
Default method for checking an element's attributes.

Parameters:
lexer - Lexer

repairDuplicateAttributes

public void repairDuplicateAttributes(Lexer lexer)
The same attribute name can't be used more than once in each element. Discard or join attributes according to configuration.

Parameters:
lexer - Lexer

addAttribute

public void addAttribute(java.lang.String name,
                         java.lang.String value)
Adds an attribute to the node.

Parameters:
name - attribute name
value - attribute value

removeAttribute

public void removeAttribute(AttVal attr)
Remove an attribute from node and then free it.

Parameters:
attr - attribute to remove

findDocType

public Node findDocType()
Find the doctype element.

Returns:
doctype node or null if not found

discardDocType

public void discardDocType()
Discard the doctype node.


discardElement

public static Node discardElement(Node element)
Remove node from markup tree and discard it.

Parameters:
element - discarded node
Returns:
next node

insertNodeAtStart

public void insertNodeAtStart(Node node)
Insert a node into markup tree.

Parameters:
node - to insert

insertNodeAtEnd

public void insertNodeAtEnd(Node node)
Insert node into markup tree.

Parameters:
node - Node to insert

insertNodeAsParent

public static void insertNodeAsParent(Node element,
                                      Node node)
Insert node into markup tree in pace of element which is moved to become the child of the node.

Parameters:
element - child node. Will be inserted as a child of element
node - parent node

insertNodeBeforeElement

public static void insertNodeBeforeElement(Node element,
                                           Node node)
Insert node into markup tree before element.

Parameters:
element - child node. Will be insertedbefore element
node - following node

insertNodeAfterElement

public void insertNodeAfterElement(Node node)
Insert node into markup tree after element.

Parameters:
node - new node to insert

trimEmptyElement

public static void trimEmptyElement(Lexer lexer,
                                    Node element)
Trim an empty element.

Parameters:
lexer - Lexer
element - empty node to be removed

trimTrailingSpace

public static void trimTrailingSpace(Lexer lexer,
                                     Node element,
                                     Node last)
This maps hello world to hello world . If last child of element is a text node then trim trailing white space character moving it to after element's end tag.

Parameters:
lexer - Lexer
element - node
last - last child of element

escapeTag

protected static Node escapeTag(Lexer lexer,
                                Node element)
Escapes the given tag.

Parameters:
lexer - Lexer
element - node to be escaped
Returns:
escaped node

isBlank

public boolean isBlank(Lexer lexer)
Is the node content empty or blank? Assumes node is a text node.

Parameters:
lexer - Lexer
Returns:
true if the node content empty or blank

trimInitialSpace

public static void trimInitialSpace(Lexer lexer,
                                    Node element,
                                    Node text)
This maps <p> hello <em> world </em> to <p> hello <em> world </em>. Trims initial space, by moving it before the start tag, or if this element is the first in parent's content, then by discarding the space.

Parameters:
lexer - Lexer
element - parent node
text - text node

trimSpaces

public static void trimSpaces(Lexer lexer,
                              Node element)
Move initial and trailing space out. This routine maps: hello world to hello world and hello world to hello world .

Parameters:
lexer - Lexer
element - Node

isDescendantOf

public boolean isDescendantOf(Dict tag)
Is this node contained in a given tag?

Parameters:
tag - descendant tag
Returns:
true if node is contained in tag

insertDocType

public static void insertDocType(Lexer lexer,
                                 Node element,
                                 Node doctype)
The doctype has been found after other tags, and needs moving to before the html element.

Parameters:
lexer - Lexer
element - document
doctype - doctype node to insert at the beginning of element

findBody

public Node findBody(TagTable tt)
Find the body node.

Parameters:
tt - tag table
Returns:
body node

isElement

public boolean isElement()
Is the node an element?

Returns:
true if type is START_TAG | START_END_TAG

moveBeforeTable

public static void moveBeforeTable(Node row,
                                   Node node,
                                   TagTable tt)
Unexpected content in table row is moved to just before the table in accordance with Netscape and IE. This code assumes that node hasn't been inserted into the row.

Parameters:
row - Row node
node - Node which should be moved before the table
tt - tag table

fixEmptyRow

public static void fixEmptyRow(Lexer lexer,
                               Node row)
If a table row is empty then insert an empty cell.This practice is consistent with browser behavior and avoids potential problems with row spanning cells.

Parameters:
lexer - Lexer
row - row node

coerceNode

public static void coerceNode(Lexer lexer,
                              Node node,
                              Dict tag)
Coerce a node.

Parameters:
lexer - Lexer
node - Node
tag - tag dictionary reference

removeNode

public void removeNode()
Extract this node and its children from a markup tree.


insertMisc

public static boolean insertMisc(Node element,
                                 Node node)
Insert a node at the end.

Parameters:
element - parent node
node - will be inserted at the end of element
Returns:
true if the node has been inserted

isNewNode

public boolean isNewNode()
Is this a new (user defined) node? Used to determine how attributes without values should be printed. This was introduced to deal with user defined tags e.g. Cold Fusion.

Returns:
true if this node represents a user-defined tag.

hasOneChild

public boolean hasOneChild()
Does the node have one (and only one) child?

Returns:
true if the node has one child

findHTML

public Node findHTML(TagTable tt)
Find the "html" element.

Parameters:
tt - tag table
Returns:
html node

findHEAD

public Node findHEAD(TagTable tt)
Find the head tag.

Parameters:
tt - tag table
Returns:
head node

findTITLE

public Node findTITLE(TagTable tt)

checkNodeIntegrity

public boolean checkNodeIntegrity()
Checks for node integrity.

Returns:
false if node is not consistent

addClass

public void addClass(java.lang.String classname)
Add a css class to the node. If a class attribute already exists adds the value to the existing attribute.

Parameters:
classname - css class name

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object
See Also:
Object.toString()

getAdapter

protected org.w3c.dom.Node getAdapter()
Returns a DOM Node which wrap the current tidy Node.

Returns:
org.w3c.dom.Node instance

cloneNode

protected Node cloneNode(boolean deep)
Clone this node.

Parameters:
deep - if true deep clone the node (also clones all the contained nodes)
Returns:
cloned node

setType

protected void setType(short newType)
Setter for node type.

Parameters:
newType - a valid node type constant

isJavaScript

public boolean isJavaScript()
Used to check script node for script language.

Returns:
true if the script node contains javascript

expectsContent

public boolean expectsContent()
Does the node expect contents?

Returns:
false if this node should be empty