Class to convert HTML into objects. #html dom
Edit
by Everton da Rosa - 10 years ago (2014-11-12)
Class to convert HTML into objects, XML DOM style.
| Hello friends,
I need a class that transforms the HTML code (which can be read from a string or a file) on an object, the style of the XML DOM with access to tags, attributes and content of tags. |
- 1 Clarification request
1.
by Manuel Lemos - 10 years ago (2014-11-13) Reply
What about the DOM classes that come with PHP?
2.
by Everton da Rosa - 10 years ago (2014-11-13) in reply to comment 1 by Manuel Lemos Comment
You refers to XML manipulation classes? Say something look like them, but shall apply to HTML documents. Thought to use the DOM class, but could have problems with content tags with characters such as "<" for example, which documents are XML be used with CDATA markup.
3.
by Manuel Lemos - 10 years ago (2014-11-13) in reply to comment 2 by Everton da Rosa Comment
Yes, DOMDocument has a loadHTML function to parse HTML.
I am not sure what is you concern with CDATA sections. I think they are like regular data sections. They are decoded but tags characters < and > are returned without special meaning, just like every other character.
Did you try that or did you have any difficulties?
4.
by Everton da Rosa - 10 years ago (2014-11-18) in reply to comment 3 by Manuel Lemos Comment
Tanks, I will test the DOMDocument class.
Ask clarification
1 Recommendation
HTMLPP: Parse HTML code and manage the DOM structure
HTMLPP is a PHP4 library for HTML code parsing. It allows you to parse a HTML code string, build the relative DOM structure and work on it with methods similar to Javascript.
Features:
HTML parsing:
- Simple tags
- Tags without closures
- Autoclosing tags
- Doctype, text and comment parsing
- Modern browser parsing behaviour (Add head,body and html tags if they're not present, Wrap table content inside the tbody if it's not present)
Dom traversing:
- Access to the parent node using the parentNode property
- Access to child nodes using the childNodes array property
- Access to sibling nodes using nextSibling and previousSibling properties
- Access to the owner document with ownerDocument property
- Document shortcuts to body, head and doctype
Dom manipulation:
- Append nodes with appendChild, append and other methods
- Remove nodes with removeChild and remove methods
- Replace nodes with replaceChild and replace methods
Attributes and style manipulation:
- Add, remove, set and get methods for attributes
- Add, remove, set and get methods for style properties
Node searching functions on every element:
- getElementById
- getElementsByTagName
- getElementsByClassName
- getElementsBySelector (Full featured support for Css3 selectors, Support for other non-standard selectors)
- Node iterator class for personalized filter functions
Dom collections with JQuery like methods:
- Add, remove and filter elements in the collection
- Change the current collection by searching in its elements siblings, child nodes or parent nodes
- Manipulate elements in the collection
Changelog:
1.0
- first release
1.0.1
- Fixed some bugs in elements parsing regexp
- Fixed a bug in doctype parsing
- Fixed some problems in the parser class
- Fixed a bug in HTMLFilterIterator::find() function when pass HTML_SEARCH_DESCENDANT as iteration type
1.0.2
- Fixed error on selector parsing
- Now every element is closed at the end of its parent code if no closing tag is found
- Better support for textarea tag
- Fixed bug on attributes parsing (thanks Mike)
1.0.3
- Fixed bug in getAttribute() method
- Fixed bug in getStyle() method
- Fixed bug on attributes parsing
| by Manuel Lemos 26695 - 9 years ago (2016-02-14) Comment
There is this old class that can parse HTML using pure PHP and return a DOM like document structure.
For most purposes the PHP DOM extensions may be more useful but if you stumbled in a limitation of those extensions, you may want to try this package. |