Extract body text in html document #html to text
Edit
by Erik Maaløe - 9 years ago (2015-11-10)
I need to parse a HTML document and extract the text part
| How to I extract the text content from a HTML document, skipping all HTML tags |
Ask clarification
1 Recommendation
HTML Parser: Parse HTML using DOMDocument
This class can parse HTML documents using DOMDocument.
It can load the HTML markup either from a file or from a text string.
It can parse the entire document, returning an array of elements.
It can parse the document for a specific element, returning an array of each element found. It also can return the element's child elements.
It can return an element referenced by a given ID.
It can display the returned results in a human readable form.
| by Dave Smith package author 7620 - 9 years ago (2015-11-10) Comment
This will parse the document using donDOcument and if I am not mistaken the node->value of the body tag will contain all the text without the html, so using this class you would process the body tag. |