Details
A good search tool (like Swish-e) will convert all character entities and ignore all tags when indexing HTML files. One unintended consequence of that feature is that using a search query to find the original text in the original document can be difficult as a result, since the original text may contain different character patterns than you might expect.
Example
<span>Hello! <b>world</b>!</span>
might be indexed as:
hello world