Skip to content Skip to sidebar Skip to footer

How To Edit All Text Values In Html Tags Using Jsoup

What I want: I am new to Jsoup. I want to parse my html string and search for each text value that appears inside tags (any tag). And then change that text value to something else.

Solution 1:

You can try with something similar to this code:

Stringhtml="<html><body><div><p>Test Data</p> <div> <p>HELLO World</p></div></div> other text</body></html>";

Documentdoc= Jsoup.parse(html);
List<Node> children = doc.childNodes();

// We will search nodes in a breadth-first way
Queue<Node> nodes = newArrayDeque<>();

nodes.addAll(doc.childNodes());

while (!nodes.isEmpty()) {
    Noden= nodes.remove();

    if (n instanceof TextNode && ((TextNode) n).text().trim().length() > 0) {
        // Do whatever you want with n.// Here we just print its text...
        System.out.println(n.parent().nodeName()+" contains text: "+((TextNode) n).text().trim());
    } else {
        nodes.addAll(n.childNodes());
    }
}

And you'll get the following output:

body contains text: other text
p contains text: Test Data
p contains text: HELLO World

Solution 2:

You want to use the CSS selector * and the method textNodes to get the text of a given tag (Element in Jsoup world).

This line below

Elementsps= doc1.getElementsByTag("p");

becomes

Elements ps = doc1.select("*");

Now, with this new selector you'll be able to select any elements (tags) within your HTML code.

FULL CODE EXAMPLE

publicstaticvoidmain(String[] args) {
    System.out.println("Setup proxy...");
    JSoup.setupProxy();

    String html = "<html><body><div><p>Test Data</p> <div> <p>HELLO World</p></div></div> other text</body></html>";
    Document doc1 = Jsoup.parse(html);
    Elements tags = doc1.select("*");
    for (Element tag : tags) {
        for (TextNode tn : tag.textNodes()) {
            String tagText = tn.text().trim();

            if (tagText.length() > 0) {
                tn.text(base64_Dummy(tagText));
            }
        }
    }
    System.out.println("======================");
    String changedHTML = doc1.html();
    System.out.println(changedHTML);
}

publicstaticStringbase64_Dummy(String abc) {
    return"This is changed text";
}

OUTPUT

======================
<html><head></head><body><div><p>This is changed text</p><div><p>This is changed text</p></div></div>This is changed text
 </body></html>

Post a Comment for "How To Edit All Text Values In Html Tags Using Jsoup"