Use WebExecute to get the rendered text content of a node and its descendants.
Using JavaScript Directly...
Begin the session
Use StartWebSession to begin the session:
- If no browser is supplied to StartWebSession, it will default to Google Chrome.
Extract text
Open the page you would like to get text from:
Use the "JavascriptExecute" command to directly write JavaScript that returns the contents of the innerText HTML tag:
Use Select to remove digit characters and non-English words:
Analyze the text
Use ToLowerCase to reduce duplication of words and DeleteStopwords to remove prepositions and other similar words from analysis:
Use WordCloud to create a word cloud of frequently used nontrivial words on the webpage:
Use StringRiffle to concatenate words into a single string, separating them with whitespaces:
Use WordCounts to count the number of times a word appears in the string, and take the top five most frequently used words:
Use BarChart to visualize the frequency of words:
Close the session
Use DeleteObject to terminate the web session process:
Using WebExecute Commands Related to Elements of Webpages...
Begin the session
Use StartWebSession to begin the session:
- If no browser is supplied to StartWebSession, it will default to Google Chrome.
Extract text
Open the page you would like to get text from:
Use the "LocateElements" command to get the ID attribute named "content":
- ID attributes are uniquely named, and should return a single WebElementObject.
Use the "ElementText" command to get the text from the ID:
Use Select to remove digit characters and non-English words:
Analyze the text
Use ToLowerCase to reduce duplication of words and DeleteStopwords to remove prepositions and other similar words from analysis:
Use WordCloud to create a word cloud of frequently used nontrivial words on the webpage:
Use StringRiffle to concatenate words into a single string, separating them with whitespaces:
Use WordCounts to count the number of times a word appears in the string, and take the top five most frequently used words:
Use BarChart to visualize the frequency of words:
Close the session
Use DeleteObject to terminate the web session process: