Text Content Types
Natural language processing functions such as TextCases, TextPosition and TextContents allow many different types of content to be identified in text. Some of these types of content are structural or grammatical, while others relate to semantic interpretation.
Containing — define containers (e.g. sentences) for matches
Alternatives — match to multiple types
Verbatim — strings to match verbatim
StringExpression ▪ RegularExpression
Structural Elements
"Word" — word-like unit (usually delimited by whitespace or punctuation)
"Sentence" — sentence-like unit (usually delimited by punctuation marks)
"Paragraph" — paragraph-like unit (delimited by multiple newlines)
"Quotation" — quotation delimited by quotation marks
"Line" — substring delimited by a newline
"NonText" — characters that are not ordinary letter-like text
"Punctuation" — punctuation mark
"Whitespace" — sequence of whitespace characters
"Emoticon" — emoticon (e.g. smiley faces)
Parts of Speech
"Noun" ▪ "Verb" ▪ "Adjective" ▪ "Adverb" ▪ "Pronoun" ▪ "Preposition" ▪ "Conjunction" ▪ "Determiner" ▪ "Interjection"
"ProperNoun" — a proper noun, typically beginning with a capital letter
"WhPronoun" ▪ "WhAdverb" ▪ "WhDeterminer"
"Punctuation" ▪ "PossessiveModifier" ▪ "ListItemMarker" ▪ "Symbol" ▪ "ForeignWord"
Phrase Types
"NounPhrase" ▪ "VerbPhrase" ▪ "AdjectivePhrase" ▪ "AdverbPhrase" ▪ "PrepositionalPhrase" ▪ "ConjunctionPhrase"
"WhNounPhrase" ▪ "WhAdjectivePhrase" ▪ "WhAdverbPhrase" ▪ "WhPrepositionalPhrase"
"NounPhraseHead" ▪ "QuantifierPhrase" ▪ "UnlikeCoordinatedPhrase"
"Clause" ▪ "ReducedRelativeClause"
"Sentence" ▪ "Fragment" ▪ "Parenthetical" ▪ "ListMarker"
Quantitative Elements
"Number" — number (e.g. "67", "6.78", "6.78e+10", "two thousand")
"Quantity" — quantity with units (e.g. "4.5 km", "10 ft. 6 in.", "30C", "7 m/s", "three kilometers")
"Unit" — units (e.g. "km", "ft.", "m/s", "kilometers")
"CurrencyAmount" — currency amount (e.g. "$5", "45 pesos", "10.25 GBP", "seven euros")
"Color" — textually described color (e.g. "light blue")
Time & Place Elements
"Date" — date or date element (e.g. day, month, year, century)
"Location" — named geographic location (e.g. "New York", "France")
"LocationEntity" — named geographic location with entity interpretation
Identification Elements
"EmailAddress" ▪ "IPAddress" ▪ "PhoneNumber" ▪ "URL" ▪ "ZIPCode"
"TwitterHandle" — Twitter handle (e.g. "@Wolfram")
Entities
Entity — match a specific entity of any type, for instance:
Geographic Entities
"Country" ▪ "AdministrativeDivision" ▪ "City" ▪ "Neighborhood" ▪ "MetropolitanArea" ▪ "GeographicRegion"
"Ocean" ▪ "Island" ▪ "UnderseaFeature" ▪ "Reef" ▪ "Beach" ▪ "Lake" ▪ "Mountain" ▪ "Volcano" ▪ "River" ▪ "Waterfall" ▪ "EarthImpact" ▪ "Desert" ▪ "Forest"
"Airport" ▪ "Park" ▪ "AmusementPark" ▪ "AmusementParkRide" ▪ "Stadium"
"Bridge" ▪ "Canal" ▪ "Tunnel" ▪ "Dam" ▪ "Mine" ▪ "Cave" ▪ "OilField" ▪ "Building" ▪ "Castle" ▪ "Cemetery" ▪ "HistoricalSite" ▪ "ReserveLand" ▪ "Shipwreck"
"University" ▪ "SchoolDistrict" ▪ "PublicSchool" ▪ "PrivateSchool" ▪ "Museum" ▪ "LibraryBranch" ▪ "LibrarySystem"
"WeatherStation" ▪ "AstronomicalObservatory" ▪ "ParticleAccelerator" ▪ "NuclearReactor" ▪ "NuclearTestSite" ▪ "NuclearExplosion"
Astronomical Entities
"Planet" ▪ "PlanetaryMoon" ▪ "MinorPlanet" ▪ "Comet" ▪ "SolarSystemFeature" ▪ "MeteorShower" ▪ "Exoplanet"
"Star" ▪ "Galaxy" ▪ "StarCluster" ▪ "Nebula" ▪ "Supernova" ▪ "Pulsar" ▪ "AstronomicalRadioSource" ▪ "Constellation"
Space-Related
"Satellite" ▪ "Rocket" ▪ "DeepSpaceProbe" ▪ "MannedSpaceMission"
Weather & Earth Science
"WeatherStation" ▪ "TropicalStorm" ▪ "Cloud" ▪ "AtmosphericLayer"
"GeologicalLayer" ▪ "GeologicalPeriod" ▪ "Mineral" ▪ "FamousGem"
Transportation-Related
"Aircraft" ▪ "Airline" ▪ "Airport" ▪ "Ship"
Engineering & Structures
"BroadcastStation" ▪ "MeasurementDevice"
"Building" ▪ "Bridge" ▪ "Tunnel" ▪ "Dam" ▪ "Mine"
Culture & Entertainment
"Language" ▪ "Religion" ▪ "Mythology"
"Movie" ▪ "MusicAct" ▪ "MusicAlbum" ▪ "MusicWork" ▪ "BroadcastStation"
"Book" ▪ "Artwork" ▪ "Periodical" ▪ "FictionalCharacter"
"Museum" ▪ "LibraryBranch" ▪ "LibrarySystem"
Activities & Hobbies
"MusicalInstrument" ▪ "SportObject" ▪ "BoardGame"
Food & Nutrition
"Food" ▪ "FoodBrandName" ▪ "FoodManufacturer" ▪ "FoodSubBrandName"
Finance
Person & Personal Attributes
"Person" ▪ "GivenName" ▪ "Surname" ▪ "PersonTitle" ▪ "Occupation"
History-Related
"HistoricalCountry" ▪ "HistoricalSite"
Linguistic Entities
"Language" ▪ "Alphabet" ▪ "WritingScript"
Physical Science
"Chemical" ▪ "Element" ▪ "Particle" ▪ "Mineral"
"FamousPhysicsProblem" ▪ "FamousChemistryProblem"
Life Science
Medical Entities
"AnatomicalStructure" ▪ "Disease" ▪ "MedicalTest" ▪ "Protein"
Organism Types
"Plant" ▪ "Species" ▪ "DogBreed" ▪ "CatBreed" ▪ "Dinosaur"
Mathematical Entities
"Polyhedron" ▪ "Surface" ▪ "SpaceCurve" ▪ "Graph" ▪ "FiniteGroup" ▪ "IntegerSequence"
"FamousMathProblem" ▪ "FamousMathGame"
Computing-Related
"NotableComputer" ▪ "ProgrammingLanguage"
Language Styles & Sentiments
"PositiveSentiment" ▪ "NegativeSentiment" ▪ "NeutralSentiment"
"Profanity" — text containing profanity
Content Topics
"BooksTopic" ▪ "CareerAndMoneyTopic" ▪ "FamilyAndFriendsTopic" ▪ "FashionTopic" ▪ "FitnessTopic" ▪ "FoodAndDrinkTopic" ▪ "HealthTopic" ▪ "LeisureTopic" ▪ "MoviesTopic" ▪ "MusicTopic" ▪ "PersonalMoodTopic" ▪ "PetsAndAnimalsTopic" ▪ "PoliticsTopic" ▪ "QuotesAndLifePhilosophyTopic" ▪ "RelationshipsTopic" ▪ "SchoolAndUniversityTopic" ▪ "SocialMediaTopic" ▪ "SpecialOccasionsTopic" ▪ "SportsTopic" ▪ "TechnologyTopic" ▪ "TelevisionTopic" ▪ "TransportTopic" ▪ "TravelTopic" ▪ "VideoGamesTopic" ▪ "WeatherTopic"
Human Languages
"Afrikaans" ▪ "Albanian" ▪ "Amharic" ▪ "Arabic" ▪ "Armenian" ▪ "Azerbaijani" ▪ "Basque" ▪ "Bengali" ▪ "Bosnian" ▪ "Bulgarian" ▪ "Catalan" ▪ "Chinese" ▪ "Croatian" ▪ "Czech" ▪ "Danish" ▪ "Dutch" ▪ "English" ▪ "Esperanto" ▪ "Estonian" ▪ "Finnish" ▪ "French" ▪ "Georgian" ▪ "German" ▪ "Greek" ▪ "Gujarati" ▪ "Hebrew" ▪ "Hindi" ▪ "Hungarian" ▪ "Icelandic" ▪ "InuktitutGreenlandic" ▪ "Italian" ▪ "Japanese" ▪ "Kannada" ▪ "Kazakh" ▪ "Khmer" ▪ "Korean" ▪ "Latvian" ▪ "Lithuanian" ▪ "Macedonian" ▪ "Majhi" ▪ "Malay" ▪ "Malayalam" ▪ "Mongolian" ▪ "Nepali" ▪ "NorwegianBokmal" ▪ "Persian" ▪ "Polish" ▪ "Portuguese" ▪ "Romanian" ▪ "Russian" ▪ "Serbian" ▪ "Sinhala" ▪ "Slovak" ▪ "Slovenian" ▪ "Spanish" ▪ "Swahili" ▪ "Swedish" ▪ "Tagalog" ▪ "Tamil" ▪ "Telugu" ▪ "Thai" ▪ "Turkish" ▪ "Ukrainian" ▪ "Urdu" ▪ "UzbekNorthern" ▪ "Vietnamese" ▪ "Welsh"
Programming Languages
"ABAP" ▪ "Ada" ▪ "AWK" ▪ "BourneShell" ▪ "C" ▪ "CPlusPlus" ▪ "CSharp" ▪ "COBOL" ▪ "CommonLisp" ▪ "D" ▪ "Dart" ▪ "Delphi" ▪ "Erlang" ▪ "FSharp" ▪ "Fortran" ▪ "Groovy" ▪ "Haskell" ▪ "Java" ▪ "JavaScript" ▪ "Logo" ▪ "Lua" ▪ "MATLAB" ▪ "ObjectiveC" ▪ "Perl" ▪ "PHP" ▪ "Prolog" ▪ "Python" ▪ "R" ▪ "Ruby" ▪ "Rust" ▪ "SAS" ▪ "Scala" ▪ "Scheme" ▪ "SQL" ▪ "Swift" ▪ "Tcl" ▪ "VBSCript" ▪ "VisualBasicNET" ▪ "WindowsPowerShell" ▪ "WolframLanguage"