Text Content Types

Natural language processing functions such as TextCases, TextPosition and TextContents allow many different types of content to be identified in text. Some of these types of content are structural or grammatical, while others relate to semantic interpretation.

Containing define containers (e.g. sentences) for matches

Alternatives match to multiple types

Verbatim strings to match verbatim

StringExpression  ▪  RegularExpression

Structural Elements

"Word" word-like unit (usually delimited by whitespace or punctuation)

"Sentence" sentence-like unit (usually delimited by punctuation marks)

"Paragraph" paragraph-like unit (delimited by multiple newlines)

"Quotation" quotation delimited by quotation marks

"Line" substring delimited by a newline

"NonText" characters that are not ordinary letter-like text

"Punctuation" punctuation mark

"Whitespace" sequence of whitespace characters

"Emoticon" emoticon (e.g. smiley faces)

Parts of Speech

"Noun"  ▪  "Verb"  ▪  "Adjective"  ▪  "Adverb"  ▪  "Pronoun"  ▪  "Preposition"  ▪  "Conjunction"  ▪  "Determiner"  ▪  "Interjection"

"ProperNoun" a proper noun, typically beginning with a capital letter

"WhPronoun"  ▪  "WhAdverb"  ▪  "WhDeterminer"

"Punctuation"  ▪  "PossessiveModifier"  ▪  "ListItemMarker"  ▪  "Symbol"  ▪  "ForeignWord"

Phrase Types

"NounPhrase"  ▪  "VerbPhrase"  ▪  "AdjectivePhrase"  ▪  "AdverbPhrase"  ▪  "PrepositionalPhrase"  ▪  "ConjunctionPhrase"

"WhNounPhrase"  ▪  "WhAdjectivePhrase"  ▪  "WhAdverbPhrase"  ▪  "WhPrepositionalPhrase"

"NounPhraseHead"  ▪  "QuantifierPhrase"  ▪  "UnlikeCoordinatedPhrase"

"Clause"  ▪  "ReducedRelativeClause"

"Sentence"  ▪  "Fragment"  ▪  "Parenthetical"  ▪  "ListMarker"

Quantitative Elements

"Number" number (e.g. "67", "6.78", "6.78e+10", "two thousand")

"Quantity" quantity with units (e.g. "4.5 km", "10 ft. 6 in.", "30C", "7 m/s", "three kilometers")

"Unit" units (e.g. "km", "ft.", "m/s", "kilometers")

"CurrencyAmount" currency amount (e.g. "$5", "45 pesos", "10.25 GBP", "seven euros")

"Color" textually described color (e.g. "light blue")

Time & Place Elements

"Date" date or date element (e.g. day, month, year, century)

"Location" named geographic location (e.g. "New York", "France")

"LocationEntity" named geographic location with entity interpretation

Identification Elements

"EmailAddress"  ▪  "IPAddress"  ▪  "PhoneNumber"  ▪  "URL"  ▪  "ZIPCode"

"TwitterHandle" Twitter handle (e.g. "@Wolfram")

Entities

Entity match a specific entity of any type, for instance:

Geographic Entities

"Country"  ▪  "AdministrativeDivision"  ▪  "City"  ▪  "Neighborhood"  ▪  "MetropolitanArea"  ▪  "GeographicRegion"

"Ocean"  ▪  "Island"  ▪  "UnderseaFeature"  ▪  "Reef"  ▪  "Beach"  ▪  "Lake"  ▪  "Mountain"  ▪  "Volcano"  ▪  "River"  ▪  "Waterfall"  ▪  "EarthImpact"  ▪  "Desert"  ▪  "Forest"

"Airport"  ▪  "Park"  ▪  "AmusementPark"  ▪  "AmusementParkRide"  ▪  "Stadium"

"Bridge"  ▪  "Canal"  ▪  "Tunnel"  ▪  "Dam"  ▪  "Mine"  ▪  "Cave"  ▪  "OilField"  ▪  "Building"  ▪  "Castle"  ▪  "Cemetery"  ▪  "HistoricalSite"  ▪  "ReserveLand"  ▪  "Shipwreck"

"University"  ▪  "SchoolDistrict"  ▪  "PublicSchool"  ▪  "PrivateSchool"  ▪  "Museum"  ▪  "LibraryBranch"  ▪  "LibrarySystem"

"WeatherStation"  ▪  "AstronomicalObservatory"  ▪  "ParticleAccelerator"  ▪  "NuclearReactor"  ▪  "NuclearTestSite"  ▪  "NuclearExplosion"

"TimeZone"

Astronomical Entities

"Planet"  ▪  "PlanetaryMoon"  ▪  "MinorPlanet"  ▪  "Comet"  ▪  "SolarSystemFeature"  ▪  "MeteorShower"  ▪  "Exoplanet"

"Star"  ▪  "Galaxy"  ▪  "StarCluster"  ▪  "Nebula"  ▪  "Supernova"  ▪  "Pulsar"  ▪  "AstronomicalRadioSource"  ▪  "Constellation"

Space-Related

"Satellite"  ▪  "Rocket"  ▪  "DeepSpaceProbe"  ▪  "MannedSpaceMission"

Weather & Earth Science

"WeatherStation"  ▪  "TropicalStorm"  ▪  "Cloud"  ▪  "AtmosphericLayer"

"GeologicalLayer"  ▪  "GeologicalPeriod"  ▪  "Mineral"  ▪  "FamousGem"

Transportation-Related

"Aircraft"  ▪  "Airline"  ▪  "Airport"  ▪  "Ship"

Engineering & Structures

"BroadcastStation"  ▪  "MeasurementDevice"

"Building"  ▪  "Bridge"  ▪  "Tunnel"  ▪  "Dam"  ▪  "Mine"

Culture & Entertainment

"Language"  ▪  "Religion"  ▪  "Mythology"

"Movie"  ▪  "MusicAct"  ▪  "MusicAlbum"  ▪  "MusicWork"  ▪  "BroadcastStation"

"Book"  ▪  "Artwork"  ▪  "Periodical"  ▪  "FictionalCharacter"

"Museum"  ▪  "LibraryBranch"  ▪  "LibrarySystem"

Activities & Hobbies

"MusicalInstrument"  ▪  "SportObject"  ▪  "BoardGame"

Food & Nutrition

"Food"  ▪  "FoodBrandName"  ▪  "FoodManufacturer"  ▪  "FoodSubBrandName"

Finance

"Company"  ▪  "Financial"

Person & Personal Attributes

"Person"  ▪  "GivenName"  ▪  "Surname"  ▪  "PersonTitle"  ▪  "Occupation"

History-Related

"HistoricalCountry"  ▪  "HistoricalSite"

Linguistic Entities

"Language"  ▪  "Alphabet"  ▪  "WritingScript"

Physical Science

"Chemical"  ▪  "Element"  ▪  "Particle"  ▪  "Mineral"

"FamousPhysicsProblem"  ▪  "FamousChemistryProblem"

Life Science

"Gene"  ▪  "Protein"

Medical Entities

"AnatomicalStructure"  ▪  "Disease"  ▪  "MedicalTest"  ▪  "Protein"

Organism Types

"Plant"  ▪  "Species"  ▪  "DogBreed"  ▪  "CatBreed"  ▪  "Dinosaur"

Mathematical Entities

"Polyhedron"  ▪  "Surface"  ▪  "SpaceCurve"  ▪  "Graph"  ▪  "FiniteGroup"  ▪  "IntegerSequence"

"FamousMathProblem"  ▪  "FamousMathGame"

Computing-Related

"NotableComputer"  ▪  "ProgrammingLanguage"

Language Styles & Sentiments

"PositiveSentiment"  ▪  "NegativeSentiment"  ▪  "NeutralSentiment"

"Profanity" text containing profanity

Content Topics

"BooksTopic"  ▪  "CareerAndMoneyTopic"  ▪  "FamilyAndFriendsTopic"  ▪  "FashionTopic"  ▪  "FitnessTopic"  ▪  "FoodAndDrinkTopic"  ▪  "HealthTopic"  ▪  "LeisureTopic"  ▪  "MoviesTopic"  ▪  "MusicTopic"  ▪  "PersonalMoodTopic"  ▪  "PetsAndAnimalsTopic"  ▪  "PoliticsTopic"  ▪  "QuotesAndLifePhilosophyTopic"  ▪  "RelationshipsTopic"  ▪  "SchoolAndUniversityTopic"  ▪  "SocialMediaTopic"  ▪  "SpecialOccasionsTopic"  ▪  "SportsTopic"  ▪  "TechnologyTopic"  ▪  "TelevisionTopic"  ▪  "TransportTopic"  ▪  "TravelTopic"  ▪  "VideoGamesTopic"  ▪  "WeatherTopic"

Human Languages

"Afrikaans"  ▪  "Albanian"  ▪  "Amharic"  ▪  "Arabic"  ▪  "Armenian"  ▪  "Azerbaijani"  ▪  "Basque"  ▪  "Bengali"  ▪  "Bosnian"  ▪  "Bulgarian"  ▪  "Catalan"  ▪  "Chinese"  ▪  "Croatian"  ▪  "Czech"  ▪  "Danish"  ▪  "Dutch"  ▪  "English"  ▪  "Esperanto"  ▪  "Estonian"  ▪  "Finnish"  ▪  "French"  ▪  "Georgian"  ▪  "German"  ▪  "Greek"  ▪  "Gujarati"  ▪  "Hebrew"  ▪  "Hindi"  ▪  "Hungarian"  ▪  "Icelandic"  ▪  "InuktitutGreenlandic"  ▪  "Italian"  ▪  "Japanese"  ▪  "Kannada"  ▪  "Kazakh"  ▪  "Khmer"  ▪  "Korean"  ▪  "Latvian"  ▪  "Lithuanian"  ▪  "Macedonian"  ▪  "Majhi"  ▪  "Malay"  ▪  "Malayalam"  ▪  "Mongolian"  ▪  "Nepali"  ▪  "NorwegianBokmal"  ▪  "Persian"  ▪  "Polish"  ▪  "Portuguese"  ▪  "Romanian"  ▪  "Russian"  ▪  "Serbian"  ▪  "Sinhala"  ▪  "Slovak"  ▪  "Slovenian"  ▪  "Spanish"  ▪  "Swahili"  ▪  "Swedish"  ▪  "Tagalog"  ▪  "Tamil"  ▪  "Telugu"  ▪  "Thai"  ▪  "Turkish"  ▪  "Ukrainian"  ▪  "Urdu"  ▪  "UzbekNorthern"  ▪  "Vietnamese"  ▪  "Welsh"

Programming Languages

"ABAP"  ▪  "Ada"  ▪  "AWK"  ▪  "BourneShell"  ▪  "C"  ▪  "CPlusPlus"  ▪  "CSharp"  ▪  "COBOL"  ▪  "CommonLisp"  ▪  "D"  ▪  "Dart"  ▪  "Delphi"  ▪  "Erlang"  ▪  "FSharp"  ▪  "Fortran"  ▪  "Groovy"  ▪  "Haskell"  ▪  "Java"  ▪  "JavaScript"  ▪  "Logo"  ▪  "Lua"  ▪  "MATLAB"  ▪  "ObjectiveC"  ▪  "Perl"  ▪  "PHP"  ▪  "Prolog"  ▪  "Python"  ▪  "R"  ▪  "Ruby"  ▪  "Rust"  ▪  "SAS"  ▪  "Scala"  ▪  "Scheme"  ▪  "SQL"  ▪  "Swift"  ▪  "Tcl"  ▪  "VBSCript"  ▪  "VisualBasicNET"  ▪  "WindowsPowerShell"  ▪  "WolframLanguage"