Jump to content

NLP++

From Wikipedia, the free encyclopedia
NLP++
The NLP++ Logo
ParadigmsNatural-language processing
Designed byAmnon Meyers, David de Hilster
DeveloperText Analysis International
First appeared1998; 27 years ago (1998)
Platform Cross-platform
OSLinux, Windows, Mac
LicenseMIT
Filename extensions.nlp, .dict, .kbb, .txxt
Websitewww.visualtext.com

NLP++ is a computer programming language for natural language processing created by Amnon Meyers and David de Hilster in 1998. It operates on an input text via multiple passes that elaborate a best-first parse tree. It can access and update a hierarchical knowledge base management system (KBMS).[1] NLP++ is deployed with an Integrated Development Environment (IDE) , which supports development of text analyzers. NLP++ is one of the only computer languages exclusively dedicated to natural language processing.

Overview

[edit]

NLP++ is a computer language dedicated to building natural language text analyzers. It allows programmers to capture and apply linguistic and world knowledge, emulating processes by which humans read and understand text.[2] NLP++ combines bottom up, island-driven, recursive grammar, and other methods in a multi-pass architecture that operates on one parse tree. It works with a hierarchical knowledge base (KB), called Conceptual Grammar (CG), to dynamically build and use stored knowledge in analyzing text. Applications range from simple syntactic processing to full natural language understanding.[3][4] VisualText[5] is a developer's environment that exploits NLP++ and CG to rapidly elaborate text analyzers. Passes and KBs from one analyzer may be exploited to more rapidly construct and tailor new text analyzers.

NLP++

[edit]

NLP++ includes functions, rules, operators, and variables specific to its internal representations of text and knowledge. NLP++ comprises general C or C++-like programming language constructs, as well as integrally addressing rule matches and the associated knowledge base.


Variables

[edit]

Variables are written with a single letter and a string name. Special variable types in NLP++ apply to specific contexts.

Variable Description Example Scope
N Specific node N("$text",2) Rules
S Suggested node S("count") Rules
X Context node and level X("concept",3) Rules
G Global variable G("People") Rules & Functions
L Local variable L("num") Rules & Functions

Regions

[edit]

There are numerous regions in NLP++ files:

Region Description Position and Scope
@NODES Specifies the nodes to be matched in the @RULES region Comes before the @RULES region
@PATH Specifies a specific path in the syntax tree to match Comes before the @RULES region
@CODE Specifies a region where NLP++ code is executed outside of a @RULES region Region ends with @@CODE
@DECL Declarative area for functions Region ends with @@DECL
@POST Specifies a region of post processing for a rule or rules Comes right before the @RULES region
@PRE Specifies a region of preprocessing for a rule or rules Comes right before the @RULES region
@CHECK Specifies certain conditions on rule nodes before trying to match the rule Comes right before the @POST or @RULES region
@RULES Specifies a region for rules Region ends with @@

Rules

[edit]

NLP++ has rules for pattern matching. A rule is written in the form of "@RULES _node <- a b c @@" where "<-" where a, b, and c are match and put under the new node "_node". Here is an example of a rule.

@POST
S("count") = N("$text",2);
S("concept") = makeconcept(G("Counts"),N("$text",1));
single();

@RULES
_count <-
    _xALPHA [s]  ### (1)
    _xNUM [s]    ### (2)
    @@

Conceptual Grammar

[edit]

The conceptual grammar is a hierarchical knowledge base that can be imported and used by NLP++ and also created by NLP++ code and pattern matching. The hierarchy contains concepts and concepts can have attributes and phrases attached to them.

VisualText

[edit]
VisualText version 2 for Windows

VisualText is an IDE that is specifically built to edit, run, and debug NLP++ text analyzers[6]. It includes a text director of texts to process, a special editor for NLP++, text highlighting of matching rules for each sequential pass of rule patterns, and tree visualizations for the syntactic tree as well as the hierarchical knowledge base. It also has the ability to quickly generate rules directly from text.

VisualText VSCode NLP++ Language Extension

The VisualText IDE has been ported to VSCode as a language extension for NLP++ which runs cross platform[7]. This is now considered to be the current version of the IDE. It was officially released as a Microsoft VSCode Language Extension on December 22, 2020 whose source code can be found in the VSCode-NLP repository on GitHub[8].

History

[edit]

The roots of NLP++ come from its two creators, Amnon Meyers and David de Hilster who are computer programmers working in the area of natural language processing since the early 1980s.

For two decades, the technology was privately owned[9][10] and was licensed by private companies to process medial, social media, historical documents, and real estate text.[11]

Open Source

[edit]

In December of 2018, NLP++ and VisualText became open source.

References

[edit]
  1. ^ "What is a knowledge base management system?". 26 November 2024.
  2. ^ Chikkarangaiah, Jayanth; Uday, Adarsh; De Hilster, David; Gangadhar, Shobha; Shetty, Jyoti (2024). "Enhancing the English natural language processing dictionary using natural language processing++". Iaes International Journal of Artificial Intelligence (Ij-Ai). 13 (3): 3466. doi:10.11591/ijai.v13.i3.pp3466-3477.
  3. ^ Ashton Williamson; David de Hilster; Amnon Meyers; Nina Hubig; Amy Apon (2024-08-16). "Low-resource ICD Coding of Hospital Discharge Summaries" (PDF). Proceedings of the 23rd Workshop on Biomedical Language Processing. pp. 548–558.
  4. ^ Pedro Lima Rodrigues; Renato de Oliveira Moraes; Hugo Watanuki; David de Hilster. "Emotions detection in social media posts" (PDF).
  5. ^ "NLP++ – VisualText". 24 November 2017.
  6. ^ "Description of the INLET System Used for MUC- 3" (PDF).
  7. ^ "NLP++ Language Extension for VSCode".
  8. ^ "GitHub Repository for NLP++ Language Extension for VSCode".
  9. ^ "Text Analysis International: About".
  10. ^ "Text Analysis International: News".
  11. ^ "Text Analysis International: Customers".
[edit]