spaCyEx

spaCyEx is a powerful extension for spaCy, designed to make pattern matching as flexible and easy as using regular expressions. It builds upon the existing capabilities of spaCy's Matcher, enhancing it with a more accessible syntax for defining complex patterns. spaCyEx allows for intuitive and detailed text pattern specifications, perfect for extracting detailed linguistic features from texts.

Installation

You can install spaCyEx via pip:

pip install spacyex

Features

Dynamic Pattern Creation: Create complex token matching patterns using a simple string-based syntax.
Integration with spaCy: Leverage spaCy's Matcher capabilities to find sequences in text that match defined patterns.
Customizable Matching Rules: Define token attributes including text characteristics, lexical attributes, and grammatical properties.

Creating Patterns

Define patterns using a string syntax where each token and its attributes are encapsulated by parentheses. Token attributes are specified by key-value pairs, separated by an equals sign (=), and multiple attributes are divided by a pipe (|).

Syntax Examples

Single Attribute: (pos=NOUN)
Multiple Attributes: (pos=NOUN|lemma=run)
Using List Values: (lemma=in[run,walk])
Using Operators: (ent_type=person|op={2,3})

Pattern Matching

Once a pattern is defined, it can be used to search text for matches.

Usage

Here is a simple example to get started with spaCyEx:

import spacyex as se
import spacy

nlp = spacy.load("en_core_web_sm")
text = "John Smith runs fast, but Jacob Smith walks slowly."
pattern = "(ent_type=person|op={2}) (lemma=in[run,walk]) (pos=ADV)"

results = se.search(pattern, text, nlp)
for match in results:
    print(match[0].text, "Start:", match[1], "End:", match[2])

This code will match sequences in the text based on the defined pattern, using named entities, lemmas, and parts of speech.

Roadmap

Support for all dictionary properties in patterns.
Additional utilities and helper functions for more complex pattern scenarios.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dist		dist
images		images
spacyex.egg-info		spacyex.egg-info
spacyex		spacyex
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

spaCyEx

Installation

Features

Creating Patterns

Syntax Examples

Pattern Matching

Usage

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Languages

wjbmattingly/spacyex

Folders and files

Latest commit

History

Repository files navigation

spaCyEx

Installation

Features

Creating Patterns

Syntax Examples

Pattern Matching

Usage

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages