Discover how to master advanced spaCy techniques, including custom pipelines, LLM integration, and model training, to build NLP solutions efficiently and effectively
Key Features
- Build End-to-End NLP Workflows, From Local Development to Production with Weasel and FastAPI
- Master No-Training NLP Development with spaCy-LLM, From Prompt Engineering to Custom Tasks
- Create Advanced NLP Solutions, From Custom Components to Neural Coreference Resolution
Book Description
Mastering spaCy, Second Edition is your comprehensive guide to building sophisticated NLP applications using the spaCy ecosystem. This revised edition embraces the latest advancements in NLP, featuring new chapters on Large Language Models with spaCy-LLM, transformers integration, and end-to-end workflow management with Weasel.
With this new edition you’ll learn to enhance NLP tasks using LLMs with spaCy-llm, manage end-to-end workflows using Weasel and integrating spaCy with third-party libraries like Streamlit, FastAPI, and DVC. From training custom named entity recognition (NER) pipelines to categorizing emotions in Reddit posts, readers will explore advanced topics like text classification and coreference resolution. This book takes you on a journey through spaCy’s capabilities, starting with the fundamentals of NLP, such as tokenization, named entity recognition, and dependency parsing. As you progress, you’ll delve into advanced topics like creating custom components, training domain-specific models, and building scalable NLP workflows.
By end of the book, through practical examples, clear explanations, tips and tricks you will be empowered to build robust NLP pipelines and integrate them with web applications to build end-to-end solutions.
What you will learn
- Apply transformer models and fine-tune them for specialized NLP tasks
- Master spaCy core functionalities including data structures and processing pipelines
- Develop custom pipeline components and semantic extractors for domain-specific needs
- Build scalable applications by integrating spaCy with FastAPI, Streamlit, and DVC
- Master advanced spaCy features including coreference resolution and neural pipeline components
- Train domain-specific models, including NER and coreference resolution
- Prototype rapidly with spaCy-LLM and develop custom LLM tasks
Who this book is for
This book is tailored for NLP engineers, machine learning developers, and LLM engineers looking to build production-grade language processing solutions. While primarily targeting professionals working with language models and NLP pipelines, it's also valuable for software engineers transitioning into NLP development. Basic Python programming knowledge and familiarity with NLP concepts is recommended to leverage spaCy's latest capabilities.
Table of Contents
- Getting started with spaCy
- Exploring spaCy Core Operations
- Extracting Linguistic Features
- Mastering Rule-Based Matching
- Extracting Semantic Representations with spaCy Pipelines
- Utilizing spaCy with Transformers
- Enhancing NLP tasks using LLMs with spacy-llm
- Training a NER pipeline component with spaCy
- Creating End-to-End spaCy Workflows with Weasel
- Training a Coreference Resolution pipeline
- Integrating spaCy with third-party libraries
Déborah is a data science consultant and writer. With a BSc in Computer Science from UFPE, one of Brazil's top computer science programs, she brings a diversified skill set refined through hands-on experience with various technologies. Déborah has thrived in different data science projects, including roles such as lead data scientist and technical contributor for respected publications. Her ability to translate complex concepts into simple language, coupled with her quick learning and broad vision, make her an effective educator. Actively engaged in community initiatives, she works to ensure equitable access to knowledge, reflecting her belief that technology is not a panacea, but a powerful tool for societal improvement when used for that purpose.
Duygu Altinok is a senior NLP engineer with 12 years of experience in almost all areas of NLP including search engine technology, speech recognition, text analytics, and conversational AI. She authored several publications in the NLP area at conferences such as LREC and CLNLP. She also enjoys working on open-source projects and is a contributor to the spaCy library. Duygu earned her undergraduate degree in Computer Engineering from METU, Ankara in 2010 and later earned her Master's degree in Mathematics from Bilkent University, Ankara in 2012. She is currently a senior engineer at German Autolabs with a focus on conversational AI for voice assistants. Originally from Istanbul, Duygu currently resides in Berlin, DE with her cute dog Adele.