spaCy v3.0.0.rc2 Release Notes
Release Date: 2020-10-26 // over 5 years ago-
No data yet 😐
You can check the official repo
Previous changes from v3.0.0.rc1
-
🍱 > 🌙 This release is a nightly pre-release and not intended for production yet. We recommend using a new virtual environment. For more details on the new features and usage guides, see the v3 documentation.
🍱 🚀 Quickstart
pip install -U spacy-nightly --pre- Introducing spaCy v3.0 nightly
- 🆕 New in v3.0: New features, backwards incompatibilities and migration guide.
- Installation Quickstart: Install the new version, pipelines and add-ons for your specific setup.
- Training Quickstart: Generate a training config for your specific use case.
- Benchmarks: Results and accuracy comparisons.
- Projects & Project Templates: Get started by cloning a project template.
🍱 ✨ New features and improvements
- 👍 Transformer-based pipelines with support for multi-task learning.
- Retrained model families for 16 languages and 52
trained pipelines in total, including 6 transformer-based pipelines. - 🆕 New training workflow and config system.
- Implement custom models using any machine learning framework, including PyTorch, TensorFlow and MXNet.
- 🚀 spaCy Projects for managing end-to-end multi-step workflows from preprocessing to model deployment.
- Integrations with Data Version Control (DVC), Streamlit, Weights & Biases, Ray and more.
- Parallel training and distributed computing with Ray.
- 🆕 New built-in pipeline components:
SentenceRecognizer,Morphologizer,Lemmatizer,AttributeRulerandTransformer. - 🆕 New and improved pipeline component API and decorators for custom components.
- Source trained components from other pipelines in your training config.
- 📜
DependencyMatcherfor matching patterns within the dependency parse using Semgrex operators. - 👌 Support for greedy patterns in
Matcher. - Type hints and type-based data validation for custom registered functions.
- Various new methods, attributes and commands.
🍱 ⚠️ Backwards incompatibilities
For more info on how to migrate from spaCy v2.x , see the detailed migration guide.
API changes
- Pipeline package symlinks, the
linkcommand and shortcut names are now deprecated. There can be many different trained pipelines and not just one "English model", so you should always use the full package name likeen_core_web_smexplicitly. - A pipeline's
meta.jsonis now only used to provide meta information like the package name, author, license and labels. It's not used to construct the processing pipeline anymore. This is all defined in theconfig.cfg, which also includes all settings used to train the pipeline. - The
train,pretrainanddebug datacommands now only take aconfig.cfg. Language.add_pipenow takes the string name of the component factory instead of the component function.- Custom pipeline components now need to be decorated with the
@Language.componentor@Language.factorydecorator. - ⚡️ The
Language.update,Language.evaluateandTrainablePipe.updatemethods now all take batches ofExampleobjects instead ofDocandGoldParseobjects, or raw text and a dictionary of annotations. - The
begin_trainingmethods have been renamed toinitializeand now take a function that returns a sequence ofExampleobjects to initialize the model instead of a list of tuples. Matcher.addandPhraseMatcher.addnow only accept a list of patterns as the second argument (instead of a variable number of arguments). Theon_matchcallback becomes an optional keyword argument.- 📜 The
Docflags likeDoc.is_parsedorDoc.is_taggedhave been replaced byDoc.has_annotation. - The
spacy.goldmodule has been renamed tospacy.training. - 🚚 The
PRON_LEMMAsymbol and-PRON-as an indicator for pronoun lemmas has been removed. - The
TAG_MAPandMORPH_RULESin the language data have been replaced by the more flexibleAttributeRuler. - 0️⃣ The
Lemmatizeris now a standalone pipeline component and doesn't provide lemmas by default or switch automatically between lookup and rule-based lemmas. You can now add it to your pipeline explicitly and set its mode on initialization. - Various keyword arguments across functions and methods are now explicitly declared as keyword-only arguments. Those arguments are documented accordingly across the API reference.
✂ Removed or renamed API
🚚 | Removed | Replacement | | --- | --- | |
Language.disable_pipes|Language.select_pipes,Language.disable_pipe,Language.enable_pipe| |Language.begin_training,Pipe.begin_training, ... |Language.initialize,Pipe.initialize, ... | 🏷 |Doc.is_tagged,Doc.is_parsed, ... |Doc.has_annotation| 📜 |GoldParse|Example| |GoldCorpus|Corpus| |KnowledgeBase.load_bulk,KnowledgeBase.dump|KnowledgeBase.from_disk,KnowledgeBase.to_disk| |Matcher.pipe,PhraseMatcher.pipe| not needed | |gold.offsets_from_biluo_tags,gold.spans_from_biluo_tags,gold.biluo_tags_from_offsets|training.biluo_tags_to_offsets,training.biluo_tags_to_spans,training.offsets_to_biluo_tags| |spacy init-model|spacy init vectors| |spacy debug-data|spacy debug data| |spacy profile|spacy debug profile| |spacy link,util.set_data_path,util.get_data_path| not needed, symlinks are deprecated |🗄 The following deprecated methods, attributes and arguments were removed in v3.0. Most of them have been deprecated for a while and many would previously raise errors. Many of them were also mostly internals. If you've been working with more recent versions of spaCy v2.x, it's unlikely that your code relied on them.
🚚 | Removed | Replacement | | --- | --- | |
Doc.tokens_from_list|Doc. __init__| 🔀 |Doc.merge,Span.merge|Doc.retokenize| |Token.string,Span.string,Span.upper,Span.lower|Span.text,Token.text| |Language.tagger,Language.parser,Language.entity|Language.get_pipe| | keyword-arguments likevocab=Falseonto_disk,from_disk,to_bytes,from_bytes|exclude=["vocab"]| |n_threadsargument onTokenizer,Matcher,PhraseMatcher|n_process| 🌲 |verboseargument onLanguage.evaluate| logging (DEBUG) | |SentenceSegmenterhook,SimilarityHook| user hooks,Sentencizer,SentenceRecognizer|