pytext v0.3.3 Release Notes
Release Date: 2020-06-08 // about 6 years ago-
🆕 New features
- ➕ Add XLM-R document classification server + console (#1358)
- MLP layer embed for float tensors and
FloatListSeqTensorizerforList[List[[float]]features. (#1374) - ➕ Add
class_accuracyinMultiLabelSoftClassificationMetrics(#1371) - ➕ Add an option to skip test run after models have been trained (#1372)
- 👌 Support DP in PyText (#1366)
- Support torchscriptify in multi_label_classification_layer (#1350)
- ➕ Add custom metric class for reporting Joint model metrics (#1339)
- MultiLabel-MultiClass Model for Joint Sequence Tagging (#1335)
- 👍 Scripted tokenizer support for DocModel (#1314)
🛠 Bugfixes
- 🛠 Fixed metric reporter aggregation and output layer for the multi-label classification
- Remove
move_state_dict_to_gpu, which is causing CUDA OOM (#1367) - 🛠 Fix Flow's default conversion of dict to AttrDict
- 🛠 Fix bug in
ClassificationOutputLayerthatpad_idxis never respected (#1347) - 🛠 Serializing/Deserializing type Any: bugfix and simplification (#1344)
- 🛠 Fix RoBERTa Q&A Training Bug with multiple BoS tokens. (#1343)
Other
- 👍 Better error message for misconfigured data fields
- 🗄 Replace deprecated integer division with floor division operator
- ➕ Add informative prints to assert statements (#1360)
- TorchScript: Put dense tensor on the same device with other input tensors (#1361)
- ⚡️ Update PyTorch + ONNX (#1340)
- ⚡️ Update PyTorch + ONNX (#1340)- binary ONNX
- ⚡️ Update PR Template (#1349)
- ⬇️ Reduce memory request for pytext train operator
- ➕ Add 'contrib' directory for experimental code (#1333)
Previous changes from v0.3.2
-
🆕 New features
- ➕ Add Roberta model into BertPairwiseModel (#1336)
- 👌 Support read file from http URL (#1317)
- add a new PyText get_num_examples_from_batch function in model (#1319)
- ➕ Add support for length label smoothing (#1308)
- ➕ Add new metrics type for Masked Seq2Seq Joint Model (#1304)
- ➕ Add mask generator and strategy (#1302)
- ➕ Add separate logging for label loss and length loss (#1294)
- ➕ Add tensorizer support for masking of target tokens (#1297)
- ➕ Add length prediction and basic masked generator (#1290)
- Add self attention option to conv_encoder and conv_decoder (#1291)
- Entity Saliency modeling on PyText: EntitySalienceMetricReporter/EntitySalienceTask
- In-batch negative training for BertPairwiseModel
- 👌 Support embedding from decoder (#1284)
- ➕ Add dense features to Roberta
- ➕ Add projection layer to HuggingFace encoder (#1273)
- ➕ add PyText Embedding TorchScript Wrapper
- ➕ Add option to pad missing label in LabelListTensorizer (#1269)
- ↔ Integrate PET and Introduce ElasticTrainer (#1266)
- 👌 support PoolingType in DocNN. (#1259)
- ➕ Added WordSeqEmbedding (#1255)
- Open source Assistant NLU seq2seq model (#1236)
- 👌 Support multi label classification
- BART in decoupled model
🐛 Bug fixes
- 🛠 Fix Incorrect State Dict Assumption (#1326)
- 🐛 Bug fix for "RoBERTaTensorizer object has no attribute is_input" (#1334)
- Cast model output to cpu (#1329)
- 🛠 Fix OSS predict-py API (#1320)
- 🛠 Fix "calling median on empty tensor" issue in MR (#1322)
- ➕ Add ScriptDoNothingTokenizer so that torchscriptification of SPM does not fail (#1316)
- 🛠 Fix creating generator everytime (#1301)
- 🛠 fix dense feature for fp16
- 👀 Avoid edge cases with quantization by setting a known seed (#1295)
- 👉 Make torchscript predictions even on empty text / token inputs
- 🛠 fix dense feature TorchScript typing (#1281)
- avoid zero division error in metrics reporter (#1271)
- 🛠 Fix contiguous issue in bilstm export (#1270)
- 🛠 fix debug file generation for multilabel classification (#1247)
- 🛠 Fix fp16 optimizer attribute name
Other
- Simplify contextual embedding dimension computation in PyText (#1331)
- 🆕 New Debug File for masked seq2seq
- 🚚 Move MockConfigLoader to OSS (#1324)
- ⚡️ Pass in optimizer config instead of create_optimizer to trainer
- ✂ Remove unnecessary torch.no_grad() block (#1323)
- 🛠 Fix Memory Issues in Metric Reporter for Classification Tasks over large Label Spaces
- ➕ Add contextual embedding support to OS seq2seq model (#1299)
- recover xlm_r tutorial notebook (#1305)
- Enable controlling bias in MLP decoder
- Migrate serving tutorial to TorchScript (#1310)
- ✂ delete caffe2 export (#1307)
- ➕ add whitelist for ONNX export
- 👉 Use dynamic quantization api for BeamSearch (#1303)
- ✂ Remove requirement that eos/bos be supplied for sequence export. (#1300)
- 👍 Multicolumn support
- 👍 Multicolumn support in torchscriptify
- ➕ Add caching support to RawExample and batch predict API (#1298)
- ➕ Add save-pytext-snapshot command to PyText cmdline (#1285)
- ⚡️ Update with Whatsapp calling data + support dictionary features (#1293)
- add arrange_caffe2_model_inputs in BaseModel (#1292)
- ✅ Replace unit-tests on LMModel and FLLanguageModelingTask by LiteLMModel and FLLiteLMTask (#1296)
- 🔄 changes to make mbart work (#1911)
- 🖐 handle encoder and decoder embedding
- ➕ Add tutorial for semantic parsing. (#1288)
- ➕ Add new fb beam search with fused operator (#1287)
- 🏗 Move generator builder to constructor so that it can easily overridden. (#1286)
- Torchscriptify ELTensorizer (#1282)
- Torchscript export for Seq2Seq model (#1265)
- 🔄 Change Seq2Seq model from_config() to a more general api (#1280)
- add max_seq_len to DocNN TorchScript model (#1279)
- 👌 support XLM-R model Embedding in TorchScript (#1278)
- Generic PyText Checkpoint Manager Interface (#1267)
- 🛠 Fix backward compatibility issue of pad_missing in LabelListTensorizer (#1277)
- ⚡️ Update mean reduction in NLLLoss (#1272)
- migrate pages.integrity.scam.docnn_models.xxx (#1275)
- Unify model input for ByteTokensDocumentModel (#1274)
- Torchscriptify TokenTensorizer
- 👍 Allow dictionaries to overwrite entries with #fairseq:overwrite comment (#1073)
- 👉 Make WordSeqEmbedding ONNX compatible
- If the snapshot path provided is not valid, throw error (#1268)
- 👌 support vocab filter by min count
- Unify input for TorchScript Tensorizers and Models (#1256)
- Torchscriptify XLM-R
- ➕ Add class logging to task (#1264)
- ➕ Add usage logging to exporter (#1262)
- ➕ Add usage logging across models (#1263)
- 🌲 Usage logging on data classes (#1261)
- 👍 GPT2 BPE add lower casing support (#1260)
- FAISS Embedding Search Space [3/5]
- Return len of tokens of each sequence in SeqTokenTensorizer (#1254)
- Vocab Limited Pretrained Embedding [2/5] (#1248)
- ➕ add Stage.OTHERS and allow TB to print to a seperate prefix not in (TRAIN, TEST, EVAL) (#1258)
- ➕ Add option to skip 2 stage tokenizer and bpe decode sequences in the debug file (#1257)
- ➕ Add Testcase for Wordpiece Tokenizer (#1249)
- modify accuracy calculation for multi-label classification (#1244)
- Enable tests in pytext/config:pytext_all_config_test
- 🌲 Introduce Class Usage Logging (#1243)
- 👉 Make PyText compatible with Any type (#1242)
- 👉 Make dict_embedding Torchscript friendly (#1240)
- 👌 Support MultipleData for export and kd generation
- ✂ delete flaky/broken tests (#1238)
- ➕ Add support for returning start & end indices.