Jaxon makes data prep, pipeline design, and neural architecture a snap by drastically reducing the need to have humans manually label examples for ML model training as well as automating and augmenting human input. Currently focused on text classifiers, Jaxon provides a platform for data science teams to build fully-trained models with a few hours of work and whatever machine time is needed for training (depending on the data), with better performance and way more control and transparency than AutoML.
Jaxon utilizes an F1 score as an optimization metric, and will display an estimated F1 based on sampling of the training set. Jaxon also provides a confusion matrix to hone in on where the model may be making errors. Training sets and their inherent accuracy are determined by the amount, quality, and breadth of data imported into Jaxon as it relates to the downstream model and use case. The best results are achieved when provided ground-truth labels are balanced and the training set – including unlabeled examples – is representative of the overall dataset. Jaxon’s synthetic labels (generated from the training set) increase downstream machine learning model and classifier accuracy and even better results can be achieved by using Jaxon iteratively to refine and enhance training data.
Jaxon trains classifiers using every word found in the training corpus, including any domain-specific language, slang, or common abbreviations and misspellings, and incorporates them into its label generation model(s). Jaxon goes even further by using the corpus as a pretraining set for neural networks as a way to increase accuracy and better fine-tune the parameters for each use case and terminology set.
Jaxon can be used in many ways. Jaxon synthetically labels datasets, cutting down on the time and expense required for manual labeling. The platform allows data science teams to create custom training schedules for neural networks, allowing them to maximize accuracy with just a few clicks. Jaxon also provides custom classifiers that can be immediately implemented into production. You can (and should!) use Jaxon from the very beginning of the data prep process through getting classifiers into production.
Jaxon is distributed in the form of a Docker stack and can be deployed on premises or in the cloud.
Jaxon is generally language agnostic and learns from statistical patterns discovered in a corpus rather than from the actual language(s) contained in the corpus. We currently support over 200 different languages.
(For example: "I'm feeling blue today" versus "The sky is a lovely shade of blue")
Assuming that both potential meanings of the word “blue” make sense for the domain, then the surrounding contextual words will help to differentiate between them. “Blue” by itself may be ambiguous, but Jaxon processes n-grams of varying lengths and can assemble distinct meaningful phrases such as “feeling blue” and “shade (of) blue” to distinguish meaning via surrounding words, just as human readers do.
Any type of text data in most common file formats can be ingested including text/JSON/XML, PDFs, and Microsoft Word documents. Text does not need to be curated or normalized before being run through Jaxon.
© Copyright 2020. All rights reserved.