At D&D we are constantly looking at ways to make the machine learning models we build more accessible and open them up to a larger audience. To that end we have looked at several projects in the pipeline and decided to bring forward the development of a tool that will do just this. It’s so […]
At D&D we are constantly looking at ways to make the machine learning models we build more accessible and open them up to a larger audience. To that end we have looked at several projects in the pipeline and decided to bring forward the development of a tool that will do just this. It’s so early in the thought stream that we don’t even have a name for it yet.
Too often we hear that people’s data is unique and you can’t productionise a machine learning model that will fit all different users and environments. If you were to turn it the other way around, and you made this a data preparation and ETL problem, rather than a ML problem you can see how this would become possible.
We have a large and growing ML library now, which would make a tool like this perfect for building out several production pipelines and utilise the models to the best of their capabilities. Allowing for deployment from one core model building tool.
Approaches to overcome the challenge
With the introduction of R services into the SQL stack, this has opened up new accessible avenues to tackle this very solution.
Though we are in the early phases we have a good idea around how the architecture will work. The tool has the following aims:
- Simple User Management
- Model Selection
- ETL from flat files (.csv) and SQL
- Train the data, in a ML partitioning method, from the tool
- Output Confusion Matrix of the model. Check https://www.draperanddash.com/machinelearning/2019/07/confusion-matrices-evaluating-your-classification-models/ for how to interpret a confusion matrix
- ETL Production data from csv and SQL Server
- Output production data
- Schedule the process – create custom model retraining schedules and refreshes on the production data
- On premise application. We have listened to our users and when dealing with sensitive data this is the quickest option for our current base
- Solid SQL backend
The tool’s architecture can be seen hereunder:
The next sections look at what the tool looks like (at this stage).
Early design ideas
There is still a lot of work to do with the interface and design, alongside still needing to work on the back end of the tool. However, the concept is sound and we are excited by what we have achieved so far.
The login screen has been designed to have encrypted security and will only allow registered users to use the tool. This is what the screen looks like in its initial development stage:
If the user is not registered then an error message will be received by the end user.
This is the driving force behind this tool. We have designed the tool to have a simple interface and enable the users to navigate around the builder. This is what it looks like presently:
The first option you have is to select the type of model required. This loads up our core stack of supervised machine learning models:
This includes all our current ML stock and new options will be made available as newer models are released. What else can you do whilst in the central console:
- Browse for a file location and select the training file needed for the specific model
- Load in the training data – see https://www.draperanddash.com/machinelearning/2019/08/deploying-a-trained-supervised-ml-model/ for further clarification of what training data is
- Load in production data
- Build a production model – this will build the predictive model with all the relevant model terms and clever stuff
- Schedule a model – this will allow the refresh scheduling of the model
Please let us know if you have any thoughts around the future direction and development of a tool like this and we will be sure to keep you up to date on our progress.