-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Summary
The idea is to use smart channel label embeddings, so that the model learns relationships about the different channel name's meanings.
Motivation
This is inspired by word2vec and other word embeddings. The basic idea is that the embeddings should somehow represent properties related to the channel name. A popular example of this behavior is that king - man + woman is approximately a queen, which applies to their embeddings. The idea of this is that the model would know what "material X density" is because it would be represented as material x + density within the model.
Affected Area
This would affect the area of the code where embeddings are generated for the channels.
Proposed Approach
There are four possible approaches for making embeddings:
- Simple: Generate a random embedding for each material and each property. For example, say that material X had an embedding A and density had an embedding B. To generate the embedding for material X density, concatenate A and B. The concern here is that the model would not be built for new materials. For example, if someone is trying to model with plastic (such as ABS), then the model would not know what to do with the embeddings.
- Smart: Generate a smart embedding for each material and each property, then concatenate them. The way to generate smart embeddings is TBD. For materials, it may involve running them through a neural network where relevant material properties are given as inputs (for example). This type of approach is most robust to the model needing to model with a new material.
- Super smart: Generate a smart embedding for each channel, ensuring that they have appropriate properties.
- Smart material, simple properties: Build a model that generates embeddings for each materials. For the properties (such as density), use an existing approach, such as using an existing embedding (like word2vec), using a random embedding (not recommended for long-term use, as the model may form relationships that don't exist), and using one-hot encoding.
The smart material simple properties is likely the approach to be used. The physics to be used is TBD based on the material properties to be modeled.
Alternatives Considered
- Concatenated One-Hot Encoding: Generate a one-hot encoding for the material and the property, then concatenate them together. This approach is not scalable, as memory cost is O(c), where c is number of channels. It is also very, very sparse, which is harder for neural networks to learn.
- Existing embeddings: Use an existing embedding engine for all components, including the materials. Given that existing embedding models likely have far less information about materials, the model would suffer. For example, it would know that copper and steel are both metals, but it may not understand the difference between the two.