Skip to content

[Code Idea]: Generate smart channel label embeddings #53

@galegozi

Description

@galegozi

Summary

The idea is to use smart channel label embeddings, so that the model learns relationships about the different channel name's meanings.

Motivation

This is inspired by word2vec and other word embeddings. The basic idea is that the embeddings should somehow represent properties related to the channel name. A popular example of this behavior is that king - man + woman is approximately a queen, which applies to their embeddings. The idea of this is that the model would know what "material X density" is because it would be represented as material x + density within the model.

Affected Area

This would affect the area of the code where embeddings are generated for the channels.

Proposed Approach

There are four possible approaches for making embeddings:

  1. Simple: Generate a random embedding for each material and each property. For example, say that material X had an embedding A and density had an embedding B. To generate the embedding for material X density, concatenate A and B. The concern here is that the model would not be built for new materials. For example, if someone is trying to model with plastic (such as ABS), then the model would not know what to do with the embeddings.
  2. Smart: Generate a smart embedding for each material and each property, then concatenate them. The way to generate smart embeddings is TBD. For materials, it may involve running them through a neural network where relevant material properties are given as inputs (for example). This type of approach is most robust to the model needing to model with a new material.
  3. Super smart: Generate a smart embedding for each channel, ensuring that they have appropriate properties.
  4. Smart material, simple properties: Build a model that generates embeddings for each materials. For the properties (such as density), use an existing approach, such as using an existing embedding (like word2vec), using a random embedding (not recommended for long-term use, as the model may form relationships that don't exist), and using one-hot encoding.

The smart material simple properties is likely the approach to be used. The physics to be used is TBD based on the material properties to be modeled.

Alternatives Considered

  1. Concatenated One-Hot Encoding: Generate a one-hot encoding for the material and the property, then concatenate them together. This approach is not scalable, as memory cost is O(c), where c is number of channels. It is also very, very sparse, which is harder for neural networks to learn.
  2. Existing embeddings: Use an existing embedding engine for all components, including the materials. Given that existing embedding models likely have far less information about materials, the model would suffer. For example, it would know that copper and steel are both metals, but it may not understand the difference between the two.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions