Skip to content

Commit 69b7275

Browse files
rachellj218Edd Wilder-James
and
Edd Wilder-James
authored
RFC: TensorFlow Official Model Garden Redesign (#130)
This should have been merged some time ago. The design is undergoing iteration from this original RFC. Thank you everyone for your conversation and contribution. * Create 20190802-model-garden-redesign.md * Update 20190802-model-garden-redesign.md * Update 20190802-model-garden-redesign.md * Update 20190802-model-garden-redesign.md * Update 20190802-model-garden-redesign.md * Update 20190802-model-garden-redesign.md * Status -> Accepted Co-authored-by: Edd Wilder-James <[email protected]>
1 parent cb1f36d commit 69b7275

File tree

1 file changed

+186
-0
lines changed

1 file changed

+186
-0
lines changed
+186
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# TensorFlow Official Model Garden Redesign
2+
3+
| Status | Accepted |
4+
:-------------- |:---------------------------------------------------- |
5+
| **Author(s)** | Jing Li ([email protected]), Hongkun Yu ([email protected]), Xiaodan Song ([email protected]) |
6+
| **Sponsor** | Edd Wilder-James ([email protected]) |
7+
| **Updated** | 2019-08-02 |
8+
9+
## Objective
10+
11+
This document presents a proposal to redesign TensorFlow official model garden.
12+
We aim to provide a central and reliable place to contain popular examples,
13+
state-of-the-art models and tutorials to demonstrate the best practice in TF2.0
14+
and illustrate real-world use cases.
15+
16+
## Motivation
17+
18+
The current [TF official model garden](https://github.com/tensorflow/models/tree/master/official)
19+
mainly has ad hoc support. Example models are implemented using mixed TensorFlow
20+
APIs in different coding styles and some of them have convergence and/or
21+
performance regression. With TensorFlow 2.0 launch, there’s a great desire to
22+
provide tensorflow users a clear and central place to showcase reliable TF2.0
23+
models with the best practices to follow.
24+
25+
We want to take this opportunity to substantially improve the state of the
26+
official model garden, and provide seamlessly end-to-end training and inference
27+
user experience on a wide range of accelerators and mobile device chips. We hope
28+
to encourage community to contribute innovations and improve TensorFlow
29+
efficiency and usability.
30+
31+
## User Benefit
32+
33+
We aim to provide the best modeling experience via this revamp effort:
34+
35+
* Usability and reliability
36+
* keep official models well-maintained and tested for both performance and
37+
convergence.
38+
* provide accessible model distribution via [TensorFlow Hub](https://www.tensorflow.org/hub) and share state-of-the-art research accomplishments.
39+
* make training on both GPU and TPU an easy switch.
40+
* provide reusable components for research and production.
41+
* End-to-end solutions
42+
* provide seamless end-to-end training and inference solutions, where inference covers serving on TPU, GPU, mobile and edge devices.
43+
* provide hyper parameter sets to tune models for various resource constraints.
44+
* provide solutions with hyper parameters to scale model training to TPU pods or multi-worker GPUs.
45+
* provide variants derived from standard models to tackle various practical tasks.
46+
47+
## Design Proposal
48+
49+
### Official model directory reorgnization
50+
51+
We are going to reorganize the official model directory to provide:
52+
53+
* common libraries, mainly two types:
54+
* Common training util library in TF2.0, model configuration and
55+
hyperparameter definition in a consistent style.
56+
* Model category related common library, e.g. primitives as basic building
57+
block for NLP models, or common networks like resnet, mobilenet. We will follow the fundamental design of Keras
58+
layer/network/model to define and utilize model building blocks.
59+
60+
**NOTE:** we are still figuring out what level of building block extraction would be the most useful and sharable
61+
during refactoring. Once we confirm the implementation is really useful, we will move it tensorflow/addons and/or tf.text.
62+
63+
* popular state-of-the-art (SOTA) models for end users as a product.
64+
* reference models for performance benchmark testing.
65+
* For models provided as SOTA models, we will share the network and
66+
modeling code, but have separate *main* modules. The main
67+
module for benchmark testing will have addtional flags and setups for
68+
performance testing.
69+
70+
The following table shows the detailed view of proposed model directory
71+
structure. The SOTA model list will be updated to cover more categories.
72+
73+
| Directory | Subdirectories | | Explainations |
74+
:-------------- |:---------------------|:--|:------------------------------ |
75+
| nlp | | | models/tasks for Natural Language Processing |
76+
| | modeling | | NLP modeling library |
77+
| | BERT | | |
78+
| | ALBERT | | |
79+
| | XLNET | | |
80+
| | Transformer | | |
81+
| | ... | | |
82+
| vision | | | models/tasks for Computer Vision |
83+
| | image_classification | | e.g. resnet, EfficientNet, ... |
84+
| | detection | | e.g. RetinaNet, Mask-RCNN, ... |
85+
| | ... | | |
86+
| recommendation| | | |
87+
| | NCF | | |
88+
| utils | | | Miscellaneous Utilities. |
89+
| | ... | | |
90+
| benchmarks | | | benchmark testing and reference models to validate tensorflow |
91+
| staging | | | Utilities not in TF core yet, and not suitable for tf addons |
92+
| r1 | | | tf1.x models and utils |
93+
| | utils | | |
94+
| | resnet50 | | |
95+
| | transformer | | |
96+
| | wide_deep | | |
97+
| | boosted_trees | | |
98+
99+
### Pretrained model repository
100+
101+
We are going to provide the pretrained models for research exploration and
102+
real-world application development. The plan is to integrate with [TensorFlow Hub](https://www.tensorflow.org/hub),
103+
where users can access the Hub modules and SavedModel for pretrained checkpoints and links to the code in the model
104+
garden.
105+
106+
### Convergence and Performance Testing
107+
108+
We have a benchmark testing framework to execute continuous performance and
109+
accuracy tests for TensorFlow on different types of accelerators. All official
110+
TF2.0 models are required to provide accuracy tests and these tests will be
111+
automatically expanded to performance tests for continuous regression testing
112+
and monitoring.
113+
114+
## Model Garden Sustainability
115+
116+
### Model Launch Criteria
117+
To ensure that official models are well-maintained and tested, we are going to enforce the following criteria for launching a new model in the official model garden, except for staging folder:
118+
119+
* Follow the best practice guideline for each model category.
120+
* Unit tests to verify the basics of the model.
121+
* Integrate the model to benchmark testing to ensure model’s accuracy should be on par with the original paper / SOTA results.
122+
* README with commands and procedures to reproduce the SOTA results, including:
123+
* Input data generation if necessary
124+
* Model execution, including all hyperparameters.
125+
126+
### Community contribution and staging
127+
128+
Due to fast ML development, we can’t possibly support all best-in-class models
129+
up to date on our own. We highly encourage users to contribute to the official
130+
model garden. After model garden refactoring (Phase 1), we plan to provide
131+
a full list of wanted models to tensorflow community and encourage tensorflow
132+
users to claim and contribute the models to the model garden.
133+
134+
We have different requirements from unifying interface, supporting all the chips
135+
and platforms and enabling benchmarks for reference models. Thus, we could have
136+
different stages of models. As we may have immediate needs to add some quick
137+
models for benchmark and debugging, we will provide a staging folder to host
138+
some drafts of SOTA or popular models. Once the staging models can converge and
139+
support major functionalities of standard official models, we can judge whether
140+
they meet the launch standard and migrate to official models or migrate them to
141+
benchmark references.
142+
143+
### Maintenance and Deprecation
144+
145+
Given the nature of this repository, old models may become less and less
146+
useful to the community as time goes on. In order to keep the repository
147+
sustainable, we will be performing bi-annual reviews of our models to ensure
148+
everything still belongs to the repo. For models to be retired, the current plan
149+
is to move them to the archive directory and these models won't run regression
150+
tests to ensure the quality and convergence.
151+
152+
The following details the policy for models in mature and staging phases:
153+
154+
* Models graduated from staging subdirectory
155+
156+
The models will be maintained by the model garden team. After we start to
157+
accept community contributions, we will put the contributors as model owners.
158+
159+
These models will have continuous convergence and performance testing to
160+
make sure no regression. In general, we won’t deprecate these models unless:
161+
* the model isn’t compatible with the TF APIs any more and have to be replaced by a new version
162+
* a strictly better model shows up and the old model isn't needed by the community/market.
163+
164+
* Models in staging:
165+
The model garden team will do quarterly review to check the status with the
166+
model contributors, such as:
167+
* model convergence
168+
* unit tests
169+
* convergence tests
170+
* coding style meets the TF2.0 best practice.
171+
If there’s no further commitment to improve the status in next 90 days, we
172+
will mark the model as deprecated, which is subject to be deleted.
173+
174+
### Official Model Releases
175+
We will do release for the model garden starting from TF 2.0. Unit tests and
176+
regression tests need to pass against the TF release. Deprecated models will be
177+
removed from the release branch.
178+
179+
We will also create pip package per release version.
180+
181+
## Milestones
182+
183+
| Phases | Milestones | Notes |
184+
|:-------- |:-----------------| :----------------------|
185+
| Phase_1 | 1. Finished directory reorganization. 2. Add common modeling library. 3. Have 2-3 SOTA models for both NLP and Vision. | Not accepting community contributions during refactorization.|
186+
| Phase_2 | Expand repository to cover more model types| Will accept community contributions on the solicited model list.|

0 commit comments

Comments
 (0)