Skip to content

Commit b455b12

Browse files
xnnpack-botMaratyszcza
authored andcommitted
Initial open-source release
PiperOrigin-RevId: 271685289
0 parents  commit b455b12

File tree

660 files changed

+258289
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

660 files changed

+258289
-0
lines changed

CONTRIBUTING.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# How to Contribute
2+
3+
We'd love to accept your patches and contributions to this project. There are
4+
just a few small guidelines you need to follow.
5+
6+
## Contributor License Agreement
7+
8+
Contributions to this project must be accompanied by a Contributor License
9+
Agreement. You (or your employer) retain the copyright to your contribution;
10+
this simply gives us permission to use and redistribute your contributions as
11+
part of the project. Head over to <https://cla.developers.google.com/> to see
12+
your current agreements on file or to sign a new one.
13+
14+
You generally only need to submit a CLA once, so if you've already submitted one
15+
(even if it was for a different project), you probably don't need to do it
16+
again.
17+
18+
## Code reviews
19+
20+
All submissions, including submissions by project members, require review. We
21+
use GitHub pull requests for this purpose. Consult
22+
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
23+
information on using pull requests.
24+
25+
## Community Guidelines
26+
27+
This project follows [Google's Open Source Community
28+
Guidelines](https://opensource.google.com/conduct/).

LICENSE

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
BSD License
2+
3+
For XNNPACK software
4+
5+
Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.
6+
Copyright 2019 Google LLC
7+
8+
Redistribution and use in source and binary forms, with or without modification,
9+
are permitted provided that the following conditions are met:
10+
11+
* Redistributions of source code must retain the above copyright notice, this
12+
list of conditions and the following disclaimer.
13+
14+
* Redistributions in binary form must reproduce the above copyright notice,
15+
this list of conditions and the following disclaimer in the documentation
16+
and/or other materials provided with the distribution.
17+
18+
* Neither the name Facebook nor the names of its contributors may be used to
19+
endorse or promote products derived from this software without specific
20+
prior written permission.
21+
22+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
23+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
24+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
25+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
26+
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
27+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
28+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
29+
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
30+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# XNNPACK
2+
3+
XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 (SSE2 level) platforms. XNNPACK is not intended for direct use by deep learning practitioners researchers; instead it provides low-level performance primitives for accelerating high-level machine learning frameworks, such as [MediaPipe](https://mediapipe.dev), [TensorFlow Lite](https://www.tensorflow.org/lite), and [TensorFlow.js](https://www.tensorflow.org/js).
4+
5+
## Supported Architectures
6+
7+
- ARM on Android, Linux, and iOS
8+
- ARM64 on Android, Linux, and iOS
9+
- WebAssembly MVP
10+
- WebAssembly SIMD (experimental)
11+
- x86 and x86-64 (up to SSE2 only) on Android, Linux, and Mac
12+
13+
## Operator Coverage
14+
15+
XNNPACK implements the following neural network operators:
16+
17+
- 2D Convolution (including grouped and depthwise)
18+
- 2D Deconvolution (AKA Transposed Convolution)
19+
- 2D Average Pooling
20+
- 2D Max Pooling
21+
- 2D ArgMax Pooling (Max Pooling + indices)
22+
- 2D Unpooling
23+
- Add (tensors of same shape)
24+
- Global Average Pooling
25+
- Channel Shuffle
26+
- Clamp (includes ReLU and ReLU6)
27+
- HardSwish
28+
- PReLU
29+
30+
All operators in XNNPACK support NHWC layout, but additionally allow custom stride along the **C**hannel dimension. Thus, operators can consume a subset of channels in the input tensor, and produce a subset of channels in the output tensor, providing a zero-cost Channel Split and Channel Concatenation operations.
31+
32+
## Acknowledgements
33+
34+
XNNPACK is a based on [QNNPACK](https://github.com/pytorch/QNNPACK) library. However, unlike QNNPACK, XNNPACK focuses entirely on floating-point operators, and its API is no longer compatible with QNNPACK.

bench/add.cc

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
// Copyright (c) Facebook, Inc. and its affiliates.
2+
// All rights reserved.
3+
//
4+
// Copyright 2019 Google LLC
5+
//
6+
// This source code is licensed under the BSD-style license found in the
7+
// LICENSE file in the root directory of this source tree.
8+
9+
#include <algorithm>
10+
#include <cmath>
11+
#include <functional>
12+
#include <random>
13+
#include <vector>
14+
15+
#include <xnnpack.h>
16+
17+
#include <benchmark/benchmark.h>
18+
19+
20+
static void add_nc_q8(benchmark::State& state) {
21+
const size_t batch_size = static_cast<size_t>(state.range(0));
22+
const size_t channels = static_cast<size_t>(state.range(1));
23+
24+
std::random_device random_device;
25+
auto rng = std::mt19937(random_device());
26+
auto u8rng = std::bind(std::uniform_int_distribution<uint8_t>(), rng);
27+
28+
std::vector<uint8_t> a(batch_size * channels);
29+
std::vector<uint8_t> b(batch_size * channels);
30+
std::vector<uint8_t> y(batch_size * channels);
31+
std::generate(a.begin(), a.end(), std::ref(u8rng));
32+
std::generate(b.begin(), b.end(), std::ref(u8rng));
33+
34+
xnn_status status = xnn_initialize();
35+
if (status != xnn_status_success) {
36+
state.SkipWithError("failed to initialize XNNPACK");
37+
return;
38+
}
39+
40+
xnn_operator_t add_op = nullptr;
41+
status = xnn_create_add_nc_q8(
42+
channels, channels /* a_stride */, channels /* b_stride */, channels /* sum_stride */,
43+
127 /* a:zero point */, 1.0f /* a:scale */,
44+
127 /* b:zero point */, 1.0f /* b:scale */,
45+
127 /* y:zero point */, 1.0f /* y:scale */,
46+
1 /* y:min */, 254 /* y:max */,
47+
0 /* flags */, &add_op);
48+
if (status != xnn_status_success || add_op == nullptr) {
49+
state.SkipWithError("failed to create Q8 Add operator");
50+
return;
51+
}
52+
53+
status = xnn_setup_add_nc_q8(
54+
add_op,
55+
batch_size,
56+
a.data(), b.data(), y.data(),
57+
nullptr /* thread pool */);
58+
if (status != xnn_status_success) {
59+
state.SkipWithError("failed to setup Q8 Add operator");
60+
return;
61+
}
62+
63+
for (auto _ : state) {
64+
status = xnn_run_operator(add_op, nullptr /* thread pool */);
65+
if (status != xnn_status_success) {
66+
state.SkipWithError("failed to run Q8 Add operator");
67+
return;
68+
}
69+
}
70+
71+
status = xnn_delete_operator(add_op);
72+
if (status != xnn_status_success) {
73+
state.SkipWithError("failed to delete Q8 Add operator");
74+
return;
75+
}
76+
77+
const size_t elements_per_iteration = batch_size * channels;
78+
state.counters["elements"] =
79+
benchmark::Counter(uint64_t(state.iterations()) * elements_per_iteration, benchmark::Counter::kIsRate);
80+
81+
const size_t bytes_per_iteration = 3 * elements_per_iteration * sizeof(uint8_t);
82+
state.counters["bytes"] =
83+
benchmark::Counter(uint64_t(state.iterations()) * bytes_per_iteration, benchmark::Counter::kIsRate);
84+
}
85+
86+
static void add_nc_q8_inplace(benchmark::State& state) {
87+
const size_t batch_size = static_cast<size_t>(state.range(0));
88+
const size_t channels = static_cast<size_t>(state.range(1));
89+
90+
std::random_device random_device;
91+
auto rng = std::mt19937(random_device());
92+
auto u8rng = std::bind(std::uniform_int_distribution<uint8_t>(), rng);
93+
94+
std::vector<uint8_t> a(batch_size * channels);
95+
std::vector<uint8_t> y(batch_size * channels);
96+
std::generate(a.begin(), a.end(), std::ref(u8rng));
97+
98+
xnn_status status = xnn_initialize();
99+
if (status != xnn_status_success) {
100+
state.SkipWithError("failed to initialize XNNPACK");
101+
return;
102+
}
103+
104+
xnn_operator_t add_op = nullptr;
105+
status = xnn_create_add_nc_q8(
106+
channels, channels /* a_stride */, channels /* b_stride */, channels /* sum_stride */,
107+
127 /* a:zero point */, 1.0f /* a:scale */,
108+
127 /* b:zero point */, 1.0f /* b:scale */,
109+
127 /* y:zero point */, 1.0f /* y:scale */,
110+
1 /* y:min */, 254 /* y:max */,
111+
0 /* flags */, &add_op);
112+
if (status != xnn_status_success || add_op == nullptr) {
113+
state.SkipWithError("failed to create Q8 Add operator");
114+
return;
115+
}
116+
117+
status = xnn_setup_add_nc_q8(
118+
add_op,
119+
batch_size,
120+
a.data(), y.data(), y.data(),
121+
nullptr /* thread pool */);
122+
if (status != xnn_status_success) {
123+
state.SkipWithError("failed to setup Q8 Add operator");
124+
return;
125+
}
126+
127+
for (auto _ : state) {
128+
status = xnn_run_operator(add_op, nullptr /* thread pool */);
129+
if (status != xnn_status_success) {
130+
state.SkipWithError("failed to run Q8 Add operator");
131+
return;
132+
}
133+
}
134+
135+
status = xnn_delete_operator(add_op);
136+
if (status != xnn_status_success) {
137+
state.SkipWithError("failed to delete Q8 Add operator");
138+
return;
139+
}
140+
141+
const size_t elements_per_iteration = batch_size * channels;
142+
state.counters["elements"] =
143+
benchmark::Counter(uint64_t(state.iterations()) * elements_per_iteration, benchmark::Counter::kIsRate);
144+
145+
const size_t bytes_per_iteration = 3 * elements_per_iteration * sizeof(uint8_t);
146+
state.counters["bytes"] =
147+
benchmark::Counter(uint64_t(state.iterations()) * bytes_per_iteration, benchmark::Counter::kIsRate);
148+
}
149+
150+
static void CharacteristicArguments(benchmark::internal::Benchmark* b)
151+
{
152+
b->ArgNames({"N", "C"});
153+
154+
int32_t c = 16;
155+
for (int32_t n = 224; n >= 7; n /= 2) {
156+
b->Args({n * n, c});
157+
c *= 2;
158+
}
159+
}
160+
161+
BENCHMARK(add_nc_q8)->Apply(CharacteristicArguments)->UseRealTime();
162+
BENCHMARK(add_nc_q8_inplace)->Apply(CharacteristicArguments)->UseRealTime();
163+
164+
#ifndef XNNPACK_BENCHMARK_NO_MAIN
165+
BENCHMARK_MAIN();
166+
#endif

0 commit comments

Comments
 (0)