Skip to content

Commit 4a545ee

Browse files
committed
update KeyFeatures
1 parent dfef7f3 commit 4a545ee

File tree

1 file changed

+22
-13
lines changed

1 file changed

+22
-13
lines changed

docs/Architecture/KeyFeatures.md

Lines changed: 22 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -30,23 +30,25 @@ layout: doc_page
3030
configurable by trading off sketch size with accuracy.
3131
* Designed for <a href="{{site.docs_dir}}/Architecture/LargeScale.html">Large-scale</a> computing environments
3232
that must handle <b>Big Data</b>, e.g.:
33-
* [Hadoop](https://hadoop.apache.org/)
34-
* [Pig](https://pig.apache.org)
35-
* [Hive](https://hive.apache.org)
33+
* [Google/BigQuery](https://cloud.google.com/blog/products/data-analytics/bigquery-supports-apache-datasketches-for-approximate-analytics)
3634
* [Druid](https://druid.apache.org)
37-
* [Spark](https://spark.apache.org)
38-
* <b>Maven deployable</b> and registered with the [Central Repository](https://search.maven.org/#search|ga|1|DataSketches).
35+
* [Spark](https://github.com/apache/datasketches-spark)
36+
* [PostgreSQL](https://github.com/apache/datasketches-postgresql)
37+
* [Hadoop/Hive](https://github.com/apache/datasketches-hive)
38+
* [Pig](https://github.com/apache/datasketches-pig)
39+
40+
* The Java-based sketches are registered with the <b>Maven Central Repository</b>. For example: [DataSketches-Java](https://search.maven.org/search?q=datasketches-java).
3941
* Extensive documentation with the systems developer in mind.
4042
* Designed for production environments:
41-
* Available in multiple languages: Java, C++, [Python](https://github.com/apache/datasketches-python)
42-
* Binary compatible across systems and languages
43+
* Available in multiple languages: [Java](https://github.com/apache/datasketches-java), [C++](https://github.com/apache/datasketches-cpp), [Python](https://github.com/apache/datasketches-python), and [Go](https://github.com/apache/datasketches-go).
44+
* Binary compatible across systems and languages. For example, a sketch can be built and loaded in a C++ platform, then serialized and transported to a Java platform where it can be merged with other sketches and queried.
4345

4446
### Built-In, General Purpose Functions
4547

4648
* General purpose [Memory Component]({{site.docs_dir}}/Memory/MemoryComponent.html) for managing data off the Java Heap.
4749
This enables systems designers the ability to manage their own large data heaps with
4850
dedicated processor threads that would otherwise put undue pressure on the Java heap and
49-
its garbage collection.
51+
its garbage collection. Starting with Java Version 9.0.0, this functionality is now native to the Java 25 language.
5052
* General purpose implementaion of Austin Appleby's 128-bit MurmurHash3 algorithm,
5153
with a number of useful extensions.
5254

@@ -58,8 +60,7 @@ its garbage collection.
5860
* Reproducible Characterization Studies
5961
* All our published speed and accuracy performance results can be reproduced using the code included in the
6062
[Characterization](https://github.com/apache/datasketches-characterization) repository.
61-
* Comprehensive Javadocs that satisfy
62-
[JDK8 Javadoc](https://docs.oracle.com/javase/8/docs/technotes/guides/javadoc/index.html) standards.
63+
* Comprehensive Javadocs.
6364

6465
### Opportunities to Extend
6566

@@ -86,15 +87,23 @@ its garbage collection.
8687

8788
### Quantiles
8889

89-
* [Quantiles Sketch Overview]({{site.docs_dir}}/Quantiles/QuantilesSketchOverview.html). Get normal or inverse PDFs or CDFs of the distributions of any numeric value from your raw data in a single pass with well defined error bounds on the results.
90-
91-
### Frequent Items
90+
#### [Four families of Quantile algorithms]({{site.docs_dir}}/QuantilesAll/QuantilesOverview.html)
91+
Get normal or inverse PDFs or CDFs of the distributions of any numeric value from your raw data in a single pass with well defined error bounds on the results.
92+
93+
### Frequency
9294

9395
* [Frequent Items Sketches]({{site.docs_dir}}/Frequency/FrequencySketchesOverview.html) Get the most frequent items from a stream of items.
96+
* [CountMin sketch of Cormode and Muthukrishnan](https://github.com/apache/datasketches-java/blob/main/src/main/java/org/apache/datasketches/count/CountMinSketch.java)
97+
* [Frequent Distinct Tuples](https://github.com/apache/datasketches-java/blob/main/src/main/java/org/apache/datasketches/fdt/FdtSketch.java)
9498

9599
### Sampling
96100

97101
* [Reservoir Sampling]({{site.docs_dir}}/Sampling/ReservoirSampling.html) Knuth's well known Reservoir sampling "Algorithm R", but extended to enable merging across different sized reservoirs.
98102
* [Weighted Sampling]({{site.docs_dir}}/Sampling/VarOptSampling.html) Edith Cohen's famous sampling algorithm that enables computing subset sums of weighted samples with optimum variance.
103+
* [Exact and Bounded Sampling Proportional to Size](https://github.com/apache/datasketches-java/blob/main/src/main/java/org/apache/datasketches/sampling/EbppsItemsSketch.java)
104+
105+
### Filters and Set Membership
106+
107+
* [Bloom Filter](https://github.com/apache/datasketches-java/blob/main/src/main/java/org/apache/datasketches/filters/bloomfilter/BloomFilter.java)
99108

100109

0 commit comments

Comments
 (0)