-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathuser_guide.html
1762 lines (1685 loc) · 114 KB
/
user_guide.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<html>
<head>
<title>In-Memory Columnar Store (IMCS)</title>
<h1>In-Memory Columnar Store (IMCS)</h1>
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#overview">Overview</a></li>
<li><a href="#functions">Functions</a></li>
<ul>
<li><a href="#gen">General columnar store functions</a></li>
<li><a href="#ddl">Generated data manipulation functions</a></li>
<li><a href="#single">Generated data access functions for single timeseries</a></li>
<li><a href="#multiple">Generated data access functions for multiple timeseries (identified by timeseries ID)</a></li>
<li><a href="#cons">Timeseries constructors</a></li>
<li><a href="#bin">Binary operations</a></li>
<li><a href="#unary">Unary operations</a></li>
<li><a href="#math">Mathematical functions</a></li>
<li><a href="#datetime">Date/time functions</a></li>
<li><a href="#scalar">Binary scalar functions</a></li>
<li><a href="#transform">Timeseries transformation functions</a></li>
<li><a href="#grand">Grand aggregates</a></li>
<li><a href="#groupby">Group-by aggregates</a></li>
<li><a href="#wingroupby">Group-by windows aggregates</a></li>
<li><a href="#grid">Grid aggregates</a></li>
<li><a href="#window">Window (moving) aggregates</a></li>
<li><a href="#hash">Hash aggregates (group-by using hash function)</a></li>
<li><a href="#cum">Cumulative aggregates</a></li>
<li><a href="#sort">Sort functions</a></li>
<li><a href="#spec">Special functions</a></li>
</ul>
<li><a href="#operators">Operators</a></li>
<li><a href="#projection">Projection issues</a></li>
<li><a href="#implementation">Implementation details</a></li>
<li><a href="#disk">Scaling beyond physical memory</a></li>
<li><a href="#installation">Installation and tuning</a></li>
<li><a href="#performance">Performance comparison</a></li>
<li><a href="#license">License</a></li>
</ul>
</head>
<body>
<h2><a name="introduction">Introduction</a></h2>
<p>
Columnar store or vertical representation of data allows to achieve better performance in comparison with classical horizontal representation due to three factors:
<ol>
<li>Reducing size of fetched data: only columns involved in query are accessed.</li>
<li>Vector operations. Applying an operator to set of values (tile) makes it possible to minimize interpretation cost.
Also SIMD instructions of modern processors accelerate execution of vector operations.</li>
<li>Compression of data. Certainly compression can also be used for all the records, but independent compression of each column can give much better results without significant extra CPU overhead. For example such simple compression algorithm like RLE
(run-length-encoding) allows not only to reduce used space, but also minimize number of performed operations.
</ol>
There are several database systems based on vertical data model: Vertica, SciDB,... There are also extensions to existed DBMSes, such as
"Oracle In-Memory Option". This plug-in tries to provide such functionality for PostgreSQL.
</p>
<h2><a name="overview">Overview</a></h2>
<p>
As it is clear from the abbreviation (IMCS: In-Memory Columnar Store) this plugin adds to PostgreSQL in-memory columnar store.
So vertical representation of data is complementary to standard horizontal representation.
Data is imported in PostgreSQL database in usual way and is stored in normal table.
Then columns from this table are fetched and stored in shared memory. IMCS provides a lot of timeseries functions which can be used for data
analysis. Operations with timeseries are performed in vector mode allowing to reach maximal possible speed of such operations.
Also IMCS makes it possible to parallelize execution of some queries (for example calculation of aggregates) and utilize all CPU cores. All this three factors: in-memory location of data, vector operations, parallel query execution, makes it possible
to increase speed of some queries more than 100 times comparing with standard PostgreSQL queries.
</p><p>
To make access to timeseries as convenient as possible, IMCS provides generator of access functions. You should specify name of source table or view (from which data will be imported), name of timestamp field (this is a main key by which timeseries elements are accessed) and
optionally timeseries identifier. The last one needs some explanations. In some cases all data from the table should be placed in a single timeseries. For example assume that we collect data about phone calls (date, duration, caller, callee,...). It will be a single timeseries. But for example in trading systems there are separate data (ticks) associated with each symbol. So we have separate timeseries for ABB, GOOG, IBM, YHOO,... In this case securities identifier (symbol) can be considered as identifier of timeseries.
</p><p>
IMCS supports the following element type for timeseries: <code>"char", int2, int4, date, int8, time, timestamp, money, float4, float8, bpchar</code>.
All timeseries elements should have the same size, so only fixed size character types are supported: for example <code>char(10)</code>, but not <code>varchar</code>. But it is possible map varying size strings into integer identifiers using IMCS dictionary.
It will greatly reduce space used by columnar store and reduce queries execution time (manipulations with integers are more efficient than with strings).
Certainly this approach works only if cardinality of such column is not so large: dictionary should fit in memory.
Size of dictionary can be specified using <code>"imcs.dictionary.size"</code> parameter. Default value is 64kb.
If size of dictionary is less or equal than 64kb, then IMCS uses two bytes integer to store string identifier. If it is larger than 64kb, then
four bytes identifier is used. Please notice that the same dictionary is used for all table and columns. So dictionary size should be greater or equal than total size of cardinalities of all unlimited varchar columns loaded in columnar store.
IMCS is able to automatically converts strings to identifiers and visa verse in output/input functions. But you can also explicitly translate identifier to string using <code>cs_code2str</code> function.
</p><p>
Also IMCS is not able to represent NULL values. It is not enforced that fields of the source table were declared as <code>NOT NULL</code>, but attempt to insert NULL value in timeseries will cause error (or optionally NULL can be substituted with zero). Please use default values instead of NULLs.
</p><p>
Given all this information IMCS generates corresponding types and functions for loading/appending/accessing this timeseries.
Assume that we have table <code>Quote</code>. After calling <code>cs_create('Quote', 'Day', 'Symbol')</code>
we will get <code>Quote_load()</code> function for loading data from table in memory,
<code>Quote_get(symbol char(10), low date, high date)</code> function for fetching/slicing corresponding timeseries and triggers which will keep track updates in <code>Quote</code> table and propagate this changes to timeseries.
</p><p>
There are two ways of synchronizing original table and timeseries:
<ol>
<li>Automatic: using triggers. In this case all inserts/deletes in original table are immediately reflected in timeseries.</li>
<li>Manual: using explicit invocation of load/append/delete methods.</li>
</ol>
Execution of <code>load()</code> is significantly more efficient than propagation of updates using triggers. Mostly because of slowness of PL/pgSQL.
Also please notice, that been stored in shared memory, timeseries have to be reloaded after restart of
the server. Unfortunately PostgreSQL doesn't support database level triggers (like <code>after startup on database</code> in Oracle).
IMCS provides two alternatives: use <i>autoload</i> mode or manually load data. In case of using <i>autoload</i> mode, data will be automatically loaded from table to columnar store on demand when it is first accessed by any query. Please notice that for large tables loading data can take substantial amount of time and so increase execution time of the query initiated this load (it can confuse an user which expects this query to complete very fast).
Fortunately database servers are not restarted frequently...
</p><p>
When data is loaded from the table, records are sorted by timestamp and inserted in ascending order.
You can append data to existed timeseries, but timestamps of inserted elements should be greater than already loaded.
When timeseries is populated using insert trigger it is necessary to enforce that the data is inserted in the table in timestamp ascending order. Otherwise <i>out-of-order</i> error will be reported while inserting element in timeseries.
</p><p>
<code><b>TABLE</b>_get</code> functions returns row of type <code><b>TABLE</b>_timeseries</code> (this type is also generated by IMCS) which has the same columns as original table, but type of this columns is <code>timeseries</code>. So it is possible to refer to this timeseries as to any other columns and apply timeseries functions to them. For example query:
<pre>
select cs_max(Close) from Quote_get('IBM');
</pre>
returns maximal close price for IBM.
IMCS provide standard operators for timeseries type, allowing to write queries with more complex expressions in standard way:
<pre>
select cs_avg(High - (Open + Close)/2) from Quote_get('IBM');
</pre>
Result of the query above is scalar value (because of used grand aggregate). But most of timeseries functions take timeseries as input and return also timeseries. For example result of the query below is timeseries:
<pre>
select cs_filter(Open < Close, Day) from Quote_get('IBM');
</pre>
When you print result of execution of this query at the screen (for example by running this query in psql), it will be represented as large string literal in braces: 'date:{01/01/2010, 01/02/2010,...}'
Certainly it is not convenient for really large timeseries and may even cause memory exhaustion.
Alternatively it is possible to change vertical representation back to horizontal representation using <code><b>TABLE</b>_project</code> or <code>cs_project</code> functions. Then produced tuples can be accessed in normal way using all SQL stuff.
For example it is possible to sort them or perform more grouping/filtering.
</p>
<h2><a name="functions">Functions</a></h2>
<h3><a name="gen">General columnar store functions</a></h3>
<p>
General columnar store functions are used to generate table-specific API functions, get information about columnar store and perform cleanup.
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_create(table_name text, timestamp_id text, timeseries_id text default null, autoupdate bool default false) returns void</code></td>
<td>This function is used to generate all API functions, types and triggers for the specified table or view <code>table_name</code>.
These can be latter removed using <code><b>table_name</b>_drop</code> function. <code>timestamp_id</code> is name of timestamp field by which timeseries elements are sorted in ascending order, allowing to efficiently extract time slices. <code>timeseries_id</code> is optional field identifying timeseries. For example for quotes it can be a symbol name. If this field is specified, then separate timeseries will be maintained for each symbol. If <code>autoupdate</code> parameter is true, then IMCS will create triggers which automatically update timeseries when new data is added/deleted to/from the source table. Alternatively it is possible to explicitly load/append/delete data to timeseries. Please notice that explicit bulk update/delete is significantly more efficient than row-level updated performed by trigger. If columnar store interface for a table was generated with <code>autoupdate=false</code>, then triggers are still generated but are disabled. You can enable them later using <code>alter table <b>TABLE</b> enable trigger user</code> command. As far as views cannot have row-level BEFORE or AFTER triggers in PostgreSQL, IMCS doesn't generate them if <code>table_name</code> is a view.</td>
</tr>
<tr>
<td><code>function cs_delete_all() returns bigint</code></td>
<td>Deletes all timeseries in columnar store. This function can be used for most efficient cleanup of columnar store.
Please notice that PostgreSQL doesn't allow to free shared memory, so it still be in use. But it can be reused in subsequent
allocation requests of columnar store. This function returns total number of removed elements (in all timeseries)</td>
</tr>
<tr>
<td><code>function cs_used_memory() returns bigint</code></td>
<td>Returns amount of memory used by columnar store.</td>
</tr>
<tr>
<td><code>function cs_profile(reset bool default false) returns setof cs_profile_item</code></td>
<td>Returns number of calls of each IMCS command. If <code>parameter</code> is true, then all counters
are reset after execution of this call.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_timestamp() returns varchar</code></td><td>Returns name of timeseries timestamp column for this table</td>
</tr>
<tr>
<td><code>function cs_str2code(str varchar) returns integer</code></td>
<td>Returns code of string in IMCS dictionary or -1 if there is not such string. This function may be used to find some particular values in varying string timeseries.</td>
</tr>
<tr>
<td><code>function cs_code2str(code integer) returns varchar</code></td>
<td>Returns string value for specified IMCS dictionary code.</td>
</tr>
<tr>
<td><code>function cs_code2str(str bytea, column_no integer) returns varchar</code></td>
<td>Extracts identifier from compound (concatenated) key and returns correspondent name from dictionary. Column number is 1-based.</td>
</tr>
</table>
</p>
<h3><a name="ddl">Generated data manipulation functions</a></h3>
<p>
Generated functions for loading/storing/deleting timeseries.
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function <b>TABLE</b>_drop() returns void</code></td>
<td>Deletes all generated functions and types for table <b>TABLE</b>.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_load(already_sorted bool default false, filter text default null) returns bigint</code></td>
<td>Populates timeseries with data from PostgreSQL table. If <code>already_sorted</code> parameter is true, then it is assumed that
records in the table are stored in proper (timestamp ascending) order. Otherwise IMCS will add "order by" clause
to select statement. Please notice that PostgreSQL vacuuming can change original order of the records. So disable vacuuming for the
table if you want to preserve insert order. Optional <code>filter</code> parameter allows to specify additional selection criteria for table records. It allows to include in timeseries only some subset of the table.
Particularly it can be used to append existed timeseries with new data.
This function returns number of inserted timeseries elements. If <code>filter</code> is not specified then this function loads data from the table only if timeseries are not yet initialized. If <code>filter</code> is not null, then this functions always tries to load data, assuming that programmer has specified proper filter condition allowing to avoid duplicates and preserve proper timeseries order. If <code>filter</code> is null and timeseries are already initialized, then this function does nothing and immediately returns zero. Example of loading quotes past '12.02.2021': <code>select Quote_load(filter='date>=''12.02.2021''');</code></td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_is_loaded() returns bool</code></td>
<td>Checks if data was already loaded to columnar store. If you just need to ensure that data is loaded,
there is no need to call this function: you can always call <b>TABLE</b>_load, it will perform this check
itself and do nothing if data was already loaded. But if behavior of your application depends on state of
columnar store, then this function may be useful.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_append(start_from <b>TIMESTAMP_TYPE</b>) returns bigint</code></td>
<td>Appends to timeseries records from the source table starting from <code>start_from</code> timestamp (inclusive).
Use this function if on-update trigger is disabled (autoupdate=false in parameters of <code>cs_create</code>).
Please also notice that this function is implemented in PL/pgSQL and so it is significantly slower than <code><b>TABLE</b>_load</code> with the same filter condition.
This function returns number of added timeseries elements.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_truncate() returns void</code></td>
<td>Truncates all timeseries for this table. This is most efficient way to delete vertical representation for the specific table.
If you need to delete all data in columnar store, better use <code>cs_delete_all()</code> function.
</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_project(input <b>TABLE</b>_timeseries input, positions timeseries default null,disable_caching bool default false) returns setof <b>TABLE</b></td>
<td>Makes horizontal projection of timeseries. Optional <code>positions</code> parameter specifies positions of selected timeseries elements.
If parameter <code>positions</code> is omitted, then all timeseries elements are transformed to horizontal representation.
So this function is opposite to <code><b>TABLE</b>_get()</code>: <code>get</code> transforms horizontal representation to vertical and <code>project</code> does backward transformation. It is possible to use this function only if number of columns returned by <b>TABLE</b>_get()</code>
and element types of corresponding timeseries are not changed. For example it is possible to run query like this:
<pre>
select (Quote_project(abb.*,cs_top_max_pos(Close, 10))).*
from Quote_get('ABB',date('01-Jan-2010'),date('31-Mar-2010'))abb;
select (Quote_project(abb.*)).*
from (select Symbol,Day,cs_maxof(Open,Close),
High,Low,cs_minof(Open,Close),Volume
from Quote_get('ABB')) abb;
</pre>
but not
<pre>
select (Quote_project(abb.*)).*
from (select Symbol,cs_maxof(Open,Close)
from Quote_get('ABB')) abb;
</pre>
In the last case it is possible to use <code>cs_project()</code> function:
<pre>
select cs_project(abb.*)
from (select Symbol,cs_maxof(Open,Close)
from Quote_get('ABB')) abb;
</pre>
Please notice that we can not use <code>().*</code> clause in this case because <code>cs_project</code> returns anonymous row.
But in PostgreSQL 9.3 we can use <i>lateral join</i>:
<pre>
select p.*
from (select Symbol,cs_maxof(Open,Close)
from Quote_get('ABB')) abb,
cs_project(abb.*) p(symbol char(10), max real);
</pre>
Please find more information about projection of timeseries, problems with <code>(...).*</code> construction in PostgreSQL and
purpose of <code>disable_caching</code> parameter in section <a href="#projection">Projection issues</a>.
</td>
</tr>
</table>
</p>
<h3><a name="single">Generated data access functions for single timeseries</a></h3>
<p>
Functions generated for accessing single timeseries (timeseries having no identifier).
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function <b>TABLE</b>_first() returns <b>TIMESTAMP_TYPE</b></code></td>
<td>Returns oldest timestamp.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_last() returns <b>TIMESTAMP_TYPE</b></code></td>
<td>Returns most recent timestamp.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_count() returns bigint</code></td>
<td>Returns number of elements in timeseries.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_get(low <b>TIMESTAMP_TYPE</b> default null, high <b>TIMESTAMP_TYPE</b> default null, limit_ts bigint default null) returns <b>TABLE</b>_timeseries</code></td>
<td>Returns vertical representation of the whole table or its time slice. Returned record contains the same columns as record of the original table, but they have <code>timeseries</code> type instead of original scalar types. These columns can be used in timeseries functions (cs_*). If <code>high</code> or <code>low</code> parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are omitted, then corresponding boundary is open. It is possible to limit number of selected elements by specifying <code>limit_ts</code> parameter (if low boundary is open then last <code>limit_ts</code> elements will be selected, otherwise first <code>limit_ts</code> elements will be selected).</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_span(from_pos bigint default 0, till_pos bigint default 9223372036854775807) returns <b>TABLE</b>_timeseries</code></td>
<td>Returns vertical representation of the whole table or its horizontal slice. Returned record contains the same columns as record of the original table, but they have <code>timeseries</code> type instead of original scalar types. These columns can be used in timeseries functions (cs_*).
Parameter <code>from_pos</code> specifies start position in timeseries (inclusive) and parameter <code>till_pos</code> specifies end position (inclusive). If <code>till_pos</code> parameter is missed, then subsequence spans till end of timeseries. Values of both <code>from_pos</code> and <code>till_pos</code> parameters can be negative. In this case position is calculated from end of timeseries, i.e. <code><b>TABLE</b>_span(from_pos:=-1)</code> extracts last element of the timeseries.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_delete(low <b>TIMESTAMP_TYPE</b>, high <b>TIMESTAMP_TYPE</b>) returns bigint</code></td>
<td>Deletes timeseries elements belonging to the specified interval. If <code>high</code> or <code>low</code> parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are nulls, then corresponding boundary is open. This function returns number of deleted elements.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_delete(till <b>TIMESTAMP_TYPE</b> default null) returns bigint</code></td>
<td>Deletes timeseries elements from the beginning till specified timestamp <code>till</code> (inclusive) or delete all elements if this parameter is null/omitted.
This function is equivalent to <code><b>TABLE</b>_delete(null, till)</code>. IMCS provides separate function for it because it is intended to be the
most frequent case of deleting elements from timeseries: it corresponds to shifting data window when new elements are appended and
deteriorated are thrown away. This function returns number of deleted elements.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_join(other timeseries, direction integer default 1) returns timeseries</code></td>
<td>Joins timestamp with <code>other</code> unsorted timeseries. It returns positions of elements in this timeseries which timestamp matches correspondent element of joined timeseries. Semantic of matching depends on value of the <code>direction</code> parameter:
<ul>
<li>If <code>direction</code> is less than zero, then this timestamp should be less or equal than other timestamp (locate timeseries element preceding timestamp).</li>
<li>If <code>direction</code> is zero, then this timestamp should be equal to other timestamp (exact match of timestamps). </li>
<li>If <code>direction</code> is greater than zero, then this timestamp should be greater or equal than other timestamp (locate timeseries element succeeding timestamp).</li>
</ul>
</td>
</tr>
</table>
</p>
<h3><a name="multiple">Generated data access functions for multiple timeseries (identified by timeseries ID)</a></h3>
<p>
Functions generated for accessing multiple timeseries (source table contains identifier of timeseries, for example 'Symbol').
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function <b>TABLE</b>_first(id <b>TIMESERIES_ID_TYPE</b>) returns <b>TIMESTAMP_TYPE</b></code></td>
<td>Returns oldest timestamp.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_last(id <b>TIMESERIES_ID_TYPE</b>) returns <b>TIMESTAMP_TYPE</b></code></td>
<td>Returns most recent timestamp.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_count(id <b>TIMESERIES_ID_TYPE</b>) returns bigint</code></td>
<td>Returns number of elements in timeseries.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_get(id <b>TIMESERIES_ID_TYPE</b>, <b>TIMESTAMP_TYPE</b> low default null, <b>TIMESTAMP_TYPE</b> high default null, limit_ts bigint default null) returns <b>TABLE</b>_timeseries</code></td>
<td>Returns timeseries with specified identifier for the corresponding table or its time slice. Returned record contains the same columns as record of original table, but they have <code>timeseries</code> type instead of original scalar types. These columns can be used in timeseries functions (cs_*). If <code>high</code> or <code>low</code> parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are omitted, then corresponding boundary is open. It is possible to limit number of selected elements by specifying <code>limit_ts</code> parameter (if low boundary is open then last <code>limit_ts</code> elements will be selected, otherwise first <code>limit_ts</code> elements will be selected).</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_get(id <b>TIMESERIES_ID_TYPE</b>[], <b>TIMESTAMP_TYPE</b> low default null, <b>TIMESTAMP_TYPE</b> high default null, limit_ts bigint default null) returns setof <b>TABLE</b>_timeseries</code></td>
<td>Does the same as function described above but for array of timeseries identifiers. For each timeseries identifier this function returns <code><b>TABLE</b>_timeseries</code> record, so output will contain as much rows as there are identifiers.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_span(id <b>TIMESERIES_ID_TYPE</b>, from_pos bigint default 0, till_pos bigint default 9223372036854775807) returns <b>TABLE</b>_timeseries</code></td>
<td>Returns timeseries with specified identifier for the corresponding table or its horizontal slice. Returned record contains the same columns as record of the original table, but they have <code>timeseries</code> type instead of original scalar types. These columns can be used in timeseries functions (cs_*).
Parameter <code>from_pos</code> specifies start position in timeseries (inclusive) and parameter <code>till_pos</code> specifies end position (inclusive). If <code>till_pos</code> parameter is missed, then subsequence spans till end of timeseries. Values of both <code>from_pos</code> and <code>till_pos</code> parameters can be negative. In this case position is calculated from end of timeseries, i.e. <code><b>TABLE</b>_span(id,from_pos:=-1)</code> extracts last element of the timeseries.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_span(id <b>TIMESERIES_ID_TYPE</b>[], from_pos bigint default 0, till_pos bigint default 9223372036854775807) returns setof <b>TABLE</b>_timeseries</code></td>
<td>Does the same as function described above but for array of timeseries identifiers. For each timeseries identifier this function returns <code><b>TABLE</b>_timeseries</code> record, so output will contain as much rows as there are identifiers.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_concat(id <b>TIMESERIES_ID_TYPE</b>[], <b>TIMESTAMP_TYPE</b> low default null, <b>TIMESTAMP_TYPE</b> high default null) returns <b>TABLE</b>_timeseries</code></td>
<td>Concatenates slices of timeseries for the specified identifiers. Returned record contains the same columns as record of original table, but they have <code>timeseries</code> type instead of original scalar types. Each such timeseries is concatenation of slices of timeseries for all specified identifiers. These columns can be used in timeseries functions (cs_*). If <code>high</code> or <code>low</code> parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are omitted, then corresponding boundary is open.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_delete(id <b>TIMESERIES_ID_TYPE</b>, low <b>TIMESTAMP_TYPE</b>, high <b>TIMESTAMP_TYPE</b>) returns bigint</code></td>
<td>Deletes timeseries elements belonging to the specified interval. If <code>high</code> or <code>low</code> parameters are not null, then them specify correspondingly upper/lower inclusive boundary for timestamp value. If some or both parameters are nulls, then corresponding boundary is open. This function returns number of deleted elements.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_delete(id <b>TIMESERIES_ID_TYPE</b>, till <b>TIMESTAMP_TYPE</b> default null) returns bigint</code></td>
<td>Deletes timeseries elements from the beginning till specified timestamp <code>till</code> (inclusive) or delete all elements if this parameter is null/omitted.
This function is equivalent to <code><b>TABLE</b>_delete(id, null, till)</code>. IMCS provides separate function for it because it is intended to be the
most frequent case of deleting elements from timeseries: it corresponds to shifting data window when new elements are appended and
deteriorated are thrown away. This function returns number of deleted elements.</td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_join(id <b>TIMESERIES_ID_TYPE</b>, other timeseries) returns timeseries</code></td>
<td>Joins timestamp with <code>other</code> unsorted timeseries. It returns positions of elements in this timeseries which timestamp matches correspondent element of joined timeseries. Semantic of matching depends on value of the <code>direction</code> parameter:
<ul>
<li>If <code>direction</code> is less than zero, then this timestamp should be less or equal than other timestamp (locate timeseries element preceding timestamp).</li>
<li>If <code>direction</code> is zero, then this timestamp should be equal to other timestamp (exact match of timestamps). </li>
<li>If <code>direction</code> is greater than zero, then this timestamp should be greater or equal than other timestamp (locate timeseries element succeeding timestamp).</li>
</ul></td>
</tr>
<tr>
<td><code>function <b>TABLE</b>_id() returns varchar</code></td><td>Returns name of timeseries identifier for this table</td>
</tr>
</table>
</p>
<h3><a name="cons">Timeseries constructors</a></h3>
<p>
Functions constructing constant timeseries (timeseries of repeated value) or timseries created by parsing string literal.
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_parse(str text, elem_type cs_elem_type, elem_size integer default 0) returns timeseries<code></td>
<td>Creates timeseries from string, for example '{1,2,3,4,5}'. Type of timeseries is specified by <code>elem_type</code> parameter. For timeseries of characters it is also necessary to specify size of timeseries element - <code>elem_size</code>.
Please notice that PostgreSQL allows implicit cast from string to the target type using this type input function, but in this case information about timeseries element type and size should be encoded in the string: <code>'int4:{1,2,3,4,5}'</code>.</td>
</tr>
<tr>
<td><code>function cs_const(val float8, elem_type cs_elem_type default 'float8') returns timeseries<code></td>
<td>Creates timeseries of numeric (integer or floating point) elements. Type of timeseries is specified by <code>elem_type</code> parameter.
Should be one of: <code>'char', 'int2', 'int4', 'int8', 'float4', 'float8'</code>.</td>
</tr>
<tr>
<td><code>function cs_const(val timestamp, elem_type cs_elem_type) returns timeseries<code></td>
<td>Creates timeseries of date/time elements. Type of timeseries is specified by <code>elem_type</code> parameter.
Should be one of: <code>'date', 'time', 'timestamp'</code>.</td>
</tr>
<tr>
<td><code>function cs_const(val text, elem_size integer) returns timeseries<code></td>
<td>Creates timeseries of character type. Size of timeseries element is specified by <code>elem_size</code> parameter.
</tr>
<tr>
<td><code>function cs_const(val text) returns timeseries<code></td>
<td>Creates timeseries of character type. Size of timeseries element is equal to the length of <code>val</code>.
</tr>
</table>
</p>
<h3><a name="bin">Binary operations</a></h3>
<p>
Binary operations with timeseries. These functions take two timeseries arguments and return result timeseries.
IMCS tries to automatically adjust types of input arguments (for example if one timeseries has "int8" element type and another - "float8", then first one will be converted to "float8").
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_add(timeseries,timeseries) returns timeseries</code></td>
<td>Adds elements of two timeseries</td>
</tr>
<tr>
<td><code>function cs_sub(timeseries,timeseries) returns timeseries</code></td>
<td>Subtracts elements of two timeseries</td>
</tr>
<tr>
<td><code>function cs_mul(timeseries,timeseries) returns timeseries</code></td>
<td>Multiplies elements of two timeseries</td>
</tr>
<tr>
<td><code>function cs_div(timeseries,timeseries) returns timeseries</code></td>
<td>Divides elements of two timeseries</td>
</tr>
<tr>
<td><code>function cs_pow(timeseries,timeseries) returns timeseries</code></td>
<td>Raises element of first timeseries to power specified by element of second timeseries</td>
</tr>
<tr>
<td><code>function cs_and(timeseries,timeseries) returns timeseries</code></td>
<td>Bitwise AND of elements of two integer or boolean timeseries</td>
</tr>
<tr>
<td><code>function cs_or(timeseries,timeseries) returns timeseries</code></td>
<td>Bitwise OR of elements of two integer or boolean timeseries</td>
</tr>
<tr>
<td><code>function cs_xor(timeseries,timeseries) returns timeseries</code></td>
<td>Bitwise XOR of elements of two integer or boolean timeseries</td>
</tr>
<tr>
<td><code>function cs_eq(timeseries,timeseries) returns timeseries</code></td>
<td>Checks if element of first timeseries is equal to element of second timeseries</td>
</tr>
<tr>
<td><code>function cs_ne(timeseries,timeseries) returns timeseries</code></td>
<td>Checks if element of first timeseries is not equal to element of second timeseries</td>
</tr>
<tr>
<td><code>function cs_gt(timeseries,timeseries) returns timeseries</code></td>
<td>Checks if element of first timeseries is greater than element of second timeseries</td>
</tr>
<tr>
<td><code>function cs_ge(timeseries,timeseries) returns timeseries</code></td>
<td>Checks if element of first timeseries is greater or equal than element of second timeseries</td>
</tr>
<tr>
<td><code>function cs_lt(timeseries,timeseries) returns timeseries</code></td>
<td>Checks if element of first timeseries is less than element of second timeseries</td>
</tr>
<tr>
<td><code>function cs_le(timeseries,timeseries) returns timeseries</code></td>
<td>Checks if element of first timeseries is less or equal than element of second timeseries</td>
</tr>
<tr>
<td><code>function cs_maxof(timeseries,timeseries) returns timeseries</code></td>
<td>Maximum of two elements</td>
</tr>
<tr>
<td><code>function cs_minof(timeseries,timeseries) returns timeseries</code></td>
<td>Minimum of two elements</td>
</tr>
<tr>
<td><code>function cs_like(timeseries,pattern text) returns timeseries</code></td>
<td>Finds elements of character timeseries matching specified pattern (case sensitive). Rules of matching are the same as for PostgreSQL <code>LIKE</code> predicate.</td>
</tr>
<tr>
<td><code>function cs_ilike(timeseries,pattern text) returns timeseries</code></td>
<td>Finds elements of character timeseries matching specified pattern (ignore case). Rules of matching are the same as for PostgreSQL <code>ILIKE</code> predicate.</td>
</tr>
<tr>
<td><code>function cs_cat(timeseries,timeseries) returns timeseries</code></td>
<td>Concatenates elements of two timeseries. Input timeseries can have any element type, result is always timeseries of characters which element size
is equal to sum of element sizes of concatenated timeseries. For example <code>cs_cat('bpchar1:{a,b,c}', 'bpchar1:{x,y,z}') = 'bpchar2:{ax,by,cz}'</code>.
In case of concatenation of character strings which actual length is smaller than fixed element size, result will contains filler character ('\0').
So if element size of concatenated timeseries in the above example is 3, then result will be <code>E'{a\\000\\000x\\000\\000,b\\000\\000y\\000\000,c\\000\\000z\\000\000}'</code>. If you prefer to get <code>'{ax,by,cz}'</code>, then please use <code>cs_add</code> instead of <code>cs_cat</code>.
Function <code>cs_cat</code> is intended to be used for concatenation of group-by keys (character or numeric) for aggregation.
</td>
</tr>
<tr>
<td><code>function cs_concat(head timeseries,tail timeseries) returns timeseries</code></td>
<td>Concatenates two timeseries. Result of this function is timeseries containing elements both of <code>head</code> and <code>tail</code> timeseries.
For example <code>cs_concat('int4:{1,2,3}','int4:{4,5,6}') = '{int4:1,2,3,4,5,6}'</code>.
Parameters <code>head</code> or <code>tail</code> may be null. In this case <code>cs_concat</code> returns just not-null timeseries.
</td>
</tr>
</table>
</p>
<h3><a name="unary">Unary operations</a></h3>
<p>
Unary operations with timeseries. These functions take single timeseries and return timeseries as result.
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_neg(timeseries) returns timeseries</code></td>
<td>Negates timeseries elements</td>
</tr>
<tr>
<td><code>function cs_not(timeseries) returns timeseries</code></td>
<td>Logical NOT of boolean timeseries elements</td>
</tr>
<tr>
<td><code>function cs_bit_not(timeseries) returns timeseries</code></td>
<td>Bitwise NOT of integer timeseries elements</td>
</tr>
<tr>
<td><code>function cs_abs(timeseries) returns timeseries</code></td>
<td>Absolute value of timeseries element</td>
</tr>
<tr>
<td><code>function cs_norm(timeseries) returns timeseries</code></td>
<td>Normalizes timeseries elements (divides each element by square root of sum of all elements)</td>
</tr>
</table>
</p>
<h3><a name="math">Mathematical functions</a></h3>
<p>
Calculation of mathematical functions for all timeseries elements.
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_sin(timeseries) returns timeseries</code></td>
<td>Sine function</td>
</tr>
<tr>
<td><code>function cs_cos(timeseries) returns timeseries</code></td>
<td>Cosine function</td>
</tr>
<tr>
<td><code>function cs_tan(timeseries) returns timeseries</code></td>
<td>Tangent function</td>
</tr>
<tr>
<td><code>function cs_exp(timeseries) returns timeseries</code></td>
<td>Exponent function</td>
</tr>
<tr>
<td><code>function cs_asin(timeseries) returns timeseries</code></td>
<td>Arcsine function</td>
</tr>
<tr>
<td><code>function cs_acos(timeseries) returns timeseries</code></td>
<td>Arccosine function</td>
</tr>
<tr>
<td><code>function cs_atan(timeseries) returns timeseries</code></td>
<td>Arctangent function</td>
</tr>
<tr>
<td><code>function cs_sqrt(timeseries) returns timeseries</code></td>
<td>Square root function</td>
</tr>
<tr>
<td><code>function cs_log(timeseries) returns timeseries</code></td>
<td>Natural logarithm function</td>
</tr>
<tr>
<td><code>function cs_ceil(timeseries) returns timeseries</code></td>
<td>Rounds timeseries element to the smallest integer greater or equal than the element value</td>
</tr>
<tr>
<td><code>function cs_floor(timeseries) returns timeseries</code></td>
<td>Rounds timeseries element to the largest integer less or equal than the element value</td>
</tr>
<tr>
<td><code>function cs_isnan(timeseries) returns timeseries</code></td>
<td>Checks if floating point timeseries element is NaN</td>
</tr>
</table>
</p>
<h3><a name="datetime">Date/time functions</a></h3>
<p>
Extracts components of date/time type. These functions are mostly needed in group-by operations to calculate aggregates for various intervals (days, weeks, months, quarters, years...).
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_year(timeseries) returns timeseries</code></td>
<td>Extracts year from date/timestamp</td>
</tr>
<tr>
<td><code>function cs_month(timeseries) returns timeseries</code></td>
<td>Extracts month (1..12) from date/timestamp</td>
</tr>
<tr>
<td><code>function cs_mday(timeseries) returns timeseries</code></td>
<td>Extracts month day (1..31) from date/timestamp</td>
</tr>
<tr>
<td><code>function cs_wday(timeseries) returns timeseries</code></td>
<td>Extracts week day (0..6 starting from Sunday) from date/timestamp</td>
</tr>
<tr>
<td><code>function cs_week(timeseries) returns timeseries</code></td>
<td>Extracts week number since start of epoch from date/timestamp</td>
</tr>
<tr>
<td><code>function cs_quarter(timeseries) returns timeseries</code></td>
<td>Extracts quarter (1..4) from date/timestamp</td>
</tr>
<tr>
<td><code>function cs_hour(timeseries) returns timeseries</code></td>
<td>Extracts hour (0..23) from time/timestamp</td>
</tr>
<tr>
<td><code>function cs_minute(timeseries) returns timeseries</code></td>
<td>Extracts minute (0..59) from time/timestamp</td>
</tr>
<tr>
<td><code>function cs_second(timeseries) returns timeseries</code></td>
<td>Extracts second (0..59) from time/timestamp</td>
</tr>
</table>
</p>
<h3><a name="scalar">Binary scalar functions</a></h3>
<p>
Functions of this group take two timeseries arguments and calculate single scalar value as result.
IMCS tries to automatically adjust types of input arguments (for example if one timeseries has "int8" element type and another - "float8", then first one will be converted to "float8").
</p><p>
Execution of these functions can be parallelized.
<p></p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_wsum(timeseries,timeseries) returns float8</code></td>
<td>Weighted sum of timeseries elements</td>
</tr>
<tr>
<td><code>function cs_wavg(a timeseries,b timeseries) returns float8</code></td>
<td>Weighted average of timeseries elements: <code>sum(a*b)/sum(a)</code></td>
</tr>
<tr>
<td><code>function cs_corr(a timeseries,b timeseries) returns float8</code></td>
<td>Correlation of two timeseries</td>
</tr>
<tr>
<td><code>function cs_cov(a timeseries,b timeseries) returns float8</code></td>
<td>Covariation of two timeseries</td>
</tr>
</table>
</p>
<h3><a name="transform">Timeseries transformation functions</a></h3>
<p>
Functions performing various transformations of input timeseries.
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_cast(input timeseries, elem_type cs_elem_type, elem_size default 0) returns timeseries</code></td>
<td>Casts timeseries elements to the specified type defined in cs_elem_type enum: ('char', 'int2', 'int4', 'date', 'int8', 'time', 'timestamp', 'money', 'float4', 'float8', 'bpchar'). For character type it is necessary to specify element size. If converted value doesn't fit in specified size, it will be truncated. Explicit casts are rarely needed, in most cases IMCS performs implicit type conversion.</td>
</tr>
<tr>
<td><code>function cs_to_<b>TYPE</b>_array(timeseries) returns <b>TYPE</b>[]</code></td>
<td>Converts timeseries to array. <b>TYPE</b> should be one of <code>"char", int2, int4, date, int8, time, timestamp, money, float4, float8, bpchar</code> and should match element type of the converted timeseries. Please notice that array is constructed in memory and large timeseries can cause memory overflow.</td>
</tr>
<tr>
<td><code>function cs_from_array(anyarray, elem_size integer default 0) returns timeseries</code></td>
<td>Converts array to timeseries. This function creates timeseries iterator for the input array, allowing to apply to it any timeseries functions. Type of the result timeseries element is the same as type of the array element.
Optional <code>elem_size</code> parameter is needed only for text array, it should specify maximal size of array element.</td>
</tr>
<tr>
<td><code>function cs_thin(timeseries, origin integer, step integer) returns timeseries</code></td>
<td>Leaves only each <code>step</code>-th element of timeseries starting from <code>origin</code>.</td>
</tr>
<tr>
<td><code>function cs_limit(timeseries, from_pos bigint default 0, till_pos bigint default 9223372036854775807) returns timeseries</code></td>
<td>Extracts subsequence from timeseries. Parameter <code>from_pos</code> specifies start position of subsequence (inclusive) and parameter <code>till_pos</code> specifies end position (inclusive). If <code>till_pos</code> parameter is missed, then subsequence spans till end of timeseries. Values of both <code>from_pos</code> and <code>till_pos</code> parameters can be negative. In this case position is calculated from end of timeseries, i.e. <code>cs_limit(s, from_pos:=-1)</code> extracts last element of the timeseries.</td>
</tr>
<tr>
<td><code>function cs_head(timeseries, n bigint default 1) returns timeseries</code></td>
<td>Extracts <code>n</code> first elements of timeseries. This function is equivalent to <code>cs_limit(0, n-1)</code>.</td>
</tr>
<tr>
<td><code>function cs_tail(timeseries, n bigint default 1) returns timeseries</code></td>
<td>Extracts <code>n</code> last elements of timeseries. This function is equivalent to <code>cs_limit(-n)</code>.</td>
</tr>
<tr>
<td><code>function cs_cut_head(timeseries, n bigint default 1) returns timeseries</code></td>
<td>Extracts all except first <code>n</code> elements of timeseries. This function is equivalent to <code>cs_limit(n)</code>.</td>
</tr>
<tr>
<td><code>function cs_cut_tail(timeseries, n bigint default 1) returns timeseries</code></td>
<td>Extracts all except last <code>n</code> elements of timeseries. This function is equivalent to <code>cs_limit(0,-n-1)</code>.</td>
</tr>
<tr>
<td><code>function cs_call(input timeseries, func oid) returns timeseries</code></td>
<td>Calls specified function for all elements of <code>input</code> timeseries.
To specify function cast function name to <code>regproc</code> or function prototype (name and argument types) to <code>recprocedure</code>:
<pre>
select cs_call(Close,'sin'::regproc)
from Quote_get('IBM');
select cs_call(Close,'sin(float)'::regprocedure)
from Quote_get('IBM');
</pre>
Please notice that <code>sin</code> is taken here only as example. There is special <code>cs_sin</code> in IMCS API.
But you can specify here name of any function, including plpgsql function:
<pre>
create function mul2(x real) returns real as
$$ begin return x*2; end; $$
language plpgsql strict immutable;
select cs_call(Close, 'mult2'::regproc)
from Quote_get('IBM');
</pre></td>
</tr>
<tr>
<td><code>function cs_union(left timeseries, right timeseries)</code></td>
<td>Unions two sorted timeseries (usually timestamps). For example <code>cs_union('int8:{1,5,7,8}', 'int8:{2,3,5,6}') = 'int8:{1,2,3,5,5,6,7,8}'</td>
</tr>
<tr>
<td><code>function cs_iif(cond timeseries, then_ts timeseries, else_ts timeseries) returns timeseries</code></td>
<td>Chooses one of two alternatives: if element of <code>cond</code> boolean timeseries is true, then use element of <code>then_ts</code> timeseries, otherwise use element of <code>else_ts</code> timeseries. All timeseries are traversed with the same speed: if we take element from <code>then_ts</code> timeseries, then corresponding element of <code>else_ts</code> timeseries is skipped.
For example <code>cs_iif('char:{1,0,1}', 'float4:{1.0,2.0,3.0}', 'float4:{0.1,0.2,0.3}') = 'float4:{1.0,0.2,3.0}'</code></td>
</tr>
<tr>
<td><code>function cs_if(cond timeseries, then_ts timeseries, else_ts timeseries) returns timeseries</code></td>
<td>Conditional computation: if element of <code>cond</code> boolean timeseries is true, then take next element of <code>then_ts</code> timeseries, otherwise use element of <code>else_ts</code> timeseries. Unlike <code>cs_iff</code> then/else timeseries are accessed only on demand, so number of elements fetched from this timeseries depends on condition.
For example <code>cs_if('char:{1,0,1}', 'float4:{1.0,2.0,3.0}', 'float4:{0.1,0.2,0.3}') = 'float4:{1.0,0.1,2.0}'</code></td>
</tr>
<tr>
<td><code>function cs_filter(cond timeseries, input timeseries) returns timeseries</code></td>
<td>Leaves only those elements from timeseries <code>input</code> for which condition <code>cond</code> is true.
For example <code>cs_filter('char:{1,0,1}', 'float4:{1.0,2.0,3.0}') = 'float4:{1.0,3.0}'</code></td>
</tr>
<tr>
<td><code>function cs_filter_pos(cond timeseries) returns timeseries</code></td>
<td>Returns positions of timeseries elements for which condition <code>cond</code> is true.
For example <code>cs_filter_pos('char:{1,0,1}') = 'int8:{0,2}'</code></td>
</tr>
<tr>
<td><code>function cs_filter_first_pos(cond timeseries, n integer) returns timeseries</code></td>
<td>Finds first N positions of timeseries elements for which condition <code>cond</code> is true.
For example <code>cs_filter_first_pos('char:{1,0,1}', 1) = 'int8:{0}'</code>
Execution of this function can be parallelized.
</td>
</tr>
<tr>
<td><code>function cs_unique(timeseries) returns timeseries</code></td>
<td>Removes subsequent duplicate values. To eliminate all duplicates in timeseries it should be sorted prior applying <code>cs_unique</code>. For example <code>cs_unique('int4:{1,1,2,2,2,1,3}') = 'int4:{1,2,1,3}'</code></td>
</tr>
<tr>
<td><code>function cs_reverse(timeseries) returns timeseries</code></td>
<td>Reverses order of timeseries elements .For example <code>cs_reverse('int4:{1,2,3}') = 'int4:{3,2,1}'</code></td>
</tr>
<tr>
<td><code>function cs_trend(input timeseries) returns timeseries</code></td>
<td>Calculates sequence trend: sign of difference between pairs of non-equal sequence elements, for example <code>cs_trend('int8:{1,2,3,3,2,2,4,5,6,5,5}') = 'int1:{0,1,1,1,-1,-1,1,1,1,-1,-1}'</code></td>
</tr>
<tr>
<td><code>function cs_diff(input timeseries) returns timeseries</code></td>
<td>Calculates difference between pairs of subsequent timeseries elements: <code>result[0] = 0, result[i] = input[i] - input[i-1]</code>. For example <code>cs_diff('int8:{1,3,2,5}') = 'int8:{0,2,-1,3}'</code></td>
</tr>
<tr>
<td><code>function cs_project(anyelement, positions timeseries default null, disable_caching book default false) returns setof record</code></td>
<td>Transforms vertical representation (all timeseries elements or just elements on specified positions) to horizontal representation. This is more generic version of <code><b>TABLE</b>_project</code> which can be applied to arbitrary set of columns.
But as far as result row is anonymous, it is not possible to unnest it using PostgreSQL <code>().*</code> clause.
In PostgreSQL 9.3 it is possible to use <code>cs_project</code> in FROM list (<i>lateral join</i>) providing alias with
description of returned columns.
Concerning optional <code>disable_caching</code> parameter please read section <a href="#projection">Projection issues</a>.
</td>
</tr>
<tr>
<td><code>function cs_project_agg(anyelement, positions timeseries default null, disable_caching book default false) returns setof cs_agg_result</code></td>
<td>This is specialized version of <code>cs_project</code> for transposing result of <code>hash_agg_*</code> functions.
They return two timeseries: the first one with values of aggregate and the second one with values of group-by key.
<code>cs_project_agg</code> transforms this result to set of <code>cs_agg_result</code> rows, consisting of two columns: <code>(agg_val float8, group_by bytea)</code>. In case of combining several keys for group-by key, it can be splitted back into separate values using <code>cs_cut</code>, <code>cs_as</code> or <code>cs_as_array</code> functions.
Concerning optional <code>disable_caching</code> parameter please read section <a href="#projection">Projection issues</a>.
</td>
</tr>
<tr>
<td><code>function cs_map(input timeseries, positions timeseries) returns timeseries</code></td>
<td>Extracts from first timeseries elements with positions specified in the second timeseries.
Example of <code>cs_map</code> usage: <code>cs_map('float8:{3.14,0.1,-10}', 'int8:{1,2}')='float8:{-0.1,10}'</code></td>
</tr>
</table>
</p>
<h3><a name="grand">Grand aggregates</a></h3>
<p>
Functions calculating grand aggregates: aggregation of all timeseries elements.
</p><p>
Execution of these functions can be parallelized (except <code>cs_median</code>).
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_count(timeseries) returns bigint</code></td>
<td>Counts number of timeseries elements.</td>
</tr>
<tr>
<td><code>function cs_empty(timeseries) returns bool</code></td>
<td>Checks if timeseries contains no elements. This function is usually more efficient than <code>cs_count() %lt;%gt; 0</code> except cases when filter is applied to large timeseries and relatively small number of elements fits filter condition (unlike <code>cs_empty</code>, <code>cs_count</code> can be executed in parallel)</td>
</tr>
<tr>
<td><code>function cs_approxdc(timeseries) returns bigint</code></td>
<td>Approximates number of different timeseries elements.</td>
</tr>
<tr>
<td><code>function cs_max(timeseries) returns float8</code></td>
<td>Maximal value of timeseries elements.</td>
</tr>
<tr>
<td><code>function cs_min(timeseries) returns float8</code></td>
<td>Minimum value of timeseries elements</td>
</tr>
<tr>
<td><code>function cs_avg(timeseries) returns float8</code></td>
<td>Average value of timeseries elements.</td>
</tr>
<tr>
<td><code>function cs_sum(timeseries) returns float8</code></td>
<td>Sum of timeseries elements.</td>
</tr>
<tr>
<td><code>function cs_prd(timeseries) returns float8</code></td>
<td>Product of timeseries elements.</td>
</tr>
<tr>
<td><code>function cs_var(timeseries) returns float8</code></td>
<td>Variation of timeseries elements.</td>
</tr>
<tr>
<td><code>function cs_dev(timeseries) returns float8</code></td>
<td>Standard deviation of timeseries elements.</td>
</tr>
<tr>
<td><code>function cs_median(timeseries) returns float8</code></td>
<td>Median element of timeseries.</td>
</tr>
<tr>
<td><code>function cs_all(timeseries) returns bigint</code></td>
<td>Bitwise AND of elements of integer timeseries.</td>
</tr>
<tr>
<td><code>function cs_any(timeseries) returns bigint</code></td>
<td>Bitwise OR of elements of integer timeseries.</td>
</tr>
</table>
</p>
<h3><a name="groupby">Group-by aggregates</a></h3>
<p>
Functions calculating aggregates for each group.
Groups are identified by sequence of elements with the same value in <code>group-by</code> timeseries.
It is not mandatory to sort this timeseries. But you should realize that sequences of the same value in different parts of the timeseries will form different groups. For example, there are four groups in timeseries <code>'{1, 1, 2, 1, 1, 1, 2, 2,}': ('{1, 1}', '{2}', '{1, 1, 1}', '{2, 2}')</code>. If you want to perform aggregation for all timeseries element with the same value, then use <code>cs_hash_*</code> functions instead.
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_group_count(timeseries) returns timeseries</code></td>
<td>Returns number of elements in each group (sequences of repeated values)</td>
</tr>
<tr>
<td><code>function cs_group_apporaxdc(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Approximates number of distinct values in each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_max(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Maximal value of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_min(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Minimal value of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_sum(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Sum of elements of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_avg(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Average value of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_var(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Variation of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_dev(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Standard deviation of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_first(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>First element of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_last(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Last element of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_any(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Bitwise OR of elements of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_group_all(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Bitwise AND of elements of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
</table>
</p>
<h3><a name="wingroupby">Group-by windows aggregates</a></h3>
<p>
Functions calculating aggregates for each group. But unlike normal aggregates it preserves length of input sequence,
repeating aggregate value as much times as there are members in the group. It helps to compare for example
day's close price with average price for the week.
Groups are identified by sequence of elements with the same value in <code>group-by</code> timeseries.
It is not mandatory to sort this timeseries. But you should realize that sequences of the same value in different parts of the timeseries will form different groups. For example, there are four groups in timeseries <code>'{1, 1, 2, 1, 1, 1, 2, 2,}': ('{1, 1}', '{2}', '{1, 1, 1}', '{2, 2}')</code>. If you want to perform aggregation for all timeseries element with the same value, then use <code>cs_hash_*</code> functions instead.
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_win_group_count(timeseries) returns timeseries</code></td>
<td>Returns number of elements in each group (sequences of repeated values)</td>
</tr>
<tr>
<td><code>function cs_win_group_apporaxdc(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Approximates number of distinct values in each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_max(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Maximal value of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_min(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Minimal value of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_sum(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Sum of elements of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_avg(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Average value of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_var(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Variation of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_dev(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Standard deviation of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_first(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>First element of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_last(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Last element of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_any(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Bitwise OR of elements of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
<tr>
<td><code>function cs_win_group_all(input timeseries, group_by timeseries) returns timeseries</code></td>
<td>Bitwise AND of elements of each group. <code>group_by</code> timeseries identifies groups: sequences of repeated values.</td>
</tr>
</table>
</p>
<h3><a name="grid">Grid aggregates</a></h3>
<p>
Splitting timeseries into intervals intervaland calculating aggregate for each interval (grid cell).
</p><p>
<table border width='100%'>
<tr><th width='30%'>Function</th><th width='70%'>Description</th></tr>
<tr>
<td><code>function cs_grid_max(input timeseries, step integer) returns timeseries</code></td>
<td>Maximal value of each interval. Parameter <code>step</code> specifies size of interval.</td>
</tr>
<tr>
<td><code>function cs_grid_min(input timeseries, step integer) returns timeseries</code></td>
<td>Minimal value of each interval. Parameter <code>step</code> specifies size of interval.</td>
</tr>
<tr>
<td><code>function cs_grid_avg(input timeseries, step integer) returns timeseries</code></td>
<td>Average value of each interval. Parameter <code>step</code> specifies size of interval.</td>
</tr>
<tr>
<td><code>function cs_grid_sum(input timeseries, step integer) returns timeseries</code></td>