|
6 | 6 | <!--
|
7 | 7 | This HTML was auto-generated from MATLAB code.
|
8 | 8 | To make changes, update the MATLAB code and republish this document.
|
9 |
| - --><title>Working with annotated matrices using the GCT and GCTX data formats in MATLAB</title><meta name="generator" content="MATLAB 8.4"><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/"><meta name="DC.date" content="2017-11-22"><meta name="DC.source" content="gctx_tutorial.m"><style type="text/css"> |
| 9 | + --><title>Working with annotated matrices using the GCT and GCTX data formats in MATLAB</title><meta name="generator" content="MATLAB 8.4"><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/"><meta name="DC.date" content="2017-11-27"><meta name="DC.source" content="gctx_tutorial.m"><style type="text/css"> |
10 | 10 | html,body,div,span,applet,object,iframe,h1,h2,h3,h4,h5,h6,p,blockquote,pre,a,abbr,acronym,address,big,cite,code,del,dfn,em,font,img,ins,kbd,q,s,samp,small,strike,strong,sub,sup,tt,var,b,u,i,center,dl,dt,dd,ol,ul,li,fieldset,form,label,legend,table,caption,tbody,tfoot,thead,tr,th,td{margin:0;padding:0;border:0;outline:0;font-size:100%;vertical-align:baseline;background:transparent}body{line-height:1}ol,ul{list-style:none}blockquote,q{quotes:none}blockquote:before,blockquote:after,q:before,q:after{content:'';content:none}:focus{outine:0}ins{text-decoration:none}del{text-decoration:line-through}table{border-collapse:collapse;border-spacing:0}
|
11 | 11 |
|
12 | 12 | html { min-height:100%; margin-bottom:1px; }
|
|
66 | 66 |
|
67 | 67 |
|
68 | 68 |
|
69 |
| - </style></head><body><div class="content"><h1>Working with annotated matrices using the GCT and GCTX data formats in MATLAB</h1><!--introduction--><p>Script used to generate this tutorial: <a href="gctx_tutorial.m">gctx_tutorial.m</a></p><!--/introduction--><h2>Contents</h2><div><ul><li><a href="#1">Reading a GCT or GCTX file</a></li><li><a href="#2">GCT data representation</a></li><li><a href="#3">Layout of the GCT structure</a></li><li><a href="#4">For large files, it can be useful to read just the metadata</a></li><li><a href="#5">Extracting a subset of data from a GCTX file</a></li><li><a href="#6">Working with metadata</a></li><li><a href="#7">List all available row metadata fields</a></li><li><a href="#8">Read all row metadata into a structure</a></li><li><a href="#9">Annotate a dataset from a structure</a></li><li><a href="#10">Read contents of metadata field</a></li><li><a href="#11">Add metadata fields from cell arrays</a></li><li><a href="#12">Remove metadata fields</a></li><li><a href="#13">Merging GCT/x files</a></li><li><a href="#14">Slicing GCT/x files</a></li><li><a href="#15">Transpose a GCT/x</a></li><li><a href="#16">Writing GCT/x files</a></li><li><a href="#17">Compute correlations</a></li><li><a href="#18">Clean-up</a></li></ul></div><h2>Reading a GCT or GCTX file<a name="1"></a></h2><p>GCT and GCTx files can be read in the same way. We'll use the same two files throughout this tutorial.</p><pre class="codeinput">gct_file_location = fullfile(cmapmpath, <span class="string">'resources'</span>, <span class="string">'example.gct'</span>); |
| 69 | + </style></head><body><div class="content"><h1>Working with annotated matrices using the GCT and GCTX data formats in MATLAB</h1><!--introduction--><p>Script used to generate this tutorial: <a href="gctx_tutorial.m">gctx_tutorial.m</a></p><!--/introduction--><h2>Contents</h2><div><ul><li><a href="#1">Reading a GCT or GCTX file</a></li><li><a href="#2">GCT data representation</a></li><li><a href="#3">Layout of the GCT structure</a></li><li><a href="#4">For large files, it can be useful to read just the metadata</a></li><li><a href="#5">Extracting a subset of data from a GCTX file</a></li><li><a href="#6">Working with metadata</a></li><li><a href="#7">List all available row metadata fields</a></li><li><a href="#8">Read all row metadata into a structure</a></li><li><a href="#9">Annotate a dataset from a structure</a></li><li><a href="#10">Read contents of a metadata field</a></li><li><a href="#11">Add metadata fields from cell arrays</a></li><li><a href="#12">Remove metadata fields</a></li><li><a href="#13">Merging GCT/x files</a></li><li><a href="#14">Slicing GCT/x files</a></li><li><a href="#15">Transpose a GCT/x</a></li><li><a href="#16">Writing GCT/x files</a></li><li><a href="#17">Compute correlations</a></li><li><a href="#18">Clean-up</a></li></ul></div><h2>Reading a GCT or GCTX file<a name="1"></a></h2><p>GCT and GCTx files can be read in the same way. We'll use the same two files throughout this tutorial.</p><pre class="codeinput">gct_file_location = fullfile(cmapmpath, <span class="string">'resources'</span>, <span class="string">'example.gct'</span>); |
70 | 70 | gctx_file_location = fullfile(cmapmpath, <span class="string">'resources'</span>, <span class="string">'example.gctx'</span>);
|
71 | 71 | ds1 = cmapm.Pipeline.parse_gctx(gct_file_location);
|
72 | 72 | ds2 = cmapm.Pipeline.parse_gctx(gctx_file_location);
|
|
76 | 76 | Done.
|
77 | 77 |
|
78 | 78 | Reading /Users/narayan/workspace/cmapM/resources/example.gctx [978x1476]
|
79 |
| -Done [0.77 s]. |
| 79 | +Done [0.78 s]. |
80 | 80 | </pre><h2>GCT data representation<a name="2"></a></h2><p>GCT and GCTx files are both represented in memory as structures.</p><pre class="codeinput">disp(class(ds1));
|
81 | 81 | disp(class(ds2));
|
82 | 82 | </pre><pre class="codeoutput">struct
|
|
99 | 99 | </pre><h2>For large files, it can be useful to read just the metadata<a name="4"></a></h2><pre class="codeinput">ds_with_only_meta = cmapm.Pipeline.parse_gctx(gctx_file_location, <span class="string">'annot_only'</span>, true);
|
100 | 100 | disp(ds_with_only_meta);
|
101 | 101 | <span class="comment">% Note that the mat field is empty, but the metadata is the same as above</span>
|
102 |
| -</pre><pre class="codeoutput">Reading /Users/narayan/workspace/cmapM/resources/example.gctx Done [0.76 s]. |
| 102 | +</pre><pre class="codeoutput">Reading /Users/narayan/workspace/cmapM/resources/example.gctx Done [0.73 s]. |
103 | 103 | mat: []
|
104 | 104 | rid: {978x1 cell}
|
105 | 105 | rhd: {11x1 cell}
|
|
121 | 121 | ds_subset = cmapm.Pipeline.parse_gctx(gctx_file_location, <span class="string">'rid'</span>, my_rids, <span class="string">'cid'</span>, my_cids);
|
122 | 122 | </pre><pre class="codeoutput">Reading /Users/narayan/workspace/cmapM/resources/example.gctx [3x1]
|
123 | 123 | Performing 3 hyperslab selections
|
124 |
| -Done [0.77 s]. |
| 124 | +Done [0.76 s]. |
125 | 125 | </pre><h2>Working with metadata<a name="6"></a></h2><p>We provide several convenience functions to operate on the metadata in a dataset.</p><p>Note that while you can modify the attributes of a dataset object directly, it is not recommended since it could affect the integrity of the data structure.</p><h2>List all available row metadata fields<a name="7"></a></h2><pre class="codeinput">row_fields = ds_subset.rhd;
|
126 | 126 | col_fields = ds_subset.chd;
|
127 | 127 |
|
|
195 | 195 | ds_subset = cmapm.Pipeline.ds_set_annotations(ds_subset, new_meta, <span class="string">'dim'</span>, <span class="string">'row'</span>);
|
196 | 196 | <span class="comment">% verify if the new fields have been added</span>
|
197 | 197 | assert(all(ismember({<span class="string">'new_field1'</span>, <span class="string">'new_field2'</span>}, ds_subset.rhd)));
|
198 |
| -</pre><h2>Read contents of metadata field<a name="10"></a></h2><pre class="codeinput">gene_symbol = cmapm.Pipeline.ds_get_meta(ds_subset, <span class="string">'row'</span>, <span class="string">'pr_gene_symbol'</span>); |
| 198 | +</pre><h2>Read contents of a metadata field<a name="10"></a></h2><pre class="codeinput">gene_symbol = cmapm.Pipeline.ds_get_meta(ds_subset, <span class="string">'row'</span>, <span class="string">'pr_gene_symbol'</span>); |
199 | 199 | disp(gene_symbol);
|
200 | 200 | </pre><pre class="codeoutput"> 'VDAC1'
|
201 | 201 | 'SORBS3'
|
|
218 | 218 | beadset_ids = cmapm.Pipeline.ds_get_meta(ds, <span class="string">'row'</span>, <span class="string">'pr_bset_id'</span>);
|
219 | 219 | dp52_bool_array = strcmp(<span class="string">'dp52'</span>, beadset_ids);
|
220 | 220 | dp52_rids = ds.rid(dp52_bool_array);
|
221 |
| -length(dp52_rids) |
222 | 221 |
|
223 | 222 | <span class="comment">% Get cids corresponding to DMSO samples.</span>
|
224 | 223 | pert_inames = cmapm.Pipeline.ds_get_meta(ds, <span class="string">'column'</span>, <span class="string">'pert_iname'</span>);
|
225 | 224 | dmso_bool_array = strcmp(<span class="string">'DMSO'</span>, pert_inames);
|
226 | 225 | dmso_cids = ds.cid(dmso_bool_array);
|
227 |
| -length(dmso_cids) |
228 | 226 |
|
229 |
| -<span class="comment">% Confirm that the size of sliced is correct: 489 probes x 100 samples.</span> |
| 227 | +<span class="comment">% Confirm that the dimensions of sliced is correct: 489 probes x 100 samples.</span> |
230 | 228 | sliced = cmapm.Pipeline.ds_slice(ds, <span class="string">'rid'</span>, dp52_rids, <span class="string">'cid'</span>, dmso_cids);
|
| 229 | +assert(isequal(size(sliced.mat), [length(dp52_rids), length(dmso_cids)]), <span class="string">'Dimension mismatch'</span>); |
231 | 230 | disp(size(sliced.mat));
|
232 | 231 | </pre><pre class="codeoutput">Reading /Users/narayan/workspace/cmapM/resources/example.gctx [978x1476]
|
233 |
| -Done [0.76 s]. |
234 |
| - |
235 |
| -ans = |
236 |
| - |
237 |
| - 489 |
238 |
| - |
239 |
| - |
240 |
| -ans = |
241 |
| - |
242 |
| - 100 |
243 |
| - |
| 232 | +Done [0.75 s]. |
244 | 233 | 489 100
|
245 | 234 |
|
246 | 235 | </pre><h2>Transpose a GCT/x<a name="15"></a></h2><pre class="codeinput">transposed = cmapm.Pipeline.ds_transpose(ds);
|
|
250 | 239 | out_gctx = cmapm.Pipeline.mkgctx(<span class="string">'example_out.gctx'</span>, ds);
|
251 | 240 |
|
252 | 241 | <span class="comment">% Note that the same dataset object can be written out as either a GCT or GCTx.</span>
|
253 |
| -<span class="comment">% Note alsa that for convenience the dimensions of the matrix is automatically appended to</span> |
| 242 | +<span class="comment">% Note also that for convenience the dimensions of the matrix is automatically appended to</span> |
254 | 243 | <span class="comment">% the filename, and the columns go first.</span>
|
255 | 244 | </pre><pre class="codeoutput">Saving file to example_out_n1476x978.gct
|
256 | 245 | Dimensions of matrix: [978x1476]
|
|
262 | 251 | done [0.26s].
|
263 | 252 | </pre><h2>Compute correlations<a name="17"></a></h2><p>Compute pairwise spearman correlations between columns of dataset</p><pre class="codeinput">cc = cmapm.Pipeline.ds_corr(ds);
|
264 | 253 |
|
265 |
| -<span class="comment">% cc is itself a GCT structure</span> |
| 254 | +<span class="comment">% cc is a square and symmetric GCT structure</span> |
| 255 | +assert(isequal(size(cc.mat), [size(ds.mat, 2), size(ds.mat, 2)]), <span class="string">'CC is not square'</span>); |
| 256 | +assert(isequal(cc.mat, cc.mat'), <span class="string">'CC is not symmetric'</span>); |
| 257 | + |
266 | 258 | <span class="comment">% Examine its contents</span>
|
267 |
| -disp(cc.mat(1:5, 1:5)); |
268 |
| -</pre><pre class="codeoutput"> 1.0000 0.9042 0.8794 0.8476 0.8184 |
269 |
| - 0.9042 1.0000 0.9022 0.8620 0.8363 |
270 |
| - 0.8794 0.9022 1.0000 0.8636 0.8455 |
271 |
| - 0.8476 0.8620 0.8636 1.0000 0.8187 |
272 |
| - 0.8184 0.8363 0.8455 0.8187 1.0000 |
273 |
| - |
274 |
| -</pre><h2>Clean-up<a name="18"></a></h2><pre class="codeinput">delete(out_gct) |
| 259 | +imagesc(cc.mat(1:20, 1:20)); |
| 260 | +colorbar |
| 261 | +caxis([0.5, 1]); |
| 262 | +axis <span class="string">square</span> |
| 263 | +title(<span class="string">'Pairwise Spearman Correlation'</span>); |
| 264 | +</pre><img vspace="5" hspace="5" src="gctx_tutorial_01.png" alt=""> <h2>Clean-up<a name="18"></a></h2><pre class="codeinput">delete(out_gct) |
275 | 265 | delete(out_gctx)
|
276 | 266 | </pre><p class="footer"><br><a href="http://www.mathworks.com/products/matlab/">Published with MATLAB® R2014b</a><br></p></div><!--
|
277 | 267 | ##### SOURCE BEGIN #####
|
|
347 | 337 | % verify if the new fields have been added
|
348 | 338 | assert(all(ismember({'new_field1', 'new_field2'}, ds_subset.rhd)));
|
349 | 339 |
|
350 |
| -%% Read contents of metadata field |
| 340 | +%% Read contents of a metadata field |
351 | 341 | gene_symbol = cmapm.Pipeline.ds_get_meta(ds_subset, 'row', 'pr_gene_symbol');
|
352 | 342 | disp(gene_symbol);
|
353 | 343 | %% Add metadata fields from cell arrays
|
|
376 | 366 | beadset_ids = cmapm.Pipeline.ds_get_meta(ds, 'row', 'pr_bset_id');
|
377 | 367 | dp52_bool_array = strcmp('dp52', beadset_ids);
|
378 | 368 | dp52_rids = ds.rid(dp52_bool_array);
|
379 |
| -length(dp52_rids) |
380 | 369 |
|
381 | 370 | % Get cids corresponding to DMSO samples.
|
382 | 371 | pert_inames = cmapm.Pipeline.ds_get_meta(ds, 'column', 'pert_iname');
|
383 | 372 | dmso_bool_array = strcmp('DMSO', pert_inames);
|
384 | 373 | dmso_cids = ds.cid(dmso_bool_array);
|
385 |
| -length(dmso_cids) |
386 | 374 |
|
387 |
| -% Confirm that the size of sliced is correct: 489 probes x 100 samples. |
| 375 | +% Confirm that the dimensions of sliced is correct: 489 probes x 100 samples. |
388 | 376 | sliced = cmapm.Pipeline.ds_slice(ds, 'rid', dp52_rids, 'cid', dmso_cids);
|
| 377 | +assert(isequal(size(sliced.mat), [length(dp52_rids), length(dmso_cids)]), 'Dimension mismatch'); |
389 | 378 | disp(size(sliced.mat));
|
390 | 379 |
|
391 | 380 | %% Transpose a GCT/x
|
|
397 | 386 | out_gctx = cmapm.Pipeline.mkgctx('example_out.gctx', ds);
|
398 | 387 |
|
399 | 388 | % Note that the same dataset object can be written out as either a GCT or GCTx.
|
400 |
| -% Note alsa that for convenience the dimensions of the matrix is automatically appended to |
| 389 | +% Note also that for convenience the dimensions of the matrix is automatically appended to |
401 | 390 | % the filename, and the columns go first.
|
402 | 391 |
|
403 | 392 | %% Compute correlations
|
404 | 393 | % Compute pairwise spearman correlations between columns of dataset
|
405 | 394 | cc = cmapm.Pipeline.ds_corr(ds);
|
406 | 395 |
|
407 |
| -% cc is itself a GCT structure |
| 396 | +% cc is a square and symmetric GCT structure |
| 397 | +assert(isequal(size(cc.mat), [size(ds.mat, 2), size(ds.mat, 2)]), 'CC is not square'); |
| 398 | +assert(isequal(cc.mat, cc.mat'), 'CC is not symmetric'); |
| 399 | +
|
408 | 400 | % Examine its contents
|
409 |
| -disp(cc.mat(1:5, 1:5)); |
| 401 | +imagesc(cc.mat(1:20, 1:20)); |
| 402 | +colorbar |
| 403 | +caxis([0.5, 1]); |
| 404 | +axis square |
| 405 | +title('Pairwise Spearman Correlation'); |
410 | 406 |
|
411 | 407 | %% Clean-up
|
412 | 408 | delete(out_gct)
|
|
0 commit comments