Skip to content

Commit 519576c

Browse files
committed
update README
1 parent 14a7b76 commit 519576c

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Read files on Hdfs.
1414
- **config** overwrites configuration parameters (hash, default: `{}`)
1515
- **input_path** file path on Hdfs. you can use glob and Date format like `%Y%m%d/%s`.
1616
- **rewind_seconds** When you use Date format in input_path property, the format is executed by using the time which is Now minus this property.
17+
- **partition** when this is true, partition input files and increase task count. (default: `true`)
1718

1819
## Example
1920

@@ -24,12 +25,13 @@ in:
2425
- /opt/analytics/etc/hadoop/conf/core-site.xml
2526
- /opt/analytics/etc/hadoop/conf/hdfs-site.xml
2627
config:
27-
fs.defaultFS: 'hdfs://hdp-nn1:8020'
28+
fs.defaultFS: 'hdfs://hadoop-nn1:8020'
2829
dfs.replication: 1
2930
fs.hdfs.impl: 'org.apache.hadoop.hdfs.DistributedFileSystem'
3031
fs.file.impl: 'org.apache.hadoop.fs.LocalFileSystem'
3132
input_path: /user/embulk/test/%Y-%m-%d/*
3233
rewind_seconds: 86400
34+
partition: true
3335
decoders:
3436
- {type: gzip}
3537
parser:
@@ -50,6 +52,15 @@ in:
5052
- {name: c3, type: long}
5153
```
5254
55+
## Note
56+
- the feature of the partition supports only 3 line terminators.
57+
- `\n`
58+
- `\r`
59+
- `\r\n`
60+
61+
## The Reference Implementation
62+
- [hito4t/embulk-input-filesplit](https://github.com/hito4t/embulk-input-filesplit)
63+
5364
## Build
5465

5566
```

0 commit comments

Comments
 (0)