Skip to content

Update natural_language_processing.ipynb #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 606 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
606 commits
Select commit Hold shift + click to select a range
3237916
add manimml
khuyentran1401 Jan 9, 2023
0522b78
highlight predictions
khuyentran1401 Jan 10, 2023
0915d03
add union in 3.10
khuyentran1401 Jan 15, 2023
c745e8b
add union in 3.10
khuyentran1401 Jan 15, 2023
1da5df4
add tips
khuyentran1401 Jan 16, 2023
915ec76
add tips
khuyentran1401 Jan 18, 2023
3cd14a1
add sketch ai
khuyentran1401 Jan 25, 2023
878819a
fix walrus operator
khuyentran1401 Jan 30, 2023
e5e1b43
fix walrus operator
khuyentran1401 Jan 30, 2023
139310d
add itertools dropwhile
khuyentran1401 Feb 21, 2023
4dc49ff
add prefect timeline view
khuyentran1401 Feb 22, 2023
19c0e10
add slots
khuyentran1401 Feb 23, 2023
fe93146
add yellowbrick freqdistribution plot
khuyentran1401 Feb 24, 2023
1b0d940
add pandas testing check_like
khuyentran1401 Feb 28, 2023
662bb6c
add memory_profiler
khuyentran1401 Mar 1, 2023
b986e4b
add testing good practice
khuyentran1401 Mar 2, 2023
09e5deb
add name complex conditions
khuyentran1401 Mar 7, 2023
54a0f28
add make_pipeline
khuyentran1401 Mar 8, 2023
9d02e4b
add exception handling vs if-else
khuyentran1401 Mar 9, 2023
a993e99
add python fire and fix typer
khuyentran1401 Mar 10, 2023
249dab4
add loguru
khuyentran1401 Mar 13, 2023
6eeca35
add catch specific exceptions
khuyentran1401 Mar 14, 2023
c57d0b3
add pyheat
khuyentran1401 Mar 15, 2023
c9984de
add parquet
khuyentran1401 Mar 16, 2023
a372cf4
add test fails
khuyentran1401 Mar 21, 2023
4ddf458
add fluke
khuyentran1401 Mar 22, 2023
6749849
add lambda
khuyentran1401 Mar 23, 2023
f388343
add removestar
khuyentran1401 Mar 24, 2023
1f4cd84
restructure
khuyentran1401 Mar 26, 2023
de38289
restructure
khuyentran1401 Mar 26, 2023
84f9f91
edit texthero clean
khuyentran1401 Mar 28, 2023
9c065b4
add monkeytype
khuyentran1401 Mar 29, 2023
c35d126
add set union
khuyentran1401 Mar 30, 2023
d651695
edit code review
khuyentran1401 Apr 1, 2023
b06fa31
add whylogs
khuyentran1401 Apr 3, 2023
6111780
add new section
khuyentran1401 Apr 4, 2023
407062d
add deepchecks simple model comparison
khuyentran1401 Apr 5, 2023
12e26dd
add stacked df
khuyentran1401 Apr 6, 2023
6a17425
add delta lake
khuyentran1401 Apr 6, 2023
f1121d3
add prettymaps
khuyentran1401 Apr 10, 2023
7bbb9f0
add df.infer_objects
khuyentran1401 Apr 11, 2023
f2b38d1
Update README.md
khuyentran1401 Apr 12, 2023
13d6ed6
Update README.md
khuyentran1401 Apr 12, 2023
c342129
add df.merge outer
khuyentran1401 Apr 13, 2023
335d094
Merge branch 'master' of https://github.com/khuyentran1401/Efficient_…
khuyentran1401 Apr 13, 2023
5076e3e
add polars
khuyentran1401 Apr 14, 2023
4c0b0ce
edit manage data
khuyentran1401 Apr 15, 2023
37f7291
add pandas 2.0
khuyentran1401 Apr 18, 2023
9a08a88
add unyt
khuyentran1401 Apr 19, 2023
6efe036
add unyt
khuyentran1401 Apr 19, 2023
542dd46
add copy on write
khuyentran1401 Apr 20, 2023
618df8e
add mlem
khuyentran1401 Apr 21, 2023
08a86b7
add delta lake overwrite partition
khuyentran1401 Apr 23, 2023
dd46dad
add pd chained_assignment
khuyentran1401 Apr 25, 2023
b60f47f
add pd chained_assignment
khuyentran1401 Apr 25, 2023
0b6e954
edit delta lake
khuyentran1401 Apr 26, 2023
808aa35
add omit else clause
khuyentran1401 Apr 27, 2023
e0aefaa
edit delta lake
khuyentran1401 Apr 27, 2023
b72c707
add covalent
khuyentran1401 Apr 28, 2023
877dff0
add sqlfluff
khuyentran1401 May 1, 2023
d28824c
add string template
khuyentran1401 May 2, 2023
fe71a3b
add mlforecast
khuyentran1401 May 3, 2023
0f7eaae
edit property decorator
khuyentran1401 May 4, 2023
c65909b
edit property decorator
khuyentran1401 May 4, 2023
5c8f09c
edit property decorator
khuyentran1401 May 4, 2023
483a382
add splitting strategy
khuyentran1401 May 5, 2023
0fb0587
add merge operation with delta lake
khuyentran1401 May 7, 2023
3223f4e
add mock
khuyentran1401 May 9, 2023
6178885
edit title
khuyentran1401 May 10, 2023
0f200af
add typing TypeVar
khuyentran1401 May 16, 2023
f4f7f11
delete .py files and add eradicate
khuyentran1401 May 17, 2023
e37e88c
clean up Chapter5 dir
khuyentran1401 May 19, 2023
32358e4
clean up Chapter1 dir
khuyentran1401 May 19, 2023
067bebc
clean up the rest of dirs
khuyentran1401 May 19, 2023
1b1cbb3
add pandas ai
khuyentran1401 May 19, 2023
8d7ed83
add delta lake data appending
khuyentran1401 May 20, 2023
8fca85e
add delta lake data appending
khuyentran1401 May 20, 2023
236f1d8
edit delta lake parquet
khuyentran1401 May 22, 2023
8c7c77e
add testing class
khuyentran1401 May 25, 2023
81347f1
add panda read_sql
khuyentran1401 May 30, 2023
5a220e3
add scikit-llm
khuyentran1401 May 31, 2023
91ce709
add scikit-llm
khuyentran1401 May 31, 2023
91616b5
add pyinstaller
khuyentran1401 Jun 2, 2023
2099f58
update dictionary
khuyentran1401 Jun 3, 2023
1a30df4
add polars - deltalake
khuyentran1401 Jun 5, 2023
7c1bcca
add caplog for pytest
khuyentran1401 Jun 6, 2023
0f4a65d
add pyserde
khuyentran1401 Jun 8, 2023
1fcacd5
add freezegun
khuyentran1401 Jun 13, 2023
3dfec70
add gpt-commit-summarizer
khuyentran1401 Jun 14, 2023
c115d7e
add leAB
khuyentran1401 Jun 16, 2023
1c60043
edit map and filter
khuyentran1401 Jun 20, 2023
8a59b88
add delta lake transaction log
khuyentran1401 Jun 21, 2023
95422c4
add dtale
khuyentran1401 Jun 23, 2023
97f1f87
add dtale
khuyentran1401 Jun 23, 2023
572189b
add dtale
khuyentran1401 Jun 23, 2023
5f07693
add dtale
khuyentran1401 Jun 23, 2023
5572172
add dtale
khuyentran1401 Jun 23, 2023
a3b1e0e
add dtale
khuyentran1401 Jun 23, 2023
754d258
add if __name__ == "__main__"
khuyentran1401 Jun 27, 2023
2843eec
delte GIF files
khuyentran1401 Jun 28, 2023
dbe9f83
add quadratic
khuyentran1401 Jun 28, 2023
815667f
remove some Python scripts
khuyentran1401 Jun 28, 2023
c5f1b3b
add sqlmodel
khuyentran1401 Jun 29, 2023
3f52ea9
add postgresml
khuyentran1401 Jul 5, 2023
328292f
edit sql notebook
khuyentran1401 Jul 5, 2023
da0f8b0
edit pathlib
khuyentran1401 Jul 11, 2023
9ade488
add delta lake append mistmatched data
khuyentran1401 Jul 12, 2023
bc22020
add ItsDangerous
khuyentran1401 Jul 13, 2023
1815df0
add pytest-postgres
khuyentran1401 Jul 14, 2023
b1d1255
add pytest-postgres
khuyentran1401 Jul 14, 2023
1ee14f5
add polars lazy
khuyentran1401 Jul 18, 2023
45f4aea
edit good practices
khuyentran1401 Jul 20, 2023
f8531b1
add chromadb
khuyentran1401 Jul 21, 2023
fe56948
add duckdb
khuyentran1401 Jul 24, 2023
1205764
edit duckdb
khuyentran1401 Jul 25, 2023
371f2e8
edit duckdb
khuyentran1401 Jul 25, 2023
e52f4e0
edit duckdb and pyarrow
khuyentran1401 Jul 26, 2023
e4fe3a3
editor decorator
khuyentran1401 Jul 27, 2023
0ab5dd9
add sqlparse
khuyentran1401 Jul 28, 2023
ee0a58b
add format with f-string
khuyentran1401 Aug 1, 2023
94f5659
edit args and kwargs
khuyentran1401 Aug 3, 2023
69a45d9
add pypdf
khuyentran1401 Aug 4, 2023
bd1afaa
update book
khuyentran1401 Aug 10, 2023
f064751
update delta lake
khuyentran1401 Aug 11, 2023
e12a2bf
edit comparing if-else and try-except
khuyentran1401 Aug 15, 2023
73f4022
update autoscraper
khuyentran1401 Aug 18, 2023
4be91ed
edit pandera
khuyentran1401 Aug 21, 2023
7b9e385
edit classmethod
khuyentran1401 Aug 22, 2023
91b4c78
add checking multiple types
khuyentran1401 Aug 24, 2023
1e3b5bf
add delta lake partitions
khuyentran1401 Aug 27, 2023
dc7d8d6
add delta lake partitions
khuyentran1401 Aug 28, 2023
5181275
add try except else
khuyentran1401 Aug 29, 2023
4b2f3eb
add pampy
khuyentran1401 Aug 30, 2023
0b9c4af
add tips
khuyentran1401 Sep 21, 2023
2d7af95
add df.align
khuyentran1401 Sep 26, 2023
f2941e0
add gif
khuyentran1401 Sep 27, 2023
c74cbb8
add vizro
khuyentran1401 Sep 29, 2023
29d5652
add doctest
khuyentran1401 Oct 2, 2023
6ce2c35
add parquet data filtering
khuyentran1401 Oct 5, 2023
61686e3
add pmdarima
khuyentran1401 Oct 9, 2023
f8b6a46
add vulture
khuyentran1401 Oct 11, 2023
3c958f7
add private variables
khuyentran1401 Oct 12, 2023
4514902
update class composition
khuyentran1401 Oct 24, 2023
67913f3
add fuzzy matching
khuyentran1401 Oct 25, 2023
1a83fa2
add set_output in scikit-learn
khuyentran1401 Oct 26, 2023
64cdfbb
add llm section
khuyentran1401 Oct 27, 2023
b6ad496
add pandas api on spark
khuyentran1401 Oct 31, 2023
cde021b
add column transformer
khuyentran1401 Oct 31, 2023
b7251ad
edit pandas api on spark
khuyentran1401 Nov 1, 2023
09dfcef
add keyword-only arguments
khuyentran1401 Nov 9, 2023
94247b0
add single point of return
khuyentran1401 Nov 14, 2023
636245c
add pandera hypothesis
khuyentran1401 Nov 15, 2023
58233b2
add mllib
khuyentran1401 Nov 17, 2023
2e911ea
add rocketry
khuyentran1401 Nov 20, 2023
07d4e54
add rocketry
khuyentran1401 Nov 20, 2023
2b7eb00
add vectorized operation in pandas
khuyentran1401 Nov 21, 2023
7056740
add deduplicate
khuyentran1401 Nov 22, 2023
7d6218f
add deduplicate
khuyentran1401 Nov 22, 2023
162d505
edit python-dotenv
khuyentran1401 Nov 23, 2023
0f8ae83
add pytube
khuyentran1401 Nov 29, 2023
b110989
add enum
khuyentran1401 Nov 30, 2023
7f5446b
add aeon
khuyentran1401 Dec 8, 2023
000f87d
Update broken linke to Feature-engine library in feature_engineer.ipynb
solegalli Dec 10, 2023
8af5162
Update feature_engineer.html
solegalli Dec 14, 2023
817ae81
Update feature_engineer.ipynb
solegalli Dec 14, 2023
c1204a0
fix merging with custom fixes
khuyentran1401 Dec 14, 2023
e8a6443
Merge pull request #17 from solegalli/solegalli-patch-3
khuyentran1401 Dec 14, 2023
a7f96b2
Merge pull request #16 from solegalli/solegalli-patch-2
khuyentran1401 Dec 14, 2023
84aeb34
Merge pull request #15 from solegalli/patch-1
khuyentran1401 Dec 14, 2023
ee32114
add pytest.raises
khuyentran1401 Dec 19, 2023
4803bda
Merge branch 'master' of https://github.com/khuyentran1401/Efficient_…
khuyentran1401 Dec 20, 2023
9375c10
add brand image
khuyentran1401 Dec 20, 2023
890b473
add polars expression
khuyentran1401 Dec 26, 2023
d5b78ae
add ruptures
khuyentran1401 Dec 27, 2023
201b217
add open-closed principle
khuyentran1401 Jan 2, 2024
a4ef938
add mixin
khuyentran1401 Jan 4, 2024
4977d93
add parameterized queries
khuyentran1401 Jan 8, 2024
61632fc
add pyspark
khuyentran1401 Jan 8, 2024
2a027c5
add jupyter-ai
khuyentran1401 Jan 10, 2024
41bb013
add zip strict=True
khuyentran1401 Jan 11, 2024
b23dfe8
add outlines
khuyentran1401 Jan 12, 2024
0747170
Create convert-to-discussion.yml
khuyentran1401 Jan 16, 2024
d744ea8
Update convert-to-discussion.yml
khuyentran1401 Jan 16, 2024
d1645a3
add tmp_path
khuyentran1401 Jan 18, 2024
50e61b0
Merge branch 'master' of https://github.com/khuyentran1401/Efficient_…
khuyentran1401 Jan 18, 2024
afdc650
add lazy predict
khuyentran1401 Jan 23, 2024
4d128ba
add delta lake constraints
khuyentran1401 Jan 23, 2024
38dff90
add sql-metadata
khuyentran1401 Jan 26, 2024
c36785e
add testbook
khuyentran1401 Jan 29, 2024
2cc8dbc
add gluonts
khuyentran1401 Jan 30, 2024
b216e4d
add parse_dates
khuyentran1401 Feb 1, 2024
cf16a41
add spark dataframe
khuyentran1401 Feb 5, 2024
b38a979
fix change_values section
khuyentran1401 Feb 6, 2024
889b71a
add tfcausalimpact
khuyentran1401 Feb 7, 2024
16b58a9
add galatic
khuyentran1401 Mar 4, 2024
f4388a6
add string pyarrow
khuyentran1401 Mar 5, 2024
f0ed409
add spark array functions
khuyentran1401 Mar 6, 2024
ec71734
add spark array functions
khuyentran1401 Mar 6, 2024
2bb580b
add polars
khuyentran1401 Mar 19, 2024
3f01c15
add safetensors
khuyentran1401 Mar 27, 2024
30afc87
add quantstats
khuyentran1401 Mar 28, 2024
de8ff58
add spark udf
khuyentran1401 Mar 30, 2024
3e65950
edit python match
khuyentran1401 Apr 2, 2024
522bfe2
add kneed
khuyentran1401 Apr 10, 2024
ff86432
update version in visualization
khuyentran1401 Apr 10, 2024
ed28299
update version in visualization
khuyentran1401 Apr 10, 2024
b57b1be
update time series notebook
khuyentran1401 Apr 10, 2024
b1e03b7
add functiontransformer
khuyentran1401 Apr 11, 2024
017feeb
add udf reusability
khuyentran1401 Apr 15, 2024
9b780ff
add pytest mark
khuyentran1401 Apr 16, 2024
914451a
add match and dataclasses
khuyentran1401 Apr 18, 2024
89e5142
add great tables
khuyentran1401 Apr 21, 2024
66b1a3a
add pydantic validation
khuyentran1401 Apr 21, 2024
46ef7b1
add itertools.islice
khuyentran1401 Apr 21, 2024
9d9eeda
add neuralforecast
khuyentran1401 Apr 25, 2024
85446bd
add mirascope
khuyentran1401 May 7, 2024
f7e44a3
add unit testing with pyspark
khuyentran1401 May 10, 2024
3a77103
add pydantic number constraint
khuyentran1401 May 10, 2024
c6e7d3c
add mirascope
khuyentran1401 May 20, 2024
8298c12
add duckdb csv
khuyentran1401 May 20, 2024
be8f8db
add timegpt
khuyentran1401 May 28, 2024
fbea331
update old posts
khuyentran1401 May 28, 2024
f2d1646
add pyspark
khuyentran1401 May 28, 2024
0a0ff2b
make changes to the existing content
khuyentran1401 Jun 3, 2024
30332b6
add statsforecast
khuyentran1401 Jun 3, 2024
97b3788
fix warning in statsforecast
khuyentran1401 Jun 4, 2024
aa99f96
add spark udf
khuyentran1401 Jun 9, 2024
153dec7
add FlashText
khuyentran1401 Jun 10, 2024
3da19ba
add FlashText
khuyentran1401 Jun 10, 2024
46cba8c
edit any and list comprehension
khuyentran1401 Jun 10, 2024
57ff1e4
edit any and list comprehension
khuyentran1401 Jun 10, 2024
41fb788
add autogluon
khuyentran1401 Jun 10, 2024
266d856
add autogluon
khuyentran1401 Jun 11, 2024
3519dd7
add magika
khuyentran1401 Jun 15, 2024
21beb1f
add pyspark
khuyentran1401 Jun 23, 2024
1210a51
add tsfresh
khuyentran1401 Jun 25, 2024
ec541ac
add polars vs pandas
khuyentran1401 Jun 27, 2024
f7f8ea3
add mlforecast
khuyentran1401 Jun 28, 2024
34f05b8
edit spark output
khuyentran1401 Jul 3, 2024
33061e0
add pyspark
khuyentran1401 Jul 3, 2024
ae3e449
add tsmoothie
khuyentran1401 Jul 4, 2024
bd423eb
add pygwalker
khuyentran1401 Jul 4, 2024
5e1d988
add backtesting
khuyentran1401 Jul 8, 2024
bcbb0af
add pyspark join
khuyentran1401 Jul 14, 2024
334cc00
add mlforecast cross_validation
khuyentran1401 Jul 14, 2024
c382e3d
add title to code
khuyentran1401 Jul 16, 2024
0117e84
replace brand image
khuyentran1401 Jul 16, 2024
c0b9fad
replace brand image
khuyentran1401 Jul 16, 2024
ebfa1f6
change brand name
khuyentran1401 Jul 16, 2024
c252eaa
Update natural_language_processing.ipynb
hagarshalabyy Jul 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
data/MLTollsStackOverflow.csv filter=lfs diff=lfs merge=lfs -text
2 changes: 2 additions & 0 deletions .gitconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[commit]
template = ~/.git-commit-template.txt
26 changes: 26 additions & 0 deletions .github/workflows/convert-to-discussion.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Convert Issue to Discussion
on:
issues:
types:
- opened
labels:
- product-request
jobs:
convert-to-discussion:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Convert Issue to Discussion
uses: abirismyname/[email protected]
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
title: ${{ github.event.issue.title }}
body: ${{ github.event.issue.body }}
repository-id: ${{ secrets.REPO_ID }}
category-id: ${{ secrets.CAT_ID }}
- name: Print discussion url and id
run: |
echo discussion-id: ${{steps.create-discussion.outputs.discussion-id}}
echo discussion-url: ${{steps.create-discussion.outputs.discussion-url}}
13 changes: 12 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,15 @@ _build
requirements_2.txt
Chapter4/Twitter.zip
Chapter4/SWEETVIZ_REPORT.html
build.sh
Chapter5/s2v_old
Chapter5/._s2v_old
build.sh
.mypy_cache
.hypothesis
wandb
venv
.DS_Store
data
*_scripts/*
*.csv
generate_url.py
189 changes: 0 additions & 189 deletions Chapter1/best_practices.ipynb

This file was deleted.

Loading