Skip to content

Commit d956220

Browse files
Update RStudio_integrations.md
1 parent 4af7566 commit d956220

File tree

1 file changed

+17
-7
lines changed

1 file changed

+17
-7
lines changed

Developing_on_Databricks/RStudio_integrations.md

+17-7
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ sc <- spark_connect(method = "databricks")
4848

4949
This will display the tables registered in the metastore.
5050

51-
<img src="https://docs.databricks.com/_images/rstudiosessionuisparklyr.png" height=300 width=420>
51+
<img src="https://github.com/marygracemoesta/R-User-Guide/blob/master/Developing_on_Databricks/images/sparklyr_tables_ui_view.png?raw=true" height=300 width=420>
5252

5353

5454
Tables from the `default` database will be shown, but you can switch the database using `sparklyr::tbl_change_db()`.
@@ -80,26 +80,36 @@ You can store RStudio Projects on DBFS and any other arbitrary file. When your
8080

8181
#### Git Integration
8282

83-
### RStudio Desktop with Databricks Connect
83+
The first step is to disable websockets from within RStudio's options:
8484

85-
#### Setup
85+
<img src="https://github.com/marygracemoesta/R-User-Guide/blob/master/Developing_on_Databricks/images/disable_websockets.png?raw=true" width = 300 height = 300>
8686

87-
#### Limitations
87+
Once that is complete, a GitHub repo can be connected by creating a new project from the Project dropdown menu at the top right of RStudio. Select *Version Control*, and on the next window select the git repo that you want to work with on Databricks. When you click *Create Project*, the repo will be cloned to the subdirectory you chose on the driver node and git integration will be visible from RStudio.
88+
89+
At this point you can resume your usual workflow of checking out branches, committing new code, and pushing changes to the remote repo.
90+
91+
#### Persisting R Project Files in DBFS
92+
93+
Instead of GitHub, you can also use the Databricks File System (DBFS) to persist files associated with the R project. Since DBFS enables users to treat buckets in object storage as local storage by prepending the write path with `/dbfs/`, this is very easy to do with the RStudio terminal window or with `system()` commands.
94+
95+
For example, an entire R project can be copied into DBFS via a single `cp` command.
96+
97+
```r
98+
system("cp -r /driver/my_r_project /dbfs/my_r_project")
99+
```
100+
___
88101

89102
## Rstudio Desktop Integration
90103
Databricks also supports integration with Rstudio Desktop using Databricks Connect. Please refer to the [PDF](https://github.com/marygracemoesta/R-User-Guide/blob/master/Developing_on_Databricks/DB%20Connect%20with%20RStudio%20Dekstop.pdf) For step by step instructions.
91104

92105
## Differences in Integrations
93106
When it comes to the two different integrations with Rstudio - the distinction between the two become prevalant when looking at the architecture. In the Rstudio Server integration, Rstudio lives inside the driver.
94107

95-
96108
Where as Rstudio Desktop + DB Connect uses the local machine and the drive and submit queries to the nodes managed by the Databricks cluster.
97109

98110
## Gotchas
99111
Something important to note when using the Rstudio integrations:
100112
- Loss of notebook functionality: magic commands that work in Databricks notebooks, do not work within Rstudio
101-
- As of September 2019, `sparklyr` is *not* supported using Rstudio Desktop + DB Connect
102-
103113

104114
___
105115
[Back to table of contents](https://github.com/marygracemoesta/R-User-Guide#contents)

0 commit comments

Comments
 (0)