Introduction to go2cell
go2cell_vignette.Rmd
go2cell
go2cell is an R package with the goal to link Gene Ontology information to cell types via marker genes of said cell types, but it does all this by harnessing the strengths of open linked data in Wikidata.
To get started with go2cell, we first need to load the package:
library(go2cell)
And, for the purposes of this showcase, we’ll be using a small test dataset of GO IDs that come with the package, called go2cell_test_data.
head(go2cell_test_data)
#> gene process goid
#> 1 FASN metabolism GO:0008152
#> 2 APBA2 protein transport GO:0015031
#> 3 SHH striated muscle tissue development GO:0014706
#> 4 SELENON positive regulation of skeletal muscle cell proliferation GO:0014858
#> 5 EDN1 positive regulation of mitotic nuclear division GO:0045840
#> 6 BIRC5 protein phosphorylation GO:0006468
Now, you can see we have 10 Gene Ontology identifiers, related to processes such as protein metabolism and glucose homeostasis. Let’s see what cell types can be matched to these identifiers.
# The function accepts a vector of GO IDs
results <- go2cell(go2cell_test_data$goid)
#> Rows: 978 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (5): cell_type, cell_typeLabel, go_ids, go_termLabel, geneLabel
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(results)
#> # A tibble: 6 × 5
#> cell_type cell_typeLabel go_ids go_termLabel geneLabel
#> <chr> <chr> <chr> <chr> <chr>
#> 1 wd:Q71292398 human oligodendrocyte GO:0008152 metabolism FASN
#> 2 wd:Q105533311 human adipocyte (heart) GO:0008152 metabolism FASN
#> 3 wd:Q68620794 human hepatocyte GO:0008152 metabolism PAH
#> 4 wd:Q68620794 human hepatocyte GO:0008152 metabolism GCK
#> 5 wd:Q101404894 human melanocyte GO:0008152 metabolism TYR
#> 6 wd:Q101404913 human purkinje neuron GO:0008152 metabolism PDE5A
We can see in the results the cell types associated with the identifiers we had, showing cell types such as the human myoblast, along with their Wikidata Item identifier (or QID), Q101404940.
In the same results, we can see the marker genes of these cell types that led to the conclusion, such as FGFR4, that linked the human myoblast to protein phosphorylation (GO:0006468).
Now, suppose we want to do the reverse process. Say we are interested in seeing what GO IDs are associated to the marker genes of cell type Q101405206, or the human neuroendocrine cell. We can get this information by using the ‘cell2go’ function:
# Now we use the Wikidata identifier
human_neuroendocrine_cell <- cell2go("Q101405206")
#> Rows: 744 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (5): cell_type, cell_typeLabel, go_ids, go_termLabel, geneLabel
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(human_neuroendocrine_cell)
#> # A tibble: 6 × 5
#> cell_type cell_typeLabel go_ids go_termLabel geneLabel
#> <chr> <chr> <chr> <chr> <chr>
#> 1 wd:Q101405206 human neuroendocrine cell GO:0005515 protein binding PENK
#> 2 wd:Q101405206 human neuroendocrine cell GO:0005515 protein binding FXYD7
#> 3 wd:Q101405206 human neuroendocrine cell GO:0005515 protein binding NEGR1
#> 4 wd:Q101405206 human neuroendocrine cell GO:0005515 protein binding CCK
#> 5 wd:Q101405206 human neuroendocrine cell GO:0046872 metal ion binding DPF1
#> 6 wd:Q101405206 human neuroendocrine cell GO:0008236 serine-type pept… PRSS12
And now we’ve retrieved all marker genes from the human neuroendocrine cell type and all GO identifiers associated with said genes.
And that’s it! To sum up, you can use go2cell to link cell types to GO IDs via one of two simple functions, depending on the type of information you want to acquire.