Introduction to go2cell • go2cell

go2cell

go2cell is an R package with the goal to link Gene Ontology information to cell types via marker genes of said cell types, but it does all this by harnessing the strengths of open linked data in Wikidata.

To get started with go2cell, we first need to load the package:

library(go2cell)

And, for the purposes of this showcase, we’ll be using a small test dataset of GO IDs that come with the package, called go2cell_test_data.

head(go2cell_test_data)
#>      gene                                                   process       goid
#> 1    FASN                                                metabolism GO:0008152
#> 2   APBA2                                         protein transport GO:0015031
#> 3     SHH                        striated muscle tissue development GO:0014706
#> 4 SELENON positive regulation of skeletal muscle cell proliferation GO:0014858
#> 5    EDN1           positive regulation of mitotic nuclear division GO:0045840
#> 6   BIRC5                                   protein phosphorylation GO:0006468

Now, you can see we have 10 Gene Ontology identifiers, related to processes such as protein metabolism and glucose homeostasis. Let’s see what cell types can be matched to these identifiers.

# The function accepts a vector of GO IDs
results <- go2cell(go2cell_test_data$goid)
#> Rows: 978 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (5): cell_type, cell_typeLabel, go_ids, go_termLabel, geneLabel
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(results)
#> # A tibble: 6 × 5
#>   cell_type     cell_typeLabel           go_ids     go_termLabel geneLabel
#>   <chr>         <chr>                    <chr>      <chr>        <chr>    
#> 1 wd:Q71292398  human oligodendrocyte    GO:0008152 metabolism   FASN     
#> 2 wd:Q105533311 human adipocyte  (heart) GO:0008152 metabolism   FASN     
#> 3 wd:Q68620794  human hepatocyte         GO:0008152 metabolism   PAH      
#> 4 wd:Q68620794  human hepatocyte         GO:0008152 metabolism   GCK      
#> 5 wd:Q101404894 human melanocyte         GO:0008152 metabolism   TYR      
#> 6 wd:Q101404913 human purkinje neuron    GO:0008152 metabolism   PDE5A

We can see in the results the cell types associated with the identifiers we had, showing cell types such as the human myoblast, along with their Wikidata Item identifier (or QID), Q101404940.

In the same results, we can see the marker genes of these cell types that led to the conclusion, such as FGFR4, that linked the human myoblast to protein phosphorylation (GO:0006468).

Now, suppose we want to do the reverse process. Say we are interested in seeing what GO IDs are associated to the marker genes of cell type Q101405206, or the human neuroendocrine cell. We can get this information by using the ‘cell2go’ function:

# Now we use the Wikidata identifier
human_neuroendocrine_cell <- cell2go("Q101405206")
#> Rows: 744 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (5): cell_type, cell_typeLabel, go_ids, go_termLabel, geneLabel
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(human_neuroendocrine_cell)
#> # A tibble: 6 × 5
#>   cell_type     cell_typeLabel            go_ids     go_termLabel      geneLabel
#>   <chr>         <chr>                     <chr>      <chr>             <chr>    
#> 1 wd:Q101405206 human neuroendocrine cell GO:0005515 protein binding   PENK     
#> 2 wd:Q101405206 human neuroendocrine cell GO:0005515 protein binding   FXYD7    
#> 3 wd:Q101405206 human neuroendocrine cell GO:0005515 protein binding   NEGR1    
#> 4 wd:Q101405206 human neuroendocrine cell GO:0005515 protein binding   CCK      
#> 5 wd:Q101405206 human neuroendocrine cell GO:0046872 metal ion binding DPF1     
#> 6 wd:Q101405206 human neuroendocrine cell GO:0008236 serine-type pept… PRSS12

And now we’ve retrieved all marker genes from the human neuroendocrine cell type and all GO identifiers associated with said genes.

And that’s it! To sum up, you can use go2cell to link cell types to GO IDs via one of two simple functions, depending on the type of information you want to acquire.