Skip to content

Preprocessing module

downloads_panglao()

Downloads metadata from PanglaoDB

Gets the unique values of metadata entities and writes those to text files.

Parameters:

Name Type Description Default
data_urls dict(list

A dictionary with the urls of each metadata file.

required

Returns:

Name Type Description
tuple

A tuple containing the DataFrames tissues, genes, cells_organs_germlayers and cells_w_descriptions.

Source code in wikidata_panglaodb/pre.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def downloads_panglao(data_urls):
    """Downloads metadata from PanglaoDB

    Gets the unique values of metadata entities and writes 
    those to text files.

    Args:
        data_urls (dict(list)): A dictionary with the urls of each metadata file.

    Returns:
        tuple: A tuple containing the DataFrames tissues, genes, cells_organs_germlayers
            and cells_w_descriptions.
    """

    tissues = pd.read_csv(
        data_urls["tissues"], usecols=[2, 4], names=["tissue", "species"]
    ).drop_duplicates()

    genes = pd.read_csv(data_urls["genes"], names=["ensg_panglao", "symbol"])
    genes["species"] = np.where(
        genes["ensg_panglao"].str.startswith("ENSMUS"), "Mus musculus", "Homo sapiens"
    )

    cells_organs_germlayers = pd.read_csv(
        data_urls["cells_organs_germlayers"],
        names=["cell_type", "germ_layer", "organ"],
    )

    cells_w_descriptions = pd.read_csv(
        data_urls["cells_w_descriptions"],
        names=["cell_type", "description", "synonyms"],
    )

    return tissues, genes, cells_organs_germlayers, cells_w_descriptions

reconcile_more_types()

Reconcile dataframe column against one type QID or more

This functions loops through all qids given in the type_qids list and reconciles the pandas column to them, returning a concatenated dataframe with all the matches.

Parameters:

Name Type Description Default
dataframe_column Series

A pandas dataframe column with the values to reconcile.

required
type_qids list

A list of the QIDs value you want to reconcile against.

required

Returns:

Name Type Description
DataFrame

A dataframe containing all possible matches for each item type.

Source code in wikidata_panglaodb/pre.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def reconcile_more_types(dataframe_column, type_qids):
    """Reconcile dataframe column against one type QID or more

    This functions loops through all qids given in the type_qids list and reconciles the pandas
    column to them, returning a concatenated dataframe with all the matches.

    Args:
        dataframe_column (Series): A pandas dataframe column with the values to reconcile.
        type_qids (list): A list of the QIDs value you want to reconcile against.

    Returns: 
        DataFrame: A dataframe containing all possible matches for each item type.

    """

    all_matches = []
    for type in type_qids:
        try:
            current = reconcile(dataframe_column, type_id=type, top_res="all")
            all_matches.append(current)
        except Exception:
            pass
    try:
        full_df_matches = pd.concat(all_matches)
    except ValueError:
        pass
    else:
        return full_df_matches