Item quality module
summarize_matches()
Return a simple summary table for item matches regardless of category
Returns a summary table in the following structure
n_unique_matches is the number of uniquely named input values that matched perc_matched is the percentage of uniquely named input values that matched n_item_matches is the number of unique items they matched against, including duplicate concepts
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dfs_list |
list(DataFrame
|
A dataframe list of the histology related reconciled tables (cells, tissues and organs) |
required |
total_list |
list(int
|
A list of the number of original input values. |
required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
A summary table for the matches. |
Source code in wikidata_panglaodb/quality.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
summarize_histology()
Return a summary table for item matches in the histology category
Returns a summary table in the following structure
n_unique_matches is the number of uniquely named input values that matched perc_matched is the percentage of uniquely named input values that matched n_item_matches is the number of unique items they matched against, including duplicate concepts how_many_perfect_matches is the percentage of items matched that got a perfect 100.0 score in the reconciliation how_many_no_p31 is the percentage of items matched that don't have an 'instance of' property
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dfs_list |
list(DataFrame
|
A dataframe list of the histology related reconciled tables (cells, tissues and organs) |
required |
total_list |
list(int
|
A list of the number of original input values. |
required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
A summary table for the matches. |
Source code in wikidata_panglaodb/quality.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
get_number_of_statements_for_items()
Return a pandas dataframe of items and their number of statements
This function takes in a list of QIDs and uses the Wikibase API to return a table with the number of statements each item has.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
qid_list |
list
|
A list containing the QIDs you want to analyse. |
required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
A dataframe of two columns, one for the input QIDs, another with the number of statements for each QID. |
Source code in wikidata_panglaodb/quality.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
get_genes_item_quality()
Changes ID columns to binary values
Changes the alternative id columns in the reconciled gene data to binary values, also drops unecessary columns if drop == True.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
Reconciled gene data. |
required |
drop |
bool
|
Wether or not to drop the other columns. |
True
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
Simplified dataframe for plotting. |
Source code in wikidata_panglaodb/quality.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
aggregate_altID_data()
Summarize alternative IDs for final bar plot
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe |
DataFrame
|
With either the histological or gene data. |
required |
group |
list
|
Columns to use for the groupby. |
required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
Summarized counts and percentages. |
Source code in wikidata_panglaodb/quality.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
|