This function processes a tibble containing accession and accession_source columns. It retrieves data from the UniProt API for rows with accession_source == "UniProt" and overwrites or creates the entry_name, protein_name, and gene_name columns only if the parsed values are not NULL or NA.
Usage
process_tibble_uniprot(
data,
accession_col = "accession",
accession_source_col = "accession_source",
entry_name_col = "entry_name",
protein_name_col = "protein_name",
gene_name_col = "gene_name"
)Arguments
- data
A tibble containing at least accession and accession_source columns.
- accession_col
The column name for accession numbers (default: "accession").
- accession_source_col
The column name for accession sources (default: "accession_source").
- entry_name_col
The column name for entry names (default: "entry_name").
- protein_name_col
The column name for protein names (default: "protein_name").
- gene_name_col
The column name for gene names (default: "gene_name").
Examples
# Example usage:
# \donttest{
# Load necessary library
library(tibble)
# Reduced example data as an R tibble
test_data <- tibble::tibble(
id = c(1, 78, 83, 87),
species = c("mouse", "mouse", "rat", "mouse"),
sample_type = c("brain", "brain", "brain", "brain"),
accession = c("O88737", "O35927", "Q9R064", "P51611"),
accession_source = c("OtherDB", "UniProt", "UniProt", "UniProt"),
entry_name = c("BSN_MOUSE", NA, "GORS2_RAT", NA),
protein_name = c("Protein bassoon", NA, "Golgi reassembly-stacking protein2", NA),
gene_name = c("Bsn", NA, "Gorasp2", NA)
)
# Process the tibble
result_data <- process_tibble_uniprot(test_data)
#> Skipping non-UniProt row 1
#> Processing accession: O35927
#> ℹ Sending request to UniProt for accession: O35927
#> ✔ Successfully retrieved data for O35927
#> ℹ Parsing UniProt data...
#> ✔ Entry name retrieved: CTND2_MOUSE
#> ✔ Protein name retrieved: Catenin delta-2
#> ✔ Gene name retrieved: Ctnnd2
#> ℹ Parsing completed.
#> Data updated for row 2
#> Processing accession: Q9R064
#> ℹ Sending request to UniProt for accession: Q9R064
#> ✔ Successfully retrieved data for Q9R064
#> ℹ Parsing UniProt data...
#> ✔ Entry name retrieved: GORS2_RAT
#> ✔ Protein name retrieved: Golgi reassembly-stacking protein 2
#> ✔ Gene name retrieved: Gorasp2
#> ℹ Parsing completed.
#> Data updated for row 3
#> Processing accession: P51611
#> ℹ Sending request to UniProt for accession: P51611
#> ✔ Successfully retrieved data for P51611
#> ℹ Parsing UniProt data...
#> ✔ Entry name retrieved: HCFC1_MESAU
#> ✔ Protein name retrieved: Host cell factor 1
#> ✔ Gene name retrieved: HCFC1
#> ℹ Parsing completed.
#> Data updated for row 4
#> ✔ Processing completed for all rows.
# Compare the original and processed tibbles
compare_tibbles_uniprot(test_data, result_data)
#> Row 1: No changes detected.
#> Row 2: entry_name updated from NA to CTND2_MOUSE
#> Row 2: protein_name updated from NA to Catenin delta-2
#> Row 2: gene_name updated from NA to Ctnnd2
#> Row 3: protein_name updated from Golgi reassembly-stacking protein2 to Golgi
#> reassembly-stacking protein 2
#> Row 4: entry_name updated from NA to HCFC1_MESAU
#> Row 4: protein_name updated from NA to Host cell factor 1
#> Row 4: gene_name updated from NA to HCFC1
#> Comparison completed.
#> [1] "Row 1: No changes detected."
#> [2] "Row 2: entry_name updated from NA to CTND2_MOUSE"
#> [3] "Row 2: protein_name updated from NA to Catenin delta-2"
#> [4] "Row 2: gene_name updated from NA to Ctnnd2"
#> [5] "Row 3: protein_name updated from Golgi reassembly-stacking protein2 to Golgi reassembly-stacking protein 2"
#> [6] "Row 4: entry_name updated from NA to HCFC1_MESAU"
#> [7] "Row 4: protein_name updated from NA to Host cell factor 1"
#> [8] "Row 4: gene_name updated from NA to HCFC1"
#> [9] "Comparison completed."
# }