Skip to contents

This function processes a tibble containing accession and accession_source columns. It retrieves data from the UniProt API for rows with accession_source == "UniProt" and overwrites or creates the entry_name, protein_name, and gene_name columns only if the parsed values are not NULL or NA.

Usage

process_tibble_uniprot(
  data,
  accession_col = "accession",
  accession_source_col = "accession_source",
  entry_name_col = "entry_name",
  protein_name_col = "protein_name",
  gene_name_col = "gene_name"
)

Arguments

data

A tibble containing at least accession and accession_source columns.

accession_col

The column name for accession numbers (default: "accession").

accession_source_col

The column name for accession sources (default: "accession_source").

entry_name_col

The column name for entry names (default: "entry_name").

protein_name_col

The column name for protein names (default: "protein_name").

gene_name_col

The column name for gene names (default: "gene_name").

Value

A tibble with UniProt data processed.

Examples

# Example usage:
# \donttest{
# Load necessary library
library(tibble)

# Reduced example data as an R tibble
test_data <- tibble::tibble(
  id = c(1, 78, 83, 87),
  species = c("mouse", "mouse", "rat", "mouse"),
  sample_type = c("brain", "brain", "brain", "brain"),
  accession = c("O88737", "O35927", "Q9R064", "P51611"),
  accession_source = c("OtherDB", "UniProt", "UniProt", "UniProt"),
  entry_name = c("BSN_MOUSE", NA, "GORS2_RAT", NA),
  protein_name = c("Protein bassoon", NA, "Golgi reassembly-stacking protein2", NA),
  gene_name = c("Bsn", NA, "Gorasp2", NA)
)

# Process the tibble
result_data <- process_tibble_uniprot(test_data)
#> Skipping non-UniProt row 1
#> Processing accession: O35927
#>  Sending request to UniProt for accession: O35927
#>  Successfully retrieved data for O35927
#>  Parsing UniProt data...
#>  Entry name retrieved: CTND2_MOUSE
#>  Protein name retrieved: Catenin delta-2
#>  Gene name retrieved: Ctnnd2
#>  Parsing completed.
#> Data updated for row 2
#> Processing accession: Q9R064
#>  Sending request to UniProt for accession: Q9R064
#>  Successfully retrieved data for Q9R064
#>  Parsing UniProt data...
#>  Entry name retrieved: GORS2_RAT
#>  Protein name retrieved: Golgi reassembly-stacking protein 2
#>  Gene name retrieved: Gorasp2
#>  Parsing completed.
#> Data updated for row 3
#> Processing accession: P51611
#>  Sending request to UniProt for accession: P51611
#>  Successfully retrieved data for P51611
#>  Parsing UniProt data...
#>  Entry name retrieved: HCFC1_MESAU
#>  Protein name retrieved: Host cell factor 1
#>  Gene name retrieved: HCFC1
#>  Parsing completed.
#> Data updated for row 4
#>  Processing completed for all rows.

# Compare the original and processed tibbles
compare_tibbles_uniprot(test_data, result_data)
#> Row 1: No changes detected.
#> Row 2: entry_name updated from NA to CTND2_MOUSE
#> Row 2: protein_name updated from NA to Catenin delta-2
#> Row 2: gene_name updated from NA to Ctnnd2
#> Row 3: protein_name updated from Golgi reassembly-stacking protein2 to Golgi
#> reassembly-stacking protein 2
#> Row 4: entry_name updated from NA to HCFC1_MESAU
#> Row 4: protein_name updated from NA to Host cell factor 1
#> Row 4: gene_name updated from NA to HCFC1
#> Comparison completed.
#> [1] "Row 1: No changes detected."                                                                               
#> [2] "Row 2: entry_name updated from NA to CTND2_MOUSE"                                                          
#> [3] "Row 2: protein_name updated from NA to Catenin delta-2"                                                    
#> [4] "Row 2: gene_name updated from NA to Ctnnd2"                                                                
#> [5] "Row 3: protein_name updated from Golgi reassembly-stacking protein2 to Golgi reassembly-stacking protein 2"
#> [6] "Row 4: entry_name updated from NA to HCFC1_MESAU"                                                          
#> [7] "Row 4: protein_name updated from NA to Host cell factor 1"                                                 
#> [8] "Row 4: gene_name updated from NA to HCFC1"                                                                 
#> [9] "Comparison completed."                                                                                     
# }