| Title: | BLAST and Sequence Analysis Tools |
| Version: | 0.1.1 |
| Description: | Description: Provides streamlined tools for retrieving sequences from NCBI, performing sequence alignments (pairwise and multiple), and building phylogenetic trees. Implements the Needleman-Wunsch algorithm for global alignment (Needleman & Wunsch (1970) <doi:10.1016/0022-2836(70)90057-4>), Smith-Waterman for local alignment (Smith & Waterman (1981) <doi:10.1016/0022-2836(81)90087-5>), and Neighbor-Joining for tree construction (Saitou & Nei (1987) <doi:10.1093/oxfordjournals.molbev.a040454>). |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | rentrez, ape, Biostrings, dplyr, tibble |
| Suggests: | msa, pwalign, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| URL: | https://github.com/loukesio/blastar |
| BugReports: | https://github.com/loukesio/blastar/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-01-09 20:09:51 UTC; theodosiou |
| Author: | Loukas Theodosiou [aut, cre] |
| Maintainer: | Loukas Theodosiou <theodosiou@evolbio.mpg.de> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-14 18:00:22 UTC |
Align DNA Sequences (Pairwise or Multiple)
Description
This function takes a tibble with a "sequence" column (and optional "accession" names) and performs either a pairwise alignment between two sequences or a multiple sequence alignment (MSA) across all.
Usage
align_sequences(
df,
method = c("pairwise", "msa"),
pairwise_type = "global",
msa_method = "ClustalOmega",
seq_indices = c(1, 2)
)
Arguments
df |
A tibble or data.frame containing at least:
|
method |
One of:
|
pairwise_type |
For pairwise only, alignment type: "global" (Needleman–Wunsch), "local" (Smith–Waterman), or "overlap". |
msa_method |
For MSA only, method name: "ClustalOmega", "ClustalW", or "Muscle". |
seq_indices |
Integer vector of length 2; indices of the two sequences to align when
|
Value
If method="pairwise", a list with:
-
alignment: aPairwiseAlignmentsSingleSubjectobject -
pid: percent identity (numeric) Ifmethod="msa", an object of classMsaDNAMultipleAlignmentor similar.
Examples
# Pairwise alignment example (requires pwalign package)
if (requireNamespace("pwalign", quietly = TRUE)) {
data <- data.frame(
accession = c("seq1", "seq2"),
sequence = c("ACGTACGTACGT", "ACGTACGTTTGT"),
stringsAsFactors = FALSE
)
res_pw <- align_sequences(
df = data,
method = "pairwise",
pairwise_type = "global"
)
res_pw$pid
}
# Multiple sequence alignment (requires msa package)
if (requireNamespace("msa", quietly = TRUE)) {
data_msa <- data.frame(
accession = c("seq1", "seq2", "seq3"),
sequence = c("ATGCATGC", "ATGCTAGC", "ATGGATGC")
)
res_msa <- align_sequences(data_msa, method = "msa", msa_method = "ClustalOmega")
print(res_msa)
}
Build a Neighbor-Joining tree from a multiple sequence alignment
Description
This function takes a Multiple Sequence Alignment (MSA) object (e.g., output of
align_sequences(method = "msa")) and generates a Neighbor-Joining (NJ) tree.
Usage
build_nj_tree(msa, model = "raw", pairwise.deletion = TRUE)
Arguments
msa |
A multiple alignment object (class |
model |
Evolutionary model for distance calculation passed to |
pairwise.deletion |
Logical. If TRUE, compute distances with pairwise deletion |
Value
An object of class phylo (NJ tree)
Examples
# Build NJ tree from multiple sequence alignment (requires msa package)
if (requireNamespace("msa", quietly = TRUE)) {
# Create example sequences
df <- data.frame(
accession = c("seq1", "seq2", "seq3"),
sequence = c("ATGCATGC", "ATGCTAGC", "ATGGATGC")
)
# Generate MSA
msa_result <- align_sequences(df, method = "msa", msa_method = "ClustalOmega")
# Build NJ tree
tree <- build_nj_tree(msa_result, model = "raw")
print(tree)
}
Fetch Metadata (and optionally sequence ranges) from NCBI
Description
Fetch Metadata (and optionally sequence ranges) from NCBI
Usage
fetch_metadata(accessions, db = c("nuccore", "protein"), seq_range = NULL)
Arguments
accessions |
Character vector of accession numbers. |
db |
Either "nuccore" or "protein". |
seq_range |
Either:
|
Value
A tibble with columns
accession, accession_version, title, organism, sequence
Examples
# Fetch metadata for a nucleotide sequence
result <- fetch_metadata("NM_000546", db = "nuccore")
# Fetch specific sequence range (positions 1-100)
result_range <- fetch_metadata("NM_000546", db = "nuccore", seq_range = c(1, 100))
# Fetch multiple accessions
result_multi <- fetch_metadata(c("NM_000546", "NM_001126"), db = "nuccore")