Exploring DNA Sequence Data Using BLAST — [Bioinformatics Eps.2]
DNA is the main thing that gives all living things their traits and traits. DNA is packed into chromosomes in a certain way.
We will try to look at data about DNA sequences. The study will look at how the DNA of a living thing is set up, and we will look for DNA that has the most similar setup.
BLAST, which stands for Basic Local Alignment Search Tool, is an algorithm and program for comparing primary biological sequence information, such as protein amino acid sequences, DNA nucleotide sequences, and RNA sequences.
The NCBI database has the information we will use.
First, we choose the information that details will use on the NCBI website. Go to the home page and click on the “top search” section..

In the NCBI database, select the nucleotide section

Look for information about nucleotides that will be used in the search field. In this case, the term “sars-cov2” is used to get the genome of the Corona Virus. “Severe acute respiratory syndrome coronavirus 2 reference genome” is the information we get back.


Select the genomic data.

Select the FASTA format to observe the DNA sequence.

To see the DNA sequence comparison, we use the BLAST tool provided. Select the ‘Run BLAST’ option available on the right bar of the page, section ‘Analyze this squence’.

In the BLAST Program, the submitted DNA has been prepared, click the BLAST button to start the process.

After BLAST is run, data will show up with sequences matching the genome the query chose before.

There is a table with several columns in the results that were found. The table shows how much the genome sequences of the data we put in are alike (Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome). For instance, there are facts.
- ‘Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/KAZ/KZ_Almaty/2020, complete genome’, is the coronavirus genome data obtained from Kazastan samples
- ‘Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/UT-UPHL-2102921286/2020, complete genome’, is data obtained from USA samples, etc.
In the result table, there are several columns, let’s discuss them one by one.
- Description. The name of the genome data in the database is shown in this column.
- Scientific Name. This is a column that lists the scientific name of the organism that was used to get the genome data.
- Max Points. The highest value is from the sum of the scores of aligned gene sequences. If the sequences match, the value will be positive, but if there are gaps or mismatches, the value will be zero or negative.
- Total Score. Shows how the gene alignment was scored. It shows how close the pieces of the gene are to each other. The gene snippet is more critical, and like the BLAST data, the higher the max score.
- Cover Query. The amount of the query was the same as the BLAST query. The nucleus base data in the DNA sequence is the question here — the result of comparing the BLAST data’s Accession Length and Query Length.
- E-value. A parameter that says how many hits are “expected” to be found by chance when searching a database of a specific size. It gets worse and worse as the Score (S) of the match goes up. The E-value is a way to describe the random noise in the background. The match is more “significant” when the E-value is low or close to zero.
- Percent Identity. This shows what percentage of the gene data that was BLASTed matches this gene data.
- Acc Length. How long is the query that the data is part of is?
- Accession. A number that points to the record of the sequence and doesn’t change when the sequence changes. This number is an “identifier” in the NCBI database system. The “identifier” is the leading way the database sequence data can be found.
We can explore many other data available in the primary database at NCBI.
That’s all for this episode, & Happy Learning.