◀ Back to blog

AMR Data Analysis Using Stata

Posted: 22 December 2021
Author: Julhas Sujan

About AMR and Stata

The IBM Stata software will help us to analyze the antimicrobial resistance data. We can use cross tabulation, pearson's chi-squared test, bar, pie, line, box, histogram, and regression analysis to determine frequency distributions.

Outine

  • Stata installtion
  • Import CSV data
  • Data cleaning
  • Frequency check
  • Number of isolates count by year
  • Cross tabulation
  • Pearson's Chi-squared test
  • Plotting results
  • Yearly antibiotics by organism

1. Stata installation

Visit the following website and follow the installation steps: https://julhas.com/jsedutech/materials/Level-1/Stata-Session-1.pdf

2. Import CSV/ Excel data

Visit the following website and follow the steps: https://julhas.com/jsedutech/materials/Level-1/Stata-Session-3.pdf

Output:


3. Data management

Visit the following website and follow the steps: https://julhas.com/jsedutech/materials/Level-1/Stata-Session-6.pdf

4. Frequency check

Frequency check - Gender, Year, Sample, Organism, Age group: A frequency distribution is a list, table or graph that displays the frequency of various outcomes in a sample.

Commands: 

tab sex
tab year
tab sample
tab organism

Output: 
. count
  23,411

. tab sex
        Sex |      Freq.     Percent        Cum.
------------+-----------------------------------
          f |     13,568       57.96       57.96
          m |      9,843       42.04      100.00
------------+-----------------------------------
      Total |     23,411      100.00

tab year
       Year |      Freq.     Percent        Cum.
------------+-----------------------------------
       2017 |      6,878       29.38       29.38
       2018 |      8,072       34.48       63.86
       2019 |      8,461       36.14      100.00
------------+-----------------------------------
      Total |     23,411      100.00

. tab sample
     Sample |      Freq.     Percent        Cum.
------------+-----------------------------------
      Blood |      1,214        5.19        5.19
        Pus |      5,753       24.57       29.76
     Sputum |      1,079        4.61       34.37
      Stool |          2        0.01       34.38
      Urine |     12,814       54.73       89.11
 Wound Swab |      2,549       10.89      100.00
------------+-----------------------------------
      Total |     23,411      100.00

. tab organism

             Organism |      Freq.     Percent        Cum.
----------------------+-----------------------------------
     Escherichia coli |     11,887       50.78       50.78
       Klebsiella sp. |      7,709       32.93       83.70
Staphylococcus aureus |      3,815       16.30      100.00
----------------------+-----------------------------------
                Total |     23,411      100.00


5. Number of isolates count by year

Commands: 

count if year == 2019
tab amk if year == 2017

Output: 

. count if year == 2019
8,461

. tab amk if year == 2017

        AMK |      Freq.     Percent        Cum.
------------+-----------------------------------
          I |          2        0.03        0.03
          R |        999       14.91       14.94
          S |      5,697       85.06      100.00
------------+-----------------------------------
      Total |      6,698      100.00

6. Cross tabulation

Cross tabulation: It is used to quantitatively analyze the relationship between multiple variables.

Commands: 
tab sex year 
tab organism year, col nofreq // Display only percentage
tab sample year, col // Number and percentage

Output: 

. tab sex year 

           |               Year
       Sex |      2017       2018       2019 |     Total
-----------+---------------------------------+----------
         f |     4,038      4,602      4,928 |    13,568 
         m |     2,840      3,470      3,533 |     9,843 
-----------+---------------------------------+----------
     Total |     6,878      8,072      8,461 |    23,411 

. tab organism year, col nofreq // Display only percentage

                      |               Year
             Organism |      2017       2018       2019 |     Total
----------------------+---------------------------------+----------
     Escherichia coli |     54.04      49.55      49.28 |     50.78 
       Klebsiella sp. |     28.58      34.64      34.83 |     32.93 
Staphylococcus aureus |     17.37      15.81      15.88 |     16.30 
----------------------+---------------------------------+----------
                Total |    100.00     100.00     100.00 |    100.00 

. tab sample year, col // Number and percentage

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

           |               Year
    Sample |      2017       2018       2019 |     Total
-----------+---------------------------------+----------
     Blood |       361        399        454 |     1,214 
           |      5.25       4.94       5.37 |      5.19 
-----------+---------------------------------+----------
       Pus |     1,578      1,979      2,196 |     5,753 
           |     22.94      24.52      25.95 |     24.57 
-----------+---------------------------------+----------
    Sputum |       336        376        367 |     1,079 
           |      4.89       4.66       4.34 |      4.61 
-----------+---------------------------------+----------
     Stool |         0          0          2 |         2 
           |      0.00       0.00       0.02 |      0.01 
-----------+---------------------------------+----------
     Urine |     3,993      4,256      4,565 |    12,814 
           |     58.05      52.73      53.95 |     54.73 
-----------+---------------------------------+----------
Wound Swab |       610      1,062        877 |     2,549 
           |      8.87      13.16      10.37 |     10.89 
-----------+---------------------------------+----------
     Total |     6,878      8,072      8,461 |    23,411 
           |    100.00     100.00     100.00 |    100.00 

7. Pearson's Chi-squared test

Pearson's Chi-squared test: Pearson's chi-squared test (χ2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance.

Commands: 
tabulate ageGroup1 sample, chi2
tabulate organism sample, chi2

Output: 

. tabulate ageGroup1 sample, chi2

            |                              Sample
  ageGroup1 |     Blood        Pus     Sputum      Stool      Urine  Wound S.. |     Total
------------+------------------------------------------------------------------+----------
  <=4 Years |        47         66          0          1        241        202 |       557 
 5-14 Years |        21         54          2          0        103        347 |       527 
15-24 Years |        27        120         23          1        257        329 |       757 
25-34 Years |        34        299         41          0        731        328 |     1,433 
35-44 Years |       117        818         91          0      1,388        275 |     2,689 
45-54 Years |       247      1,580        215          0      2,792        404 |     5,238 
55-64 Years |       337      1,636        299          0      3,503        392 |     6,167 
>= 65 Years |       384      1,180        408          0      3,799        272 |     6,043 
------------+------------------------------------------------------------------+----------
      Total |     1,214      5,753      1,079          2     12,814      2,549 |    23,411 

         Pearson chi2(35) =  3.9e+03   Pr = 0.000

. tabulate organism sample, chi2

                      |                              Sample
             Organism |     Blood        Pus     Sputum      Stool      Urine  Wound S.. |     Total
----------------------+------------------------------------------------------------------+----------
     Escherichia coli |       640      1,041        105          2      9,800        299 |    11,887 
       Klebsiella sp. |       351      2,299        848          0      2,508      1,703 |     7,709 
Staphylococcus aureus |       223      2,413        126          0        506        547 |     3,815 
----------------------+------------------------------------------------------------------+----------
                Total |     1,214      5,753      1,079          2     12,814      2,549 |    23,411 

         Pearson chi2(10) =  9.9e+03   Pr = 0.000


8. Plotting results


// Histogram 
histogram age

// Box plot
graph box age

// Bar graph 
graph bar (count), over(ageGroup1) stack

// Bar graph with more than one categorical variables
graph bar (count), over(ageGroup1) over(sex) stack

// Pie chart
graph pie, over(sex)
graph pie, over(sex) title(Gender distribution)
graph pie, over(sex) title(Gender distribution) legend(on) scheme(s2color)

Output: 
 

9. Yearly antibiotics by organism

Commands: 
tab amc year if organism == "Escherichia coli", col nofreq
tab amc year if organism == "Klebsiella sp.", col nofreq
tab amc year if organism == "Staphylococcus aureus", col

Output: 
. tab amc year if organism == "Escherichia coli", col nofreq

           |               Year
       AMC |      2017       2018       2019 |     Total
-----------+---------------------------------+----------
         R |     75.14      70.07      73.57 |     72.84 
         S |     24.86      29.93      26.43 |     27.16 
-----------+---------------------------------+----------
     Total |    100.00     100.00     100.00 |    100.00 


. tab amc year if organism == "Klebsiella sp.", col nofreq

           |               Year
       AMC |      2017       2018       2019 |     Total
-----------+---------------------------------+----------
         R |     68.89      71.05      70.59 |     70.33 
         S |     31.11      28.95      29.41 |     29.67 
-----------+---------------------------------+----------
     Total |    100.00     100.00     100.00 |    100.00 


. tab amc year if organism == "Staphylococcus aureus", col

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

           |               Year
       AMC |      2017       2018       2019 |     Total
-----------+---------------------------------+----------
         R |       412        446        530 |     1,388 
           |     38.98      36.44      40.30 |     38.60 
-----------+---------------------------------+----------
         S |       645        778        785 |     2,208 
           |     61.02      63.56      59.70 |     61.40 
-----------+---------------------------------+----------
     Total |     1,057      1,224      1,315 |     3,596 
           |    100.00     100.00     100.00 |    100.00