Number of Acyclic Alkane Isomers

Introduction

At the organic chemistry lecture sometimes this question arises: how to find out how many isomers does have an open-chain alkane CnH2n+2? The question has no simple answer. Despite some youtube chemistry teachers offering a ready magic formula, the fact is that the number of isomers has be count separately for each n. In practice, this means generating every possible isomer for a given number of C-atoms. Please refer to the proffesional sources if you need the specific details covering this issue [1,2].

The Goal

I wanted to design a simple and comprehensible algorithm for the demonstration purposes. The algorithm should allow counting all possible alkane isomers and could be recreated, modified and employed by users without a deep knowledge of the mathematical sciences. To simplify the task, the stereoisomers will be ignored.

The Method 

In order to calculate all combination of the isomers, I have built the logical scheme shown at the flowchart (Fig. 1) and described below.
Fig. 1. Generation of every possible isomer with 4 carbon atoms from propane and filtering the set. 


1. Starting from propane (it is the largest alkane having one single isomer), add a methyl group to the already existing carbon atom of the propane molecule. This new structure is saved into a set. Then iterate through each C-atom in propane, each time producing a new isomer and stacking it to the set of structures.
Although it is quite demanding in terms of the calculation time, this way we will be sure that no isomer has been skipped. 
2. The obtained set is reduced to contain only unique structures. There is only two isomers for butane. 
3. Now to calculate all isomers of the next homologue, pentane, both isobutane and n-butane are treated as above, producing two sets which are combined into a single set and the latter is filtered to remove the duplicates.
4. This procedure is repeated until the desired alkane size is reached.

Implementation

The starting point for the searching of the appropriate method was the question: how to represent and store an alkane molecule in the computer memory? After a few unsuccessfull attempts, I have accidentally found the library igraph in R and could not help noticing that graphs are strikingly resembling chemical structures. For instance, the following code will produce the isopentane structure (2-methylbutane)[3]:

library(igraph)
g <- make_graph(c(1, 2, 2, 3, 2, 4, 3, 5), directed = FALSE)
plot(g, vertex.label=NA, vertex.size = 25)

Fig. 2. A graph plotted in R representing the chemical structure of 2-methyl butane 

Library igraph has been created to deal with the mathematical graphs. In a nutshell, a graph is a network of points (vertices) connected to each other. Each connection between two vertices is called an "edge". To learn more about graphs please be referred to Wikipedia and relevant literature. 
For the method being described here, I have found that the following functions of the igraph package are particularly useful:
  1. graph.edgelist() creates a graph (molecule) from a 2D array of edge pairs; the argument directed = FALSE creates an undirected graph, which is the most suitable for our purpose
  2. gorder() outputs the number of all vertices of a graph (number of carbon atoms in a molecule)
  3. graph.isomorphic(g1, g2) is a particularly useful function. This boolean function returns TRUE if  two graphs are representing the same chemical structure (or in graph terms: two graphs are isomorphic) and FALSE otherwise.
  4. Finally, the construction length(E(g)[[inc(vert)]]) will give us the number of edges of a vertex vert in the graph g (this corresponds to the number of C-C bonds a particular carbon has). This construction is required to ensure that no carbon will have more that 4 C-C bonds.
All the further R functions derive from the base package and their usage can be found in the official documentation or elsewhere.

The Algorithm

The first helper function add_carbon() receives a graph, adds a new C-atom to every C-atom of the mother structure and outputs a raw list of isomers:

add_carbon <- function(alkane){
  g <- graph.edgelist(alkane, directed = FALSE)
  next_index <- gorder(g) + 1
  ncarb <- paste0("C", as.character(next_index))
  output_list <- list()
  for (j in 1:(next_index-1)) {
   jcarb <- paste0("C", as.character(j))
   if (length(E(g)[[inc(jcarb)]]) < 4){
      jmat <- matrix(c(alkane[,1], jcarb, alkane[,2], ncarb), ncol=2)
      output_list[[length(output_list) + 1]] <- jmat
    }
  }
   return(output_list)
}

The next helper function check_nth_elem() checks whether the nth element of the list has duplicates:

# checking for isomorphism
# removes all isomorphes to a single element
 check_nth_elem <- function(n, graph_list) {
  res <- sapply(graph_list, function(x) `!`(graph.isomorphic(x, graph_list[[n]]) ))
  res[n] <- TRUE
  graph_list <- graph_list[res]
  return(graph_list)  
  }

Yet another function check_all() filters the duplicates out of the list and outputs the tidy list:

 check_all_elem <- function(graph_list){
 a <- 1
 continue <- TRUE 
 while (continue){
   graph_list <- check_nth_elem(a, graph_list)
   a <- a + 1
 if (a > length(graph_list))
   {
    continue = FALSE
   } 
 }
 return(graph_list)
 }

The last function creates internally the structure of propane, calls add_carbon(),  filters the list and prints the length of it. This is repeated several times until the size of alkane has reached the value stored in the depth variable.
I have also added a command to print out the current time before printing the size of the list to be able to monitor the calculation time for different alkanes.

generate_isomers_old <- function(depth){
  propane <- matrix(c("C1","C2","C2","C3"), nrow=2)
  iter_list <- list(propane)
  nisomers <- c()
  # iteration
  for (j in 4:depth){  
  new_list <- unlist(lapply(iter_list, add_carbon), recursive = FALSE)
  graphs_clean <- check_all_elem(lapply(new_list, graph.edgelist, directed = FALSE))
  iter_list <- lapply(graphs_clean, function(x) as.matrix(as_data_frame(x)))  
  nisomers <- c(nisomers, length(iter_list))
  print(Sys.time())
  cat(paste("N carbon:",
              as.character(j),
            ";   N isomers",
            as.character(nisomers[length(nisomers)])
  ) ) 
  cat("\n")
  }
  return()
}
To start the calculation of isomers it is enough to write a single command:

generate_isomers_old(10) #put any number you want

Results

The calculation time increases drastically as the alkane size grows. Hexadecane was the largest alkane for which the isomers were counted (Table 1). The obtained results are in a perfect agreement with the numbers of open-chain alkane isomers on Wikipedia[4].

Time, seconds

N carbons

N isomers

<1

4

2

<1

5

3

<1

6

5

<1

7

9

<1

8

18

<1

9

35

1

10

75

4

11

159

18

12

355

126

13

802

1073

14

1858

4379

15

4347

23249

16

10359

Table 1. Calculation times of numbers of open-chain alkanes isomers with 4 to 16 carbon atoms.

Conclusions

Counting of alkane isomers is a non-trivial task which can be solved programmatically. I have successfully demonstrated that using available tools and basic programming knowledge it is possible to obtain the number of isomers of alkanes with 4 to 16 carbon atoms within a reasonable time. To calculate the number of isomers for the higher alkanes the approach could be optimized by (1) choosing a faster programming environment or (2) modifying the code in such a way that would allow to avoid redundant calculations. Lastly, the calculation of stereoisomers requires an additional method that allows to identify the chiral centers.

References

  1. S. Kurz, University of Bayreuth. Anzahl von Strukturisomeren der Alkane (in German).
  2. University of Cambridge, Isomer Count
  3. An excellent R online interpreter could be found at the homepage of Jupyter Project
  4. Wikipedia, "List of straight-chain alkanes"

Glory to Ukraine! 
Слава Україні! Смерть ворогам! 



Comments

Popular posts from this blog

Number of Acyclic Alkane Isomers - Python Version