/datum/pathogendna

Pathogen DNA
The pathogen DNA sequence consists of two separately handled parts:
- The "private" sequence is basically a sequence that contains
- the unique identifier of the microbody of the pathogen, which must be defined in the /datum/microbody subclass. the pathogen controller contains a lookup table of the existing microbody IDs for fast lookup, reducing search times from O(n) to O(1).
- the numeric values of the pathogen, sequentially, as two byte signed integers, which means each numeric value is encoded in the DNA sequence as four hexadecimal digits
- a single digit signifying the amount of stages of a pathogen
- a single digit signifying if the pathogen is symptomatic at the time of writing this documentation, the sequence of the numeric values is: advance speed | suppression threshold | stages EXAMPLE: A pathogen that is a virus, with an advance speed of 5 and a suppression threshold of 5, 5 stages and symptomatic carries the following DNA sequence: 00010005000551 DEVELOPER NOTE: The encoding is NOT two's complement as any coder would expect, due to the fact that I have no idea how BYOND numbers are represented. The encoding of the two bytes is 2 byte one's complement. (I think. Whatever.) NOTE: While any of these values are highly unlikely to ever pass 255, I'll leave it open for two bytes. This private sequence is only ever printed and cannot be directly spliced. Modifying the numeric values is done through seed mutation logic. Since this part of the DNA is never directly modified, this part is always calculated from the numeric values.
- The "public" sequence is the sequence that contains
- The suppressant of the pathogen. Each suppressant is given a round-randomized 3 quartet (1.5 byte) unique identifier by the pathogen controller upon round setup. A lookup table of available suppressants is available in the pathogen controller for fast processing.
- A single separator to signal the end of suppressants and the beginning of carriers.
- All carriers, sequentially. Carriers are assigned 3 quartets as well, and they have their own lookup table. (Note: Carriers are not actually used or coded or anything.)
- Anything I may have forgot to mention is also here.
- A single separator to signal the beginning of the symptoms.
- All symptoms of the pathogen sequentially.
A symptom is composed of R * 3 quartets, where R represents rarity, followed by a 'DNA separator' marked by a | in the DNA.
Rarity is a value from 1 to 5, where 1 is VERY COMMON and 5 is VERY RARE. All symptoms are assigned this value as their unique identifier and an unique identifier is generated for
each symptom at round setup following this pattern:
- All VERY_COMMON symptoms are assigned a symptom-unique round randomized 3 quartet (eg. 1F3 is now sweating and EE2 is now farting)
- All symptoms of rarity R (where R is the next lowest rarity category not yet processed) are assigned a 3R quartet. First, a set of available identifiers is generated by taking
each identifier for rarity R-1 symptoms, and prepending and appending the identifier of all VERY_COMMON symptoms. This will generate a moderate amount of collisions (for example
prepeding 1F3 to EE20AB yields the same as appending 0AB to 1F3EE2), which are then eliminated. Then, rarity R symptoms are each assigned a randomly pick()-ed identifier from this
list. If, due to an inbalance in the amount of symptoms available there is no ID that is left avaiable and the sequence will be a randomly generated 3R quartet. This, of course,
means that the symptom cannot be synthesized via pathology science that round. Tough luck.
At the time of writing this documentation, the numbers add up and all symptoms should be synthesizable. At current time, it is also hard to synthesize a VERY_RARE symptom. which
is intended and a good thing.
The DNA separators are a sort of 'resource' available to pathologists, although infinite of them is produceable through replicating DNA. If someone can come up with any reasonable way
to limit DNA separators, I am open to implementing it.
What is a DNA separator though? Well, as the above algorithm shows, the identifier sequence of each high tier symptom is composed of VALID sequences for lower tier symptoms. To be able
to determine which symptom is supposed to be a higher tier one, symptoms must be separated by a
magical nucleic acidwe call the DNA separator. This means that a pathogen of K symptoms will contain K-1 DNA separators. The pathogen controller will contain a symptom lookup table and an inverse table for looking up the numeric identifier. Due to the nature of this, we are limited to 4096 (2^12, 3 quartets) VERY_COMMON symptoms. Oh the horror. This part of the DNA is modifiable through splicing. During a splicing session, you can scrap parts of the DNA and introduce new parts from already existing DNA. Once the splicing session is complete, the DNA is evaluated (compiled) and destroyed if it contains an invalid sequence (such as an attempt to create a tier three symptom, but no tier three symptom having that specific DNA sequence). I believe pathologists will be kept busy by making theirfancy symptomsall round, and it's not as straightforward as chemistry or as boring as genetics. EXAMPLE: Suppose that sweating is 1F3, farting is EE2, coughing is sweating + farting (1F3EE2) and the heat weakness suppressant is 0C3. The DNA sequence for a pathogen with the above three symptoms, no carriers and heat weakness would be: 0C3||1F3|EE2|1F3EE2 During splicing, moving could only be done by moving coherent parts. A coherent part is a 3 quartet beginning at a 3 quartet boundary or after a DNA separator. This means one could move 1F3 or EE2 out of 1F3EE2 but not 3EE or F3E. A single DNA separator is also a coherent part. A coherent part could only be moved to a boundary (ie. inserted between two existing coherent parts) EXAMPLE: Suppose that no symptom gained the unique identifier of farting + sweating (EE21F3). A pathologist splicing the above DNA sequence into 0C3||1F3|EE21F3 would be in for a nasty surprise, as the DNA collapses, due to EE21F3 being an invalid sequence. Due to the nature of the rolling unique identifiers, of course it does not mean EE21F3AB0 couldn't be a valid sequence, as a symptom with the sequence 1F3AB0 could exist, and a higher tier symptom could have EE2 prepended to this. This means that only ELEMENTARY (3 quartet starting at boundary) subsets of a symptom's unique identifier is also guaranteed to be a valid unique identifier.