About this document
What: a survey, or notes
Status: draft
I am coding an algorithm of my own design. Shared code.
I plan to test either using simulation or TI Launchpads using CC2650 mcu/radios.
Explanation of the title and focus of this document
Distributed: there are many units (processors.) They communicate and cooperate.
Non-centralized: units are the same. They execute the same code. No unit has more resources. Units may take different roles, but they all can take any role. No central unit has complete information.
Sleep synchronization: synchronizing wake periods. Units have sleep and wake periods. More specialized than clock synchronization where units are always awake.
Balanced: all units have the same power. None are powered by an electric utility grid.
Ultra-low power: units use microamps of power on average. They use milliamps when active, but they are duty cycled (sleeping often.) They use harvested energy such as solar energy. They have a small or no battery. They often reboot as energy is exhausted.
Single-hop: all units can communicate directly with all other units. No unit needs to relay or route messages. The network topology is bipartite.
Sensors: the units do not sense external events not synchronized with the network. They can sleep for as long as necessary, without concern for missing sensing external events (that don’t have interrupts.)
Wireless: the units have a radio. Messages can be broadcast. There is contention for shared channels. A radio may not be able to transmit and receive at the same time.
Relevant Literature
Digi Corporation’s proprietary DigiMesh product apparently has a sleep synchronization algorithm. It apparently is a leader-election algorithm. Many details are unavailable?
“Firefly Clock Synchronization in an 802.15.4 Wireless Network”, Leidenfrost and Elmenreich, 2009 describes an algorithm where units can sleep and synchronize. They call it a “reach back firefly” algorithm. Unlike a firefly (where a message is just a one-bit blink at the time of the synchronization point, without content), here a sync message is transmitted at an offset from the synchronization point and has the offset (many bits of information) as content. Like most firefly algorithms, all units send sync messages.
Wireless HART has sleep synchronization, but it is centralized: synchronization propagates from a gateway device that has more resources (a connection to the internet) than other units.
There is much more literature, see later.
Background
In clock synchronization, a network (the units together) maintain a global notion of time. This lets units schedule internal events to occur simultaneously, or timestamp external events. The units communicate to synchronize clocks.
Sleep synchronization means: synchronize the events: unit awaken and unit sleep. Many clock synchronization algorithms assume the units are always awake. It is harder to synchronize when units may sleep.
Time is kept by a Clock. It need not be referenced to an external, real time, such as sidereal (sun) time.
Each unit has a hardware clock, really just a counter.
A GlobalClock keeps the global time. Each unit has a GlobalClock. It is based on the hardware clock. A GlobalClock is just an offset from the hardware clock. Initially the offset is zero. When a unit receives a sync message from a master, a unit can calculate an offset and the local GlobalClock becomes ‘in sync’ with the master’s GlobalClock.
Hardware clocks drift (have slightly different tick rates) and so each unit’s GlobalClock drifts from every other units GlobalClock (especially the master’s.) Thus to maintain an accurate GlobalClock, a unit must periodically receive a sync giving message.
The problem of maintaining the global clock is:
– getting an initial offset from a master (startup)
– periodically receiving offsets to correct for hardware clock drift (maintenance)
The global clock is NOT received from an external source e.g. GPS.
Assumptions
Each unit is duty-cycled: sleeping and waking. When sleeping, unable to receive since mcu and radio are powered off (only a timer to wake the mcu is powered.)
The network is symmetric-powered. No unit has more power than others. No unit is not duty-cycled.
Units are the same, hardware and software. Each unit is capable of assuming any role.
Units have one transceiver that cannot transmit and receive at the same time. A unit cannot detect its own transmission has collided.
There is contention: collisions can occur on the radio channels.
Network topology is totally connected. Each unit can hear all other units (and does so, except for collisions.) No unit relays messages to other units.
Duty-cycle is high (sleeping mostly.) That is, put a high priority on conserving power at the cost of slow execution or missing external events. For example, when energy is harvested and units are small, the power available may be very low, supporting only a very high duty cycle (sleeping most of the time.)
Low performance. The work being done is not critical and extremely low data rate. Don’t care if the work doesn’t get done always. Only care that the work gets done sometimes. Here the work is both the synchronization and some additional coordination or data exchange.
Not Constrained
Not assume that units play equal, same role.
Not assume that the communication is uniform. Some units may play roles having more communication than others.
Not assume that lower layers of the protocol stack, e.g. the MAC layer, support synchronization, e.g. by timestamping messages with clocks used in those layers.
Not assume a sensing network. That is, the network may do something self-contained, without sensing external conditions or events. Because units sleep so often, they probably would miss sensing many external events.
Classes of Algorithms
In “firefly” algorithms, roles are equal. Firefly insects are leaderless.
In “leader election” algorithms, some units play the role of leader (master of the clock.) Opposite of firefly.
In “time triggered protocol” algorithms, units have a common, slotted schedule, a priori, built into the protocol. Not only is the wake period scheduled, but the wake period is divided into slots reserved for certain units and tasks. Eventually, the schedules of different units are synchronized. This is TDMA.
Keywords
This is a list of search terms. In searching, you must cast a wide net, since authors use diverse terms for subject matter that overlaps.
clock synchronization
firefly synchronization
sleep scheduling
distributed algorithms
wireless networks
wireless sensor networks or mobile ad-hoc networks
energy conservation or duty-cycling
clusters or cliques
self-organizing networks
flooding
Leader elections algorithms
Some algorithms form cliques (clusters) of units where units within a clique are synchronized, but cliques not synchronized with each other. The algorithms are said to go through phases of startup and maintenance. In the startup phase, cliques form and merge. Eventually all units are synchronized in one clique, and the algorithm is in the maintenance phase.
The maintenance phase corrects clock drift, and merges new cliques, when units join the network dynamically. Joining units may form new cliques until they synchronize with existing cliques.
General review of concerns in the literature
The general literature has many concerns. Here I briefly discuss those concerns.
Coverage: in multi-hop networks (not bi-partite), insuring that the area covered by the awake network is maximal, so that any external events e.g. are sensed.
Precision: the goal may be more clock precision (e.g. uSec) than is needed just to synchronize sleep (e.g. mSec). This might require support in lower layers such as MAC.
Dynamic: networks in which units (possibly mobile) may join/leave the network.
Fault tolerance: if one unit fails or has inadequate power, the algorithm still keeps or recovers synchronization.
Drift correction: the drift of one hardware clock can be estimated. Drift is generally linear. Thus it can be corrected for. That saves effort in global synchronization.
Correctness: whether you can prove mathematically that the algorithm works: 1. doesn’t suffer deadlock 2. converges to a single synchronized clique.
Routing under energy constraints: when the network is multi-hop (not bipartite), routing through nodes having the most energy reserves.
Flooding: spreading a message quickly and with minimum transmissions (and/or energy.)
General review of the history of the literature
Radio has existed for a century, but it was controlled by people and transmitted analog messages. Computer control of radios is relatively new.
Clock synchronization has been studied for centuries.
DARPA explored digital, packet-switched, computer controlled radio networks circa 1970 in the “ALOHA net”.
Radio networks where energy must be conserved and units sleep often is still a subject of research, with many recent papers.
Review of Recent Literature
Most papers use a capitalized acronym to describe their algorithm/protocol.
“Firefly Clock Synchronization in an 802.15.4 Wireless Network”
Discusses an algorithm that is both firefly and time triggered. It is reachback: sync messages have an offset. It is firefly: all units transmit sync messages.
RBS
FTSP “Fine-grained network time synchronization using reference broadcasts”. Epson et al. 2002.
Distributed Sleep-Scheduling Protocols for Energy Conservation in Wireless Networks, Naik et al
The protocol is in the app/transport layer. No master, each unit sends sync (firefly). A unit sends sync in the wake period of its own schedule. A unit periodically listens for the entire sync period (to hear other cliques.) A unit also transmits sync in the sleep period of its schedule, to apprise other cliques (redundantly with listening.)
“A Survey of Energy-Efficient Scheduling Mechanisms in Sensor Networks”, Wang and Xiao.
“An application-specific protocol architecture for wireless microsensor networks” Heinzelman et al. 2002. LEACH-C. Multi-hop. Low-energy routing. Clustering.
“Self-organizing distributed sensor net- works” Clare et al. 1999. An early description of the basic idea of TDMA to conserve energy while organizing networks.
“Randomized Independent Scheduling (RIS)” Kumar et al. 2004. Glosses time synchronization in favor of coverage concerns.
Sponsored Sector” Tian et al.
GLOSSY. The concern here is flooding, with clock synchronization as a secondary benefit. The novel idea: when two nodes simultaneously send an exact same message (with no distinguishing characteristics such as sender ID), the messages do NOT destructively interfere, but are heard by many receivers as one message.
FLOPSYNC-2
Firefly algorithms
Firefly algorithms are non-centralized and leaderless.
In general, firefly algorithms are not concerned with sleeping. Firefly insects don’t sleep while they are synchronizing. They also can transmit and receive at the same time. Also, they can communicate with multiple others at the same instant, since their eyes are multi-processing.
Firefly algorithms may be concerned with multi-hop networks, since firefly insects can be spread so far that each firefly cannot see some others.
Firefly algorithms (especially for insects, i.e. models of biology) may assume only a binary message, a pulse.
IEEE 1588 Precision Time Protocol (PTP)
This is “leader election.” An advance form sends two messages to establish a precise time: the first announces intent to synchronize at a time, and the second gives the precise time the first message was sent. The timestamp in the first message is not precise because of inherent unpredictable delays in the protocol stack, while some stacks will give you the precise time that a timestamped message WAS sent (via the return code or interrupt from the sending of the first message.)
Thus there are two sub-classes of “leader election”: “imprecise subclass” in which only one message is sent, and “precise subclass” in which two messages are sent.
PTP is not concerned with energy conservation and assumes network is always on.
Bluetooth as the Communication Stack
Classic Bluetooth is asymmetric power and connection oriented. A Central is typically a phone with much power. A Peripheral is typically low power and listens periodically for connections, but after a connection is made, stays awake.
Bluetooth radios have a choice of counter/clocks: precise crystals or imprecise resistor capacitor.
The Scanner/Observer roles can be used for broadcasting (without connections), not just for classic advertising of services.
BT advertiser is multiple access, but not carrier sense?
Typical jitter in the stack is as much as 5 milliseconds.
Beacons
My specific use case
Ultra low power. Units lack batteries, instead use harvested solar power. They may sleep at night. They may have periods without enough power to sustain the volatile memory or a timer, and reboot often, say every morning. Duty cycle is very low, say 1%. Bipartite: units are all within range of each other. The units just display in sync, they blink an led or move.