Explaining main algorithm
For a while I’ve been thinking of writing a scientific article. I wanted it to have certain utility.
Morse code is binary: it takes only two values – either dot (short) or hyphen (long). I figured out that short (s) can stand for two-eye blinking whilst long (l) can indicate left-eye blinking. Another question emerged: how to understand when does one-symbol recording stop?
Empty space between two symbols can be presented by right-eye blinking – r. If I input singly symbol of short (dot) and long (hyphen), I will blink my right eye once to indicate the space between two symbols.
To separate independent words, one has to blink her right eye twice and get rr.
Hence, I have collected an ordered set of symbols – r, l, s, - that can be converted into a full-fledged text. Once I accomplish the transformation, I get an answer.
Deciphering Python Code
Let’s take a closer look to the code functions I used.
eye_aspect_ratio(eye). I use dlib library for face detection. Next, out of 68 facial parameters (these are dots that are spread across human face in a sharpened shape of face), I pick 6 that are responsible for eyes location. In Function I determine whether an eye is opened or closed by counting two Euclidean distances between the upper and lower eyelids of the eye, equally offset from the center (parameters A and B), and Euclidean distance between the right and left corners of the eye (parameter C). The bigger the number, the more open the eye is.
def eye_aspect_ratio(eye):
A = dist.euclidean(eye[1], eye[5])
B = dist.euclidean(eye[2], eye[4])
C = dist.euclidean(eye[0], eye[3])
ear = (A + B) / (2.0 * C)
return ear
build_plots(name, value, plot) is responsible for output of an image from camera to computer screen.
def build_plots(name, value, plot):
plot = plot.update(value)
cv2.imshow(name, plot)
draw_outline(frame, eye) takes camera frame and eyes coordinates. It then fixates eyes with neon-green rings.
def draw_outline(frame, eye):
eyeHull = cv2.convexHull(eye)
cv2.drawContours(frame, [eyeHull], -1, (0, 255, 0), 1)
get_args() gets me prerequisite arguments (shape-predictor) for future execution.
def get_args():
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--shape-predictor", required=True, help="path to facial landmark predictor")
args = vars(ap.parse_args())
return args
open_video() opens my front camera and returns prerequisite arguments.
def open_video():
args = get_args()
print("[INFO] loading facial landmark predictor")
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(args["shape_predictor"])
print("[INFO] starting video stream thread...")
vs = VideoStream(0).start()
time.sleep(1)
return vs, detector, predictor
calibration(name, flag). Every person’s average ear (eye aspect ratio) is different, I can’t use any universal number for this parameter as former simply doesn’t exist. Thus, the calibration comes at handy. It helps to get numbers for several conditions: both eyes opened, both eyes closed, left eye closed & right eye opened and vice versa. After the first beep video opens and I have to hold my eyes opened till next beep. Afterwards, I close my eyes patiently waiting for last beep. Out of all sampling for eyes opened I return a minimum average value ear for left and right eyes. Out of all sampling for eyes closed I return a maximum average value ear for left and right eyes. Function thus returns optimal values for correct execution.
def calibration(name, flag):
plotLeft = LivePlot(640, 360, [5, 35], invert=True)
plotRight = LivePlot(640, 360, [5, 35], invert=True)
(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]
vs, detector, predictor = open_video()
print('\a')
time.sleep(3)
time_start = time.time()
both_eyes_open = []
both_eyes_close = []
while True:
frame = vs.read()
frame = imutils.resize(frame, width=450)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
rects = detector(gray, 0)
for rect in rects:
shape = predictor(gray, rect)
shape = face_utils.shape_to_np(shape)
leftEye = shape[lStart:lEnd]
rightEye = shape[rStart:rEnd]
leftEAR = int(100 * eye_aspect_ratio(leftEye))
rightEAR = int(100 * eye_aspect_ratio(rightEye))
if flag == 0:
earAvg = (leftEAR + rightEAR) / 2.0
both_eyes_open.append(earAvg)
elif flag == 1:
earAvg = (leftEAR + rightEAR) / 2.0
both_eyes_close.append(earAvg)
draw_outline(frame, leftEye)
draw_outline(frame, rightEye)
build_plots("ImagePlotLeft", leftEAR, plotLeft)
build_plots("ImagePlotRight", rightEAR, plotRight)
cv2.imshow(name, frame)
time_dif = time.time() - time_start
if cv2.waitKey(25) == ord("q"):
break
if time_dif > 5:
print('\a')
break
cv2.destroyAllWindows()
vs.stop()
if flag == 0: return min(both_eyes_open)
if flag == 1: return max(both_eyes_close)
morse_code_from_eyes(). This is the major function of my code. In real time it monitors and analyzes human eyes. Calibration goes first and is followed by a beep sound, after which the recording starts. Recording goes the similar way as during the calibration, however, now I am comparing the results I get from camera with the ones I got from calibration one step ago. I use counter to trace one symbol per once. If I haven’t utilized counter, I would have stored a large number of symbols per one blink as there wouldn’t be any break. After all symbols are passed, I press the “q” button on a keyboard to finish recording and close front camera. Then function returns the result of symbols recording.
def morse_code_from_eyes():
both_open = calibration("both_eyes_open", 0)
both_close = calibration("both_eyes_close", 1)
plotLeft = LivePlot(640, 360, [5, 35], invert=True)
plotRight = LivePlot(640, 360, [5, 35], invert=True)
(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]
vs, detector, predictor = open_video()
print('\a')
time.sleep(3)
counter = 0
points = ""
while True:
frame = vs.read()
frame = imutils.resize(frame, width=450)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
rects = detector(gray, 0)
for rect in rects:
shape = predictor(gray, rect)
shape = face_utils.shape_to_np(shape)
leftEye = shape[lStart:lEnd]
rightEye = shape[rStart:rEnd]
leftEAR = int(100 * eye_aspect_ratio(leftEye))
rightEAR = int(100 * eye_aspect_ratio(rightEye))
earAvg = (leftEAR + rightEAR) / 2.0
draw_outline(frame, leftEye)
draw_outline(frame, rightEye)
build_plots("ImagePlotLeft", leftEAR, plotLeft)
build_plots("ImagePlotRight", rightEAR, plotRight)
if counter == 0:
if earAvg <= both_close + 1:
points += "s"
counter += 1
print(earAvg, "ssssssssssssssssssss")
elif leftEAR - rightEAR >= 0 and earAvg <= both_open - 3:
points += "p"
counter += 1
print(leftEAR - rightEAR, earAvg, "ppppppppppppppppppрр")
elif rightEAR - leftEAR >= 0 and earAvg <= both_open - 3:
points += "l"
counter += 1
print(rightEAR - leftEAR, earAvg, "llllllllllllllllllll")
else:
if counter == 5:
counter = 0
else:
counter += 1
cv2.imshow("Frame", frame)
if cv2.waitKey(25) == ord("q"):
break
cv2.destroyAllWindows()
vs.stop()
return points
text_from_morse_code(points) is responsible for converting a received stroke of symbols (r,l,s) into a comprehensible text. Firstly, I save a dictionary which keys are a designation in Morse code, and the value is a letter. I split the stroke into “pp” to get separate independent words. Consequently, I go through all the symbols before “p” and convert each one into a letter. After, letters are combined into words and words into sentences. The ultimate result is a returned word.
def text_from_morse_code(points):
alphabet = {"sl": "A", "lsss": "B", "lsls": "C", "lss": "D", "s": "E",
"ssls": "F", "lls": "G", "ssss": "H", "ss": "I", "slll": "J",
"lsl": "K", "slss": "L", "ll": "M", "ls": "N", "lll": "O",
"slls": "P", "llsl": "Q", "sls": "R", "sss": "S", "l": "T",
"ssl": "U", "sssl": "V", "sll": "W", "lssl": "X", "lsll": "Y",
"llss": "Z",
"sllll": "1", "sslll": "2", "sssll": "3", "ssssl": "4",
"sssss": "5", "lssss": "6", "llsss": "7", "lllss": "8",
"lllls": "9", "lllll": "0"}
points = points.split("pp")
answer = ""
for word in points:
letters = word.split("p")
new_word = ""
for letter in letters:
if letter in alphabet:
new_word += alphabet[letter]
else:
new_word += "-"
answer += new_word + " "
return answer
Conclusion
First you need to run the morse_code_from_eyes() function, and save the result to a variable. After that, pass the resulting string to the text_from_morse_code() function and get the final result.