Explaining main algorithm
For a while I’ve been thinking of writing a scientific article. I wanted it to have certain utility.
Morse code is binary: it takes only two values – either dot (short) or hyphen (long). I figured out that short (s) can stand for two-eye blinking whilst long (l) can indicate left-eye blinking. Another question emerged: how to understand when does one-symbol recording stop?
Empty space between two symbols can be presented by right-eye blinking – r. If I input singly symbol of short (dot) and long (hyphen), I will blink my right eye once to indicate the space between two symbols.
To separate independent words, one has to blink her right eye twice and get rr.
Hence, I have collected an ordered set of symbols – r, l, s, - that can be converted into a full-fledged text. Once I accomplish the transformation, I get an answer.
Deciphering Python Code
Let’s take a closer look to the code functions I used.
eye_aspect_ratio(eye). I use dlib library for face detection. Next, out of 68 facial parameters (these are dots that are spread across human face in a sharpened shape of face), I pick 6 that are responsible for eyes location. In Function I determine whether an eye is opened or closed by counting two Euclidean distances between the upper and lower eyelids of the eye, equally offset from the center (parameters A and B), and Euclidean distance between the right and left corners of the eye (parameter C). The bigger the number, the more open the eye is.
def eye_aspect_ratio(eye): A = dist.euclidean(eye[1], eye[5]) B = dist.euclidean(eye[2], eye[4]) C = dist.euclidean(eye[0], eye[3]) ear = (A + B) / (2.0 * C) return ear
build_plots(name, value, plot) is responsible for output of an image from camera to computer screen.
def build_plots(name, value, plot): plot = plot.update(value) cv2.imshow(name, plot)
draw_outline(frame, eye) takes camera frame and eyes coordinates. It then fixates eyes with neon-green rings.
def draw_outline(frame, eye): eyeHull = cv2.convexHull(eye) cv2.drawContours(frame, [eyeHull], -1, (0, 255, 0), 1)
get_args() gets me prerequisite arguments (shape-predictor) for future execution.
def get_args(): ap = argparse.ArgumentParser() ap.add_argument("-p", "--shape-predictor", required=True, help="path to facial landmark predictor") args = vars(ap.parse_args()) return args
open_video() opens my front camera and returns prerequisite arguments.
def open_video(): args = get_args() print("[INFO] loading facial landmark predictor") detector = dlib.get_frontal_face_detector() predictor = dlib.shape_predictor(args["shape_predictor"]) print("[INFO] starting video stream thread...") vs = VideoStream(0).start() time.sleep(1) return vs, detector, predictor
calibration(name, flag). Every person’s average ear (eye aspect ratio) is different, I can’t use any universal number for this parameter as former simply doesn’t exist. Thus, the calibration comes at handy. It helps to get numbers for several conditions: both eyes opened, both eyes closed, left eye closed & right eye opened and vice versa. After the first beep video opens and I have to hold my eyes opened till next beep. Afterwards, I close my eyes patiently waiting for last beep. Out of all sampling for eyes opened I return a minimum average value ear for left and right eyes. Out of all sampling for eyes closed I return a maximum average value ear for left and right eyes. Function thus returns optimal values for correct execution.
def calibration(name, flag): plotLeft = LivePlot(640, 360, [5, 35], invert=True) plotRight = LivePlot(640, 360, [5, 35], invert=True) (lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"] (rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"] vs, detector, predictor = open_video() print('\a') time.sleep(3) time_start = time.time() both_eyes_open = [] both_eyes_close = [] while True: frame = vs.read() frame = imutils.resize(frame, width=450) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) rects = detector(gray, 0) for rect in rects: shape = predictor(gray, rect) shape = face_utils.shape_to_np(shape) leftEye = shape[lStart:lEnd] rightEye = shape[rStart:rEnd] leftEAR = int(100 * eye_aspect_ratio(leftEye)) rightEAR = int(100 * eye_aspect_ratio(rightEye)) if flag == 0: earAvg = (leftEAR + rightEAR) / 2.0 both_eyes_open.append(earAvg) elif flag == 1: earAvg = (leftEAR + rightEAR) / 2.0 both_eyes_close.append(earAvg) draw_outline(frame, leftEye) draw_outline(frame, rightEye) build_plots("ImagePlotLeft", leftEAR, plotLeft) build_plots("ImagePlotRight", rightEAR, plotRight) cv2.imshow(name, frame) time_dif = time.time() - time_start if cv2.waitKey(25) == ord("q"): break if time_dif > 5: print('\a') break cv2.destroyAllWindows() vs.stop() if flag == 0: return min(both_eyes_open) if flag == 1: return max(both_eyes_close)
morse_code_from_eyes(). This is the major function of my code. In real time it monitors and analyzes human eyes. Calibration goes first and is followed by a beep sound, after which the recording starts. Recording goes the similar way as during the calibration, however, now I am comparing the results I get from camera with the ones I got from calibration one step ago. I use counter to trace one symbol per once. If I haven’t utilized counter, I would have stored a large number of symbols per one blink as there wouldn’t be any break. After all symbols are passed, I press the “q” button on a keyboard to finish recording and close front camera. Then function returns the result of symbols recording.
def morse_code_from_eyes(): both_open = calibration("both_eyes_open", 0) both_close = calibration("both_eyes_close", 1) plotLeft = LivePlot(640, 360, [5, 35], invert=True) plotRight = LivePlot(640, 360, [5, 35], invert=True) (lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"] (rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"] vs, detector, predictor = open_video() print('\a') time.sleep(3) counter = 0 points = "" while True: frame = vs.read() frame = imutils.resize(frame, width=450) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) rects = detector(gray, 0) for rect in rects: shape = predictor(gray, rect) shape = face_utils.shape_to_np(shape) leftEye = shape[lStart:lEnd] rightEye = shape[rStart:rEnd] leftEAR = int(100 * eye_aspect_ratio(leftEye)) rightEAR = int(100 * eye_aspect_ratio(rightEye)) earAvg = (leftEAR + rightEAR) / 2.0 draw_outline(frame, leftEye) draw_outline(frame, rightEye) build_plots("ImagePlotLeft", leftEAR, plotLeft) build_plots("ImagePlotRight", rightEAR, plotRight) if counter == 0: if earAvg <= both_close + 1: points += "s" counter += 1 print(earAvg, "ssssssssssssssssssss") elif leftEAR - rightEAR >= 0 and earAvg <= both_open - 3: points += "p" counter += 1 print(leftEAR - rightEAR, earAvg, "ppppppppppppppppppрр") elif rightEAR - leftEAR >= 0 and earAvg <= both_open - 3: points += "l" counter += 1 print(rightEAR - leftEAR, earAvg, "llllllllllllllllllll") else: if counter == 5: counter = 0 else: counter += 1 cv2.imshow("Frame", frame) if cv2.waitKey(25) == ord("q"): break cv2.destroyAllWindows() vs.stop() return points
text_from_morse_code(points) is responsible for converting a received stroke of symbols (r,l,s) into a comprehensible text. Firstly, I save a dictionary which keys are a designation in Morse code, and the value is a letter. I split the stroke into “pp” to get separate independent words. Consequently, I go through all the symbols before “p” and convert each one into a letter. After, letters are combined into words and words into sentences. The ultimate result is a returned word.
def text_from_morse_code(points): alphabet = {"sl": "A", "lsss": "B", "lsls": "C", "lss": "D", "s": "E", "ssls": "F", "lls": "G", "ssss": "H", "ss": "I", "slll": "J", "lsl": "K", "slss": "L", "ll": "M", "ls": "N", "lll": "O", "slls": "P", "llsl": "Q", "sls": "R", "sss": "S", "l": "T", "ssl": "U", "sssl": "V", "sll": "W", "lssl": "X", "lsll": "Y", "llss": "Z", "sllll": "1", "sslll": "2", "sssll": "3", "ssssl": "4", "sssss": "5", "lssss": "6", "llsss": "7", "lllss": "8", "lllls": "9", "lllll": "0"} points = points.split("pp") answer = "" for word in points: letters = word.split("p") new_word = "" for letter in letters: if letter in alphabet: new_word += alphabet[letter] else: new_word += "-" answer += new_word + " " return answer
Conclusion
First you need to run the morse_code_from_eyes() function, and save the result to a variable. After that, pass the resulting string to the text_from_morse_code() function and get the final result.
